UK expertise informs Google's new dataset search
6 September 2018
Experts from UK Research & Innovation have contributed to a new search tool launched today by Google that aims to help scientists, policymakers and other user groups more easily find the data required for their work and their stories, or simply to satisfy their intellectual curiosity.
In today's world, scientists in many disciplines, and a growing number of journalists, live and breathe data. There are many thousands of data repositories on the web, providing access to millions of datasets, and local and national governments around the world publish their data as well. As part of the UK Research & Innovation (UKRI) commitment to easy access to data, their experts worked with Google to help develop Dataset Search - external link, launched today.
Similar to how Google Scholar works, Dataset Search lets users find datasets wherever they're hosted, whether it's a publisher's site, a digital library or an author's personal web page.
Google approached UKRI's Natural Environment Research Council and Science & Technology Facilities Council (STFC) to help ensure their world-leading environmental datasets were included. The heritage in these organisations, managing huge complex datasets on the atmosphere, oceans, climate change and even data about the solar system, led Dr Sarah Callaghan, Data & Programme Manager at the UKRI's national space laboratory, STFC Rutherford Appleton Laboratory (RAL) Space, to work with Google on the project.
Dr Sarah Callaghan said:
In RAL Space, we manage, archive and distribute thousands of terabytes of data to make it available to scientific researchers and other interested parties. My experience making datasets findable, usable and interoperable enabled me to advise Google on their Dataset Search and how to best display their search results. I was able to draw on my work with NERC and STFC datasets, not only in just archiving and managing data for the long term and the scientific record, but also helping users to understand if a dataset is the right one for their purposes.
To create Dataset Search, Google developed guidelines for dataset providers - external link to describe their data in a way that search engines can better understand the content of their pages. These guidelines include salient information about datasets: who created the dataset, when it was published, how the data was collected, what the terms are for using the data, etc. This enables search engines to collect and link this information, analyse where different versions of the same dataset might be and find publications that may be describing or discussing the dataset. The approach is based on an open standard - external link for describing this information. Many STFC and NERC datasets for environmental data are already described in this way and are particularly good examples of findable, user-friendly datasets.
Dr Sarah Callaghan continue:
Standardised ways of describing data allow us to help researchers by building tools and services to make it easier to find and use data. If people don't know what datasets exist, they won't know how to look for what they need to solve their environmental problems. For example, an ecologist might not know where to go to find, or how to access, the rainfall data needed to understand a changing habitat. Making data easier to find will help introduce researchers from a variety of disciplines to the vast amount of data I and my colleagues manage for NERC and STFC.
The new Google Dataset Search offers references to most datasets in environmental and social sciences, as well as data from other disciplines including government data and data provided by news organisations.
Professor Tim Wheeler, Director of Research & Innovation at NERC, said:
NERC is constantly working to raise awareness of the wealth of environmental information held within its data centres and to improve access to it. This new tool will make it easier than ever for the public, business and science professionals to find and access the data that they're looking for. We want to get as many people as possible interested in and able to benefit from data collected by the environmental science that we fund.
Dr Chris Mutlow, Director of STFC RAL Space, said:
This work builds on RAL Space experience in data management and commitment to making it easily accessible. The expertise that Sarah and our other data scientists have in this area is becoming an ever more important global resource to call upon. The data centres we manage for NERC and STFC play an important role in scientific research and are a facility available to all.
NERC External Communications Manager
1. NERC is the UK's main agency for funding and managing research, training and knowledge exchange in the environmental sciences. Our work covers the full range of atmospheric, Earth, biological, terrestrial and aquatic science, from the deep oceans to the upper atmosphere and from the poles to the equator. We coordinate some of the world's most exciting research projects, tackling major issues such as climate change, environmental influences on human health, the genetic make-up of life on Earth, and much more. NERC is part of UK Research & Innovation, a non-departmental public body funded by a grant-in-aid from the UK government.
2. RAL Space is an integral part of the STFC's Rutherford Appleton Laboratory (RAL). RAL Space carries out world-class space research and technology development with involvement in over 210 space missions.