
Common Metadata for Climate Modelling Digital Repositories



About METAFOR | Why METAFOR |
|
|
|
Nearly all the confidence we have in predictions of global warming are based on repositories of scientific data: ranging from large repositories of climate model simulations, such as that assembled for the IPCC working group one (held at Lawrence Livermore Laboratory in the US, at PCMDI), to the records of individual station measurements such as the Central England Timeseries (held in a number of places, for example: http://badc.nerc.ac.uk/data/cet/). Currently such repositories are poorly connected, the user of one repository may not be aware what data is available elsewhere, and even if they are aware, would have to deal with different information models (file formats, metadata structures, documentation methodologies) at each institution. Comparing and contrasting the information about the data, let alone the data itself, is difficult without significant specific expertise. At the same time, the amount of data arriving into these repositories for storage, retrieval and subsequent interpretation is increasing rapidly: new instruments in space, and models with higher resolution and more complexity (and thus more variables) are producing a veritable deluge of data and information. While the problem with the data deluge is at least readily comprehensible, the issue of understanding the differences between one sensor and another, and one model and another, is less well appreciated: for example, the MPIM-M&D group hold approximately 27,000 different model datasets in their repository, each of which differs primarily in terms of the type of model used, the parameters with which it was run, and in some cases, the reasons why it was run. Future assessments of climate change are likely to require access to even more data than that used during the fourth assessment of the IPCC. It is already clear for example, that the putative AR5 will need to use distributed repositories of simulation data: no one organisation is likely to have the funding (and bandwidth to all the relevant communities) to support a centralised database such as that used in AR4. Thus in planning for the future assessments we need to start addressing the challenge of finding, manipulating, and interpretting, datasets in a distributed environment. Distributed geospatial datasets are of course not a new concept: many attempts have been made to address cataloguing such things, and amongst others, the Open Geopspatial Consortium (OGC) has addressed developing a variety of web services that can be used to manipulate distributed data. However, no existing development has addressed the full spectrum of information models needed to provide the context within which distributed data can be interpretted and used. Existing standardized metadata structures (e.g. ISO 19139 are completely incapable of discriminating between such datasets without major extensions, and so every institution holding similar types of data has thus far had a different approach to sorting and recording the key distinctions. While efforts to describe the differences between sensors is well underway (e.g. Sensor ML or the activities of ESA’s HMA), the difficulties with describing the differences between simulations have hitherto been mainly sporadic and/or parochial, and yet, such distinctions have been, and will continue to be integral to the proper interpretation of simulation data. In this project we build on expertise being developed within the European Community and abroad to
The impact of this project will not only be on the European contribution for planning for AR5, but also on existing scientific activities which exploit the partner repositories: in particular (ECWMF sharing agreement, CCMVAL etc). Navigation: Next: Objectives and Principles |