CMIP5/Variable Structures

(2010/01/24: this DRS description is out of date and needs updating)

With CMIP5 it is clear that whatever file structure we use, we need to make it one that we can use as the basis for file replication amongst federation partners.

Current thinking is along these lines.

An atomic dataset will be a collection of data defined by:

activity, institute, model, scenario/experiment, data frequency, variable name, local ensemble member, version 

where the last two may be omitted if there is no ensemble and only one version.

This needs to be married with the proposal for unique identifiers within files, and with the metadata available within the files.

Metafor Metadata Identifiers

The key documents exported from the metafor questionnaire are CIM document instances: with cimtype one of

  • platform
  • simulation
  • experiment
  • data (This needs some discussion, particularly vis-a-viz ESG)
  • component

All CIM document instances include the following identity metadata:

  • a UUID identifier
  • a version number
  • a metadata version string

All such CIM documents can be retrieved using the following URL structure:

  • http://hostname/cimtype/uuid (will return the latest)
  • http://hostname/cimtype/uuid/v where v is an integer (will return a specific version).

The following are the semantic rules associated with these identity metadata:

  1. The uuid should only change if the thing described by the document changes. That is, once assigned, the uuid never changes, and once exported, the document should persist.
  2. If the thing itself changes, we should copy the document, give it a new uuid, and update it ...
  3. If the metadata is copied to a new version, we
    • update the metadata version, and
    • increment the document version number.
  4. If the metadata is updated, we
    • increment the document version.

Metafor Document Inventory

An inventory of documents will be available at a set of atom feeds to appear at:

  • http://hostname/feeds/cmip5/[cimtype] (one feed for each type), and
  • http://hostname/feeds/cmip5/all (one feed for all types).

As it stands, the feed will (eventually) have

  • lots of documents
  • but some of them will be later versions, and we'd need to do something clever ... (oai-pmh wouldn't help here, it'd have the same problem). (The created date will be that of the first version, but the updated date will be consistent with the version itself).

It might be cleverer to have on entry per uri, rather than one entry per (uri+version), and use the links to point to the older versions. Actually, that'd be easy enough to do ... so we'll do it, and update this page when done ...