wiki:tickets/870

Moving towards CIM2.0

(associated with ticket:870)

Interim Plans

(to be discussed at a telco before finalization)

1.5 CMIP5 *** (01/12/10)
Allyn Includes support for external vocabularies
1.51 Quality package refactor *** (01/12/10)
Dominic Just ensure that the quality package can be serialised using Full Moon, so we have some examples in mind before simplificaiton and rationalisation.  
1.52 CMIP5 modified ****** (01/01/11)
Allyn Fix the namespace issue, and serialise to use exernal ISO xsd.
1.6 Simplification and Rationalisation ************ (01/03/11)
Kristin Address a range of issues to be listed below and in ticket:920 and ticket:949 to be complete in time for y3 mtg. (Includes O&M compliance)
1.7 Refactoring to TC211 compliance ************ (01/03/11)
Kristin Address TC211 compliance where possible. Address as many serialisation concerns as practical within the timescale. Needs to be complete in time for y3 mtg. ticket:951
1.8 Serialise using FullMoon *************** (01/04/11)
Dominic & Kristin May involve some new classmaps (hopefully not, since ISO serialisation of MOLES will catch most, if not all, of them. Definitely will involve changes associated with ISO19136 rules. Needs to be done in time to allow the services to refactor to use these concepts. Include UML for vocabularies as a separate package. ticket:950
1.9 RDF serialisation adjustments *************** (01/04/11)
Tobias, Kristen, Maurizio Addressing any modifications necessary in the base UML model to support UML serialisation. This is mainly for IS-ENES and future ESG. (The timescale may be allowed to slip, since in practice, no Metafor tools will depend on it, but it does need to be complete in time for integration in V2.0.  
2.0 Final Metafor Version (end of project)
Rupert, Bryan, Dominic Includes BFG coupling improvements, and aggregates 1.8 and 1.9 which will be developed in parallel.

Notes to accompany table:

  • A named person is responsible for each version, and where more than one person is identified, the person in italics will be responsible).
  • O&M is ticket:920 and associated wiki page.
  • It is assumed that the BFG coupling improvements (see ticket:862 and associated  wiki page) appear after 1.8, and before 2.0. The exact timing is still tbd.
  • The work on RDF serialisation will include three main drivers:
    • Kristen: Input from EuroGEOSS and vis-a-viz UML/RDF issues (e.g.  here).
    • Tobias: Input from IS-ENES.
      • See initial thoughts on  IS-ENES wiki
      • Note that it might not be desirable to serialise the entire CIM to RDF: we might consider tagging those parts that we consider priorities ...
    • Maurizio and Sylva: Input from ESG and issues associated with serialisation tools.

Impacts

Key dependencies on UML refactoring are:

  • The eXist xqueries!
  • The validation
  • The serialisation from the questionnaire
  • ESG ingestion
  • Existing instances

Consequences:

  • Everything with a CIM dependence should assert a dependency on a CIM version, and raise an error/warning if a "later" CIM instances are found.
  • A starting migration plan:
    • Fix software when 1.9 is finalised.
    • Start a migration of software to conform to ApCIM 1.9
      • Inevitably expect that 1.9 will be modified a little so that 2.0 reflects the 1.9 heritage modified following practical experience implementing with software.
      • We will need to migrate data
        • means we need a 1.5 to 2.0 tool
        • we'll need a 2.0 to 1.5 tool for ESG unless we can give them a direct RDF tool at the same time (which is possible if we solve the RDF serialisation step in the development of 1.9). (There will be more ramifications to work through in terms of their GUIs etc.)
        • (There will be more than one version of the CIM in the wild at one time, since we can't - and don't want to -control all producers and consumers of the CIM.)
    • (So no changes in the metafor software development til this 1.9 milestone delivery, although IS-ENES might start thinking about it at the 1.8 step ...)

Generic Issues

  • We may allow ourselves to exploit the INSPIRE serialisation rules to include association classes.)
  • With 1.51 objectives:
    • What is the nature of the UML representation of the remote vocabulary members, and how is that serialised in XSD, and what should a consumer of an XML document expect to do? See the OGC phenemonon discussions!
  • ISO compliance issues
    • Object property pattern (making the documents much easier to consume), and
    • Namespace issue (fixable in 1.5?)
    • Naming issues: lower case classes (avoided by use of the Object Property pattern)
    • How to handle the document stereotype (any others?) in serialisation via Full moon and friends?
      • This could be fudged for the quality package (1.52) where the document stereotype could be implemented as a class, but we'd have to consider the implications elsewhere wrt multiple inheritance (cf FeatureType).

Specific Issues

With 1.5

These issues might guide simplification and/or rationalisation.

  • <<Document>> stereotype
    • There appears to be redundancy in what is included in the document attributes and some attributes of some classes which carry the stereotype
    • As far as I can tell, these have been eliminated as of r2644 [AT]. There may have been some confusion because the ids & contact details specified as part of the document stereotype refer to the _metadata_ and not to the _artefact_ being described by the metadata.
  • Simulation
    • duration has type calendar on Simulation
      • how are startPoint and endPoint related
    • Why not simplify the relationship between SimulationRun and SimulationComposite? This must be able to be simpler!
    • We should not have specific related simulations included via named attributes. We should have a relatedSimulation attribute, with the relationship a property.
    • Many inputs via couplings are not helpful. In the questionnaire we found it helpful to discriminate between ancillary files, inputs, and boundary type inputs. These could be handled by having a convenience class which has a property (extensible, but including the above) and which bundles together the inputs (from a simulation perspective).
  • NumericalActivity
    • Dataholder should not be an attribute of a numerical activity, if this is needed, there ought to be an empty associated data record, rather than attributes of non-existing data tacked onto the numerical activity.
    • Done as of r2645 [AT]
  • ModelComponent
    • Should we bite the bullet and explicitly model configured and unconfigured models?
    • What is a Timing? (I read the text, but didn't really know how that was top level property.)
    • Should we pull out the isChildOf and isParentOf attributes and model them as first class (composition) objects?
  • NumericalExperiment
    • Why is calendar an attribute of the experiment? It might be a requirement attribute but not a numerical experiment attribute?
    • What is the number? How does it's characteristics differ from the URI?
  • Platform
    • Currently every time the compiler is updated, the platform would have to be updated. Likewise, the decomposition onto processor elements. Sadly, these start to feel like properties of the Simulation (ie. the runtime environment).

Some further points from an email:

  1. I am not sure the difference between numericalProperties and scientificProperties (both under componentProperties) is well defined. If the difference is clear for everyone, fine; if it is not, I think we should keep only componentProperties for now.
    • For numericalProperties, the note says "The properties that this model simulates and/or couples. NumericalProperties? contain those properties that describe what a model simulates.".
    • For scientificProperties, the note says "The properties that this model simulates and/or couples. ScientificProperties? contain those properties that describe how a model simulates."
  2. The sequence of elements to define a hierarchical coupled model using the element <childComponent> seems too complex to me (or maybe I did not really understand how to do this). For example, if I am right, to define an oceanic component included in an ocean model included in an ocean-atmosphere model, one has to specify this sequence of elements:<model>/<modelComponent>/<childComponent>/<modelComponent>/<childComponent>/<modelComponent> For each level of component, one has to use a <modelComponent> in a <childComponent>. Why is that so? Couldn't we just use a <modelComponent> and skip the associated <childComponent>?
    • BNL isn't sure what is going on here ... needs follow up.
  3. In <deployment>, we have <platform> into which we have <unit>. In <deployment> we also have <parallelisation>into which we have <processes> and <rank>. <unit> was included in <platform> to allow the description of a platform composed of more than one computing unit (e.g. a cluster of PCs). This is fine. But then, I think that we would need to associate <processes> and <rank> to <unit> because we may want to specify the number of processes used to run the component onto each computing unit. Specifying the total number of processes for the whole platform is not detailed enough. (In OASIS4, we worked around this problem by using one platform element with its associated processes and ranks for each computing unit; so element <unit> is useless for OASIS ....
    • BNL there are other problems with platform and deployment. I think we might need a bunch of changes here.
  4. I think that standard netCDF attributes should be available to describe a <dataObject> (usefull when <dataObject> is used to describe a netCDF file).
    • BNL We should be able to point straight to NCML I reckon!

General:

  • Configured and Unconfigured Models see ticket:868 (web page) and ticket:888 (web page).
  • O&M compliance: see ticket:920 (web page).
  • Should we consider using the OAI-ORE concept of an aggregation for bundling together metafor documents (as one wants to do with a recordset etc)?
  • Metafor never really grabbed the nettle of who does what when (which should inform the document stereotype), nor does it exploit the feature type stereotype (which should inform what is likely to be a useful discrete viewable composition of information which might be linked between documents). It also needs to consider the implication of having an addressable URI for most of the interesting entities (are they the same as the feature instances?).
  • We may also want to see how we could exploit the Open Provenance Model Vocabulary ( OPMV), particularly for post processing (?):
    • Classes: Agent, Artifact, Process
    • Properties: used, wasControlledBy, wasDerivedFrom, wasEncodedBy, wasEndedAt, wasGeneratedAt, wasGeneratedBy, wasPerformedAt,wasPerformedBy,wasStartedAt,wasTriggeredBy,wasUsedAt
    • Can we marry O&M and OPMV in any useful way?

Attachments