wiki:tickets/244

Processing the mindmaps (bundled version) to create XML suitable for ingestion by the CMIP5 questionnaire

At the telco on the 29th September 2009 it was decided to change the format of the mindmaps to support a (hopefully) improved version of the questionnaire. This new format is termed bundled.

Background

Marie-Pierre et al are maintaining the "bundled" mindmaps. The bundled mindmaps allow component properties to be grouped together by a label (hence the term bundle). They also allow component properties to be grouped together by a constraint. The two previous versions of the mindmaps (original and flattened) are no longer being maintained.

The bundled mindmap rules are described below. The associated CMIP5 questionnaire XML rules are documented in ticket 343.

Rules

1: The mindmaps should be named <name>_bdl.xml where <name> is one of the eight realms.

2: Controlled Vocabulary (CV) nodes should be to the right on the MindMap? (MM). Anything to the left will be ignored. [We can't actually check this, all we can do is check that nodes to the right conform to the rest of the rules defined here]

3: Specifying work in progress

  • A CV node and its children will be ignored if it includes the yellow triangular warning icon (messagebox_warning). This icon is meant to be used when the node is not yet complete.
  • A CV node and its children will be ignored if its text is in an italic font. Eg. enumerations of controlled vocabularies that have not yet been agreed.

4: All CV nodes that are not covered by rules 2: and 3: must conform to the following visual rules

a: componentBold, not Purple(#990099)
b: parameter bundlePurple(#990099)
c: parameter bundle to be treated as a non-editable component by the questionnaireBold and Purple(#990099)
d: constraintBlue(#0033ff)
e: parameterBrown(#996600)
f: valuenodestyle fork

5: Hierarchy rules:

  • There is one mindmap for each of the eight realm components.
  • a component contains
    • [0..n] components
    • [0..n] parameter bundles
    • at least one component or parameter bundle
  • a parameter bundle contains
    • [0..n] parameters
    • [0..n] constraints
    • at least one parameter or constraint
  • a constraint contains
    • [1..n] parameters
  • a parameter contains
    • [1..n] values
  • values are atomic

6: Specifying different types of values (controlled vocab, keyboard input, coupled):

Different types of values are indicated by adding the following icons to a value node

  • keyboard input (numerical value) is indicated by the Purple-1 icon
  • keyboard input (string value) is indicated by the pencil icon
  • a coupled parameter is indicated by the Back arrow
  • controlled vocabulary definition is the default when no icons are found

7: Formatting for keyboard input values

A keyboard input value may optionally include units. These will be specified in the text of the value node.

8: Specifying the multiplicity of controlled vocabulary parameters:

If a parameter has more than one potential controlled vocabulary value associated with it then the mindmaps define whether it is valid to select more than one value for the parameter, or whether only one value may be specified. The following icons, which are added to the controlled vocabulary values nodes, indicate the number of allowed parameter values

  • OR, the tick icon (button_ok)
  • XOR, the cross icon (button_cancel)

A consequence of this rule is that a value node and all its siblings will have the same icon.

The AND, the yellow star icon (bookmark) has been deprecated and will now produce an error in the checker if used.

9: Parameters which do not naturally fit into a parameter group will be placed in a default parameter group which has the name "<component name>Attributes". This default parameter group will not appear in the CIM output, or ESG definitions hierarchy. It is used as a convenience for the mindmaps and the questionnaire.

10: Controlled Vocabulary Definitions

Metafor is providing definitions for all controlled vocabulary terms. The bundled mindmaps will store the definitions.

Definitions will be stored as a note (pen and book icon) on the appropriate node.

For historical reasons (and for potential future expansion) the following format is used for notes.

[definition]definition text[/definition]

11: Constraint format

If the constraint in a constraint node is going to be validated by the questionnaire then it needs to conform to a certain syntax. The syntax is:

if paramname is [not] "value1" [or "value2"]*

for a parameter which contains a single value (XOR)

if paramname has [not] value1" [or "value2"]*

for a parameter which may contain multiple values (OR)

In both cases the "not" is optional and acts to negate the expression.

Constraints that do not conform to the above format are allowed but will not be validated.

12: Component names must not end in Scheme

This is a guideline rather than a rule.

13: If a component describes a parametrization/numerical scheme then it must have a SchemeType? parameter (SchemeName? is not mandatory as some schemes have no usual name and are only identified by their type). This rule can not be checked.

14: Controlled vocabulary component names, parameter bundle names and parameter names may only contain alphabetic characters, numeric characters, hyphen, underscore or dot.

15: Controlled vocabulary component names, parameter bundle names and parameter names will use Camel Case.

16: A parameter which contains a keyboard input value must only contain that input value. i.e. a keyboard input may not have multiple values in the mindmaps.

17: In component names, parameter names and parameter bundle names "_" and "." have a special meaning.

There must only appear at most 1 "_" in a name. ......
"_" means "ignore everything before" when translating for ESG
"." means "ignore everything afer" when translating for ESG

Both "_" and "." are stripped out of the name for the questionnaire.

Associated rules are:

There must only appear at most 1 "_" in a name.

There must only appear at most 1 "." in a name.

If a name contains both a "_" and a "." then the "_" must occur before the "." in the name.

Checking

A checker has been written to ensure that the bundled mindmaps conform to the above rules. The checker is written in XSLT and can be run directly via an XSLT processor or through a python front end. The code is available in controlled_vocabulary/Software

Using the python front end called MMCheck.py. This front end assumes that the python bindings libxslt and libxml2 are available.

mmcheck.py -h for help and information of command line arguments

mmcheck.py realm_bdl.mm to run.

Calling the XSLT directly. You can use whatever your favourite XSLT engine. xsltproc is one such engine and can be run thus

xsltproc xsl/mmcheck_bdl.xsl realm_bdl.mm

You can control some of the functionality of the checker through the argument list. Constraints checking can be switched between yes and no (default yes) and warnings can be switched between yes and no (default yes). For example:

xsltproc --stringparam CheckConstraints no --stringparam Warning no xsl/mmcheck_bdl.xsl realm_bdl.mm

Translating

If the mindmap passes all of the checks in the previous section it can be translated to the xml expected by the questionnaire. The translator is also written in XSLT and has a very similar make-up to the checker. The code is also located in controlled_vocabulary/Software

Again there is a python front end. This is called mm2q.py. Again, this front end assumes that the python bindings libxslt and libxml2 are available.

mm2q.py -h for help and information of command line arguments

mm2q.py realm_bdl.mm to run. This produces a file called realm.xml

Calling the XSLT directly. You can use whatever your favourite XSLT engine. xsltproc is one such engine and can be run thus

xsltproc xsl/mm2q_bdl.xsl realm_bdl.mm > realm_bdl.xml

Translation rules

  1. If definitions are not supplied for components or parameters in the mindmaps then dummy definitions are added.
  2. If a component does not contain a default parameter group then an empty one is added.
  3. Coupled parameters are ignored by default. If, by ignoring such parameters a constraint ends up with no parameters within it then this constraint is also ignored.
  4. Default parameter groups are renamed to "Component Attributes"
  5. parameter names are output in a: raw form, b: in questionnaire form and c: in ESG form
  6. mindmap version numbers are provided for provenance



Ensembles questionnaire
An additional translator has been created to translate mindmaps to an "Ensembles" questionnaire format. This adds an 'Unknown' option to the 'other' and 'N/A' options. Use mm2ensq.py instead of mm2q.py




= =

BELOW ARE THE OLD FLATTENED MINDMAP RULES FOR POSTERITY

Processing mindmaps to create xml for the CMIP5 questionnaire

Rupert, Marie-Pierre and Charlotte had a long email discussion about the rules that Rupert will be using to process the mindmaps we are using to gather controlled vocabulary for the questionnaire (subject: mm to xml). The proposed rules for MindMap? conformance below are the outcome of this discussion.

Proposed rules for MindMap? Conformance

Golden Rule: Don't create ambiguity where no ambiguity exists by using different names for the same thing.

1: Controlled Vocabulary (CV) nodes should be to the right on the MindMap? (MM). Anything to the left will be ignored. [We can't actually check this, all we can do is check that nodes to the right conform to the rest of the rules defined here]

2: Work in progress

  • A CV node and its children will be checked but not go through to the questionnaire if it includes the yellow triangular warning icon (messagebox_warning). This icon is meant to be used when the node is not yet complete. The xsl stylesheet MMtodo.xsl outputs any "to be completed" nodes.
    • In the development phase there will be a version of the questionnaire that does include the work in progress components (so long as they obey the rules). These nodes will come with a warning.
  • A CV node and its children will be ignored if its text is in an italic font. Eg. enumerations of controlled vocabularies that have not yet been agreed.

3: All CV nodes that are not covered by rules 1: and 2: must conform to the following visual rules

a: componentBold
b: component refas a: + red arrow (LINK attribute defined in MM xml)
c: leaf parameterBrown(#996600)
d: complex parameterPurple(#990099) NO LONGER ALLOWED ?
e: common propertyBlue(#0033ff)
f: common property refas e: + red arrow (LINK attribute defined in MM xml)
g: valueNodeStyle? fork

Comment: The mm distinguishes between components based on their position in the hierarchy. A root component is 18pt, a child of a root component is 14pt, any other children are 14pt and Purple(#990099). This information is purely a visual aid and is not required in the questionnaire.

4: A value may include the pencil icon or the Purple-1 icon. These icons indicate that a keyboard input is required. The pencil icon indicates that a 'string' is to be input. The Purple-1 icon indicates that a 'numerical' value is to be input. In both cases the text is enclosed in square brackets. This text acts as a description rather than being controlled vocabulary. If a value does not include either icon it is assumed to be a controlled vocabulary definition.

5: If a value node has value node siblings (i.e. more than one value node have the same parent) then it, and all its siblings, must include either the tick icon (button_ok), the cross icon (button_cancel) or the yellow star icon (bookmark). A value node and all its siblings must have the same icon. The tick icon indicates an OR group, the cross icon indicates a XOR group and the yellow star icon indicates an AND group.

6: Notes:

  • a: A controlled vocabulary value node can include a note (pen and book icon) which provides a description of the value. --This is particularly useful for CV lists which contain "other", the note can be used to inform the user of the information we require about the other item--
  • b: A leaf parameter can include a note (pen and book icon) which provides information about the parameter.
  • c: Notes associated with any other nodes will be ignored.

7: Hierarchy rules:

  • a: There is one mindmap per level 1 parent component and one for the root (level 0) component. (This rule implies that it is not valid to have both atmos and ocean in the same mindmap)
  • b: A component will contain 0 or more components and/or component references.
  • c: A component will contain 0 or more common CV nodes. Common CV nodes may also be children of the central node. In this case they apply to all component nodes. Any common CV nodes may be references. Rupert made this up we need to talk about it, we need to come back to this.
  • d: Leaf components (components not containing other components) will contain 0 or more parameters.
  • e: Common CV nodes will contain 1 or more parameters. hmmmm
  • f: complex parameters (a choice based parameter) will contain 1 or more complex parameters and/or parameters NO LONGER REQUIRED but we may be able to flatten these using a piece of code if we come up with rules to govern it.
  • g: parameters will contain one or more values (of the same type - see rule 10) Oh dear. Suggestion that we should have more control over user responses.
  • h: values are atomic, they may not contain any other nodes. [MMCheckValues.xsl] (True for atmosphere not true for ocean right now, we may be able to get round this with the flattening rules in point f)

8: Component names must not end in Scheme.

9: If a component describes a parametrization/numerical scheme then it must have a SchemeType? parameter (SchemeName? is not mandatory as some schemes have no usual name and are only identified by their type). '

10: There must be a one to one mapping between a parameter name and the type of numerical value it contains. This is to reduce confusion when we process the questionnaire responses.

11: the mindmap will interpret the syntax: [name](units) as follows...

  • the contents of [] is the name/description of the parameter that appears next to the keyboard input box in the questionnaire
  • the contents of () are the units that correspond to a numerical parameter.

We need a check for this.

12: Express the option "no/none" with the parameter "modelled" This rule replaces the "no X phenomenon / scheme" in (Scheme)Type drop-list of values.

  • "Modelled" maps onto what is called "represented" in the CIM
  • The suggested possible values of modelled are (to be disscussed):
    • yes, activated
    • yes, but not activated
    • no.

13: Icons we are using in the mindmaps

triangle cautionThere is still work to do
pencilA string keyboard input value is required
Purple number 1A numerical keyboard input value is required
Question MarkAuthor is not sure about this
Green tickBoolean OR
Red crossBoolean XOR
StarBoolean AND
red traffic light *a current ticket 244 rule does not match
electric light bulb *suggestion of modification for a current 244 rule
Letter *2 options proposed (to be voted... or suggest an alternative representation)
  • indicates that these icons are used for communciation with each other not for implementation

Outstanding questions about the MindMap? rules

  • Do we care about the font, font size and/or choice of bubble or fork visual representation being consistent for the same types of nodes in the mindmaps? Currently font sizes are consistent (should sizes therefore be part of all clauses in rule 4?). However, the type of font is currently not consistent (in Atmosphere.mm). Should we enforce consistency? If so, for everything, or a subset?
  1. Hmmm, I hadn't even realised that I had been using sans-serif and not ariel for parts of the mindmap, I guess that shows us that it doesn't really matter.

I am not sure what it means to be common so I'm reserving comment on these questions until I have had more time to think.

  • Is it correct to assume that common CV is shared by all components at a lower level of the tree than its definition. If not how is it determined which components treat it as common and which do not?
  • What does it actually mean to be common? Does it mean that each component shares these definitions i.e. we define them once for all these components?
  • Can a component override a common parameter by declaring a local parameter of the same name or must they be distinct?
  • Is it correct to allow common CV to exist at any level (not just at the "top" level?
  • Is it correct to limit reference nodes to be common CV or a component (of any type) or can other types of nodes be references too e.g. a parameter?

A prototype schema has been created and is in the repository

The translator (MMReader.xsl) has been written and is in the repository. The xml output that the xsl creates from the Atmosphere mindmap has also been added to the repository.

Two constraint checks have been written so far ...

The easiest way to run the xsl is to use the command line program xsltproc although you can obviously use whatever xsl processor you like. To produce the Atmosphere.xml file I ran the following.

xsltproc xsl/MMReader.xsl Atmosphere.mm > Atmosphere.xml


So here is a quick summary of what I am doing.

!: I am finding out what the constraints on the mindmaps should be

2: I am writing a set of checks that test that the mindmaps conform to these constraints. I have written one check so far: MMCheckSchemeAndType.xsl

3: Part of these checks includes the output of all of the "to be done" parts of the mindmaps (MMtodo.xsl).

4: I am writing a translator (MMReader.xsl) that converts the mindmap xml to a more generic xml form as long as they conform to the constraints

5: I will write a schema which defines the more generic xml structure

6: I am writing a translator (MMWriter.xsl) that converts the more generic form into a mindmap.


Charlotte wrote:

I get what you mean now when you talk about complex parameters.

ComponentDomain?, Space, HorizontalDiscretisation? in the Atmosphere mindmap has three parameters and we need to ask for different information depending on which one the user chooses to describe their model. So yes, we do need to have a concept of Complex Parameter.

Rupert wrote:

Regarding complex parameter I was assuming that a complex parameter is a parameter which contains other parameters but perhaps this should be a component? Such things are coloured purple (but are not bold). There is one example in the atmosphere mindmap which is "Space" in "ComponentDomain". I can't see how that could be a component. Such entities are also used in the ocean mindmap e.g. "Lateral" in "Tracers Scheme" in "OceanAdvection". Advice anyone?

Charlotte wrote:

Here are my amendments/comments on the constraints for MM/xml

1: All leaf components must have Scheme and Type parameters (changed)

2: All MM nodes must conform to the specified formats (agreed)

3: A component may contain another component and/or parameters (agreed)

4: I don't understand what a complex parameter is. (???)

5: A parameter only contains controlled values (agreed, unless it's value has been specified as user defined by a pencil icon or number1 icon)

6: Vocab is to the right. (agreed)

Now for your questions...

  1. Must a parameter have a name? Ocean has examples that do not, or are they parameter values?
  1. I removed "name" from the Atmosphere mindmap and replaced it with Scheme because "name" was simply referring to the name of the scheme.
  1. What is a common property and why is it needed? Ocean says "mostly common"! as a comment so how does this help?
  1. I will need to look more closely at the ocean mindmap to answer that, maybe Eric and tell you what he means. However, there are some examples of common properties in the Atmosphere mindmap too. See the properties of the components in AtmosClouds where some of the properties of the different types of cloud components are the same. We may decide that it makes more sense to move these properties to the left and call them the common properties of AtmosClouds.
  1. Is a complex parameter a required concept?
  1. I don't understand the what a complex parameter is.
  1. Do all (XOR,AND,OR) options have to be the same at a particular level or can they be mixed?
  1. Yes, they have to be the same.
  1. Can a value contain another value?
  1. No, we need to flatten our mindmaps so this is not the case.
  1. If something is not a string or float then what is it - it must be a set of defined values?
  1. What a marvellously open question.... I don't know, do you have an example?
  1. Do we want to maintain component/subcomponent/scheme info in the xml?
  1. Yes
  1. Presumably we want to keep the "notes" in the xml too?
  1. Yes, if a user selects "other" as a parameter value I want a text box to appear which uses the note to tell them what further information is required.
  1. How does the keyboard entry keyword relate to the String and Integer options?
  1. Keyboard entry is denoted by square brackets. The number 1 icon indicates that a number is required and a pencil icon indicates that a string is required.
  1. valorisation vs. choices for valorisation? what does this mean? Is that the name of a parameter vs the values it can take?
  1. Valorisation is French and I think it means valuation in this context. So I reckon "valorisation vs. Choices for valorisation" means "a user enters a number vs. we give the user a list of numbers to choose from".

Hope this helps you Rupert, do shout if you need to know more.

I guess I should wiki-fy this email exchange.

Charlotte


From: Pascoe, CL (Charlotte)

Sent: 28 April 2009 !13:02

Subject: RE: MindMap constraints to produce reasonable xml for use by questionaire

Hi Rupert,

I have just committed my latest version of the atmosphere mindmap Atmosphere.mm to the subversion repository at revision 453.

Each leaf component has a Scheme and a Type (exception for the GW parameterisations which have a type for propagation and a type for dissipation). Scheme and Type are offered to the user as XOR choices.

Each leaf component may also have some properties. For many components the properties are called "properties" (having been known as keywords yesterday) but for other components the properties have names like "PrecipitatingHydrometeors" and "NumberOfChannels". If choices are required then properties are offered to the user as OR choices.

I'll review what this means in terms of your constraints when I've had my lunch :-) Charlotte


From: Rupert Ford [mailto:rupert@manchester.ac.uk]

Sent: 28 April 2009 !12:26

To: Pascoe, CL (Charlotte); Lawrence, BN (Bryan)

Subject: MindMap constraints to produce reasonable xml for use by questionaire

Hi Charlotte and Bryan,

I said I would pester you about the sort of constraints you would like to have in the MindMaps so here I am. Should I be emailing a larger group of people i.e. the people identified in the telco?

Here are some obvious constraints I've thought of (which may be wrong I

suppose) and I've written xsl to check the one we discussed at the telco. Are there any more that you can think of?

Any constraints for MM/xml?

1: All leaf components must have Name and Type parameters - attached xsl checks for this

2: All MM nodes must conform to the specified formats

3: A component may contain another component and/or parameters

4: A complexparameter only contains parameters

5: A parameter only contains controlled values

6: vocab is to the right

I also have a load of questions which relate to how I am going to translate the mindmaps into a cleaner xml structure.

Must a parameter have a name? Ocean has examples that do not, or are they parameter values?

What is a common property and why is it needed? Ocean says "mostly common"! as a comment so how does this help?

Is a complex parameter a required concept?

Do all (XOR,AND,OR) options have to be the same at a particular level or can they be mixed?

Can a value contain another value?

If something is not a string or float then what is it - it must be a set of defined values?

Do we want to maintain component/subcomponent/scheme info in the xml?

Presumably we want to keep the "notes" in the xml too?

How does the keyboard entry keyword relate to the String and Integer options?

valorisation vs. choices for valorisation? what does this mean? Is that the name of a parameter vs the values it can take?

Many thanks for your wisdom and sorry for all the questions.

-- Rupert