|
METAFOR Year 1 meeting
Cosener’s House, Abingdon Feb. 9-11 2009
List of Actions, Decisions and Issues
1. Actions:
· CMIP5 questionnaire:
o Develop the CMIP5 questionnaire list of requirements and constraints – initial version from mtg minutes to go on wiki (Eric)
o Develop a sequence diagram/flowchart/storyline for the users’ path through the questionnaire – Laurent and Sebastien based on initial version by Bryan (see http://metaforclimate.eu/trac/wiki/CMIP5/Storyline) Arrange a telco with the USA to discuss the CMIP5 questionnaire storyline, Dean Williams from PCMDI will need to take part.
o Sarah is to create an ingestion diagram, showing where the metadata can be ingested from and when (netcdf files, questionnaire etc)
o Talk with Bob Drach regarding ESG Publisher – Bryan
o Charlotte and Marie-Pierre to liaise with each other and other scientists (e.g. IPSL and GFDL) to fill in the gaps in the controlled vocabulary structure.
o Arrange a dfn and telco session with Sylvia for her to go through the controlled vocabulary mind map with us, taking into account the results for the Dy-Core workshop – Sarah, Marie-Pierre
o Put the mind maps on the Metafor site to allow visitors to interact with it – Sarah + BADC support
o Marie-Pierre, Lois, Charlotte and Eric are to decide on a way to govern the mind maps for the controlled vocabulary, filling in the controlled vocabulary, rules for merging the maps etc. - due 20th Feb
o Work on associated diagnostic variables – Bryan
o Lois is to email Allyn with already built lists of controlled vocabulary – Lois
o Time line:
§ Database open no earlier than July 1st
§ ESG CV needs by July 1st (not too disruptive when no instances are there)
§ Population of database, Q3 2009
§ CV list ready by mid-April 2009
§ List of questionnaire requirements ready by end of Y1 mtg
§ So questionnaire ready by July 1st
· International scene and CMIP5 (including STAB recommandations):
o We need to seek the endorsement of WGCM (and CMIP panel) (which may already in fact be implied) that METAFOR will lead (and assume primary responsibility) for obtaining from the modelling groups metadata experiment information – Eric/Sarah
o We are to better structure our communication with the CMIP5 panel and our colleagues in the USA. We should build a closer relationship between PCMDI and Metafor encourage the dissemination and uptake of the CIM by the climate effects and climate modelling communities – All (Sarah/Eric to lead)
o Medata data governance: Metafor should take the lead in developing the governance for the CIM. This could be by analogy with the CF committee that governs the controlled vocabulary – Eric/Karl
· Working with Curator/ESG and related projects (includes dissemination actions):
o Develop a schedule of regular teleconferences between Metafor and Curator to discuss priorities, issues etc. – Sarah
o Finalise the MoU between Metafor and GENESI-DR – Bryan and Luigi
o Investigate the other EC projects to determine if any have areas of common interest – Sarah
o Produce a document on Metafor’s relationship to GEOS – Bryan
o Do a survey of what other projects are around and what we could get from them to target our effort for cross project interactions. This should include identifying datasets and the interfaces to datasets – Sarah
· CIM development:
o Bryan is to give a copy of his old UML about CF to Allyn – Bryan
o Build CIM instances – All
o Take Marie-Pierre’s controlled vocabulary mind map and ingest it into the CIM – Allyn
o Allyn and Bryan are to discuss with software package people how to translate controlled vocabulary lists into xml schema (blocker)
o Allyn is to clean up the CIM with regards to the change property and add attributes to the document type – e.g. parent.
o Allyn is to sort out the coupler concept and break up the simulation class to get rid of file compmonent.
o Phil to organise a telco on the subject of grids with Sylvia and Balaji
o Sylvia is to email the Metafor list with the UML of the Curator grids package.
o Present slides from the quality package at future telco - Mark
· WP4/5/6 issues:
o Allyn and Bryan to discuss the technology to use for portal searches – Allyn and Bryan
o Demonstration of the Metafor portal to be given via telco and dfn – BADC
o Work plans for WP 4, 5 and 6 should be decided on – due by next telco, Thurs 19th Feb
o Arrange monthly telco to discuss WP 4, 5 and 6 – Sarah/Eric to ensure this happenso
· Management and dissemination:
o Give the STAB access to the full Metafor site – Sarah
o Eric to discuss with Metafor EU liaison about interactions with EU policy makers to be made by IS-ENES – Eric
o Develop a video of Metafor activities – Sarah
o Timeline for the year 1 report:
§ Sarah to put template for year 1 report and “Form C” (for finances) onto the metafor site – due Feb 28th
§ Eric and Sarah to update metafor-admin list and find out about year 1 review meeting
§ WP leaders to coordinate production of 3-4 pages per WP
§ WP Leaders to provide technical reports to Sarah and Eric – due Mar15th
§ Partners to provide form C (giving financial situation up to end Feb) to Sarah and Eric – due Mar15th
§ Final report assembled – Eric and Sarah – due Mar 31st.
o Sarah and Allyn to coordinate about restructuring the wiki and website to make it easier to find things. They’re also to formulate a working procedure for tracking issues using the wiki and trac system, and inform the list of this procedure.
o Everyone is to review their tickets, and create new ones as issues arise.
o Sarah is to mail the list with instructions on what to do if the website or Trac breaks
o Everything on the website bar the EU-related documents (proposal, deliverables, reports,…) should be made public – Sarah
2. Decisions and information:
· CMIP5 metadata:
o CMIP5 will provide Metafor’s real-world practical example.
o Who governs changing the structure of the CIM in response to CMIP5? We do.
o Metafor would like to recommend mandatory minimum content for CMIP5 metadata documents.
o The CIM tells us how to structure the information we collect from the questionnaire. The questionnaire structure tells us how to collect that information. We need fully fleshed out sequence diagrams/storylines to inform the structure of the questionnaire.
o We need to ingest metadata from the netCDF data files as well as the questionnaire.
o We would like PCMDI to not allow datasets to be published until the metadata is complete. (This is still under negotiation, as is the definition of “complete”)
o Links to other documents should not be included in the documents themselves, instead all documents should have unique identifiers that are linked by a third party register.
o We need to have 1 coherent voice to speak to the modelling groups with about metadata. Hence the questionnaire can be branded CMIP5 (or other name) instead of Metafor without any problems.
o The CMIP5 panel will govern CMIP5 controlled vocabulary.
o For the CMIP5 questionnaire, there will be controlled vocabulary for the model name, which will be unique. Model name will provide a starting point for finding out how the model is configured.
o CMIP5 data is expected to come in to the data centres no earlier than July 1st 2009. The data centres will need enough metadata to catalogue the incoming data, and will require the rest to publish the datasets.
o For the list of diagnostic and prognostic variables of interest to scientists, we only need to concentrate on the ones in the CMIP5 output variables list.
o For every model component we should ask about the output and input variables. It was pointed out that the Met Office (and other modelling groups) can produce this information in their own structure, but it will take work to translate that to the CIM.
o We can discuss post-processing in the same way as discussing coupling. These topics will have a different set of questions in the questionnaire.
o We only need to record coupling information at the level required for CMIP5.
o For CMIP5 the change property is not needed as we only need to say that the modeller is basing a new instance on an old instance. Hence for CMIP5, change property should not be used.
o Gridspec netcdf files will be submitted separately from the data files. Charles Doutrieux from PCMDI is writing code to harvest gridspec information from netcdf files into xml.
o We can populate some of the grids package from gridspec netcdf files.
o The BADC and WDCC will work on building empty questionnaires in parallel, using geonetwork and building tools from scratch using Jango. This will be dependent on the questionnaire sequence diagram/storyline. Once they’re built, all project members should test them.
· Controlled vocabulary:
o Modellers will maintain the constraints on the controlled vocabulary, not us.
o Regarding controlled vocabulary, we need to distinguish control of Names and Values from Structure. Structure determines how you compose your vocabulary.
o In Metafor, there is internally governed controlled vocabulary and externally governed vocabulary.
o Freemind is a useful communication tool for interacting with other scientists.
o The question posed to scientists when developing the controlled vocabulary mindmap was “What do you need to know to differentiate many models when looking at their output?”
o The modelling group is expected to run the same model for all experiments in a given focus area, though this may not happen in practise.
o Users of the questionnaire should be allowed copy the information given for a different experiment and overwrite it to save as a new experiment, capturing the changes from the previous one without having to input all the previous information which hasn’t changed.
o The CMIP5 controlled vocabulary should use the same characteristic/coefficient structure as proposed by Marie-Pierre. It’s possible for the controlled vocabulary to be hierarchical and we want this to be the case.
o Some controlled vocabulary may be component-independent, e.g. Numerical component.
o We need to get copies of all the existing (and soon-to-be-created) CMIP5 controlled vocabulary lists and ingest them into the CIM.
o The controller of the mind map is to maintain the maps versioning and comments.
o In the short term, we will reconcile conflicting scientific information in the controlled vocabulary by making an executive decision ourselves. In the long term, this will be done by the (yet to be formed) standards committee.
o Interact more with the climate modelling communities to get more potential user opinions on the CIM and the controlled vocabulary. People in charge of coordination:
§ Atmosphere: Marie-Pierre (to interview: Michel Dequé, Fred Hourdin, Marco Giorgetta, Met Office person, Bruce Wyman GFDL, Gary Strand NCAR cf. CCSM.ucar.edu/working_groups)
§ Ocean: Eric (to interview: Gurvan Madec, Steve Griffies, Met Office person, Helmut Haak MPI, NCAR)
§ Sea ice: Eric (to interview: LIM person LLN, David Salas, Andrew+Mike Winton GFDL, Helmut Haak)
§ Land surface: Marie-Pierre (to interview: Herve Douville, Jan Polcher, Sergey Malyshev GFDL, NCAR)
§ Atmosphere Chemistry: Lois + Charlotte (to interview: Met Office Graham + Fiona; GFDL Larry Horowitz, NCAR Gary Strand + Laurence Buja, Mainz people, KNMI Peter VVH, Vincent-Henry Puech)
§ Land ice: Marie-Pierre (to interview: Herve Douville, IPSL Geerhart Krinner)
§ Ocean bio geochemistry: Eric (to interview: Lauren Bopp, John Dunne, Helmut Haak)
§ River routing: Marie-Pierre (to interview: Stefan Hagemann, Bertrand Decharme, Jan Polcher)
· CMIP5 questionnaire:o Requirements:
§ The questionnaire should be as flexible as possible, to make it easy for the users to use. It should also allow users to tell us things in a different order from our proposed question flow.
§ We never want to ask a user the same question twice. The questionnaire should only be changed/updated by us in such a way so that new information can be added, but previously entered information by the users does not need to be updated.
§ There should be restrictions on what can be entered as responses to the questionnaire. We should use drop down lists of controlled vocabulary where possible, with extra functionality to make it easy for users to navigate through long lists of controlled vocabulary. Free text feedback should be available to inform us if the drop down list is incomplete.
§ We should start with simple questions and get more complicated (progressive disclosure)
§ We need to capture Curator/ESG controlled vocabulary and keep compatibility with it.
§ Metafor can and will spend a couple of weeks correcting and quality controlling the metadata from the questionnaire. The questionnaire software should be able to flag questions with missing answers and guide the user to where those questions are.
§ The questionnaire needs to be all about CMIP5 and only about CMIP5. It has to include a way to collect the extra information the modellers may want to tell us that we haven’t asked about.
§ It should ask for more information than was provided for CMIP3.
§ It should satisfy the CMIP5 use cases and any information we ask for should produce CIM instances for the use cases we came up with.
§ It should allow the user to save a response, but not validate it, to allow the information to be added in more than one session (re-edit/save/update/increment functionality)
§ Users should be able to edit the questionnaire after submission to correct mistakes in the submitted metadata.
o We shouldn’t rule out having a faceted browse questionnaire, as it allows a lot of flexibility (at the cost of having the questions as triples).
o We will put the questionnaire together and then decide what questions should be mandatory.
o In the cases of ensembles, the questions about the model used should be filled in once, while the changes in forcing can be filled in once for each run of the ensemble.
· CIM requirements:
o A future objective is to have the CIM applicable to other areas of science, e.g. forecasting, reanalysis etc.
o With regards to the CIM, the following requirements were identified (in collaboration with the climate modelling community):
§ Only describe configured models
§ Change property should be able to add a new component
§ Model properties should only be prognostic variables
§ Detailed sequencing of the model is not needed
§ Metadata from the netCDF files should be preserved and exposed in the CIM
o It’s important to get the controlled vocabulary lists urgently, but it’s not so urgent to get the controlled vocabulary into the CIM.
o The change property is now to be considered as a holding place as we’re not going to implement it for the moment.
o A coupler binds input and output data streams of components either to each other or to a physical file or parameter list. The data stream will have a type. The coupling connection should always be specified at the level of the parent of the two child components being coupled.
o The key question for the use cases for Metafor is “what do people want to search on?”
· The end users of the Metafor project are:
o Climate modellers
o Climate effects community – professional data users who use climate data (e.g. hydrologists, eecologists) and will use the model outputs
o Climate impact community – policy makers and resource managers
o For the second and third group, the usage will depend on the tools we create and the ease of navigation. We expect that the scientists will see different metadata than the policy makers. The policy makers will be dealt with by IS-ENES to a large extent.
o The interactions with these groups will affect the structure of the metadata. Metafor should be aware of this, but not try to solve it.
· Management and dissemination:
o We agreed to use the joomla site for storing admin documents, meeting presentations etc, and use the wiki and trac for all working documents.
o All issues need to be turned into tickets and reviewed regularly.
Issues:
· How deep does the CIM need to go for climate modelling/ being field-specific as opposed to being generalised to include other fields?
· Change structure (resolved Wed)
· CF information model
· Ingestion from data to CIM (THREDS etc and ESG Publisher)
· Access control (EGEEE versus ESG/NDG)
· Distributed database model
· Controlled vocabularies and UML (discussed during meeting but on-going)
· The BADC have a prototype Metafor portal website, but need instances of the CIM to build the infrastructure around.
· What do we include in the controlled vocabulary with regards to external dictionaries (e.g. CF standard names?)
· How much detail do we need to ask for in the questionnaire?
· Should there be mandatory questions in the questionnaire?
· When Metafor is finished, we need to have some governance of the CIM in place. How do we arrange this?
· How can Metafor and Curator touch base regularly to discuss priorities etc?
· We need to interact more closely with modellers from now on to communicate about controlled vocabulary, train them about the CIM and get their feedback.
· How should we interact with ESA and where should we put our resources for interaction?
· Is there a universal structure for controlled vocabulary we could define and work on, or is it just a flat list with the structure associated with a given activity (e.g. CMIP5)?
· Regarding Marie-Pierre’s work on translating controlled vocabulary into the CIM: discussion needs to continue regarding the fact that characteristic/coefficient/property are all very similar from the UML point of view.
· We need a procedure to map the information we get from the questionnaire onto the CIM.
· How do we make links between the model data and ancillary files?
· How do we ask about coupling information in the questionnaire?
· How do we create the repository that links CIM documents together?
· How do we use the quality package? (The structure of the package is detailed in Mark’s presentation)
Next telco: Thurs 19th February, 9.30 BT/10:30 CET
Notes by Eric Guilyardi and Sarah Callaghan, 17th February 2009
|