wiki:tickets/281

when should references use XPath and when should they embed documents locally

This page provides documentation for ticket:281.

The reference stereotype can either use XPath to link to external documents or embed them locally at the referenced location. This allows a document containing several references to be contained as one single document instead of a confusing set of interlinked documents.

Storing multiple documents as a single entity is convenient, but those nested documents no longer have their own lifecycle and cannot be separately governed.

One proposal is to persist every CIM document as a separate artifact, but to export/transport them as a single artifact. This brings with it other issues. See Phil's email:

Hi Allyn,
 
Some thoughts below. Apologies if they're obvious.
 
Phil

  > From: trubliphone@googlemail.com [mailto:trubliphone@googlemail.com] On Behalf Of A. Treshansky
  > Sent: Monday, July 20, 2009 4:14 PM 

  > To: Elkington, Mark; Bentley, Philip
  > Subject: references - embedded vs. xquery

  > Mark, Phil,

  > Mark's question about how to link to existing documents raised a concern for me.  
  > What happens if you are creating a new CIM document which refers to an existing CIM document?  
  > Both of you had expressed concern about having a large set of interlinked references to documents 
  > and so I modified the CIM to support references which embed the referenced document content instead of just pointing to it via XQuery.
  > However, if it's embedded, it's no longer a separate document with its own lifecycle and it makes it harder (though not impossible)
  > for others to reference it.  
     
    In a way I think this is an example of the 'granularity' problem that we were just discussing during our conference call. 
    Our experiences seem to suggest that current XML technologies are not especially well-suited for persisting and managing large number of
    interrelated data granules/atoms. We (well, Mark) got around the problem, for the most part, by using good ole rdbms ... 
    but I'm beginning to sound like a stuck record!
     
  > One solution to this is to force all documents to be stored in repositories as separate documents but for all (or some specific sub-set of) 
  > referenced documents within a single document to be exported as embedded.   
  > That is a hierarchical set of components is persisted as a set of linked documents but transfered as a single large document.   
     
    I think if you were to rephrase that first sentence to read
    "...force all information granules/atoms to be stored in the repository as separate data objects..." you'd be there. 
    I don't think we should be storing documents; rather we should be storing 'records'. 
    As you hint at in the latter half of the sentence, documents are things you generate/export - they're presentational artifacts IMHO.
     
    IIRC, in my response to your recent sequence document, I suggested introducing a ClimateModel entity as a top-level <document> item. 
    This would be a document made up of 1..* ModelComponent definitions. The latter would NOT be documents, but simply data objects in 
    your repository (be it XML or RDBMS). One could extend this strategy across the CIM.


  > Of course, this has its own issues - such as where do the document-specific features of embedded documents come from 
  > (given that they were never entered into a repository as a separate document)?
     
    If you are intent on going down the XML persistence route then the only suggestion I can make is to move all the document-related stuff
    into xslt/mako/genshi/whatever templates and use plain XML to store raw content. 
    Then a document is defined by whatever search/report interface or scripts you provide to end-users. So here at MOHC we have...
     
    db-records + gridspec-mako-template => CIM GridSpec document
    db-records + model-mako-template => CIM Model document
    db-records + expt-mako-template => CIM Experiment document
    etc...
     
    We'll hide the bits on the left-hand side of these equations behind buttons in our web interface. 
    The mako templates define what a document is, not the content stored in the database. We have no documents in our database. 
    I suspect the ESG solution is similar, even if their notion of a document is simply the information you see in your web browser 
    when you perform a search.
      
  > I'm not sure if I'm making sense here or if I'm just confusing myself.  
  > The CIM currently supports both approaches - references via XQuery and references via local embedding.  
  > I think that this is the right approach.  But I need to bottom out when/how one approach is used instead of another. 
  > I think some sequencing [there's that word again] descriptions/diagrams would help here. 

  > Do either of you have an easy answer for this issue?  Or shall I raise it as a ticket?

  > Regards,
  > Allyn

I should add guidelines to the CIM development guide  http://metaforclimate.eu/trac/wiki/WP2/DevelopingTheCIM on when to use embedding and when to use external referencing... any suggestions?