[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SGML transfer format for discussion



Kate Ord wrote:
> 
> In support of the implementation of the Metadata Guidelines, the ANZLIC
> Working Group on Metadata has developed an SGML transfer format for discussion
> and comment.

Unsurprisingly, metadata got a reasonable work-out at the 
Australian WWW Technical Conference convened by DSTC in Brisbane 
last week ( http://www.dstc.edu.au/aw3tc/ ).  The general focus 
was on "Dublin Core" and its decendents and refinements.  
A couple of papers were presented which also included 
geographic data in case studies (Boston, Cox).  
My understanding of the current state-of-play/expectations 
about how this will all be implemented contains the following layers:

various metadata semantics (DC, ANZLIC, FGDC ... and combos)
expressed using PICS-ng ("next generation", believe it or not!) 
	(attribute,value) syntax
encoded variously - but XML looks like getting the guernsey 
	as the main human-readable/web-retrievable format, 
	with z39.50 gateways providing compatibility with that universe. 

I like the look of XML - it is another metalanguage like SGML, 
but is more restricted/prescriptive than SGML so parsers will 
be easier to implement etc, but of course being a metalanguage 
it is much richer than HTML.  It looks very familiar to anyone 
who has written HTML.  (Apparently Micro$oft are 
planning to move to XML as their native document format for 
applications, including MSWord!  Sun's documentation has 
already switched.  Imagine, those guys getting together ...)

I can forsee a situation where it would be convenient to use 
XML for all web document storage, including cases of both 
embedded or detached metadata.  Linking mechanisms for XML 
are being very carefully designed.  An HTTP server could then be 
configured to process a document in response to the needs of the client
- 
eg
* a "conventional" HTML document could be generated, or
* a PICS description of the document served, which might encode
	* a Dublin Core description, &/or
	* an ANZLIC description
extracted from the doc, and expressed again in XML syntax
Specific terms _may_ be equivalent between DC, ANZLIC, etc, in 
which case they may only need to be stored once in the source 
document, and mappings would be made using DSSL type processing 
instructions, depending on what was requested.  

At the client, XML coming in will also be processed by 
an XML parser, with reference to the specified XML DTD.  

To be honest, I'm still a little woolly about why the PICS 
layer is inserted - it would not be hard to just encode 
ANZLIC etc in XML without worrying about the extra mapping, 
but everyone is talking PICS for general metadata, so I guess 
there is some compatibility issue here.  

Anyway - I guess my main messages are that 
(i) HTML will be superceded for information storage - 
though it may persist as a presentation language;
(ii) along with this, the <meta > HTML element 
(always a kludge, particularly as it was so flexible 
that it was difficult to control and ensure validity) 
will be superceded by XML "containers", customised 
for the particular metadata semantics;   
(ii) for web applications SGML DTD's will be a thing 
of the past, to be superceded by XML DTD's, which look 
very similar, but are easier to write.  

If anyone spots errors here, please shout!  
-- 
__________________________________________________
Dr Simon Cox - Australian Geodynamics Cooperative Research Centre
CSIRO Exploration & Mining, PO Box 437, Nedlands, WA 6009 Australia
T:  +61 8 9389 8421   F:  +61 8 9389 1906   simon@ned.dem.csiro.au
http://www.ned.dem.csiro.au/unrestricted/people/CoxSimon/