[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: SGML transfer format for discussion
Kate Ord wrote:
>
> In support of the implementation of the Metadata Guidelines, the ANZLIC
> Working Group on Metadata has developed an SGML transfer format for discussion
> and comment.
Unsurprisingly, metadata got a reasonable work-out at the
Australian WWW Technical Conference convened by DSTC in Brisbane
last week ( http://www.dstc.edu.au/aw3tc/ ). The general focus
was on "Dublin Core" and its decendents and refinements.
A couple of papers were presented which also included
geographic data in case studies (Boston, Cox).
My understanding of the current state-of-play/expectations
about how this will all be implemented contains the following layers:
various metadata semantics (DC, ANZLIC, FGDC ... and combos)
expressed using PICS-ng ("next generation", believe it or not!)
(attribute,value) syntax
encoded variously - but XML looks like getting the guernsey
as the main human-readable/web-retrievable format,
with z39.50 gateways providing compatibility with that universe.
I like the look of XML - it is another metalanguage like SGML,
but is more restricted/prescriptive than SGML so parsers will
be easier to implement etc, but of course being a metalanguage
it is much richer than HTML. It looks very familiar to anyone
who has written HTML. (Apparently Micro$oft are
planning to move to XML as their native document format for
applications, including MSWord! Sun's documentation has
already switched. Imagine, those guys getting together ...)
I can forsee a situation where it would be convenient to use
XML for all web document storage, including cases of both
embedded or detached metadata. Linking mechanisms for XML
are being very carefully designed. An HTTP server could then be
configured to process a document in response to the needs of the client
-
eg
* a "conventional" HTML document could be generated, or
* a PICS description of the document served, which might encode
* a Dublin Core description, &/or
* an ANZLIC description
extracted from the doc, and expressed again in XML syntax
Specific terms _may_ be equivalent between DC, ANZLIC, etc, in
which case they may only need to be stored once in the source
document, and mappings would be made using DSSL type processing
instructions, depending on what was requested.
At the client, XML coming in will also be processed by
an XML parser, with reference to the specified XML DTD.
To be honest, I'm still a little woolly about why the PICS
layer is inserted - it would not be hard to just encode
ANZLIC etc in XML without worrying about the extra mapping,
but everyone is talking PICS for general metadata, so I guess
there is some compatibility issue here.
Anyway - I guess my main messages are that
(i) HTML will be superceded for information storage -
though it may persist as a presentation language;
(ii) along with this, the <meta > HTML element
(always a kludge, particularly as it was so flexible
that it was difficult to control and ensure validity)
will be superceded by XML "containers", customised
for the particular metadata semantics;
(ii) for web applications SGML DTD's will be a thing
of the past, to be superceded by XML DTD's, which look
very similar, but are easier to write.
If anyone spots errors here, please shout!
--
__________________________________________________
Dr Simon Cox - Australian Geodynamics Cooperative Research Centre
CSIRO Exploration & Mining, PO Box 437, Nedlands, WA 6009 Australia
T: +61 8 9389 8421 F: +61 8 9389 1906 simon@ned.dem.csiro.au
http://www.ned.dem.csiro.au/unrestricted/people/CoxSimon/