[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SGML transfer format for discussion





In support of the implementation of the Metadata Guidelines, the ANZLIC
Working Group on Metadata has developed an SGML transfer format for discussion 
and comment.

In order to transcend email conversion problems, we have included 
the main documents as text within this email. These documents can also 
be picked up a Microsoft Word or text files from the ERIN ftp site 
(ftp://ftp.erin.gov.au/pub/sgml). They are described below:-

SGMLWHY.DOC
-----------
This document explains SGML, the transfer format and its potential 
for distributed directories. It includes an introduction to SGML. It 
also includes an explanation of why a standard format is required and 
why we have taken this particular path.

SGMLDTD.DOC
-----------
This is a critical document for SGML called the Document Type Declaration
or DTD. This defines the structure and elements of the SGML documents.

SGMLEG.DOC
----------
This document is an example of a marked up SGML document.

ANZMAP.DOC
----------
This document maps ANZLIC elements only to FGDC and GILS. It does not fully map
all FGDC to all GILS, only those elements shared with ANZLIC. This is Word Table document
and has not been included in the text of this message. It can be picked up from 
the ftp site.


Could you please send any comments to Michael Fox (michael.fox@ogdc.vic.gov.au) or 
Kate Ord (kateo@erin.gov.au) by C.O.B. on Friday 30th May.

Thanks for your consideration.

Kate and Michael

*******************************************************************************
SGMLWHY.TXT and SGMLWHY.DOC
*******************************************************************************
			PROPOSED SGML METADATA FORMAT


What is SGML?
-------------
Standard Generalized Markup Language, also known as SGML, became 
an International Organisation for Standardisation (ISO) standard in 1985 
(http://www.sgmlopen.org:80/sgml/docs/index.htm).  It is used to define 
the structure of ASCII text files or documents. It is concerned primarily 
with structure and not with the content of the document.  

It consists of text contained within a series fields called elements which 
are defined by tags at the beginning and end of each field. These tags are 
contained within triangular brackets, <>.  The beginning and ending tag 
contain the same name however the ending tag name is preceded by a 
forward slash, /. 

SGML is designed primarily for exchanging and not for direct viewing by 
the user.  But if you did look at an SGML file, what would it look like? 
Below is an example of an extract from an SGML document:-

	<ANZMETA>
	<TITLE>Eucalypts of Australia: 1996</TITLE>
	<ABSTRACT>
	<PARAGRPH>This data is a compilation of Eucalyptus species site 
	data from all over Australia. </PARAGRPH>
	</ABSTRACT>
	........
	</ANZMETA>

This format may look familiar to you. It is probably because you have 
looked at another group of marked up documents on the Internet called 
Hyper Text Markup Language or HTML documents.  HTML is a form of 
SGML document whose name tags have been specially defined and are 
recognised by a number of World Wide Web browsers.

The structure and elements contained within a standard SGML document 
are defined in another document file called a Document Type Declaration 
or DTD.  This DTD is read and used by programs such as word processing 
programs, indexing programs and metadata parsing programs to determine 
the structure of the SGML documents before it.

Why do we need to have a standard SGML format?
---------------------------------------------
The  main use of SGML is as an exchange format for inputting and 
outputting metadata entries. It is a very useful format for transferring 
metadata and presenting this information in a format which can be read by 
databases and programs for searching, checking, reporting and other 
functions. 

SGML also has the potential to be used as a core component of a national 
distributed directory system. SGML documents can be created directly or 
outputted from metadatabases.  These documents can then be indexed, 
searched and presented to the user via the World Wide Web using a range 
of technologies for searching distributed sites including Free WAIS and 
ISite.

Examples of this include two Internet data directories - the Marine and 
Coastal Data Directory of Australia - Blue Pages 
(http://www.environment.gov.au/cgi-bin/mcdd)
and Geographic Data Victoria’s data directory 
(http://www.gisnet.vic.gov.au/). 

Other SGML metadata standards
-----------------------------
The United States has initiated a number of metadata standards which are 
gaining international recognition including the FGDC’s GEO standard for 
describing Digital Geospatial Metadata 
(http://www.fgdc.gov/clearinghouse/docs/encoding.html) and the GILS or 
Global Information Locator Service standard for describing information 
resources (http://info.er.usgs.gov/gils/index.html). Both of these standards 
have chosen to use SGML as a standard of transfer and have defined their 
own DTDs.  By trying to use consistent metadata SGML tags wherever 
possible, we greatly enhance our ability for interoperability in directory 
searching.

The ISO/TC 211 Working Group on Geographic Information is also 
currently working on the development of an international metadata 
standard which incorporates SGML format as a standard transfer format 
for input and output of metadata entries

Key Elements of the proposed ANZLIC SGML format
-----------------------------------------------
As much as possible, the DTD reflects the structure of the ANZLIC Core 
Elements as outlined in the "Metadata Guidelines - Version 1.0", July 
1996. However at times the DTD departs from this structure to increase its 
searchability across the Internet.  

The DTD has a number of key elements which need to be highlighted 
when evaluating its usefulness.

1. Simple structure - The DTD is very simple with no use of attributes 
and entities.  The primary aim of this SGML format is to structure the 
text and not check its content for valid entries. This function will be 
performed by the databases in which it is stored. This approach echoes 
the approaches of other metadata SGML standards.  It also allows 
maximum flexibility for any format or other changes which might occur 
as the ANZLIC metadata guidelines are reviewed and improved.
 
2. Eight character tag names - The eight character tag names are 
condensed to comply with some SGML program limits and to reduce 
the size of the SGML document.  As much as possible, the DTD uses 
the existing GEO tags. The key for tag names and full element names is 
documented at the beginning of the DTD.
 
 One of the advantages of this approach is that ANZLIC and FGDC 
systems and SGML documents are instantly compatible. This enhances 
the interoperability of our systems.  
 
 The major disadvantage however is that these names are harder to 
understand and may be less accessible to the reader of the SGML 
document itself.
 
3. Additional elements - There are a number of additional elements 
which have been included for the purposes of structuring the ANZLIC 
elements. The  unique ID has been included for national directory 
purposes. 
 
There are also four additional elements, the Bounding Coordinates, 
which give summary level geographic information which can be used 
when performing spatial searches on the SGML documents across the 
Internet.
 
The ANZLIC fields "Custodian" and "Jurisdiction" together form the 
key organisation responsible for the data. This concept of custodianship 
is unique to Australia. While these fields do not map exactly to the 
FGDC and GILS field "Originator", this is a key search field across 
international directories. For implementation reasons, an additional 
element origin has been proposed made up of custod and jurisdic.

Process for developing the proposed ANZLIC SGML format
------------------------------------------------------
A number of members of the Working Group have been consulting with 
Australian and overseas experts to put together a proposed format for 
SGML.  It is hoped that this format will be used in conjunction with the 
ANZLIC Working Group's "Metadata Guidelines -Version 1.0" to enable 
easy transfer of metadata and contribute to national directory initiatives.

The ANZLIC Metadata Working Group now submits the following DTD 
and example SGML format for review by users on the ozmeta-l discussion 
list.  Please send comments to the following members by Friday 30th May 
1997:

Kate Ord
ERIN Regional Information Section
Environment Australia
email: kateo@erin.gov.au
ph:  06 274 1201
fax: 06 274 1333

or 

Michael Fox
Geographic Data Victoria
email: Michael.Fox@ogdc.vic.gov.au
ph:  03 9603 9041
fax: 03 9603 9199

*******************************************************************************
SGMLDTD.TXT and SGMLDTD.DOC
*******************************************************************************
	
<!DOCTYPE ANZMETA [

<!-- Name: 			ANZMETA DTD Version 1.0 -->
<!-- Purpose:			This Document Type Declaration defines the ANZLIC Metadata -->
<!--				Core Element Structure for use with SGML compliant parsers,   -->
<!--				viewers and  other tools.  The DTD aims to be compatible with   -->
<!--				other Z39.50 Metadata profiles. -->
<!-- Reference:			The Australia New Zealand Land Information Council - Metadata 
<!--				Guidelines July 1996. -->
<!-- Date:			1.5.1997 -->
<!-- Author:			ANZLIC Working Group on Metadata. -->
<!-- File Ref:			sgmldtd.txt -->

<!-- Key to SGML codes -- >
<!-- ANZLIC elements:		Dataset			=	citeinfo	-- >
<!--				Unique Record ID	=	uniqueid	-- >
<!--				Title			=	title		-- >
<!--				Originator		=	origin 		-- >
<!--				Custodian		=	custod 		-- >
<!--				Jurisdiction		=	jurisdic 	-- >
<!--				Description		=	descript	-- >
<!--				Abstract		=	abstract	-- >
<!--				Search Word(s)		=	themekey	-- >
<!--				Geographic Extent Name =	placekey	-- >
<!--				Geographic Extent Polygon =	dsgpolyo	-- >
<!--				Data Currency		=	timeinfo	-- >
<!--				Beginning Date		=	begdate		-- >
<!--				Ending Date		=	enddate		-- >
<!--				Dataset Status		=	status		-- >
<!--				Progress		=	progress	-- >
<!--				Maintenance & Update	=	update		-- >
<!--					Frequency				-- >
<!--				Access			=	distinfo	-- >
<!--				Stored Data Format	=	native		-- >
<!--				Available Format Type	=	avlform 	-- >
<!--				Access Constraints	=	accscons	-- >
<!--				Data Quality		=	dataqual	-- >
<!--				Lineage			=	lineage		-- >
<!--				Positional Accuracy	=	posacc		-- >
<!--				Attribute Accuracy	=	attracc		-- >
<!--				Logical Consistency	=	logic		-- >
<!--				Completeness		=	complete	-- >
<!--				Contact Information	=	cntinfo		-- >
<!--				Contact Organisation	=	cntorg		-- >
<!--				Contact Position	=	cntpos		-- >
<!--				Mail Address 1	& 2	=	address		-- >
<!--				Suburb or City		=	city		-- >
<!--				State			=	state		-- >
<!--				Country			=	country		-- >
<!--				Postcode		=	postal		-- >
<!--				Telephone		=	cntvoice	-- >
<!--				Facsimile		=	cntfax		-- >
<!--				Electronic Mail Address	=	cntemail	-- >
<!--				Metadata Date		=	metd		-- >
<!--				Additional Metadata	=	supplinf	-- >
<!-- Additional elements:	Bounding Coordinates	=	bounding	-- >
<!--				North Bounding Coordinate =	northbc		-- >
<!--				South Bounding Coordinate =	southbc		-- >
<!--				East Bounding Coordinate =	eastbc		-- >
<!--				West Bounding Coordinate =	westbc		-- >
<!--				Longitude		=	long 		-- >
<!--				Latitude		=	lat 		-- >
<!-- Structural elements:	Paragraph		=	paragrph	-- >
<!--				Item			=	item		-- >
<!--				List			=	list	 	-- >

<!-- Defining special characters -->
<!ENTITY % ISOnum PUBLIC		"ISO 8879:1986//ENTITIES Numeric and Special Graphic//EN">
<!ENTITY % ISOlat1 PUBLIC		"ISO 8879:1986//ENTITIES Added Latin 1//EN">
<!ENTITY % ISOgrk1 PUBLIC		"ISO 8879:1986//ENTITIES Greek Letters//EN">
<!ENTITY % ISOpub PUBLIC		"ISO 8879:1986//ENTITIES Publishing//EN">

<!-- Structure Model -->
<!ELEMENT anzmeta			- -	(citeinfo, descript, bounding, timeinfo, status, distinfo, 
						 dataqual, cntinfo+, metd, supplinf)>

<!-- High Level Elements -->
<!ELEMENT citeinfo			- - 	(uniqueid, title, origin)>
<!ELEMENT descript			- -	(abstract, themekey+, (placekey+ | dsgpolyo+))>
<!ELEMENT bounding			- -	(northbc, southbc, eastbc, westbc)>
<!ELEMENT timeinfo			- -	(begdate, enddate)>
<!ELEMENT status			- -	(progress, update)>
<!ELEMENT distinfo			- -	(native+, avlform+, accscons)>
<!ELEMENT dataqual			- -	(lineage, posacc, attracc, logic, complete)>
<!ELEMENT cntinfo  			- - 	(cntorg, cntpos, address, address?, city, state, 
						 country, postal, cntvoice,cntfax, cntemail)>
<!ELEMENT metd				- - 	(#PCDATA | (day?, month?, year))>
<!ELEMENT supplinf			- -	(paragrph+)>

<!-- Component definitions -->

<!-- Dataset-->
<!ELEMENT uniqueid			- -	(#PCDATA)>
<!ELEMENT title				- -	(#PCDATA)>
<!ELEMENT origin			- -	(custod, jurisdic)>
<!ELEMENT custod			- -	(#PCDATA)>
<!ELEMENT jurisdic			- -	(#PCDATA)>

<!-- Description -->     
<!ELEMENT abstract			- -	(paragrph+)>
<!ELEMENT themekey			- - 	((#PCDATA, qualifr?)+)>
<!ELEMENT qualifr			- - 	(#PCDATA)>
<!ELEMENT placekey			- - 	(#PCDATA)>
<!ELEMENT dsgpolyo			- - 	(long, lat, long, lat, long, lat, (long, lat)+)>

<!-- Bounding Coordinates -- >
<!ELEMENT (northbc | southbc | eastbc | westbc) - -  (#PCDATA)>
<!ELEMENT (long | lat)			- -	(#PCDATA)>

<!-- Data Currency -->
<!ELEMENT (begdate | enddate)		- -	(#PCDATA | (day?, month?, year))>
<!ELEMENT day				- - 	(#PCDATA)>
<!ELEMENT month				- - 	(#PCDATA)>
<!ELEMENT year				- - 	(#PCDATA)>

<!-- Dataset Status -->
<!ELEMENT progress			- -	(#PCDATA)>
<!ELEMENT update			- - 	(#PCDATA)>

<!-- Access -->
<!ELEMENT (native | avlform)		- -	(#PCDATA)>
<!ELEMENT accscons 			- -  	(#PCDATA)>

<!-- Data Quality -->
<!ELEMENT lineage			- -	(paragrph+)>
<!ELEMENT posacc			- -	(paragrph+)>
<!ELEMENT attracc			- -	(paragrph+)>
<!ELEMENT logic				- -	(paragrph+)>
<!ELEMENT complete			- -	(paragrph+)>

<!-- Contact Information -->
<!ELEMENT cntorg 			- -	(#PCDATA)>
<!ELEMENT cntpos			- - 	(#PCDATA)>
<!ELEMENT address			- -	(#PCDATA)>
<!ELEMENT city				- -	(#PCDATA)>
<!ELEMENT state				- -	(#PCDATA)>
<!ELEMENT country			- -  	(#PCDATA)>
<!ELEMENT postal			- -  	(#PCDATA)>
<!ELEMENT cntvoice			- -	(#PCDATA)>
<!ELEMENT cntfax			- -	(#PCDATA)>
<!ELEMENT cntemail			- -	(#PCDATA)>

<!ELEMENT paragrph			- -	(#PCDATA) + (list)>
<!ELEMENT list				- -	(item+)>
<!ELEMENT item				- -	(#PCDATA)>

<!-- End of ANZMETA DTD Version 1.0 --> ]>


*******************************************************************************
SGMLEG.TXT and SGMLEG.DOC
*******************************************************************************

<ANZMETA>
<CITEINFO>
<UNIQUEID>
ANZCW0301000001
</UNIQUEID>
<TITLE>
AVHRR NDVI fortnightly series covering continental Australia at full resolution.
</TITLE>
<ORIGIN>
<CUSTOD>
Bureau of Meteorology
</CUSTOD>
<JURISDIC>
Australia
</JURISDIC>
</ORIGIN>
</CITEINFO>
<DESCRIPT>
<ABSTRACT>
<PARAGRPH>
AVHRR NDVI (Normalized Difference Vegetation Index) fortnightly series covering continental 
Australia at a 1 kilometre resolution. NDVI is a measure of the absorption of red light by plant 
chlorophyll and the reflection of infrared radiation by water -filled leaf cells. Its values broadly 
measures the density of active foliage.
</PARAGRPH>
</ABSTRACT>
<THEMEKEY>
AGRICULTURE Biodiversity
</THEMEKEY>
<THEMEKEY>
ATMOSPHERE Management
</THEMEKEY>
<THEMEKEY>
ATMOSPHERE Pressure Monitoring
</THEMEKEY>
<THEMEKEY>
CLIMATE AND WEATHER Mapping
</THEMEKEY>
<THEMEKEY>
HAZARDS Pests
</THEMEKEY>
<DSGPOLYO>
<LONG>
112.5
</LONG>
<LAT>
-10
</LAT>
<LONG>
154
</LONG>
<LAT>
-10
</LAT>
<LONG>
154
</LONG>
<LAT>
-44
</LAT>
<LONG>
112.5
</LONG>
<LAT>
-44
</LAT>
<LONG>
112.5
</LONG>
<LAT>
-10
</LAT>
</DSGPOLYO>
</DESCRIPT>
<BOUNDING>
<NORTHBC>
-10
</NORTHBC>
<SOUTHBC>
-44
</SOUTHBC>
<EASTBC>
154
</EASTBC>
<WESTBC>
112.5
</WESTBC>
</BOUNDING>
<TIMEINFO>
<BEGDATE>
<DAY>
01 
</DAY>
<MONTH>
Apr 
</MONTH>
<YEAR>
1991
</YEAR>
</BEGDATE>
<ENDDATE>
Current
</ENDDATE>
</TIMEINFO>
<STATUS>
<PROGRESS>
In Progress
</PROGRESS>
<UPDATE>
Monthly
</UPDATE>
</STATUS>
<DISTINFO>
<NATIVE>
DIGITAL Unsigned 8 bit Generic Binary
</NATIVE>
<AVLFORM>
DIGITAL GIF Image
</AVLFORM>
<ACCSCONS>
Environment Australia internal use only.
</ACCSCONS>
</DISTINFO>
<DATAQUAL>
<LINEAGE>
<PARAGRPH>
These are the basic datasets received from  Marine laboratories.
</PARAGRPH>
</LINEAGE>
<POSACC>
<PARAGRPH>
Positional error should not exceed 1km for the vast majority of pixel centres.
</PARAGRPH>
</POSACC>
<ATTRACC>
<PARAGRPH>
As determined by CSIRO processing of spectral response and computation of NDVI, see 'AVHRR 
DOCUMENTATION' held by Neil Freeman (ERIN).
</PARAGRPH>
</ATTRACC>
<LOGIC>
<PARAGRPH>
The method of aggregation employed selects the maximum fortnightly value for output to each 
monthly pixel, so as to minimize atmospheric contamination - thus the monthly pixel values are cloud 
free.
</PARAGRPH>
</LOGIC>
<COMPLETE>
<PARAGRPH>
All datasets are spatially complete - there are no missing sections. The data is temporally incomplete 
- March to November 1994 are missing.
</PARAGRPH>
</COMPLETE>
</DATAQUAL>
<CNTINFO>
<CNTORG>
Environmental Resources Information Network (ERIN), Environment Australia
</CNTORG>
<CNTPOS>
Scientific Coordinator - Remote Sensing
</CNTPOS>
<ADDRESS>
GPO Box 787
</ADDRESS>
<CITY>
Canberra
</CITY>
<STATE>
ACT
</STATE>
<COUNTRY>
Australia
</COUNTRY>
<POSTAL>
2620
</POSTAL>
<CNTVOICE>
+61 6 274 1203
</CNTVOICE>
<CNTFAX>
+61 6 274 1333
</CNTFAX>
<CNTEMAIL>
shane@erin.gov.au
</CNTEMAIL>
</CNTINFO>
<METD>
<DAY>
21 
</DAY>
<MONTH>
Nov 
</MONTH>
<YEAR>
1996
</YEAR>
</METD>
<SUPPLINF>
<PARAGRPH>
Documentation of the Normalized Difference Vegetation Index (AVHRR) AVHRR data can be found at:-
<LIST>
<ITEM>
Folio held by Neil Freeman (ERIN)
</ITEM>
<ITEM>
On-line documentation - http://www.environment.gov.au/land/monitoring/ndvi.html
</ITEM>
</LIST>
</PARAGRPH>
</SUPPLINF>
</ANZMETA>