[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
FW: NSDI_L NSDI: If not NOW, WHEN?
Sorry for the cross-posting, but I thought this message was a useful
one. When Doug was in Australia he commented on the difference between
the WWW and the Clearinghouse. This message expands upon those
thoughts.
Graham Baker
----------
From: Doug Nebert
To: nsdi_l@fgdc.er.usgs.GOV
Cc: clearhs_wg@fgdclearhs.er.usgs.gov
Subject: Re: NSDI_L NSDI: If not NOW, WHEN?
Date: Tuesday, 3 June 1997 8:03AM
Douglas Nebert wrote:
>
> Subject: NSDI_L NSDI: If not NOW, WHEN?
> Date: Mon, 02 Jun 1997 10:34:26 -0400
> From: Potter at Island Resources Foundation <bpotter@irf.org>
> To: nsdi_l@fgdc.er.usgs.GOV
> CC: bpoore@geochange.er.usgs.GOV
>
> Bill Thoen's "GIS Online" column for the June 1997 GIS WORLD is entitled "A
> Good Dataset is Hard to Find." He raises good questions about the
> usefulness of the FGDC Clearinghouse.
>
> His conclusion is that "the best tools I've found for finding resources
> online are the resource lists compiled by people who have no official
> mandate, but have knowledge and interest in the subject they focus on."
> Also praised are the regular search engines (Yahoo, Excite! and AltaVista),
> where Thoen says, "Using these I was able to find in about two minutes all
> the resources I couldn't find in the FGDC clearinghouses." (Which were such
> esoterica as the USGS Public Land Survey.)
>
The failure of Clearinghouse to find Public Land Survey was because
there was no entry for Public Land Survey in any of the registered
metadata collections. As of Friday there are now placeholder metadata
entries: one for each series of USGS, National Wetlands Inventory, and
Census data under the pseudo-server "Large Spatial Data Collections."
These will get the user some hits by pointing to the top-level directory
of the ftp/web site where one then navigates the hierarchies to see if
the data actually cover the area of interest. Organizations that take
the time to prepare metadata for each downloadable data set can make it
easier for the user to discover the data through formal fields such as
time period of content and precise geographic extent -- something
conventional and even cutting-edge Web indices cannot yet do.
> I think Thoen's comments about the "soft engineering approach" as an
> alternative to the cost of the NSDI make a lot of sense, in view of the
> products of the best efforts of a LOT of hard-working, dedicated folks over
> a not-inconsiderable period of time.
The notion that a brief visit to the Internet search engines reveals
everything one needs for GIS use is an oversimplification. My
experiences with search engines is that their contents are incomplete
(some agencies are there, some aren't), different search engines have
different content and index different sites, the level of detail or
access described is not consistent, links are broken or out-of-date, and
there are far too many spurious "hits" to wander through -- often on the
order of thousands. In Clearinghouse, the focus is on
geographically-referenced data so one has no spurious "hits," and links
will be kept current by the distributed nature of Clearinghouse search.
The goal of Clearinghouse is to provide a virtual focal point (like a
Lycos or AltaVista) that is tuned to spatial and temporal data, and to
allow queries against those properties that cannot be made in Excite,
for example. Because we have such knowledge of when and where phenomena
occur, we document them and make them searchable. One means of storing
these fields and making them searchable is via FGDC metadata, as we are
required to do as federal data custodians. You can make these same
properties be "exposed" to search against other metadata profiles or
organizations (Australia and New Zealand are doing this) so that your
fields map to the equivalent of FGDC fields, but the delivered metadata
looks different.
Remember also that there are (at least) two levels of purpose to
metadata. Clearinghouse is primarily interested in having access to a
handful of meaningful and searchable fields. Whereas these could be
satisfied by a brief catalog entry, many of us also have an obligation
to actually document the history and quality of the data set for advised
re-use, liability concerns, or just professional courtesy. When an
organization must both document their holdings and make those
descriptions searchable and deliverable, the use of FGDC metadata
fulfills both requirements through current Clearinghouse technology. If
an organization's requirements are only to prepare a catalog record then
obviously FGDC metadata seems like overkill.
To make it easier for non-federal sites to simply provide catalog
records for inclusion in Clearinghouse we have done several things of
interest and are testing some complementary technologies to broaden
participation:
-- MetaLite software was written as an HTML forms-based metadata
collector. Using about 20 fields (only the mandatories) of FGDC
metadata, one can submit and update a metadata entry using a Web
browser. The webserver script saves the entry as SGML, TEXT, and HTML
and kicks off Isite to make it immediately searchable. This is intended
to be installed at participating Clearinghouse sites to gather metadata
from the field using low-tech and familiar capabilities. Because the
entries are stored in SGML and TEXT they can later be loaded into a
different metadata editor and beefed up to a fuller record, if desired.
-- Website as Clearinghouse Node. Many sites do not want to install
Isite and would like to be able to serve their metadata both through
HTML pages and also to make the metadata files accessible via http.
FGDC and MEL have just prototyped a script that can be installed in a
webserver's cgi-bin directory that can be used by centralized sites
(like FGDC and other Gateway access points) to download a compressed
archive of all the website's FGDC metadata in SGML form and index it at
the gateway rather than at the original host website. Provided that
sites are willing to assemble parseable FGDC metadata at their website,
this gatherer technology may solve several problems for us.
-- Bridging to other metadata forms. Isite can be modified to
recognize
other tags and data structures but have them map to the same searchable
constructs as FGDC and related protocols like the Government Information
Locator Service. For people interested in building or linking to their
own search engines yet providing consistent access, we could formalize
field names using a cgi-bin "query" convention such as DAV is attempting
with W3C support. Other groups are talking about embedding metadata in
"META" tags within HTML so they can type-in their metadata twice on a
page and have it possibly be indexed by Internet search services. Some
communities are proposing that SQL be extended to the Internet as a
search and retrieval method for information resources we traditionally
don't think of in databases. The entertainment community is also under
the gun to provide a pre-emptive ratings service for potentially every
document/file on the Net. This system, called PICS, is basically
formalized ratings (metadata) on an object (dataset/file) so could
potentially be utilized to support search and retrieval of things we're
interested in.
All these examples are provided to illustrate that there is no consensus
yet -- and much experimentation -- as to how one best characterizes
files on the Internet for general and potentially specialized access.
Note that none of the advancing technologies sound much like "soft
engineering" but, with the right tools and lots of time, may become
mainstream. We will move with the appropriate technology to protect
peoples' investment in their data and metadata and to minimize costs of
collection and maintenance. In the meantime, to rely on conventional
general-purpose web indexing services misses the mark of being able to
discover organized data collections, interoperate with related
approaches in the OpenGIS, imagery, and library communities, and to
someday provide two-step access from discovery of data resources to
invocation of the data across the wire -- real Internet GIS.
We're all interested in making Clearinghouse a more useful, and used,
resource and we have some technical hurdles to leap before declaring it
a success. One hurdle is in having critical mass -- enough entries must
be there to make it a comprehensive and authoritative reference. Because
we are not yet at critical mass I have been reluctant to advertise the
Clearinghouse as formally open and ready for business. Another hurdle is
the slowdown in Internet speeds over the past six months that have been
making a distributed search of many sites often time out. We are
interested in making this work and with the gathering technology
mentioned above should have it work quickly very soon. The third hurdle
is reliability. We are actively engaging sites in becoming parallel
gateway sites to the FGDC to provide regional but identical access to
the indexed resources of Clearinghouse. This will allow a user to
complete a search even if they can't get to the FGDC. When these
features are in place we will invite Bill and the general GIS public to
come back and give it a spin again.
Doug Nebert
Clearinghouse Coordinator