National Vegetation Information System Taxonomic Review

National Vegetation Information System
Centre for Plant Biodiversity Research
Department of the Environment and Heritage, 2004

4.0 Discussion and recommendations

Discussion and recommendations on taxonomic issues, database linkages and database structure are provided. It should be noted that many of these recommendations involve work that is already being effectively done by NVIS collaborators. The recommendations are more an attempt to develop a consistent framework that NVIS managers and collaborators can use for developing and sharing data.

4.01 General principles

4.011 The taxonomic process

The complex nature of botanical taxonomy has resulted in problems over time that knowledge managers find difficult to address. There is no magic bullet for taxonomic problems found in biodiversity databases and the NVIS database is no exception. The dynamic nature of the taxonomic process means that data improvement in this area will take many years to complete and will never finish completely. The taxonomic data within NVIS will improve over time with the gradual improvement of data quality in State-based projects and between the States and the NVIS database. Improvement of data quality will depend on constant and on-going feedback cycles between data custodians, stakeholders and clients. Thus taxonomic maintenance will need to be an on-going and integral part of NVIS. The following sections outline some of the developments that should assist NVIS managers and collaborators in this process.

4.012 The 'Consensus Census'

As part of the Australia's Virtual Herbarium (AVH) project, the Council of the Heads of Australia Herbarium (CHAH) has initiated a project to develop an agreed national consensus taxonomy of Australian plants against which State-based censuses can be mapped. With most plant names (estimated to be c. 95%) there will be agreement between States, but there will be instances of genuine differences of interpretation and NVIS will have to deal with these. It is recommended as a priority, that NVIS participants form close relationships with their State herbaria so that they may receive regular and timely notification of changes in taxonomy (recommendation 1). The Australian Plant Name Index (APNI) is to be a foundation for this 'Consensus Census'.

4.013 Reference to a common taxonomic standard

It is also recommended that NVIS adopt a common/standard database for plant nomenclature and taxonomic validation and that this be applied at all levels from data gathering to data integration. To take advantage of the taxonomic work taking place as part of the national 'Consensus Census', it is recommended that the Australian Plant Name Index/What's Its Name (APNI/WIN) database and its contributing State censuses should be used for this purpose (recommendation 2).

4.014 Other national databases

The Federal Department of the Environment and Heritage (DEH) runs a number of national databases (including plant names databases) to serve various purposes. Of particular relevance is the SPRAT database, which is found on the same Oracle backbone as NVIS. The plant data in SPRAT is sourced from APNI/WIN, although there are still some issues of discrepancies between the two databases. Ideally both SPRAT and NVIS should not hold taxonomic information, but pick this up through direct links to APNI/WIN. In reality, for operational reasons, it is more efficient to store a local extract of relevant information and update this regularly. To avoid proliferation of redundant information and problems of inconsistency, it is recommended that NVIS use the SPRAT taxonomy files for validation and that links between SPRAT and APNI/WIN be improved to allow SPRAT to be updated rapidly in response to changes in taxonomy (recommendation 3). This linkage is discussed in greater detail in Section 4.3.

4.015 Common names

While useful at a local level, common names are problematic when employed at a national level as the same common name may not be used across all States or the same common name may be applied to different taxa in different States. Common names should have no role in the NVIS Veg_Description table (recommendation 4); if they are required for presentational or interpretative purposes, they can be picked up from APNI/WIN and other taxonomic databases. Despite some taxonomic problems (see Section 4.011), the use of scientific names will provide greater consistency between NVIS stakeholders.

4.016 Linking managers and custodians of taxonomic data

State and Commonwealth herbaria are custodians and managers of plant taxonomic information. These herbaria have a mandate, the staff, expertise and resources to manage this data effectively. In many instances, NVIS surveys and data management is undertaken by agencies other than those responsible for maintaining the census of plants in the State. In all cases it is recommended that strong links be forged between NVIS data custodians and local State or Territory herbaria to minimise the risks of working with outdated or inappropriate taxonomy (recommendation 1). Sections 4.21 and 4.22 will discuss these linkages in greater detail.

A more consistent approach to database structure and data exchange is also recommended (recommendation 5) by following the standards and protocols outlined in HISPID (Conn 1996). Section 4.23 outlines the benefits to NVIS managers and collaborators in adopting a more structured, uniform approach to nomenclatural data. The atomizing of plant name data into single elements is also recommended (recommendation 22), to simply the comparison of data. This recommendation also has the benefit or separating taxonomic opinion from the simple nomenclatural elements, the separation of qualification from the end of specific epithets being a case in point (recommendation 14) (see Sections 4.143, 4.41).

4.1 Taxonomic issues and database structure

Section 3 and Table 3.1 details a wide range of taxonomic issues encountered in the Taxon_Lists dataset. Discussion and recommendations on each of these issues will be detailed. For the most part, these solutions involve a simple editing of existing records. Most of these solutions are presented as a brief set of instructions in the CPBR_SOLUTION field as described in Section 2.23 and as shown in Appendix F.

Apart from examination of the data, comment was sought on the database architecture. As discussed in Section 2.31, an understanding of the nature and extent of taxonomic issues was needed before useful advice could be provided on the database structure. It is impossible to talk about the taxonomic issues and the database structure separately as they are so strongly interlinked. Thus, recommendations relating to many of the taxonomic issues will include recommendations relating to the structure of the dataset. A number of the solutions relate to different ways of storing information and by enhancing structural links to other databases.

4.11 Record duplication

There was considerable duplication of taxa within the Taxon_Lists dataset as discussed in Section 3.11. 3,684 taxa are represented by 5,447 records in Taxon_Lists; these records used to generate NVIS Vegetation Descriptions. Each record was assessed for duplication, marked in the MASTER_RECORD field with "M" for master records, "D" for duplicates and blank for unique records.

The recommendation for dealing with this duplication is consolidation of the records used to generate the records in the Veg_Descriptions table (recommendation 6). Duplicate records, apart form the "D" marking, were also accompanied by referral statements in the CPBR_SOLUTION field (see Section 3.11). All duplicate records that are linked to Vegetation Descriptions should be shifted to records marked as master records. The referral statement provides the TAXDSC_ID number to help find the master record. Unique, unmarked records can stay as is, no consolidation being needed.

Most duplication was found when comparing data provided by one State with that of another. This problem could be reduced, even eliminated, by comparing NVIS data at the custodial level with a nationally agreed data set, such as APNI/WIN and the developing 'Consensus Census' (see Section 4.012). Comparison against official State censuses would also help reduce duplication, and the effectiveness of this would increase as each State and national census resolved their inconsistencies.

The large group case studies (Section 3.3) also showed that there was duplication of names when examining the data of a single State. This problem could be reduced, even eliminated, if NVIS custodians were to compare their data against a standard list or State (or national) census.

It is not recommended in the short term that duplicate records be discarded as they represent unique contributions from each of the States (recommendation 7). Although the names may appear to be the same, they may in fact apply to different taxonomic concepts (e.g. a species in a broad or in a narrow sense). In most cases it will be quite acceptable to combine records with the same name, but in some cases it may not. The resolution of these inconsistencies can be quite complex and rely on specialist knowledge and database structures. Databases are being developed to track taxonomic concepts and it may be possible in the future to link taxa in NVIS with databases such as APNI/WIN; unique identifiers exist in both databases and may help to facilitate this.

In rare cases involving historical taxonomy, the same name has been given to two completely different and distinct plant species. These duplicate names are known as homonyms and are unlikely to occur in NVIS data (see Section 4.15).

4.12 Infraspecific name issues

If was found of the 538 records that related to infraspecific taxa, 42% of them had rank identifiers that were misspelt. Considerable variation was encountered, largely relating to how "subspecies" and "variety" are abbreviated in the INFRA_SPECIES_RANK field. Table 3.12 documents this variation along with the recommended forms of abbreviation.

The short term solution for these records is to correct the typographic errors following the instructions written in the CPBR_SOLUTION field (recommendation 8). Greater care is needed with the quality of data provided by NVIS collaborators. By checking the spelling of the rank identifier is consistent, future data transfers will be more easily linked with other states records. The abbreviations recommended (recommendation 24) are the same as used in the Australian Vegetation Attribute Manual (ESCAVI 2003) and we recommend State data providers refer to this publication for further guidance (recommendation 23).

Section 3.12 also discussed the presence of records form South Australia that possessed an infraspecific rank identifier, but no infraspecific name. It was speculated that these names were lost during the data transfer process. The advice provided in the CPBR_SOLUTION field is to refer these records back to their data source, in this case South Australia. The data provider should be able to check what data was lost and provide the correct infraspecific name. If no name can be provided, the rank identifier should be removed, taking the record back to a species level identification (recommendation 9).

Overall the presence of records identified to infraspecific level in the NVIS database is considered to be worthwhile. These records are informative to a finer level than records identified only to species level. Furthermore, well-recognised taxa at the species level are routinely reclassified as varieties or subspecies of another species. The reverse is often true where varieties and subspecies are elevated to the rank of species as a result of further taxonomic research or new evidence. This is part of the normal taxonomic process and many examples exist within current NVIS data. For NVIS the important issue is that a taxon is recognisable and distinct and that the correct name is applied to it, not the rank of the name. To choose arbitrarily not to use taxa below a certain rank cannot be justified scientifically and seriously limits the value and utility of NVIS. It is recommended that were possible and appropriate, infraspecific level taxa be used (recommendation 10). Linking to State and national censuses will be extremely beneficial in this regard.

4.13 Synonymy

A total of 69 records were found to possess synonymous or non-current names when compared to the APNI/WIN database, as discussed in Section 3.13.

The immediate solution to the synonymy found in Taxon_Lists is to change the records used to generate Vegetation Descriptions from the synonymous names to the name written in the CPBR_WIN_NAME field (recommendation 11).

To improve the NVIS taxonomic currency it is recommended that a closer linkage to the APNI/WIN database be established (recommendation 3). When a change in the name of a taxon is accepted in APNI/WIN, this change could be communicated to NVIS. APNI/WIN attempts to keep track of the consensus view of what is a plant's current name. As taxonomic understanding changes and is published in botanical journals, APNI/WIN is updated and this new change is made available online (www.anbg.gov.au/win).

Alternate names and synonymy are real issues when comparing datasets between herbaria. APNI/WIN whilst being a good nomenclatural database, it currently represents the taxonomic opinion of only one institution, the CPBR. It is hoped that the 'Consensus Census' (Section 4.012) will go to the next level, establishing closer ties between all government herbaria (recommendation 1), resulting in a consensus view of plant name currency, with synonymy being part of this.

4.14 Other taxonomic problems

Apart from issues relating to record duplication, infraspecific taxa and synonymy, eight other taxonomic issues were identified in the Taxon_Lists dataset. Tables 3.1 and 3.14 provide details of these issues and their frequency. The solutions to addressing these issues are outlined in the following subsections.

4.141 Alternate family name

There were 415 cases where the family name found in Taxon_Lists did not match what is used in the APNI/WIN database. The three reasons for this mismatch are described in Section 3.141. Some cases related to a valid alternate taxonomic view, others related to simple typographic errors.

While the reasons for variation are mixed, the short term solution recommended for each of these issues is the same; the alternate names should be replaced with the family names written in the CPBR_SOLUTION field (recommendation 12). While some of the family names are valid alternate taxonomic viewpoints, in the interests of simplifying the dataset, a consensus view should be adopted. The advantages of this should be self evident for groups such as Acacia, where the species for this genus listed in the NVIS Taxon_Lists table are currently linked to several family names (see Table 3.141a).

As for the long term solution, it is recommended as a priority, that NVIS data providers ignore family names and that NVIS data import routines discard family name data on import (recommendation 13). Family names will be retrofitted after import from a standard database such as APNI/WIN when/if they are need for a particular purpose. NVIS data custodians can continue to use family names locally if they feel a particular local need, but need to be aware that these will be discarded and replaced as part of data integration at the national level.

4.142 Missing family name

There was only one record found that had the FAMILY field blank. For the short term, the correct family name should be added to this record, as written in the CPBR_SOLUTIONS field (recommendation 12).

Longer term, recommendation 13 outlined in Section 4.141 for handling family names will accommodate completely those situations where a family name is missing.

4.143 Name qualifier present

Section 3.143 provided commentary on some of the difficulties found in plant taxonomy. Five different qualifiers were found in the Taxon_Lists dataset (see Table 2.22) to attempt to account for this complexity. Records originating from Victorian and South Australian data providers had qualifiers following the species epithet in the SPECIES field. Herbarium specimen databases such as ANHSIR (http://www.anbg.gov.au/ cgi-bin/anhsir) provides separate fields for name qualification, in accordance with HISPID standards (Conn 1996), see Section 4.23 for more details.

The recommendation for qualifiers is that they be moved into a new QUALIFIER field (recommendation 14). The present lumping of name and qualifier in the same field makes it difficult to compare data sourced from different States. This separation of species name from the qualification will allow for easier comparison, qualification can then also be ignored where appropriate. The establishment of this new field will help NVIS Taxon_Lists become more consistent with the fields outlined in the HISPID standards (Conn 1996), which are used by all government herbaria. This will allow for a more streamlined data exchange process between NVIS collaborators and managers (see Section 4.23). It should also be noted that SPRAT currently has no qualifier field, for consistency it is recommended that this field also be added to this database.

4.144 Double epithets

Records that possessed double epithets in the SPECIES field are discussed in Section 3.144. Hybridisation and identification difficulties were speculated as possible reasons for double epithets. "Refer to origin" was added to the CPBR_SOLUTION column, as source clarification is needed. It is recommended that referral back to the host institution be made to enable the appropriate checking of the original data source (recommendation 15). This would likely result in a change in the SPECIES field data, hopefully with a reduction to one epithet.

4.145 Status unknown

Taxon_Lists was found to have 72 records identified to either genus or family level; followed by some form of species qualifier in the SPECIES field (see Section 3.145). With this qualification it was difficult to determine whether a single species or multiple species were being referred to. The currency in the CPBR_WIN_CURR field was listed as being "unknown". APNI/WIN relates to definitive names, not vague, qualified taxa.

There are a number of options with these qualified names, keeping them as is would be considered an unsatisfactory solution. The preferred short term option would be to remove the qualification in the SPECIES field, leaving the identification at the genus (or family) level as shown in Table 4.145 (recommendation 16). This option will help such records better compare to other States data and other botanical databases like APNI/WIN.

Table 4.145 Species qualifier correction
  Family Genus Species
Before Juncaceae Juncus spp.
       
After Juncaceae Juncus  

The more difficult but more informative solution would be to try and determine what the species being referred to, actually are. These records would need to be referred back to the State data providers, who in most cases are Tasmania, South Australia or Victoria (see Table 3.14). This might involve rechecking of herbarium specimens or resurveying sites in the attempt to improve the quality of past species lists. Future data provided by State collaborators will hopefully be of a superior quality, possessing fewer of these vague qualified records. It is recommended where possible these records be referred back to the host institution (recommendation 17).

Another, less desirable option would be to change all the qualifiers to make them more consistent. The variation found in Table 3.145 could be rationalised, all "species" qualifiers being abbreviated to a standard "spp.". It should be noted that this abbreviation relates only to multiple species within a genus (or family) and only abbreviated in this way for these cases.

4.146 Name misspelt

The six examples of typographic errors discussed in Section 3.146 were listed in the CPBR_WIN_CURR field as being "not current". The recommendation for these errors is to replace these names with the correct spellings found in the relevant CPBR_WIN_NAME fields (recommendation 18).

4.147 Phrase names

Twelve examples were found within the Taxon_Lists dataset as being correctly cited phrase names (Section 3.147). These names were not found in the APNI/WIN database they were marked in the CPBR_WIN_CURR field as being unknown. This exercise has helped CPBR database managers to realise that it has a number of gaps in APNI/WIN when it comes to phrase names. No editing of these names is required in Taxon_Lists until the taxa are formally published; however APNI/WIN needs updating with the twelve NVIS phrase names.

There were other cases where phrase names have subsequently been formally described and published. These for the purposes of this assessment were treated as synonyms, being linked in the CPBR_WIN_NAME to the current names.

4.148 Non-plant taxon

There was one fungal record identified and discussed in Section 3.148. As the NVIS database relates to plant profiles, the recommendation is that this record be removed for the Taxon_Lists dataset (recommendation 19).

4.15 Author Name issues

The Taxon_Lists dataset possesses two columns that relate to plant name authors, AUTHOR and INFRA_AUTHOR. These fields when surveyed were found to possess three types of authority issues; authority missing, author incorrect and author name in wrong field.

These issues were discussed in Section 3.2 where it was revealed 25% of the records assessed had some form of author inconsistency or error. The correction of these issues and keeping author data accurate in the future is considered to be a high maintenance task. The recommendation is that the two author columns be removed from the NVIS database and from NVIS data interchange formats (recommendation 20). Specialist plant name databases such as APNI/WIN are dedicated to keeping such nomenclatural information current and can be used as sources of this information if it is needed. For NVIS to maintain a separate list of authors in parallel to APNI/WIN would be double handling and a poor use of resources. The general recommendation is that author names are not needed for databases such as NVIS and that to use, check and maintain them is an unnecessary and expensive overhead.

Author names serve little purpose outside the arcane world of historical taxonomy where they constitute a flag to indicate the same name has been used for different taxa. The only risk or removing authors from NVIS is nomenclatural homonyms would be harder to detect. Homonyms are cases where a botanist describes a new species using a name that is already preoccupied, applied to another species. Authors are useful in detecting these cases, as they will identify the different botanists involved. The risk of missing homonyms in the NVIS context is considered to be negligible, as they were more an issue is the early years of plant taxonomy. Today homonyms are rare; the tools (i.e. online databases) for checking if a name is preoccupied are more advanced and accessible.

4.16 Large group case studies

Recommendations relating to issues uncovered with the large group case studies (Proteaceae, the wattles and the eucalypts) have been sufficiently dealt with by Sections 4.1 through to 4.15. The greater detail shown with these groups included a State-by-State comparison to help data providers develop a greater awareness of the inconsistencies encountered when attempting to marry taxonomic data from multiple sources.

4.2 NVIS in relation to other botanical databases

4.21 NVIS relationship with Commonwealth and national databases

As a general rule, NVIS is a vegetation and ecological database and as such should not seek to duplicate taxonomic effort taking place in other institutions dedicated to this task. NVIS should endeavour to use the results of these other projects, freeing up NVIS resources for other activities. There are numerous taxonomic database projects at national, state and even local level competing for maintenance resources and NVIS should not add to this list. Many of these projects are seeking ways to collaborate, reduce duplication and achieve higher levels of consistency. NVIS should position itself to take advantage of this by linking in with them rather than maintaining its own independent taxonomy.

At the Commonwealth level, the CPBR and Australian Biological Resources Study (ABRS) compile and maintain the Australian Plant Name Index (APNI) as a list of all names of plants known to occur in Australia, including details of published synonymy. An overlay to this database, What's Its Name (WIN) attempts to crystallise contemporary taxonomic thought from the alternatives offered in the botanical literature.

APNI/WIN has links to and contributes to the International Plant Name Index (IPNI), ensuring that names used in Australia and in line with those used by the international botanical community.

Acknowledging the confusion and difficulties caused by differences in taxonomy used by different State herbaria and in different State censuses, CHAH have embarked on a 'Consensus Census' project that will be striving for a single national view of Australian plant taxonomy. The WIN interface to APNI has been offered as the platform for the 'Consensus Census' and will reflect the combined and compromise view of all Australian government herbaria. The expectation is that over time there will be a gradual convergence and agreement of taxonomy in all states.

The recommendation from this report is that NVIS align its taxonomy with the CHAH national 'Consensus Census' project when it becomes available (recommendation 2). This can be done both directly, through APNI/WIN and indirectly through links to State herbarium plant census projects. A closer alliance between NVIS taxonomy and national botanical projects like the CHAH endorsed 'Consensus Census' will remove the perception of a Commonwealth imposed taxonomy since the content of the 'Consensus Census' is being provided by the States themselves, with the Commonwealth as just another equal partner. Furthermore, this association is likely to foster greater collaboration between NVIS and the botanical resources of State and Commonwealth herbaria and provide greater opportunity for NVIS to take advantage of, have input to, and influence the direction and priorities of the AVH project.

Acknowledging practical difficulties in establishing direct database links between APNI/WIN and NVIS, this report recommends interim solutions involving periodic updates of taxonomic information from APNI/WIN into the immediate NVIS database environment (see Section 4.3).

The logical place for this is DEH's SPRAT database in the same Oracle RDBMS as NVIS. SPRAT contains taxonomic tables of all plant and animal names of interest to DEH and already has direct data links APNI/WIN.

The Taxon_Lists table is made up of 59,513 records (Section 1.34) of which 5,447 records (3,684 taxa) are used to create plant community profiles in the NVIS Veg_Description table (Section 3.0). After a clean up of the data following the Recommendations of Section 4.1, it is recommended (recommendation 21, in part) that these linked records be migrated from NVIS into SPRAT as described in greater detail in Section 4.3.

It is recommended that these links be enhanced and that formal arrangements be established to ensure timely updates of SPRAT from APNI/WIN (recommendation 3).

4.22 The role of State and Territory botanical databases

It is recommended that survey agencies contributing to NVIS as a priority build active working links with State herbaria who are custodians of census information about plants occurring in each State (recommendation 1).

Taxonomic lists for each State/Territory (which typically hold many more species than are used in NVIS vegetation descriptions) such as the State and Territory censuses maintained by herbaria should be used to support updates of NVIS taxonomic data. Examples of these State censuses include The National Herbarium of NSW's PlantNet (http://plantnet.rbgsyd.gov.au/PlantNet/NSWplants/nswplants.htm) and Perth Herbarium's FloraBase (http://florabase.calm.wa.gov.au). Links to all available State censuses can be found on Australia's Virtual Herbarium web site (http://www.chah.gov.au/avh).

The Commonwealth Government has a requirement for a separate combined national list of plant names, including indications of unresolved taxonomic issues between jurisdictions. The CHAH endorsed 'Consensus Census' is seen at the appropriate vehicle for this and this report recommends that NVIS align itself with the taxonomy that will be contained in this census (recommendation 2).

State and Territory custodians of NVIS are responsible for maintaining the content of this database, and it is highly advisable that they consider adopting a 'Consensus Census' view of taxonomy before data is dispatched to the national NVIS database as illustrated in Figure 4.3c. This will radically reduce the amount of work required in the data compilation stage. Comparing the initial NVIS taxon data against the local State census would be a valuable first step in this process.

Discrepancies and inconsistencies that are detected during the NVIS data loading process will be fed back to NVIS data custodians. This will initiate a dialogue towards agreement on an acceptable taxonomy.

A simpler data structure, with no families (recommendation 13) or authors (recommendation 20), will help future transfers of data pass validation get through to "validation successful" without the need for human intervention.

4.23 HISPID standards

Managers of Australian Commonwealth, State and Territory government herbaria, collectively form the Council of Heads of Australian Herbaria (CHAH) with a technical subcommittee, the Herbarium Information Systems Committee (HISCOM). This committee develops and maintains the Herbarium Information Standards and Protocols for Interchange of Data (HISPID), which specifies how herbarium data is to be recorded and transferred between databases. Computer operating systems and database platforms vary between herbaria and HISPID attempts to store herbarium specimen data in consistent or compatible fields. Coupled with this were consistent protocols and data delivery mechanisms to allow for a simplified exchange of data between institutions. This has been essential to the AVH project, and specimen exchange accompanied by electronic field data, reduces data entry duplication.

The current HISPID standards (Version 3 — Conn 1996) (http://plantnet.rbgsyd.nsw.gov.au/HISCOM/HISPID/HISPID3/hispidright.html) are actively used by the major Australian government herbaria, key collaborators in the NVIS database. For future NVIS data exchange, it is recommended that NVIS managers and collaborators adopt the use of the relevant HISPID conventions and standards for the representation of botanical data (recommendation 5). By adopting these standards, data can made more consistent between NVIS stakeholders, allowing for simpler uploading of data from collaborators to NVIS managers.

One current area of difference between the HISPID standards and data provided to NVIS by some collaborators is the combining of multiple nomenclatural elements into one string (or field). It is our recommendation (recommendation 22) that collaborators should follow the nomenclatural database structure outlined in the HISPID standards, effectively atomizing data into single elements (or fields). These name elements can be recombined later if necessary for presentational or reporting purposes. Table 4.23 below shows a hypothetical example of how combined data could be broken up to match HISPID standards.

Table 4.23 Atomizing of nomenclatural data elements
Before
Genus name Species epithet Infraspecific epithet
Acacia excelsa Benth. subsp. angusta Pedley
After
Genus name Species epithet Infraspecific rank Infraspecific epithet
Acacia excelsa subsp. angusta

Note: In this example the authors have be omitted in accordance with recommendation 20.

Whilst the before picture appears to be simpler, with fewer fields, the multiple data elements in a single field complicate comparisons to other records. As mentioned in Section 4.143, the presents of name qualifiers at the end of some records made comparison with records without qualification difficult, even when multiple records of the same taxon were compared. Moving the qualification, or other data, into a separate field (recommendation 14) simplifies the data elements and allows users greater control.

4.3 Future Test Measures

The on-going validity of NVIS will depend on the reliability of occurrence records in the database to which an identification based on correct and consistent taxonomy is critical. This can only be achieved by regular checking of data and comparison against reliable data sets serving as a standard for the project and the updating of NVIS and contributing databases to these standards.

Databases against which NVIS can and should be checked are themselves being checked, enhanced and updated — taxonomy is governed by contemporary acceptance and best practice rather than legislation and continual change is the only constant.

This report recommends (recommendation 21) a multi-stage approach to the checking of taxonomic data, with iterative checks on the databases against which NVIS is being checked:

  • NVIS data to be compared (and updated) at source against local State censuses which in turn will be compared against a combined national 'Consensus Census';
  • National plant list used by ERIN be updated regularly from 'Consensus Census' information sourced through APNI/WIN (quarterly or ad hoc preload updates may be sufficient);
  • Changes or updates detected or required at the national level be communicated to NVIS custodians. Dialogue to scope and resolve differences to be initiated.
  • NVIS abandon the concept of maintaining its own taxonomic authority file for internal and incoming data checking and validation;
  • APNI/WIN and the evolving 'Consensus Census' be used as the taxonomic resource for NVIS; for reasons of pragmatism, expediency and practicality, use of this resource need not be direct if there is an appropriate up to date local gateway to views of this information; SPRAT fulfill these requirements;
  • NVIS adopts an existing accessible taxonomic authority file for this purpose. From a practical point of view, the taxon tables of the SPRAT database are well-placed for this role:
    • The SPRAT taxon tables are pre-populated and available
    • SPRAT taxon data is a reduced subset of the unnecessarily complex (for NVIS purposes) APNI/WIN data
    • They are Oracle database tables on the same database server as NVIS and can be made accessible as part of NVIS
    • They are managed by the same database unit which manages the NVIS database
    • SPRAT uses APNI/WIN as a source of taxonomic and nomenclatural data
    • Mechanisms are already in place to provide SPRAT with direct database access to APNI/WIN data
    • APNI/WIN is managed by DEH and is continually maintained and updated in response to the contemporary taxonomic literature and input from the Australian botanical community
    • APNI/WIN is a key dataset of the peak body, CHAH and is the foundation of their 'Consensus Census' project to build and maintain an agreed contemporary taxonomy for Australian plants
    • Through its links to APNI/WIN and APNI's links to CHAH and the State Herbaria, SPRAT is likely to be a reasonable reflection of names and taxonomy employed by NVIS State and Territory contributing agencies;
  • NVIS enters into a MOUMOU or Service Level Agreement with SPRAT establishing a clear understanding of the use(s) SPRAT taxonomic data will be put, NVIS requirements and expectations of SPRAT taxonomic data and mechanisms for dealing with inconsistencies and unexpected events;
  • Given that NVIS data providers have not empowered the NVIS database to change provided taxon names, NVIS will need map provided names to an NVIS endorsed name; in most cases the mapping will be 1:1with no change involved; a small percentage will be either errors or inconsistencies which will require communication between NVIS, SPRAT and the data provider;
  • Existing NVIS data will be compared against SPRAT and inconsistencies will be rectified; changes will need to be communicated with NVIS data providers and agreements reached on how errors and legitimate differences will be flagged and handled, both within NVIS and within the supplying databases; and
  • NVIS data supplied for amalgamation will be compared (and updated) on load with the national NVIS plant list, stored as fields in SPRAT, which will reflect the taxonomy of APNI/WIN and the 'Consensus Census'. As previously mentioned, changes will need to be communicated with NVIS data providers and agreements reached on how errors and legitimate differences will be flagged and handled by both NVIS and the data providers.

From time to time it might be advisable to conduct an independent audit or focused evaluation of taxonomic data in the NVIS database, with a scope similar to this report. While this could be conducted annually or with each major data re-supply or a significant database restructure, a detailed evaluation of NVIS taxonomy will probably not be necessary for another 2-3 years. In this time the CHAH 'Consensus Census' is likely to have been implemented at the national level; this would provide an appropriate and useful trigger for a new evaluation and updated on NVIS taxonomic data.

Schematics of how the checking and validation process might be implemented and progressed over time have been developed in discussion with ERIN and are outlined in Figures 4.3a, 4.3b and 4.3c. Option 1 and Option 2 are initial and intermediate transition configurations, leading to a fairly simple final stage in Option 3 when links between State and national censuses and the NVIS database are more mature.

Table 4.3 indicates actions that might be taken by NVIS to address inconsistencies, incompatibilities, differences and errors in NVIS data files supplied by NVIS data providers. In line with the Figures mentioned above, it also details what issues may be detected and measured as part of an automated process.

Table 4.3 Summary of discrepancy handling options
DISCREPANCY Action Reported Automatically Reported quantitatively
RECORD NUMBERS      
Single (unique) occurrence names No action; all should match an entry in taxon name table no In summary report
Master records No action; not relevant as all master entries are in external taxon name table (SPRAT) n/a n/a
Duplicate records n/a in the context of supplied data n/a n/a
Total taxa   yes In summary report
       
INFRASPECIFIC NAME ISSUES      
Rank misspelled Correct manually and reload yes In summary report
Infra name in wrong field Correct manually and reload yes In summary report
Infra name missing Refer to data supplier, correct and reload yes In summary report
       
SYNONYMY      
Nomenclatural synonym Map to current name; notify data supplier yes In summary report
Taxonomic synonym Map to current name; notify data supplier yes In summary report
       
OTHER TAXONOMIC ISSUES      
Alternate family name n/a — family names to be ignored and supplied through SPRAT no In summary report
Missing family name n/a — family names to be ignored and supplied through SPRAT yes In summary report
Name misspelt Correct manually and reload; notify data supplier yes In summary report
Phrase name Treat as normal names; must match existing name in names table yes In summary report
Non-plant taxon Refer to data supplier, correct and reload if appropriate yes In summary report
Double epithet Correct and reload manually, notify data supplier yes In summary report
Status unknown Refer to data supplier; correct and reload if appropriate yes In summary report
Name qualifier present   yes In summary report



 Species Taxonomy in NVIS Vegetation Descriptions (short term arrangement)

Figure 4.3a Option 1: Species Taxonomy in NVIS Vegetation Descriptions (short term arrangement)




 Species Taxonomy in NVIS Vegetation Descriptions (mid-term arrangement)

Figure 4.3b Option 2: Species Taxonomy in NVIS Vegetation Descriptions (mid-term arrangement)




 Species Taxonomy in NVIS Vegetation Descriptions (long term arrangement)

Figure 4.3c Option 3: Species Taxonomy in NVIS Vegetation Descriptions (long term arrangement)

4.4 Guidance material for incorporation in future NVIS manuals

The data that is amalgamated to make up the NVIS Taxon_Lists table comes from multiple sources at different times from collaborators with differing backgrounds and research emphases. With this variation in mind, the Executive Steering Committee for Australian Vegetation Information developed the 2003 Australian Vegetation Attributes Manual (ESCAVI manual). This manual provides "nationally agreed guidelines for translating and compiling mapped vegetation datasets into the NVIS database through describing the NVIS attribute framework and links to the NVIS database" (ESCAVI 2003: 3). It is the recommendation of this report that collaborators, in the interests of providing NVIS more consistent data, become well familiar with this manual's guidelines (recommendation 23).

Data is periodically added to the NVIS database and existing data regularly reviewed. Guidance is provided (Section 4.41) to data collaborators to help simplify and improve the accuracy of data to which they are custodians, data that ultimately is incorporated into NVIS.

Field collection techniques and appropriate vouchering procedures are also outlined (Section 4.42) to encourage data collaborators to increase the number of vouchered NVIS records.

4.41 Provision of data to NVIS

In terms of the taxonomic data provided by the NVIS data custodians, this report concurs generally with the data structure of the Taxon_Lists table of the NVIS database. In this instance, simplicity is key, and all that is really required are fields for the essential elements of a taxon name. All other details of taxonomy can be obtained from other databases of plant names (i.e. SPRAT) or specialist taxonomic databases (i.e. APNI/WIN) as outlined in Section 4.21. This supplementary information can be obtained though a dynamic link between databases or can be imported into appropriate database fields if necessary. The long term recommendation (recommendation 21) to simply the process even further is to migrate this Taxon_Lists data into SPRATSPRAT, see Section 4.3.

If NVIS accepts the recommendation of this report to ignore family names and author names (recommendations 13 and 20) for data transfer, and adds a new Qualifier field (recommendation 14) then the Taxon_List dataset will be much simpler:

  • Genus
  • Species
  • Infraspecies rank
  • Infraspecies name
  • Qualifier (new field)

This will provide a unique taxon match for the purposes of data loading and for feedback of problems to NVIS data custodians. This will also mirror the fields and standards outlined under HISPID, allowing greater consistency of data exchange between key stakeholders, such as herbaria (see Section 4.23). Under the International Code of Botanical Nomenclature (ICBN) (Greuter et al. 2000), it is not necessary to include all ranks, but it is recommended that data be supplied to the finest possible level, such as subspecies and varieties, for greater information clarity, as per Section 4.12 (recommendation 10).

NVIS database managers may choose to manage additional taxon fields such as family name (sources from SPRATSPRAT or APNI/WIN) and taxon authors. If so, these should be sourced from an authoritative taxonomic database such as APNI/WIN or SPRATSPRAT (recommendation 25). See comments in Section 4.15 on the limited utility of author names.

Whilst simplification of NVIS data is recommended above, the data present in the remaining fields will need to be improved. Section 4.4 stressed the importance of data consistency and the role of the ESCAVI manual. One example of where greater familiarity with the ESCAVI manual will benefit managers and data collaborators is with infraspecific names. As Section 3.12 shows, State providers need to take greater care with the quality of infraspecific rank data they provide. Section 4.12 makes recommendation (recommendation 24) as to what preferred rank abbreviations are (e.g. subsp., var.). These abbreviations match those recommended in the ESCAVI manual. If collaborators follow these guidelines, data will be much more consistent allowing for improved correlation of information in NVIS (recommendation 23).

4.42 Field data collection and herbarium vouchers

It is the recommendation of this report that future data incorporated in NVIS be based on herbarium vouchers lodged with recognised Australian state and national herbaria wherever possible. However, it is understood that full vouchering of all vegetation survey data is impractical, both from a field and herbarium standpoint. Material in the field may be inadequate or insufficient for herbarium specimens, and vouchering all taxa encountered in a field survey is essentially an impossible task. Nonetheless, the judicious collection of vouchers, where possible, is strongly recommended (recommendation 26).

For NVIS collaborators not associated with herbaria, we would encourage the development of closer ties to the State or Territory herbarium that is most appropriate (recommendation 1). Most herbaria will accept specimen donations and manage them indefinitely for relatively little cost.

There are definite advantages in data that is linked to a tangible voucher specimen. Views on taxonomy change over time and it is not uncommon for names associated with herbarium specimens to change. As herbarium specimens are each given a unique numerical identifier (HISPID Accession number, for details see http://plantnet.rbgsyd.nsw.gov.au/HISCOM), this allows specimen name changes to be rapidly checked and communicated, for example to NVIS. Data provided to NVIS from non-vouchered sources cannot be so readily updated and over time may come to represent out-dated taxonomic concepts, which in turn may lessen the value of the data.

Most data providers will be familiar with the process of collecting herbarium vouchers and associated data. For collaborators who need guidance on these techniques, a number of useful references are available:

Australian National Botanic Gardens (2003) How to Collect Plants (http://www.anbg.gov.au/cpbr/herbarium/collecting/index.html)

Royal Botanic Gardens Sydney (1995) Collection, Preparation and Preservation of Plant Specimens, 2nd edn., Royal Botanic Gardens, Sydney.

Bridson, D. & Forman, L. (eds.) (1998) The Herbarium Handbook, 3rd edn., Royal Botanic Gardens, Kew.