Herbarium Datasets for Download:

The following downloads provide access to entire datasets from individual herbaria, and to the entire Consortium database. These datasets are intended for those performing large-scale data harvesting or analyses.

Data Formats:

Each dataset is formatted as a ZIP archive containing multiple tab-delimited text files in a simple relational structure. Files listed under the "DwC Archive" column follow the Darwin Core Archive (DwCA) specifications for field names. Files listed under the "Native Format" column have fields names matching the structure of the underlying Consortium database, and include additional fields that cannot be easily represented by DwCA.

Each dataset contains a text file called "occurrences.txt" (Native) or "occurrence.txt" (DwCA) holding one record per specimen with all core specimen fields (herbarium info, current identification, collector info, dates, location, notes, etc.). Each occurrence record is uniquely identified within the PNW Herbaria database by a field called OccurrenceID (Native) or id (DwCA).

Each dataset contains a text file called "annotations.txt" (Native) or "identifications.txt" (DwCA) that holds annotations/identifications. There may be multiple annotations/identifications linked to a single specimen record, using the OccurrenceID field (Native) or coreid field (DwCA; coreid = id in occurrence.txt).

Each dataset contains a text file called "media.txt" (Native) or "multimedia.txt" (DwCA) that holds metadata for images and other media associated with each specimen. There may be multiple images linked to a single specimen, again using OccurrenceID (Native) or coreid (DwCA). The image files themselves are stored on our server and can be access via the URLs defined in media.txt and multimedia.txt. The Native format media.txt file provides direcct links to three image formats provided by PNW Herbaria: a thumbnail image, a full-resolution JPEG image, and an online image viewer allowing zoom and pan capabilities. The DwCA format only provides links to the full-resolution JPEG images; however, programmers may readily modify these links to reccreate the links to the thumbnails or online image viewer.

Native format dataset files contain a fourth file, called "types.txt" containing scientific names (basionyms), type designations, and other relevant info for nomenclatural type specimens. There may be multiple type names associated with each occurrence record, each linked to a record in occurrences.txt by the OccurrenceID field.

DwCA format dataset files also contain two XML files with dataset metadata. The file "meta.xml" contains ordered lists of field names within each of the enclosed text files. The file "eml.xml" contains general metadata about the dataset as a whole, including provider and collection information, contact information, and license terms.

All text files are encoded as unicode (UTF-8) text, with Windows-style line endings (CR+LF) and fields separated by tabs. Field names and values are not enclosed in quotes.

Data Usage Policy:

Use of these datasets requires agreement to the terms and conditions in our Data Usage Policy.

Sensitive data (e.g., localities for rare taxa) are omitted from these datasets. Copies of these files with restricted data intact can be accessed by logging in. Some records in our database are not available for public release; such records are excluded from the datasets and record counts below.

Herbarium Collection # Specimens # Images DwC Archive Native Format Last Updated
