Darwin Core Basics

Why Darwin Core?

There is an existing standard, Darwin Core, that works well for sharing many types of biological data, especially those that are related to ecology. It’s been around for about 25 years and at least 3.5 billion data records are shared via Darwin Core to repositories like the Ocean Biodiversity Information System (OBIS).

For communities like those which have developed around offshore development, this means the data you help collect can travel further and be reused more quickly. More practically, Darwin Core also saves time for anyone managing data, because it cuts down on the cleaning and translation work usually needed to merge different formats. In short, it helps your data stay connected, useful, and impactful well beyond the original project.

What kind of things fit into Darwin Core?

Diagram showing various biological data types that can use Darwin Core — Data types that fit Darwin Core (*Click to enlarge*)

Generally speaking, Darwin Core can accommodate observations of biological organisms. In the image above are some of the data types that have already used Darwin Core to publish to repositories like OBIS and GBIF.

What about things that don’t fit into Darwin Core?

Invariably, scientists measure variables that are unique to their data. So how do we share these in a standardized way? The folks at OBIS came up with a nice extension to Darwin Core that does just that—it’s called the Extended Measurement or Fact (eMoF) extension.

As you’ll see in the example from the Animal Telemetry Network (ATN), sometimes communities have strong standards in place, like the Movebank Attribute Dictionary, which prescribes standard definitions for variables related to telemetry. The eMoF extension allows for these to be put into a generalized, standard structure so that they can interoperate and be reused with standards from other communities, like the Climate and Forecast (CF) Standard Names.

Darwin Core Basics

OK, we’re ready for some nitty-gritty. Darwin Core is essentially a list of terms you can use as column headers in your data. Of course, these terms can also be used in more sophisticated structures than spreadsheets, like relational databases, but let’s keep it simple for now.

An example of this is describing the date some data was collected. If we don’t use Darwin Core, you might call it date and I might call it collection day. If we agree to use Darwin Core, we know it will be eventDate and we will never have to discuss it.

Diagram showing how a single Darwin Core term standardizes data description — Single Darwin Core term example (*Click to enlarge*)

Example Term

For example, in Darwin Core, countryCode is a standard term that describes that a particular observation happened in a country. You describe them via two-letter country codes (like US for the United States or CA for Canada).

Although there are many good arguments for using standards early in your data workflow, you don’t have to create an entirely new workflow; you can map to countryCode, and other Darwin Core terms, anywhere in your data process. That way, whether you’re collecting records in the field, managing them in a database, or publishing them online, the meaning stays consistent and others will immediately understand what that field represents.

What is required?

One of the most frequent questions we get is which terms are required in Darwin Core? The answer is none. The standard does not prescribe requirements, however repositories like OBIS and GBIF do.

For example, OBIS requires:

eventID
eventDate
decimalLatitude
decimalLongitude
occurrenceID
occurrenceStatus (e.g., present/absent)
basisOfRecord (e.g., HumanObservation)
scientificName

When you look at those requirements, they’re rather simple:

what did you see: scientificName
when: eventDate
where: decimalLatitude/decimalLongitude
did you see it, how did you see it: basisOfRecord
how do we reference this record: eventID and occurrenceID

But if that’s all the information Darwin Core communicated, it wouldn’t be very useful. On the next page we look at some specific examples. It comes down to what your scientific community expects of you. For example, it might be normal to collect the sex of the species in your field. In that case, it’s best to include sex as a term.

Why Darwin Core?

What kind of things fit into Darwin Core?

The Darwin Core Archive - a unit of data sharing

What about things that don’t fit into Darwin Core?

Darwin Core Basics

Example Term

What is required?