What data standards are and what they do

What is a data standard?

A technical specification that details the structure, organization, documentation, and format of data.

Technical data standards pertain to organizing the data and documenting how data was collected; they do not provide guidelines for how data should be collected. They are usually machine-testable, and enable machines to exchange data.

Some common scenarios

A drawing of two people exchanging data without a standard

No standard scenario (Click to enlarge)

No standard

The data is not referenced to any standard and the only way to understand it is to talk to the people who created it, or extract context from related work, like published papers. Relies on human intuition, institutional knowledge and irregular documentation.

A drawing of a person giving data to a closed database

Closed standard scenario (Click to enlarge)

Closed standard

The data is referenced to a closed standard, one that is only accessible to a select group of people (e.g., a private company). Relies on closed documentation and institutional knowledge. This is different from FAIR data that has restricted access.

A drawing of data being exchanged across the world via an open standard

Open standard scenario (Click to enlarge)

Open standard

Humans and machines never have to interact to be able to reuse each other’s data. Relies on FAIR, open data standards. Standard is well-documented and machine-readable. Some of the best arguments for using open standards include increasing collective efficiency to make bigger and better products and to get and track credit for your work.

Why use a data standard?

In studying the US Atlantic ocean, implementing standards enable us to go from data to regional syntheses efficiently.

  • Using data standards allows for the consistent collection of data, and aids in data aggregation, sharing and reuse, and interoperability of that data across different systems, sources, and users.
  • Data standards can save time and effort if used during data collection, but may be applied at other points in the data lifecycle by restructuring and re-documenting the data.

Diagram showing the flow of data from collection through standards to synthesis

General data flow diagram (Click to enlarge)

In short, it can save you time and effort. There are many standards out there that apply to many facets of our data as scientists. Integrating them into your workflows will increase your efficiency and increase the value of your data for people who reuse your data, including yourself.

Diagram illustrating how standards improve efficiency in data workflows

Efficiency benefits of standards (Click to enlarge)