How to encode data

The best way to integrate with the GCW Data Portal at the data level is to serve data (observational or analysed/simulated) through the Open-source Project for a Network Data Access Protocol (OPeNDAP). OPeNDAP and the Network Common Data Format (NetCDF) both use the Common Data Model which simplifies data handling. When serving data through OPeNDAP, data must be encoded according to the Climate and Forecast convention (CF). Starting with version 1.6 of the CF conventions, standardised approaches to encoding:

  • gridded data
  • timeseries at stations
  • profiles at stations
  • trajectories
  • trajectories of profiles

have been developed. When not serving gridded data it is important to add the global attribute featureType to identify the proper type of the data for the portal to be able to identify the proper services to offer. Unless you have a web service (OAI-PMH or OGC CSW) offering discovery metadata as GCMD DIF or ISO19115 (using GCMD Science Keywords and OSGEO keywords for URL identification) or would like to provide discovery metadata manually (not recommended) it is required to add  ACDD elements to the data files. The GCW Data Portal can then automatically extract discovery metadata. The required elements are

ACDD global attributes required (extract from ACDD documentation).
Attribute Description
id An identifier for the data set, provided by and unique within its naming authority. The combination of the "naming authority" and the "id" should be globally unique, but the id can be globally unique by itself also. IDs can be URLs, URNs, DOIs, meaningful text strings, a local key, or any other unique string of characters. The id should not include white space characters.
naming_authority The organization that provides the initial id (see above) for the dataset. The naming authority should be uniquely specified by this attribute. We recommend using reverse-DNS naming for the naming authority; URIs are also acceptable. Example: 'edu.ucar.unidata'.
title A short phrase or sentence describing the dataset. In many discovery systems, the title will be displayed in the results list from a search, and therefore should be human readable and reasonable to display in a list of such names. This attribute is also recommended by the NetCDF Users Guide and the CF conventions.
summary A paragraph describing the dataset, analogous to an abstract for a paper.
keywords A comma-separated list of key words and/or phrases. Keywords may be common words or phrases, terms from a controlled vocabulary (GCMD is required), or URIs for terms from a controlled vocabulary (see also "keywords_vocabulary" attribute).
geospatial_lat_min Describes a simple lower latitude limit; may be part of a 2- or 3-dimensional bounding region. Geospatial_lat_min specifies the southernmost latitude covered by the dataset. Must be decimal degrees north.
geospatial_lat_max Describes a simple upper latitude limit; may be part of a 2- or 3-dimensional bounding region. Geospatial_lat_max specifies the northernmost latitude covered by the dataset. Must be decimal degrees north.
geospatial_lon_min Describes a simple longitude limit; may be part of a 2- or 3-dimensional bounding region. geospatial_lon_min specifies the westernmost longitude covered by the dataset. See also geospatial_lon_max. Must be decimal degrees east.
geospatial_lon_max Describes a simple longitude limit; may be part of a 2- or 3-dimensional bounding region. geospatial_lon_max specifies the easternmost longitude covered by the dataset. Cases where geospatial_lon_min is greater than geospatial_lon_max indicate the bounding box extends from geospatial_lon_max, through the longitude range discontinuity meridian (either the antimeridian for -180:180 values, or Prime Meridian for 0:360 values), to geospatial_lon_min; for example, geospatial_lon_min=170 and geospatial_lon_max=-175 incorporates 15 degrees of longitude (ranges 170 to 180 and -180 to -175). Must be decimal degrees east.
time_coverage_start Describes the time of the first data point in the data set. Use the ISO 8601:2004 date format, preferably the extended format as recommended in the Attribute Content Guidance section. I.e. YYYY-MM-DDTHH:MM:SSZ (always use UTC).
time_coverage_end Describes the time of the last data point in the data set. Use ISO 8601:2004 date format, preferably the extended format as recommended in the Attribute Content Guidance section. I.e. YYYY-MM-DDTHH:MM:SSZ (always use UTC).
Conventions A comma-separated list of the conventions that are followed by the dataset. For files that follow this version of ACDD, include the string 'ACDD-1.3'. (This attribute is described in the NetCDF Users Guide.)
history Provides an audit trail for modifications to the original data. This attribute is also in the NetCDF Users Guide: 'This is a character array with a line for each invocation of a program that has modified the dataset. Well-behaved generic netCDF applications should append a line containing: date, time of day, user name, program name and command arguments.' To include a more complete description you can append a reference to an ISO Lineage entity; see NOAA EDM ISO Lineage guidance.
source The method of production of the original data. If it was model-generated, source should name the model and its version. If it is observational, source should characterize it. This attribute is defined in the CF Conventions. Examples: 'temperature from CTD #1234'; 'world model v.0.1'.
processing_level A textual description of the processing (or quality control) level of the data.
date_created The date on which this version of the data was created. (Modification of values implies a new version, hence this would be assigned the date of the most recent values modification.) Metadata changes are not considered when assigning the date_created. The ISO 8601:2004 extended date format is recommended, as described in the Attribute Content Guidance section.
creator_type Specifies type of creator with one of the following: 'person', 'group', 'institution', or 'position'. If this attribute is not specified, the creator is assumed to be a person.
creator_institution The institution of the creator; should uniquely identify the creator's institution. This attribute's value should be specified even if it matches the value of publisher_institution, or if creator_type is institution.
creator_name The name of the person (or other creator type specified by the creator_type attribute) principally responsible for creating this data.
creator_email The email address of the person (or other creator type specified by the creator_type attribute) principally responsible for creating this data.
creator_url The URL of the person (or other creator type specified by the creator_type attribute) principally responsible for creating this data.
institution The name of the institution principally responsible for originating this data. This attribute is recommended by the CF convention.
publisher_name The name of the person (or other entity specified by the publisher_type attribute) responsible for publishing the data file or product to users, with its current metadata and format.
publisher_email The email address of the person (or other entity specified by the publisher_type attribute) responsible for publishing the data file or product to users, with its current metadata and format.
publisher_url The URL of the person (or other entity specified by the publisher_type attribute) responsible for publishing the data file or product to users, with its current metadata and format.
project The name of the project(s) principally responsible for originating this data. Multiple projects can be separated by commas, as described under Attribute Content Guidelines. Examples: 'PATMOS-X', 'Extended Continental Shelf Project'.

It is worth noting that GCW does not care about how data are stored. Data may be stored in a different form (e.g. a relational database), but transformed into the appropriate form when shared. A number of open source tools facilitating data sharing using OPeNDAP is available and some listed below:

When serving data through OPeNDAP it is recommended to keep datasets simple. I.e. do not collect many stations in one dataset. Specify one station per dataset for observational data. Please contact the GCW Data Portal if further details are required.