ObsPack Data Product
Each ObsPack data product includes
- a unique ObsPack name
- prepared data sets
- metadata
- a product summary
- an e-mail distribution list of all data providers
ObsPack Name
Each ObsPack data product has a unique ObsPack name using the following structure.
obspack_<trace gas identifier>_<preparation lab number>_<product name>_<product version number>_<preparation date>
Please note that the <product_name> part of this structure is optional.
The version numbering scheme is major.minor[.minor] where a major release is indicated by the first number in the sequence and minor revisions are indicated by the second and third (optional) numbers in the sequence. Below are a few examples.
obspack_co2_1_GLOBALVIEWplus_v1.0_2015-07-30 (first major release of GLOBALVIEWplus data product)
obspack_co2_1_PROTOTYPE_v1.0.4b_2014-02-13 (minor revision to PROTOTYPE data product)
obspack_co2_1_PROTOTYPE_v1.0.4_2013-11-25 (minor revision to PROTOTYPE data product)
obspack_co2_1_PROTOTYPE_v1.0.0_2012-11-06 (first major release of PROTOTYPE data product)
Please note: The latest minor revision of a major release includes all changes included in intermediate minor revisions if they exist. We can expect a considerable number of minor revisions while the ObsPack framework is being developed. Once the framework has been thoroughly vetted, the number of minor revisions should be greatly reduced.
The ObsPack name is used throughout.
Prepared Data Sets
An ObsPack data set is 1) a collection of measurements for a single trace gas species, 2) derived from a single laboratory-project, and 3) prepared according to a set of instructions. A set of instructions, specific to each data set, configures ObsPack software to subset data, average data, or pass data through without alteration. Multiple instruction sets for a given measurement record will create multiple unique data sets. For example, the NOAA quasi-continuous CO2 measurement record from the 396 magl intake height on the Wisconsin tall tower site (LEF) could be subsetted into 2 data sets; one consisting of average values of afternoon measurements only, and a second consisting of average values of nighttime measurements only. The ways in which data are prepared depend on the intended use of the data product.
Data sets are presented as individual files. File names are unique and include the trace gas species identifier, alphanumeric site/project/campaign code, measurement project, laboratory identification number, a data selection tag, and the file type identifier, e.g., "nc" (netCDF4) and "txt" (ASCII text). The file name structure is as follows.
<trace gas identifier>_<site code>_<project>_<lab number>_<selection tag>.<filetype extension>
Below are a few examples.
co2_lef_aircraft-pfp_1_allvalid.txt
co2_lef_surface-pfp_1_representative.txt
co2_lef_tower-insitu_1_afternoon-396magl.txt
co2_lef_tower-insitu_1_nighttime-396magl.txt
co2_nat_surface-flask_26_marine.nc
co2_songnex2015_aircraft-insitu_114_allvalid.nc
co2_con_aircraft-flask_20_allvalid.nc
The selection tag included is intended to convey a very general notion of how the data have been selected. This information including relevant literature references is included in the file.
Metadata
Each data set includes comprehensive metadata describing the sampling location, sampling strategy, preparation strategy, and contact information for the contributing laboratory and data providers. Also included in each data set are the contributing lab's logo and country flag (where available). These metadata provide users with all the information required to give proper attribution when displaying data from an ObsPack product. Figure 1 is constructed entirely from data and metadata extracted from a single data set.
Inside the Data File
Each data file includes a single prepared data set and associated metadata. Each data item in a data set includes the sample collection time, position, reported mole fraction or isotope ratio, estimated uncertainty (when available), the number (n) of individual measurements contributing to the reported value, and a unique ID that distinguishes the item from all other data items in the product. Metadata are presented as global attributes that describe general features of the data set and variable attributes that describe characteristics of the variables associated with each data item. Tables 1 and 2 describe global and variable attributes included in a typical ObsPack netCDF data file.
Global Attributes | |
---|---|
Name | Description |
site_code | site code |
site_name | Standard site name (e.g., Park Falls, Wisconsin) |
site_country,site_country_flag | Country in which site is located and link to image of flag |
site_longitude | Longitude (decimal degree) at representative site location |
site_latitude | Latitude (decimal degree) at representative site location |
site_elevation | Ground or surface elevation at representative site location |
site_elevation_unit | site_elevation is reported in meters above sea level (masl) |
site_map, dataset_map | Link to world map highlighting site location (this key has been replaced by "dataset_map") |
site_utc2lst | Hour conversion from UTC to LST |
site_url | URL link to site web page |
site_comment | Additional relevant site information |
dataset_creation_date | Creation date of dataset in ISO format |
dataset_num | Integer that uniquely identifies the data set in the ObsPack data product |
dataset_name | Character string that uniquely identifies the data set in the ObsPack data product. Data set names are discussed here. |
dataset_map | Link to world map highlighting the data set sampling location (replaces "site_map"; file type is png) |
dataset_parameter | Identifies trace gas species included in data set (e.g., co2, c13co2) |
dataset_process | String description of ObsPack data preparation (e.g., PassThru, TimeStepAverage) |
dataset_project | Typically identies sampling platform and strategy (e.g., surface-flask, tower-insitu, aircraft-pfp) |
dataset_db | Boolean T/F. Indicates source data are from NOAA operational database . |
dataset_archive_dir | Source data archive directory . |
dataset_archive_file | Source data file or file filter . |
dataset_contribution | A short text summary of those responsible for the data set. |
dataset_intake_ht | This attribute is set when it is necessary to subset source data by sample intake height . |
dataset_intake_ht_unit | dataset_intake_ht is reported in meters above ground level (magl) . |
dataset_time_window_utc | Attribute set when necessary to subset source data by sample collection time (UTC) . |
dataset_time_window_lst | Attribute set when necessary to subset source data by sample collection time (LST) . |
dataset_time_window_exclusion | T or F for determining if data are excluded based on time_window . |
dataset_time_fill | T or F for determining if dataset has time filled values . |
dataset_parse_function | Python module used to read source data . |
dataset_data_frequency | Measurement frequency of source data. |
dataset_data_frequency_unit | Indicates the time unit of the data set_data_frequency attribute. |
dataset_platform | Fixed or Mobile. |
dataset_start_date | Date of first item in data set (ISO 8601 format). |
dataset_stop_date | Data of last item in data set (ISO 8601 format). |
dataset_selection | Brief description of how data have been selected by data contributor or prepared by NOAA. |
dataset_selection_tag | Short descriptor to help convey how data have been selected by data contributor or prepared by NOAA. The selection tag is included in the data set name. |
dataset_comment | Additional relevant site information |
dataset_description | Description of the dataset. This may be the ObsPack product's description or additional relevant descriptive information regarding the specific dataset. |
dataset_calibration_scale | Measurements are relative to reported calibration scale. |
dataset_fair_use | This is the ObsPack fair use statement agreed upon by data providers. |
dataset_reciprocity | Statement on the reciprocity of the data within a dataset. |
dataset_reference_total_listed | Formerly called "dataset_reference number". Number indicating how many references to published literature to expect in this file. |
dataset_reference_#_name | Reference provided by data contributor. # represents a number from 1 to "dataset_reference_total_listed". |
dataset_globalview_prefix | Character string of equivalent GLOBALVIEW file name prefix (GLOBALVIEW products are outdated with no plans to resume production. It is recommended to use the annually updated GLOBALVIEWplus products). |
dataset_globalview_mbl_designation | Marine boundary layer site designation (GLOBALVIEW products are outdated with no plans to resume production. It is recommended to use the annually updated GLOBALVIEWplus products). |
dataset_globalview_weight_yearspan | The suggested relative weights for GLOBAVIEW (GLOBALVIEW products are outdated with no plans to resume production. It is recommended to use the annually updated GLOBALVIEWplus products). |
dataset_globalview_weight | The suggested relative weights for GLOBAVIEW (GLOBALVIEW products are outdated with no plans to resume production. It is recommended to use the annually updated GLOBALVIEWplus products). |
dataset_globalview_weight_rsd | Residual standard deviation (RSD) of the measurements about the smooth curve, S(t), with annual resolution for GLOBAVIEW (GLOBALVIEW products are outdated with no plans to resume production. It is recommended to use the annually updated GLOBALVIEWplus products). |
dataset_globalview_weight_n | The number of residuals per year used in the RSD determination for GLOBALVIEW (GLOBALVIEW products are outdated with no plans to resume production. It is recommended to use the annually updated GLOBALVIEWplus products). |
lab_total_listed | Number of contributing laboratories associated with the data set. |
lab_#_number | Laboratory identification number. See Lab Table. # represents a number from 1 to "lab_total_listed". |
lab_#_abbr | Laboratory abbreviation or acronym (e.g., CONTRAIL, UHEI-IUP) |
lab_#_name | Laboratory name |
lab_#_address | |
lab_#_country, lab_#_country_flag | |
lab_#_parameter | Attributes which parameter(s) in the dataset this lab has contibuted |
lab_#_url | |
lab_#_logo | |
lab_#_ongoing_atmospheric_air_comparison | If "T", lab participates in at least one ongoing direct atmospheric air comparison experiment. |
lab_#_comparison_activity | Brief description of measurement comparison activities |
campaign_#_abbr [ _name, _num, _url, _logo ] | Additional metadata fields used to identify specific attributes of a campaign. |
program_total_listed | Number of contributing programs associated with the data set. |
program_#_abbr [ _number, _name, _address, _country, _country_flag, _url, _logo ] | Providers may make a distinction between the measurement lab and over-arching research programs (e.g., NACP, ICOS). # represents a number from 1 to "program_total_listed". |
provider_total_listed | Number of providers (Principal Investigators) associated with the data set. |
provider_#_name. # represents a number from 1 to "provider_total_listed". | |
provider_#_address | |
provider_#_country | |
provider_#_affiliation | |
provider_#_affiliation_abbr | |
provider_#_parameter | Attributes which parameter(s) in the dataset this provider has contibuted |
provider_#_email | |
provider_#_tel | Telephone number |
partner_total_listed | Number of partners associated with the data set. Partners are individuals and organizations that provider critical logistical, physical, or financial support for the measurements. |
partner_#_abbr [ _name, _address, _country, _affiliation, _affiliation_abbr, _email, _tel, _url, _logo, _flag ] | Partners may be individuals or organizations. # represents a number from 1 to "partner_total_listed". |
obspack_originator_lab_total_listed | Number of laboratories responsible for preparing the ObsPack product. |
obspack_originator_lab_#_abbr [ _name, _number ]. | |
obspack_originator_individual_total_listed | Number of individuals responsible for preparing the ObsPack product. |
obspack_originator_individual_#_name [ _email, _affiliation ]. | |
obspack_data_time_step | Time interval at which ObsPack data are presented (e.g., day, hour). |
obspack_name | Unique ObsPack identification string. Structure is obspack_<parameter>_<preparation/distribution lab number>_<product name>_<version number>_<preparation date> (e.g., obspack_co2_1_PROTOTYPE_v0.9.1_2012-07-20). |
obspack_description | Brief description of data product contents. |
obspack_version | ObsPack software version number. |
obspack_creation_date | Date when the ObsPack data product was prepared. |
obspack_citation | Required ObsPack citation. This citation is in addition to the requirements of the ObsPack Fair Use statements. |
obspack_fair_use | These cooperative data products are made freely available to the scientific community and are intended to stimulate and support carbon cycle modeling studies. We rely on the ethics and integrity of the user to assure that each contributing national and university laboratory receives fair credit for their work. Fair credit will depend on the nature of the work and the requirements of the institutions involved. Your use of an ObsPack data product implies an agreement to contact each contributing laboratory to discuss the nature of the work and the appropriate level of acknowledgement. If an ObsPack data product is essential to the work, or if an important result or conclusion depends on an ObsPack product, co-authorship may be appropriate. This should be discussed with each data provider at an early stage in the work. Contacting the data providers is not optional; if you use an ObsPack data product, you must contact the data providers. To help you meet your obligation, each data product includes an e-mail distribution list of all data providers. ObsPack data products must be obtained directly from the ObsPack Data Portal at gml.noaa.gov/ccgg/obspack/ and may not be re-distributed. Beginning November 2013, all new ObsPack data products will have a unique Digital Object Identifier (DOI) registered with the International DOI Foundation. In addition to the conditions of fair use as stated above, users must also include the ObsPack product citation in any publication or presentation using the product. The required citation is included in every data product and in the automated e-mail sent to the user during product download. Beginning November 2013, there are no longer any exceptions to this policy; it applies to all ObsPack products including GLOBALVIEW. |
obspack_warning | Every effort is made to create the most accurate and precise data product possible. Contributors reserve the right to make corrections to this product and data based on recalibration of standard gases or for other reasons deemed scientifically justified. Contributors to this product are not responsible for results and conclusions based on use of this product without regard to this warning. |
Variable Attributes | |
---|---|
Name | Description |
obs_num | Unique observation number in a single data set. Ranges from 1 to UNLIMITED (netCDF). |
obs_id | Unique identification string that distinguishes the data item from all other data items in the ObsPack data product. It includes dataset_name and obs_num. |
obspack_num | Unique observation index number across all data sets in the ObsPack distribution. Ranges from 1 to max_obspack_num. |
obspack_id | Unique identification string that distinguishes the data item from all other data items in any ObsPack data product. It includes obspack_name, dataset_name, and obspack_num delimited by a tilde (~). |
time | Air sample collection time (UTC). POSIX time (number of seconds since January 1, 1970 in UTC). |
time_decimal | Air sample collection time (UTC) in decimal year notation (e.g., 2012.4523312). |
time_components | Air sample collection time (UTC) represented as a 6-element array [year, month, day, hour, minute, second]. Calendar time components as integers. |
solartime_components | Air sample collection time (solar time) represented as a 6-element array [year, month, day, hour, minute, second]. UTC time is converted to local solar time based on longitude and day-of-year. Solar time components as integers. |
analysis_datetime | Air sample measurement date and time in UTCAir sample measurement date and time in UTC.Units depend on trace gas species. |
time_interval | Total number of seconds of the averaging interval. |
value_original_scale | Values supplied by data providers that are not on the WMO CO2 X2019 calibration scale are reported in this variable. |
value_unc | This is the estimated uncertainty of the reported value. |
value_stddev | This is the standard deviation of the reported mean value when nvalue is greater than 1. |
inst_repeatability | This is the standard deviation of the measurement instrument when measuring a constant air stream, e.g. from a standard or zero gas tank. |
nvalue | Number of individual measurements used to compute reported value. |
latitude | Latitude at which air sample was collected (units: decimal degrees). |
longitude | Longitude at which air sample was collected (units: decimal degrees, range: -180° to +180°). |
altitude | Altitude (surface elevation plus sample intake height) at which air sample was collected. Units are meters above sea level (masl). |
pressure | Ambient pressure at time of sampling. Units are hectopascal (hPa) where 1 hPa = 100 Pa. This variable is not always available. |
elevation | Surface or ground elevation at which air sample was collected. Units are meters above sea level (masl). |
intake_height | Height above ground at which air sample was collected. Units are meters above ground level (magl). |
qcflag | This is the quality control flag provided by the contributing PIs. |
instrument | Instrument ID used to detect atmospheric parameter. |
method | Air sample collection method. |
air_sample_container_id | ID of air sample container. |
event_number | Many laboratories identify each discrete air sample collected at some time and location using a unique sample event number. The event number (reported as a string) can be used to relate measurements of different trace gases and isotopes from the same sample. |
processing_time | Some datasets use a combination of data processes (TimeStepAverage, TimeWindowAverage, PassThru). This value indicates the window or step for a given observation in seconds. A value of 0 is for data that uses PassThru |
obs_flag | Representation flag indicates that reported value has large spatial scale representation (1) or is locally influenced (0). This attribute is derived from the data providers source data. The implementation of this flag is still being developed. Suggestions welcome. |
source_id | The upstream data provider can optionally include a source_id string to identify or provide context for a particular observation in the source data. See provider_comment if available. |
flight_id | If data item was sourced from an air campaign, the data provider can optionally provide a flight identification string. |
profile_id | The upstream data provider can optionally include a profile_id, generally for aircraft or shipboard programs, that can be used to identify unique profiles in the data. |
unique_sample_location_num | This variable uniquely identifies a sample location and datetime. The number assigned to each observation in this variable will be the same in all future ObsPack products including ones for other species measured in that sample.. |
temperature | Temperature at time of sampling in Kelvin. |
pressure_altitude | Pressure Altitude in meters above sea level derived from ambient pressure at time of sampling. |
gps_altitude | GPS Altitude in meters above sea level taken at time of sampling. |
u | Eastward (westerly) wind component in meters per second. |
v | Northward (southerly) wind component in meters per second. |
h2o | Water Vapor mole fraction reported in units of micromol mol-1 (10-6 mol per mol of dry air); equivalent to ppm (parts per million). |
assimilation_concerns | Values in this array indicate if the given observation has the assimilation concern defined by each column. A value of 0 means that there is no concern or it is not known to exist, and a non-zero value means that this concern does exist. |
CT_sampling_strategy | Flag indicating how an observation should be sampled from the atmospheric model. Values are: 1 = 4-hour model average; 2 = 1-hour model average; 3 = 90-minute average; 4 = quasi-instantaneous model value. |
CT_MDM | CarbonTracker model-data mismatch error value (in ppm). This is the error value placed on each measurement in the assimilation system, and is meant to express the statistics of simulated-minus-observed CO2 residuals expected if CarbonTracker were using perfect surface fluxes. |
CT_RMSE | Root-mean-square error (in ppm CO2) from which the CT_MDM values are generated. These are statistics of model performance for these observations, generated from other simulations. |
CT_assim | 1=assimilate; 0=do-not-assimilate; 2=assimilable but withheld for cross-validation" |
CT_may_reject | CT internal EnKF flag, logical: 0 (FALSE)=may not localize; 1 (TRUE)=may localize |
CT_may_localize | CT internal EnKF flag, logical: 0 (FALSE)=may not reject; 1 (TRUE)=may reject |
Product Summary
The ObsPack product summary (<product name>_dataset_summary.txt) briefly summarizes the contents of the data product including 1) the ObsPack Fair Use Statement, 2) a brief description of the data product and its intended use, 3) the total number of data sets (max_dataset_num) and the total number of observations (max_obs_num) included in the package, and 4) a list of all data sets in the data product. Listed with each data set is the contributing laboratory abbreviation; the start and end date of the included data; indication of lab participation in ongoing direct atmospheric air comparison experiments; and a short phrase indicating the data selection strategy used by the data provider.
Summary files for currently available data products can be found by clicking on the information icon located next to the list of available product versions.
Data Provider E-mail Distribution List
Use of an ObsPack data product implies agreement to contact each contributing laboratory to discuss the nature of the work and the appropriate level of acknowledgement, which may include co-authorship (see the ObsPack Fair Use Statement). To help users meet this obligation, each data product includes an e-mail distribution list of all data providers. The text file <product name>_data_provider_email_list.txt provides the e-mail list in two formats to facilitate use. The list includes e-mail addresses for those data providers who have contributed to the particular data product.