Flexible YAML and OG 1.0 Output#

PyGlider 1.0 makes the deployment YAML more flexible so that variable names, derived-variable computations, and the netCDF dimension name are no longer hardcoded. This lets you produce OceanGliders 1.0 (OG 1.0) trajectory files — with uppercase OG vocabulary names and the N_MEASUREMENTS dimension — using exactly the same processing pipeline as the legacy IOOS GDAC format.

Top-level YAML keys#

Two new top-level keys control the output format.

output_conventions#

Declares the naming convention used in the file. This value is recorded in the global attributes and signals to downstream tools which vocabulary the variable names follow.

output_conventions: OG-1.0   # or IOOS_GDAC (default)

output_dimension#

Sets the name of the observation dimension in the output netCDF file. Internally pyglider always works with a dimension called time; this key causes it to be renamed on write and restored on read.

output_dimension: N_MEASUREMENTS   # default: time

If omitted, the dimension name is time (IOOS GDAC style).

Variable flexibility via processing_role#

In previous versions, the processing pipeline looked for variables by their literal name — ds['pressure'], ds['conductivity'], etc. — which forced the YAML to use those exact keys.

Now each variable entry in netcdf_variables can carry a processing_role that tells the pipeline what role the variable plays, regardless of what it is called in the output file.

netcdf_variables:

  PRES:                          # OG 1.0 output name
    source:          sci_water_pressure
    processing_role: pressure    # pipeline looks this up by role
    long_name:       Pressure (measured variable)
    units:           dbar
    ...

  LATITUDE:
    source:          m_lat
    processing_role: latitude
    ...

Required roles#

The following roles are used directly by the processing pipeline regardless of any processing_method entries. If you use non-standard variable names (i.e. OG 1.0), you must declare these roles explicitly:

Role

Default IOOS GDAC name

Typical OG 1.0 name

time

time

TIME

latitude

latitude

LATITUDE

longitude

longitude

LONGITUDE

pressure

pressure

PRES

depth

depth

DEPTH

profile_index

profile_index

PROFILE_NUMBER

If a required role cannot be resolved to a variable that exists in the dataset, pyglider raises a ValueError with a message pointing to the YAML fix needed.

Legacy fallback roles#

Variables like temperature, conductivity, and oxygen are only looked up by role in the legacy processing path (when no processing_method entries cover thermodynamic variables). If you supply processing_method blocks for salinity and density — as OG 1.0 YAMLs should — you do not need processing_role on these variables; the method inputs name them explicitly.

If processing_role is absent and no processing_method covers a variable, pyglider falls back to looking for a variable whose name matches the role string, so existing IOOS GDAC YAMLs continue to work without modification.

Derived variables via processing_method#

Previously, salinity, density, depth, and profile numbering were computed by hardcoded calls inside the processing functions, always consuming variables named conductivity, temperature, pressure, etc.

Now you can specify how each derived variable is computed and which named inputs to use:

  PSAL:
    processing_method:
      practical_salinity:
        conductivity: CNDC
        temperature:  TEMP
        pressure:     PRES
    long_name:    Sea water practical salinity
    units:        "1"
    ...

  DEPTH:
    processing_method:
      depth_from_pressure:
        pressure: PRES
        latitude: LATITUDE
    processing_role: depth
    ...

  PROFILE_NUMBER:
    processing_method:
      find_profiles:
        pressure: PRES
    processing_role: profile_index
    ...

The processing_method key contains a single-entry mapping from a method name to its named inputs. The inputs are references to other variable names in netcdf_variables.

Built-in method names#

Method

Computes

Required inputs

practical_salinity

SP via TEOS-10

conductivity, temperature, pressure

potential_temperature

θ via TEOS-10

salinity, temperature, pressure

potential_density_sigma0

σ₀ via TEOS-10

salinity, temperature, pressure, latitude, longitude

density

in-situ density

salinity, temperature, pressure, latitude, longitude

depth_from_pressure

depth (m)

pressure, latitude

find_profiles

profile index and direction

pressure

distance_over_ground

cumulative distance

latitude, longitude

gps_fixes_from_nav

sparse GPS fix variable (SeaExplorer only)

role, lat_source, lon_source

GPS fix variables (SeaExplorer OG 1.0)#

OG 1.0 requires three sparse variables that record the glider’s actual GPS surface fixes: LATITUDE_GPS, LONGITUDE_GPS, and TIME_GPS. These are non-NaN only at the measurement timestamps that are closest to a real GPS fix; all other values are NaN. They are derived from the SeaExplorer navigation (gli) files, not from the sensor payload.

Use processing_method: gps_fixes_from_nav to declare them in the YAML. Three separate entries are needed, one per output variable, each with a role input that selects which array to write (latitude, longitude, or time). The lat_source and lon_source inputs name the NMEA-format columns in the gli file (almost always Lat and Lon).

LATITUDE_GPS:
  processing_method:
    gps_fixes_from_nav:
      role:       latitude
      lat_source: Lat
      lon_source: Lon
  long_name:     Latitude of each GPS surface fix
  standard_name: latitude
  units:         degrees_north
  observation_type: measured
  vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/LAT/

LONGITUDE_GPS:
  processing_method:
    gps_fixes_from_nav:
      role:       longitude
      lat_source: Lat
      lon_source: Lon
  long_name:     Longitude of each GPS surface fix
  standard_name: longitude
  units:         degrees_east
  observation_type: measured
  vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/LON/

TIME_GPS:
  processing_method:
    gps_fixes_from_nav:
      role:       time
      lat_source: Lat
      lon_source: Lon
  long_name:     Time of each GPS surface fix
  calendar:      gregorian
  units:         seconds since 1970-01-01T00:00:00Z
  observation_type: measured

Pyglider reads the merged gli parquet file, filters to rows where DeadReckoning is 0 (or NavState is not 116 when DeadReckoning is absent), converts NMEA coordinates to decimal degrees, and maps each fix to the nearest timestamp on the sensor time grid. The three variables are produced in one pass; the role input determines which result is assigned to each YAML entry. All other YAML attributes (units, long_name, vocabulary, etc.) are written to the variable as usual.

Custom methods#

If the method name contains a . it is treated as a dotted Python import path. The function must have the signature:

def my_method(ds: xr.Dataset, inputs: dict, output_name: str) -> xr.DataArray:
    ...

where inputs is {role_name: variable_name_in_ds, ...} as declared in the YAML.

Dimension handling#

The time dimension is always used internally. On write, if output_dimension is set to something other than time, the dimension is renamed just before the file is saved. On read (e.g. when loading a timeseries file to make grids or profiles), pyglider detects the non-standard dimension by finding the coordinate with standard_name: time and renames it back to time. This is transparent to any intermediate processing steps.

OG 1.0 examples#

Slocum#

# OG 1.0 compliant deployment YAML for dfo-rosie713-20190615
# Based on tests/example-slocum/deploymentRealtime.yml
#
# Key differences from IOOS GDAC format:
#   - output_dimension: N_MEASUREMENTS (instead of time as dimension)
#   - Variable names follow OG1.0 vocabulary (uppercase, e.g. CNDC, TEMP, PRES)
#   - processing_role: maps output variable name to its role in the processing pipeline
#   - processing_method: specifies how derived variables are computed and from which inputs
#   - QC variables are auto-named as {varname}_QC

output_dimension: N_MEASUREMENTS
output_conventions: OG-1.0

metadata:
  acknowledgement: Funding from Fisheries and Oceans Canada, Canadian Foundation
                   for Innovation, BC Knowledge Development Fund
  comment:        "Calvert Island test deployment June 2019."
  contributor_name: James Pegg, Jody Klymak, Tetjana Ross, Cailin Burmaster
  contributor_email: jpegg@uvic.ca, jklymak@uvic.ca, tross@uvic.ca, cburmaster@uvic.ca
  contributor_role: Technical Coordinator, PI, PI, Operator
  contributor_role_vocabulary: "http://vocab.nerc.ac.uk/collection/W08/current/CONT0005/,http://vocab.nerc.ac.uk/collection/W08/current/CONT0004/,http://vocab.nerc.ac.uk/collection/W08/current/CONT0004/,http://vocab.nerc.ac.uk/collection/W08/current/CONT0003/"
  creator_email: jklymak@uvic.ca
  creator_name:  Jody Klymak
  creator_url:   http://cproof.uvic.ca
  data_mode: 'R'
  deployment_id: '1'
  deployment_name: 'dfo-rosie713-20190615'
  deployment_start: '2019-06-15'
  deployment_end: '2019-06-30'
  glider_name: dfo-rosie
  glider_serial: '713'
  glider_model: Slocum G3 Deep
  glider_instrument_name: slocum
  glider_wmo: "999999"
  institution: C-PROOF
  contributing_institutions: C-PROOF
  contributing_institutions_role: Operator
  contributing_institutions_role_vocabulary: "http://vocab.nerc.ac.uk/collection/W08/current/CONT0003/"
  keywords: "AUVS, Autonomous Underwater Vehicles, Oceans, Ocean Pressure,
             Water Pressure, Oceans, Ocean Temperature, Water Temperature,
             Oceans, Salinity/Density, Conductivity, Oceans,
             Salinity/Density, Density, Oceans, Salinity/Density, Salinity"
  keywords_vocabulary: GCMD Science Keywords
  license: "This data may be redistributed and used without restriction or
            warranty"
  metadata_link: "https://cproof.uvic.ca"
  metadata_conventions: CF-1.10, ACDD-1.3, OG-1.0
  Conventions: CF-1.10, ACDD-1.3, OG-1.0
  naming_authority: "ca.uvic.cproof"
  platform:         "sub-surface gliders"
  platform_serial:  "713"
  platform_vocabulary: "http://vocab.nerc.ac.uk/collection/L06/current/27/"
  platform_type:    "Slocum Glider"
  processing_level: "Data provided as is with no expressed or implied
                     assurance of quality assurance or quality control."
  project: SaanichInletTest19
  project_url: http://cproof.uvic.ca
  publisher_email: jklymak@uvic.ca
  publisher_name:  Jody Klymak
  publisher_url:   https://cproof.uvic.ca
  references:     cproof toolbox URL
  rtqc_method:    "No QC applied"
  sea_name:   Coastal Waters of Southeast Alaska and British Columbia
  source:     Observational data from a profiling glider.
  standard_name_vocabulary: CF Standard Name Table v83
  summary: Manufacturer test in Saanich Inlet.
  transmission_system: IRRIDIUM
  wmo_id: "999999"
  deployment_vessel: "Vellela Vellela"
  deployment_station: "Imperial Eagle"
  deployment_latitude: "48.873728"
  deployment_longitude: "-125.212218"


glider_devices:
  pressure:
    make: Micron
    model: Pressure
    serial: '104702'
  ctd:
    sensor_name: SENSOR_CTD
    long_name: CTD Metadata
    make_model: Seabird SlocumCTD
    maker: Seabird Scientific
    model: SlocumCTD
    type: CTD
    type_vocabulary: "https://vocab.nerc.ac.uk/collection/L05/current"
    # pyglider-only fields (used for processing, not written to netCDF):
    make: Seabird
    serial: '9507'
    factory_calibrated: " "
    calibration_date: " "
    calibration_report: " "
    comment: 'Constants for ctd_9507 are found using dfo-rosie713-20230810.'
    Thermal_lag_constants_[alpha,tau]: [0.2, 2]
    dTdC: 0
  optics:
    sensor_name: SENSOR_FLUOROMETER
    long_name: Fluorometer Metadata
    make_model: Wetlabs FLBBCDSLC
    maker: Wetlabs
    model: FLBBCDSLC
    type: fluorometer_chla
    type_vocabulary: "http://vocab.nerc.ac.uk/collection/R25/current/"
    # pyglider-only:
    make: Wetlabs
    serial: '5059'
  oxygen:
    sensor_name: SENSOR_DOXY
    long_name: Oxygen Sensor Metadata
    make_model: AADI Optode4831
    maker: AADI
    model: Optode4831
    type: OPTODE_DOXY
    type_vocabulary: "http://vocab.nerc.ac.uk/collection/R25/current/"
    # pyglider-only:
    make: AADI
    serial: '665'


netcdf_variables:

  # -------------------------------------------------------------------------
  # Coordinates — processing_role tells the pipeline what each variable is
  # -------------------------------------------------------------------------

  TIME:
    source:        sci_m_present_time
    processing_role: time
    long_name:     Time elapsed since 1970-01-01T00:00:00Z
    standard_name: time
    calendar:      gregorian
    units:         seconds since 1970-01-01T00:00:00Z
    axis:          T
    observation_type: measured
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/TIME/

  LATITUDE:
    source:        m_lat
    processing_role: latitude
    long_name:     Latitude north (WGS84)
    standard_name: latitude
    units:         degrees_north
    axis:          Y
    comment:       "Estimated between surface fixes"
    observation_type: measured
    reference:     WGS84
    valid_max:     90.0
    valid_min:     -90.0
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/LAT/

  LONGITUDE:
    source:        m_lon
    processing_role: longitude
    long_name:     Longitude east (WGS84)
    standard_name: longitude
    units:         degrees_east
    axis:          X
    comment:       "Estimated between surface fixes"
    observation_type: measured
    reference:     WGS84
    valid_max:     180.0
    valid_min:     -180.0
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/LON/

  # -------------------------------------------------------------------------
  # Measured CTD variables
  # -------------------------------------------------------------------------

  CNDC:
    source:        sci_water_cond
    processing_role: conductivity
    long_name:     Electrical conductivity of the water body by CTD
    standard_name: sea_water_electrical_conductivity
    units:         S m-1
    sensor:        SENSOR_CTD
    valid_min:     0.0
    valid_max:     10.0
    observation_type: measured
    accuracy:      0.0003
    precision:     0.0001
    resolution:    0.00002
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/CNDC/

  TEMP:
    source:        sci_water_temp
    processing_role: temperature
    long_name:     Temperature of the water body by CTD
    standard_name: sea_water_temperature
    units:         degree_C
    sensor:        SENSOR_CTD
    valid_min:     -5.0
    valid_max:     50.0
    observation_type: measured
    accuracy:      0.002
    precision:     0.001
    resolution:    0.0002
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/TEMP/

  PRES:
    source:        sci_water_pressure
    processing_role: pressure
    long_name:     Pressure (measured variable)
    standard_name: sea_water_pressure
    units:         dbar
    conversion:    bar2dbar
    sensor:        SENSOR_CTD
    valid_min:     0.0
    valid_max:     2000.0
    positive:      down
    reference_datum: sea-surface
    observation_type: measured
    accuracy:      1.0
    precision:     2.0
    resolution:    0.02
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/PRES/

  # -------------------------------------------------------------------------
  # Derived coordinate variables — processing_method specifies inputs
  # -------------------------------------------------------------------------

  DEPTH:
    processing_method:
      depth_from_pressure:
        pressure: PRES
        latitude: LATITUDE
    processing_role: depth
    long_name:     Depth below surface of the water body
    standard_name: depth
    units:         m
    axis:          Z
    positive:      down
    reference_datum: sea-surface
    observation_type: calculated
    valid_min:     0.0
    valid_max:     2000.0
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/DEPTH/

  PROFILE_NUMBER:
    processing_method:
      find_profiles:
        pressure: PRES
    processing_role: profile_index
    long_name:     Profile number
    comment:       "PROFILE_NUMBER increments by one each time the glider starts
                    an ascending or descending profile."
    valid_min:     1
    valid_max:     2147483647

  PROFILE_DIRECTION:
    processing_method:
      find_profiles:
        pressure: PRES
    processing_role: profile_direction
    long_name:     Vertical direction of profile
    comment:       "1 = descending, -1 = ascending, 0 = not in a profile"

  DISTANCE_OVER_GROUND:
    processing_method:
      distance_over_ground:
        latitude: LATITUDE
        longitude: LONGITUDE
    long_name:     Distance over ground flown since mission start
    units:         km

  # -------------------------------------------------------------------------
  # Derived thermodynamic variables (TEOS-10)
  # The teos_10 method computes salinity, potential temperature, potential
  # density, and density together.  Each entry specifies which output to
  # assign and which inputs to use, allowing for multiple CTDs.
  # -------------------------------------------------------------------------

  PSAL:
    processing_method:
      practical_salinity:
        conductivity: CNDC
        temperature:  TEMP
        pressure:     PRES
    long_name:     Sea water practical salinity
    standard_name: sea_water_practical_salinity
    units:         "1"
    comment:       "raw, uncorrected salinity"
    valid_min:     0.0
    valid_max:     40.0
    observation_type: calculated
    sensor:        SENSOR_CTD
    accuracy:      0.01
    precision:     0.01
    resolution:    0.001
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/PSAL/

  THETA:
    processing_method:
      potential_temperature:
        salinity:     PSAL
        temperature:  TEMP
        pressure:     PRES
    long_name:     Water potential temperature
    standard_name: sea_water_potential_temperature
    units:         degree_C
    observation_type: calculated
    accuracy:      0.002
    precision:     0.001
    resolution:    0.0001
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/THETA/

  SIGTHETA:
    processing_method:
      potential_density_sigma0:
        salinity:     PSAL
        temperature:  TEMP
        pressure:     PRES
        latitude:     LATITUDE
        longitude:    LONGITUDE
    long_name:     Water potential density referenced to surface
    standard_name: sea_water_potential_density
    units:         kg m-3
    observation_type: calculated
    accuracy:      0.01
    precision:     0.01
    resolution:    0.001
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/SIGTHETA/

  DENSITY:
    processing_method:
      density:
        salinity:     PSAL
        temperature:  TEMP
        pressure:     PRES
        latitude:     LATITUDE
        longitude:    LONGITUDE
    long_name:     Density
    standard_name: sea_water_density
    units:         kg m-3
    observation_type: calculated
    valid_min:     990.0
    valid_max:     1040.0
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/DENSITY/

  # -------------------------------------------------------------------------
  # Other measured variables
  # -------------------------------------------------------------------------

  HEADING:
    source:        m_heading
    long_name:     Glider heading angle
    standard_name: platform_orientation
    units:         degrees
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/HEADING/

  PITCH:
    source:        m_pitch
    long_name:     Glider pitch angle
    standard_name: platform_pitch_angle
    units:         degrees
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/PITCH/

  ROLL:
    source:        m_roll
    long_name:     Glider roll angle
    standard_name: platform_roll_angle
    units:         degrees
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/ROLL/

  WAYPOINT_LATITUDE:
    source:        c_wpt_lat
    long_name:     Waypoint latitude
    standard_name: latitude
    units:         degrees_north

  WAYPOINT_LONGITUDE:
    source:        c_wpt_lon
    long_name:     Waypoint longitude
    standard_name: longitude
    units:         degrees_east

  CHLA:
    source:        sci_flbbcd_chlor_units
    long_name:     Chlorophyll-a concentration
    standard_name: mass_concentration_of_chlorophyll_a_in_sea_water
    units:         mg m-3
    sensor:        SENSOR_FLUOROMETER
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/CHLA/

  CDOM:
    source:        sci_flbbcd_cdom_units
    long_name:     CDOM
    units:         ppb
    sensor:        SENSOR_FLUOROMETER
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/CDOM/

  BBP700:
    source:        sci_flbbcd_bb_units
    long_name:     700 nm wavelength backscatter
    units:         "1"
    sensor:        SENSOR_FLUOROMETER
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/BBP700/

  DOXY:
    source:        sci_oxy4_oxygen
    long_name:     Dissolved oxygen
    standard_name: moles_of_oxygen_per_unit_mass_in_sea_water
    units:         umol kg-1
    sensor:        SENSOR_DOXY
    valid_min:     0.0
    valid_max:     500.0
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/DOXY/

  # -------------------------------------------------------------------------
  # QC variables — auto-named {varname}_QC, no source needed
  # average_method: QC_protocol tells the gridder to use max-flag binning
  # -------------------------------------------------------------------------

  CNDC_QC:
    average_method: QC_protocol

  TEMP_QC:
    average_method: QC_protocol

  PRES_QC:
    average_method: QC_protocol

  PSAL_QC:
    average_method: QC_protocol

  DENSITY_QC:
    average_method: QC_protocol

  DOXY_QC:
    average_method: QC_protocol

  CHLA_QC:
    average_method: QC_protocol


scalar_variables:

  PLATFORM_MODEL:
    from_metadata: glider_model
    long_name: Glider model name

  WMO_IDENTIFIER:
    from_metadata: glider_wmo
    long_name: WMO identifier

  PLATFORM_SERIAL_NUMBER:
    from_metadata: glider_serial
    long_name: Glider serial number

  DEPLOYMENT_TIME:
    from_metadata: deployment_start
    long_name: Deployment start time
    units: seconds since 1970-01-01T00:00:00Z

  DEPLOYMENT_LATITUDE:
    from_metadata: deployment_latitude
    long_name: Deployment latitude
    units: degrees_north

  DEPLOYMENT_LONGITUDE:
    from_metadata: deployment_longitude
    long_name: Deployment longitude
    units: degrees_east


# NOTE: profile_variables is used by extract_timeseries_profiles (IOOS GDAC
# profile files).  OG 1.0 does not define separate per-profile files — the
# trajectory file itself is the deliverable.  This section is retained for
# users who also want IOOS GDAC profile output, but variable names will need
# updating if OG 1.0 naming is required there too.
profile_variables:
  profile_id:
    comment: Sequential profile number within the trajectory.
    long_name: 'Profile ID'
    valid_max: 2147483646
    valid_min: 1

  profile_time:
    comment:           Timestamp corresponding to the mid-point of the profile
    long_name:         Profile Center Time
    observation_type:  calculated
    standard_name:     time

  profile_time_start:
    comment:           Timestamp corresponding to the start of the profile
    long_name:         Profile Start Time
    observation_type:  calculated
    standard_name:     time

  profile_time_end:
    comment:           Timestamp corresponding to the end of the profile
    long_name:         Profile End Time
    observation_type:  calculated
    standard_name:     time

  profile_lat:
    comment:           Interpolated latitude at the mid-point of the profile
    long_name:         Profile Center Latitude
    observation_type:  calculated
    standard_name:     latitude
    units:             degrees_north
    valid_max:         90.0
    valid_min:         -90.0

  profile_lon:
    comment:           Interpolated longitude at the mid-point of the profile
    long_name:         Profile Center Longitude
    observation_type:  calculated
    standard_name:     longitude
    units:             degrees_east
    valid_max:         180.0
    valid_min:         -180.0

  u:
    comment:  Depth-averaged eastward current estimate
    long_name:         Depth-Averaged Eastward Sea Water Velocity
    observation_type:  calculated
    standard_name:     eastward_sea_water_velocity
    units:             m s-1
    valid_max:         10.0
    valid_min:         -10.0

  v:
    comment:  Depth-averaged northward current estimate
    long_name:         Depth-Averaged Northward Sea Water Velocity
    observation_type:  calculated
    standard_name:     northward_sea_water_velocity
    units:             m s-1
    valid_max:         10.0
    valid_min:         -10.0

  lon_uv:
    comment:           Not computed
    long_name:         Longitude
    observation_type:  calculated
    standard_name:     longitude
    units:             degrees_east
    valid_max:         180.0
    valid_min:         -180.0

  lat_uv:
    comment:           Not computed
    long_name:         Latitude
    observation_type:  calculated
    standard_name:     latitude
    units:             degrees_north
    valid_max:         90.0
    valid_min:         -90.0

  time_uv:
    comment:       Not computed
    long_name:     Time
    standard_name: time
    calendar:      gregorian
    units:         seconds since 1970-01-01T00:00:00Z
    observation_type: calculated

  instrument_ctd:
    comment:    pumped CTD
    calibration_date:     "2017-12-24"
    factory_calibrated:  "yes"
    long_name:           Seabird Glider Payload CTD
    make_model:          Seabird GPCTD
    serial_number:       "9507"
    type:                platform
# -*- coding: utf-8 -*-
"""
Process dfo-rosie713 realtime data to OG 1.0 format.

This script mirrors tests/example-slocum/process_deploymentRealTime.py but
targets OG 1.0 output.  It uses the same raw binary data; only the YAML and
output directories differ.

NOTE: This script documents the *intended* API once pyglider has been updated
to support processing_role, processing_method, output_dimension, and the
_load_dataset/_save_dataset helpers.  Some calls will not work correctly until
those changes are made to slocum.py, utils.py, and ncprocess.py.
"""

import logging
import os
import pyglider.ncprocess as ncprocess
import pyglider.slocum as slocum
import pyglider.utils as pgutils

logging.basicConfig(level='INFO')

# Raw data lives in the existing example-slocum directory.
# Outputs go into subdirectories here.
binarydir = '../example-slocum/realtime_raw/'
cacdir    = '../example-slocum/cac/'

deploymentyaml = './deploymentRealtime_og10.yml'

l1tsdir  = './L0-timeseries-og10/'
# profiledir = './L0-profiles-og10/'
griddir  = './L0-gridfiles-og10/'

# ------------------------------------------------------------------------
# Step 1: binary → OG 1.0 timeseries netCDF
#
# binary_to_timeseries will need to:
#   - read processing_role to identify pressure/temperature/conductivity/
#     latitude/longitude variables by role rather than by hardcoded name
#   - read processing_method entries to compute derived variables
#     (DEPTH, PSAL, SIGMA0, DENSITY, POTENTIAL_TEMPERATURE, PROFILE_NUMBER,
#      PROFILE_DIRECTION, DISTANCE_OVER_GROUND) using the named inputs
#   - call _save_dataset instead of ds.to_netcdf so that the time dimension
#     is renamed to N_MEASUREMENTS before writing
# ------------------------------------------------------------------------
outname = slocum.binary_to_timeseries(
    binarydir, cacdir, l1tsdir, deploymentyaml,
    search='*.[s|t]bd',
    profile_filt_time=20,
    profile_min_time=20,
)

# ------------------------------------------------------------------------
# Step 2: timeseries → per-profile netCDF files (IOOS GDAC style)
#
# extract_timeseries_profiles will need to:
#   - call _load_dataset instead of xr.open_dataset so that the
#     N_MEASUREMENTS dimension is normalised back to time for processing
#   - use processing_role to find profile_index (PROFILE_NUMBER),
#     latitude (LATITUDE), longitude (LONGITUDE), etc.
#   - call _save_dataset when writing each profile file
#
# NOTE: OG 1.0 does not define per-profile files; the trajectory file is
# the primary deliverable.  This step produces IOOS GDAC profile files as
# a secondary output for users who need them.
# ------------------------------------------------------------------------
# ncprocess.extract_timeseries_profiles(outname, profiledir, deploymentyaml)

# ------------------------------------------------------------------------
# Step 3: timeseries → gridded netCDF
#
# make_gridfiles will need to:
#   - call _load_dataset to normalise the dimension on load
#   - use processing_role to find depth (DEPTH), latitude (LATITUDE),
#     longitude (LONGITUDE) for gridding axes
#   - call _save_dataset when writing the gridded file
# ------------------------------------------------------------------------
outname2 = ncprocess.make_gridfiles(outname, griddir, deploymentyaml)

SeaExplorer#

# OG 1.0 compliant deployment YAML for dfo-eva035-20190718
# Based on tests/example-data/example-seaexplorer/deploymentRealtime.yml
#
# Key differences from IOOS GDAC format:
#   - output_dimension: N_MEASUREMENTS (instead of time as dimension)
#   - Variable names follow OG 1.0 vocabulary (uppercase, e.g. CNDC, TEMP, PRES)
#   - processing_role: maps output variable name to its role in the pipeline
#   - processing_method: specifies how derived variables are computed and from which inputs
#   - QC variables are auto-named as {varname}_QC

output_dimension: N_MEASUREMENTS
output_conventions: OG-1.0

metadata:
  acknowledgement: Funding from Fisheries and Oceans Canada, Canadian Foundation
                   for Innovation, BC Knowledge Development Fund
  comment:        "Explorer Seamount cruise on Tully"
  contributor_name: James Pegg, Jody Klymak, Tetjana Ross
  contributor_email: jpegg@uvic.ca, jklymak@uvic.ca, tross@uvic.ca
  contributor_role: Technical Coordinator, PI, PI
  contributor_role_vocabulary: "http://vocab.nerc.ac.uk/collection/W08/current/CONT0005/,http://vocab.nerc.ac.uk/collection/W08/current/CONT0004/,http://vocab.nerc.ac.uk/collection/W08/current/CONT0004/"
  creator_email: jklymak@uvic.ca
  creator_name:  Jody Klymak
  creator_url:   http://cproof.uvic.ca
  data_mode: 'R'
  deployment_id: '1'
  deployment_name: 'dfo-eva035-20190718'
  deployment_start: '2019-07-18'
  deployment_end: '2019-12-30'
  glider_name: dfo-eva035
  glider_serial: '035'
  glider_model: SeaExplorer
  glider_instrument_name: seaexplorer
  glider_wmo: '999999'
  institution: C-PROOF
  contributing_institutions: C-PROOF
  contributing_institutions_role: Operator
  contributing_institutions_role_vocabulary: "http://vocab.nerc.ac.uk/collection/W08/current/CONT0003/"
  keywords: "AUVS, Autonomous Underwater Vehicles, Oceans, Ocean Pressure,
             Water Pressure, Oceans, Ocean Temperature, Water Temperature,
             Oceans, Salinity/Density, Conductivity, Oceans,
             Salinity/Density, Density, Oceans, Salinity/Density, Salinity"
  keywords_vocabulary: GCMD Science Keywords
  license: "This data may be redistributed and used without restriction or
            warranty"
  metadata_link: "https://cproof.uvic.ca"
  metadata_conventions: CF-1.10, ACDD-1.3, OG-1.0
  Conventions: CF-1.10, ACDD-1.3, OG-1.0
  naming_authority: "ca.uvic.cproof"
  platform:         "sub-surface gliders"
  platform_vocabulary: "http://vocab.nerc.ac.uk/collection/L06/current/27/"
  platform_type:    "SeaExplorer Glider"
  processing_level: "Data provided as is with no expressed or implied
                     assurance of quality assurance or quality control."
  project: ExplorerSeamount19
  project_url: http://cproof.uvic.ca
  publisher_email: jklymak@uvic.ca
  publisher_name:  Jody Klymak
  publisher_url:   http://cproof.uvic.ca
  references:     cproof toolbox URL
  rtqc_method:    "No QC applied"
  sea_name:   BC Coastal Waters
  source:     Observational data from a profiling glider.
  standard_name_vocabulary: CF Standard Name Table v83
  summary: Short deployment off Tully near Explorer Seamount.
  transmission_system: IRRIDIUM
  wmo_id: "999999"
  deployment_latitude: "48.91"
  deployment_longitude: "-130.61"


glider_devices:
  pressure:
    make: Micron
    model: Pressure
    serial: '104702'
  ctd:
    sensor_name: SENSOR_CTD
    long_name: CTD Metadata
    make_model: Seabird GPCTD
    maker: Seabird Scientific
    model: GPCTD
    type: CTD
    type_vocabulary: "https://vocab.nerc.ac.uk/collection/L05/current"
    # pyglider-only fields (used for processing, not written to netCDF):
    make: Seabird
    serial: '0278'
    factory_calibrated: "Yes"
    calibration_date: "02/11/2018"
    calibration_report: " "
    Thermal_lag_constants_[alpha,tau]: [0.34, 4.6]
    dTdC: 0
    comment: 'Constants were found using dfo-bb046-20220707.'
  optics:
    sensor_name: SENSOR_FLUOROMETER
    long_name: Fluorometer Metadata
    make_model: Wetlabs FLBBCDSLC
    maker: Wetlabs
    model: FLBBCDSLC
    type: fluorometer_chla
    type_vocabulary: "http://vocab.nerc.ac.uk/collection/R25/current/"
    # pyglider-only:
    make: Wetlabs
    serial: '4741'
  oxygen:
    make: AROD_FT
    model: Optode4831
    serial: '0022'


netcdf_variables:

  timebase:
    source:       GPCTD_TEMPERATURE

  # -------------------------------------------------------------------------
  # Coordinates — processing_role tells the pipeline what each variable is
  # -------------------------------------------------------------------------

  TIME:
    source:        time
    processing_role: time
    long_name:     Time elapsed since 1970-01-01T00:00:00Z
    standard_name: time
    calendar:      gregorian
    units:         seconds since 1970-01-01T00:00:00Z
    axis:          T
    observation_type: measured
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/TIME/

  LATITUDE:
    source:        NAV_LATITUDE
    processing_role: latitude
    long_name:     Latitude north (WGS84)
    standard_name: latitude
    units:         degrees_north
    axis:          Y
    conversion:    nmea2deg
    comment:       "Estimated between surface fixes"
    observation_type: measured
    reference:     WGS84
    valid_max:     90.0
    valid_min:     -90.0
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/LAT/

  LONGITUDE:
    source:        NAV_LONGITUDE
    processing_role: longitude
    long_name:     Longitude east (WGS84)
    standard_name: longitude
    units:         degrees_east
    axis:          X
    conversion:    nmea2deg
    comment:       "Estimated between surface fixes"
    observation_type: measured
    reference:     WGS84
    valid_max:     180.0
    valid_min:     -180.0
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/LON/

  # -------------------------------------------------------------------------
  # Measured CTD variables
  # -------------------------------------------------------------------------

  CNDC:
    source:        GPCTD_CONDUCTIVITY
    processing_role: conductivity
    long_name:     Electrical conductivity of the water body by CTD
    standard_name: sea_water_electrical_conductivity
    units:         S m-1
    sensor:        SENSOR_CTD
    valid_min:     0.0
    valid_max:     10.0
    observation_type: measured
    accuracy:      0.0003
    precision:     0.0001
    resolution:    0.00002
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/CNDC/

  TEMP:
    source:        GPCTD_TEMPERATURE
    processing_role: temperature
    long_name:     Temperature of the water body by CTD
    standard_name: sea_water_temperature
    units:         degree_C
    sensor:        SENSOR_CTD
    valid_min:     -5.0
    valid_max:     50.0
    observation_type: measured
    accuracy:      0.002
    precision:     0.001
    resolution:    0.0002
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/TEMP/

  PRES:
    source:        GPCTD_PRESSURE
    processing_role: pressure
    long_name:     Pressure (measured variable)
    standard_name: sea_water_pressure
    units:         dbar
    sensor:        SENSOR_CTD
    valid_min:     0.0
    valid_max:     2000.0
    positive:      down
    reference_datum: sea-surface
    observation_type: measured
    accuracy:      1.0
    precision:     2.0
    resolution:    0.02
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/PRES/

  # -------------------------------------------------------------------------
  # Derived coordinate variables — processing_method specifies inputs
  # -------------------------------------------------------------------------

  DEPTH:
    processing_method:
      depth_from_pressure:
        pressure: PRES
        latitude: LATITUDE
    processing_role: depth
    long_name:     Depth below surface of the water body
    standard_name: depth
    units:         m
    axis:          Z
    positive:      down
    reference_datum: sea-surface
    observation_type: calculated
    valid_min:     0.0
    valid_max:     2000.0
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/DEPTH/

  PROFILE_NUMBER:
    processing_method:
      find_profiles:
        pressure: PRES
    processing_role: profile_index
    long_name:     Profile number
    comment:       "PROFILE_NUMBER increments by one each time the glider starts
                    an ascending or descending profile."
    valid_min:     1
    valid_max:     2147483647

  PROFILE_DIRECTION:
    processing_method:
      find_profiles:
        pressure: PRES
    processing_role: profile_direction
    long_name:     Vertical direction of profile
    comment:       "1 = descending, -1 = ascending, 0 = not in a profile"

  DISTANCE_OVER_GROUND:
    processing_method:
      distance_over_ground:
        latitude: LATITUDE
        longitude: LONGITUDE
    long_name:     Distance over ground flown since mission start
    units:         km

  # -------------------------------------------------------------------------
  # Sparse GPS fix variables (non-NaN only at actual surface fixes)
  # -------------------------------------------------------------------------

  LATITUDE_GPS:
    processing_method:
      gps_fixes_from_nav:
        role:       latitude
        lat_source: Lat
        lon_source: Lon
    long_name:     Latitude of each GPS surface fix
    standard_name: latitude
    units:         degrees_north
    observation_type: measured
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/LAT/

  LONGITUDE_GPS:
    processing_method:
      gps_fixes_from_nav:
        role:       longitude
        lat_source: Lat
        lon_source: Lon
    long_name:     Longitude of each GPS surface fix
    standard_name: longitude
    units:         degrees_east
    observation_type: measured
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/LON/

  TIME_GPS:
    processing_method:
      gps_fixes_from_nav:
        role:       time
        lat_source: Lat
        lon_source: Lon
    long_name:     Time of each GPS surface fix
    calendar:      gregorian
    units:         seconds since 1970-01-01T00:00:00Z
    observation_type: measured

  # -------------------------------------------------------------------------
  # Derived thermodynamic variables (TEOS-10)
  # -------------------------------------------------------------------------

  PSAL:
    processing_method:
      practical_salinity:
        conductivity: CNDC
        temperature:  TEMP
        pressure:     PRES
    long_name:     Sea water practical salinity
    standard_name: sea_water_practical_salinity
    units:         "1"
    comment:       "raw, uncorrected salinity"
    valid_min:     0.0
    valid_max:     40.0
    observation_type: calculated
    sensor:        SENSOR_CTD
    accuracy:      0.01
    precision:     0.01
    resolution:    0.001
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/PSAL/

  THETA:
    processing_method:
      potential_temperature:
        salinity:     PSAL
        temperature:  TEMP
        pressure:     PRES
    long_name:     Water potential temperature
    standard_name: sea_water_potential_temperature
    units:         degree_C
    observation_type: calculated
    accuracy:      0.002
    precision:     0.001
    resolution:    0.0001
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/THETA/

  SIGTHETA:
    processing_method:
      potential_density_sigma0:
        salinity:     PSAL
        temperature:  TEMP
        pressure:     PRES
        latitude:     LATITUDE
        longitude:    LONGITUDE
    long_name:     Water potential density referenced to surface
    standard_name: sea_water_potential_density
    units:         kg m-3
    observation_type: calculated
    accuracy:      0.01
    precision:     0.01
    resolution:    0.001
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/SIGTHETA/

  DENSITY:
    processing_method:
      density:
        salinity:     PSAL
        temperature:  TEMP
        pressure:     PRES
        latitude:     LATITUDE
        longitude:    LONGITUDE
    long_name:     Density
    standard_name: sea_water_density
    units:         kg m-3
    observation_type: calculated
    valid_min:     990.0
    valid_max:     1040.0
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/DENSITY/

  # -------------------------------------------------------------------------
  # Other measured variables
  # -------------------------------------------------------------------------

  HEADING:
    source:        Heading
    long_name:     Glider heading angle
    standard_name: platform_orientation
    units:         degrees
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/HEADING/

  PITCH:
    source:        Pitch
    long_name:     Glider pitch angle
    standard_name: platform_pitch_angle
    units:         degrees
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/PITCH/

  ROLL:
    source:        Roll
    long_name:     Glider roll angle
    standard_name: platform_roll_angle
    units:         degrees
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/ROLL/

  CHLA:
    source:        FLBBCD_CHL_SCALED
    long_name:     Chlorophyll-a concentration
    standard_name: mass_concentration_of_chlorophyll_a_in_sea_water
    units:         mg m-3
    sensor:        SENSOR_FLUOROMETER
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/CHLA/

  CDOM:
    source:        FLBBCD_CDOM_SCALED
    long_name:     CDOM
    units:         ppb
    sensor:        SENSOR_FLUOROMETER
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/CDOM/

  BBP700:
    source:        FLBBCD_BB_700_SCALED
    long_name:     700 nm wavelength backscatter
    units:         "1"
    sensor:        SENSOR_FLUOROMETER
    vocabulary:    http://vocab.nerc.ac.uk/collection/OG1/current/BBP700/

  # -------------------------------------------------------------------------
  # QC variables
  # -------------------------------------------------------------------------

  CNDC_QC:
    average_method: QC_protocol

  TEMP_QC:
    average_method: QC_protocol

  PRES_QC:
    average_method: QC_protocol

  PSAL_QC:
    average_method: QC_protocol

  DENSITY_QC:
    average_method: QC_protocol

  CHLA_QC:
    average_method: QC_protocol


scalar_variables:

  PLATFORM_MODEL:
    from_metadata: glider_model
    long_name: Glider model name

  WMO_IDENTIFIER:
    from_metadata: glider_wmo
    long_name: WMO identifier

  PLATFORM_SERIAL_NUMBER:
    from_metadata: glider_serial
    long_name: Glider serial number

  DEPLOYMENT_TIME:
    from_metadata: deployment_start
    long_name: Deployment start time
    units: seconds since 1970-01-01T00:00:00Z

  DEPLOYMENT_LATITUDE:
    from_metadata: deployment_latitude
    long_name: Deployment latitude
    units: degrees_north

  DEPLOYMENT_LONGITUDE:
    from_metadata: deployment_longitude
    long_name: Deployment longitude
    units: degrees_east
# -*- coding: utf-8 -*-
"""
Process dfo-eva035 realtime data to OG 1.0 format.

This script mirrors process_deploymentRealTime.py but targets OG 1.0 output.
It reuses the raw parquet files produced by the standard pipeline; only the
YAML and output directories differ.
"""

import logging
import pyglider.seaexplorer as seaexplorer
import pyglider.ncprocess as ncprocess

logging.basicConfig(level='INFO')

rawncdir       = './realtime_rawnc/'
deploymentyaml = './deploymentRealtime_og10.yml'
l0tsdir        = './L0-timeseries-og10/'
griddir        = './L0-gridfiles-og10/'

# Step 1: timeseries (raw_to_rawnc and merge_parquet already run by the
# standard pipeline — we reuse the parquet files in realtime_rawnc/).
outname = seaexplorer.raw_to_timeseries(
    rawncdir, l0tsdir, deploymentyaml, kind='sub'
)

# Step 2: gridded netCDF
outname2 = ncprocess.make_gridfiles(outname, griddir, deploymentyaml)