pyglider.utils#

Utilities that are used for processing scripts.

Attributes#

Functions#

get_distance_over_ground(ds[, varnames])

Add a distance over ground variable to a netcdf structure

get_glider_depth(ds[, varnames])

Get glider depth from pressure sensor.

get_profiles_new(ds[, min_dp, filt_time, ...])

Find profiles in a glider timeseries:

get_derived_eos_raw(ds[, varnames])

Calculate salinity, potential density, density, and potential temperature

fill_metadata(ds, metadata, sensor_data[, varnames])

Add metadata to a Dataset

nmea2deg(nmea)

Convert a NMEA float to a decimal degree float. e.g. -12640.3232 = -126.6721

oxygen_concentration_correction(ds, ncvar)

Correct oxygen signal for salinity signal

_get_varnames(deployment)

Build a role → variable-name mapping from the deployment YAML.

_resolve_role(ds, varnames, role)

Resolve a processing role to its variable name and verify it exists in ds.

_dispatch_processing_methods(ds, ncvar)

Compute all variables in ncvar that carry a processing_method entry.

_load_dataset(filename)

Open a netCDF dataset and normalise the primary dimension to time.

_save_dataset(ds, filename, deployment, **kwargs)

Write a dataset to netCDF, renaming the time dimension when required.

flag_conductivity_in_depth_space(ts0[, d_profile, dz, ...])

Flag conductivity as QC1 (good) or QC4 (bad) using profile bins and depth bins.

interpolate_over_salinity_NANs(ds)

Function applied to the dataset before finding the internal temperature.

apply_thermal_lag(ds, fn, alpha, tau[, interpolate_filter])

Function from Garau et al. (2011): estimates temperature inside the

flag_CTD_data(ts0[, clean_stdev, accuracy])

Wrapper function to flag CTD data.

adjust_CTD(ts, deploymentyaml[, alpha, tau, dTdC, ...])

Pulls correction constants from deploymentyaml. If alpha, tau, or dTdC

maskQC4(ds)

Optional:

Module Contents#

pyglider.utils.get_distance_over_ground(ds, varnames=None)#

Add a distance over ground variable to a netcdf structure

Parameters:
dsxarray.Dataset

Must have variable latitude and longitude indexed by time dimension.

varnamesdict, optional

Role → variable-name mapping from _get_varnames(). Uses roles latitude and longitude. Defaults to those literal names when absent.

Returns:
ds.xarray.Dataset

With distance_over_ground key.

pyglider.utils.get_glider_depth(ds, varnames=None)#

Get glider depth from pressure sensor.

Parameters:
dsxarray.Dataset

Must have variables pressure and latitude indexed by time dimension. Assume pressure sensor in dbar.

varnamesdict, optional

Role → variable-name mapping from _get_varnames(). Uses roles pressure, latitude, and depth (output). Defaults to those literal names when absent.

Returns:
ds.xarray.Dataset

With depth variable (named per varnames['depth'], default 'depth') added.

pyglider.utils.get_profiles_new(ds, min_dp=10.0, filt_time=100, profile_min_time=300, varnames=None)#

Find profiles in a glider timeseries:

Parameters:
dsxarray.Dataset

Must have time coordinate and pressure as a variable

min_dpfloat, default=10.0

Minimum distance a profile must transit to be considered a profile, in dbar.

filt_timefloat, default=100

Approximate length of time filter, in seconds. Note that the filter is really implemented by sample, so the number of samples is filt_time / dt where dt is the median time between samples in the time series.

profile_min_timefloat, default=300

Minimum time length of profile in s.

varnamesdict, optional

Role → variable-name mapping from _get_varnames(). Uses roles pressure and profile_index (output). Defaults to those literal names when absent.

pyglider.utils.get_derived_eos_raw(ds, varnames=None)#

Calculate salinity, potential density, density, and potential temperature

Parameters:
dsxarray.Dataset

Must have time coordinate and temperature, conductivity, pressure, and latitude and longitude as variables.

varnamesdict, optional

Role → variable-name mapping from _get_varnames(). Uses roles conductivity, temperature, pressure, latitude, and longitude for inputs. Output variables are always written as salinity, potential_density, density, and potential_temperature (IOOS GDAC convention); for OG 1.0 use processing_method in the deployment YAML instead.

Returns:
dsxarray.Dataset

with salinity, potential_density, density, and potential_temperature as new variables.

Notes

Thermodynamic variables derived from the Gibbs seawater toolbox import gsw.

  • salinity::

    gsw.conversions.SP_from_C(r, ds.temperature, ds.pressure)

  • potential_density::
    sa = gsw.SA_from_SP(ds[‘salinity’], ds[‘pressure’],

    ds[‘longitude’], ds[‘latitude’])

    ct = gsw.CT_from_t(sa, ds[‘temperature’], ds[‘pressure’]) ds[‘potential_density’] = ((‘time’),

    1000 + gsw.density.sigma0(sa, ct).values)

  • density::
    ds[‘density’] = ((‘time’), gsw.density.rho(

    ds.salinity, ds.temperature, ds.pressure).values)

  • potential_temperature::
    ds[‘potential_temperature’] = ((‘time’), gsw.conversions.pt0_from_t(

    ds.salinity, ds.temperature, ds.pressure).values)

pyglider.utils.fill_metadata(ds, metadata, sensor_data, varnames=None)#

Add metadata to a Dataset

Parameters:
dsxarray.Dataset

Dataset must have longitude, latitude, and time values

metadatadict

dictionary of attributes to add to the global attributes. Usually taken from deployment.yml file.

sensor_datadict

dictionary of device data to add to the global attributes.

varnamesdict, optional

Role → variable-name mapping from _get_varnames(). Uses roles latitude and longitude. Defaults to those literal names when absent.

Returns:
dsxarray.Dataset

Dataset with attributes filled out.

pyglider.utils.nmea2deg(nmea)#

Convert a NMEA float to a decimal degree float. e.g. -12640.3232 = -126.6721

pyglider.utils.oxygen_concentration_correction(ds, ncvar)#

Correct oxygen signal for salinity signal

Parameters:
dsxarray.Dataset

Should have oxygen_concentration, potential_temperature, salinity, on a time coordinate.

ncvardict

dictionary with netcdf variable definitions in it. Should have oxygen_concentration as a key, which itself should specify a reference_salinity and have correct_oxygen set to "True".

Returns:
dsxarray.Dataset

With oxygen_concentration corrected for the salinity effect.

pyglider.utils._get_varnames(deployment)#

Build a role → variable-name mapping from the deployment YAML.

Each entry in netcdf_variables that carries a processing_role attribute contributes a {role: varname} pair. For YAML files that follow the IOOS GDAC convention (no explicit processing_role), any variable whose name matches a known role is mapped to itself so that existing processing code continues to work without changes.

Parameters:
deploymentdict

Deployment metadata loaded from the YAML file.

Returns:
varnamesdict

Mapping of role string to the actual variable name used in the dataset, e.g. {'pressure': 'PRES', 'temperature': 'TEMP', ...}.

Notes

Required roles — looked up directly by the pipeline; must be declared with processing_role when non-standard variable names are used: time, latitude, longitude, pressure, depth, profile_index.

Legacy fallback roles — only used when no processing_method covers the relevant derived variable; do not need processing_role if processing_method entries name their inputs explicitly: temperature, conductivity, salinity, profile_direction, oxygen_concentration.

pyglider.utils._resolve_role(ds, varnames, role)#

Resolve a processing role to its variable name and verify it exists in ds.

Parameters:
dsxarray.Dataset

Dataset to check against.

varnamesdict

Role → variable-name mapping from _get_varnames().

rolestr

The role to look up (e.g. 'latitude', 'depth').

Returns:
str

The variable name in ds that corresponds to role.

Raises:
ValueError

If the resolved name is not present in ds. The message names the role, the resolved variable name, and how to fix the YAML.

pyglider.utils.BUILTIN_METHODS#
pyglider.utils._dispatch_processing_methods(ds, ncvar)#

Compute all variables in ncvar that carry a processing_method entry.

Variables are processed in YAML order, so later entries can depend on earlier ones (e.g. POTENTIAL_TEMPERATURE uses PSAL which must be computed first). Methods already handled by explicit utility calls (depth_from_pressure, find_profiles) are not recomputed, but any YAML attributes (e.g. vocabulary, long_name) are applied to the variable if it already exists in ds — so callers should invoke the explicit utility functions before calling this dispatcher.

Parameters:
dsxarray.Dataset

Dataset containing measured variables.

ncvardict

Mapping of variable name → YAML attributes (netcdf_variables).

Returns:
dsxarray.Dataset

Dataset with dispatched derived variables added.

pyglider.utils._load_dataset(filename)#

Open a netCDF dataset and normalise the primary dimension to time.

Files saved with a custom output_dimension (e.g. N_MEASUREMENTS for OG 1.0) carry their time coordinate under a different dimension name. This function detects that case via standard_name: time on the coordinate variable and renames both the dimension and (if necessary) the coordinate back to time so the rest of the processing pipeline can use a single consistent name.

Parameters:
filenamestr or Path

Path to the netCDF file.

Returns:
dsxarray.Dataset

Dataset with time as the primary dimension and coordinate.

pyglider.utils._save_dataset(ds, filename, deployment, **kwargs)#

Write a dataset to netCDF, renaming the time dimension when required.

The deployment YAML may specify output_dimension (e.g. N_MEASUREMENTS for OG 1.0). If set, the time dimension is renamed to output_dimension before writing so the file conforms to the target convention. Internal processing always works with time; this rename only affects what is written to disk.

Parameters:
dsxarray.Dataset

Dataset whose primary dimension is time.

filenamestr or Path

Output file path.

deploymentdict

Deployment metadata loaded from the YAML file. The optional key output_dimension controls the dimension name written to the file (default: 'time').

**kwargs

Passed through to xarray.Dataset.to_netcdf().

pyglider.utils.flag_conductivity_in_depth_space(ts0, d_profile=50, dz=5, clean_stdev=3, accuracy=None)#

Flag conductivity as QC1 (good) or QC4 (bad) using profile bins and depth bins.

Conductivity data are grouped into profile bins of width d_profile and depth bins of width dz. Within each depth bin, points farther than 5 standard deviations from the mean are temporarily excluded. A second mean and standard deviation are then computed from the remaining points, and values farther than clean_stdev standard deviations from that cleaned mean are flagged as QC4. If accuracy is provided, deviations smaller than accuracy are not flagged.

Parameters:
ts0xarray.Dataset

Timeseries dataset containing conductivity, depth, and profile_index.

d_profilefloat, optional

Width of the profile bins.

dzfloat, optional

Width of the depth bins in meters.

clean_stdevfloat, optional

Number of standard deviations used in the second-pass flagging step.

accuracyfloat or None, optional

Sensor accuracy threshold. Deviations smaller than this are not flagged.

Returns:
qcnp.ndarray

Array of QC flags with the same shape as ts0.conductivity. Good data are flagged as 1, bad data as 4.

pyglider.utils.interpolate_over_salinity_NANs(ds)#

Function applied to the dataset before finding the internal temperature. Function interpolates temperature over bad data and small data gaps to prevent errors from affecting the neighbouring cells.

Parameters:
ds: DataArray

Timeseries of mission data

Returns:
interp: DataArray

Timeseries of interpolated temperature

pyglider.utils.apply_thermal_lag(ds, fn, alpha, tau, interpolate_filter=None)#

Function from Garau et al. (2011): estimates temperature inside the conductivity cell then recalculates salinity

Parameters:
ds: DataArray

Timeseries of mission data

fn: float

Sampling frequency of the sensor

alphafloat

Thermal lag strength constant for the sensor.

tau: float

Thermal lag time constant for the sensor.

interpolate_filter: callable or None, optional

Function applied to the dataset before finding the internal temperature. Function interpolates over bad data and small data gaps to prevent errors from affecting the neighbouring cells.

Returns:
sal: DataArray

Timeseries of salinity_adjusted calculated using the internal temperature of the conductivity cell.

pyglider.utils.flag_CTD_data(ts0, clean_stdev=3, accuracy=None)#

Wrapper function to flag CTD data.

Uses flag_conductivity_in_depth_space to flag conductivity as QC1 (good) or QC4 (bad) in profile-depth space. Conductivity and salinity are then flagged as QC4 wherever conductivity is flagged as QC4.

Creates conductivity_QC, salinity_QC, and temperature_QC if they do not already exist.

Parameters:
ts0xarray.Dataset

Timeseries of mission data.

clean_stdevfloat, optional

Number of standard deviations from the cleaned mean for data to be flagged as QC4.

Returns:
tsxarray.Dataset

Timeseries of mission data with conductivity_QC, salinity_QC, and temperature_QC.

pyglider.utils.adjust_CTD(ts, deploymentyaml, alpha=None, tau=None, dTdC=None, interpolate_filter=None)#

Pulls correction constants from deploymentyaml. If alpha, tau, or dTdC differ from the values in the YAML file, the values provided as function arguments are used and a warning is issued.

Applies conductivity–temperature lag correction and thermal lag correction when the corresponding constants are not None or 0. This produces the variables temperature_adjusted and salinity_adjusted. The variables potential_density_adjusted and potential_temperature_adjusted are derived from the adjusted temperature and salinity.

Parameters:
tsxarray.Dataset

Time series of mission data.

deploymentyamlstr or list

Path to a YAML file containing deployment information for the glider.

If a list is provided, YAML files are read in order, and top-level keys in later files overwrite those in earlier files.

alphafloat, optional

Thermal lag correction parameter alpha. Default is None.

taufloat, optional

Thermal lag correction parameter tau. Default is None.

dTdCfloat, optional

Time lag (seconds) between temperature and conductivity sensors. Default is None.

interpolate_filter: callable or None, optional

Function applied to the dataset before finding the internal temperature. Function interpolates over bad data and small data gaps to prevent errors from affecting the neighbouring cells. Default is None.

Returns
——-
tsxarray.Dataset

Time series dataset with the additional variables: temperature_adjusted, salinity_adjusted, potential_density_adjusted, and potential_temperature_adjusted. Metadata are updated to reflect applied corrections.

pyglider.utils.maskQC4(ds)#

Optional: Masks QC4 samples in data variables (set to NaN) so gridding ignores them. Only QC1 (good) data are gridded.

Parameters:
ds: DataArray

Timeseries of a data

Returns:
ds: DataArray

Timeseries of a data with QC4 data masked