pyglider.utils#
Utilities that are used for processing scripts.
Attributes#
Functions#
|
Add a distance over ground variable to a netcdf structure |
|
Get glider depth from pressure sensor. |
|
Find profiles in a glider timeseries: |
|
Calculate salinity, potential density, density, and potential temperature |
|
Add metadata to a Dataset |
|
Convert a NMEA float to a decimal degree float. e.g. -12640.3232 = -126.6721 |
|
Correct oxygen signal for salinity signal |
|
Build a role → variable-name mapping from the deployment YAML. |
|
Resolve a processing role to its variable name and verify it exists in ds. |
|
Compute all variables in ncvar that carry a |
|
Open a netCDF dataset and normalise the primary dimension to |
|
Write a dataset to netCDF, renaming the time dimension when required. |
|
Flag conductivity as QC1 (good) or QC4 (bad) using profile bins and depth bins. |
Function applied to the dataset before finding the internal temperature. |
|
|
Function from Garau et al. (2011): estimates temperature inside the |
|
Wrapper function to flag CTD data. |
|
Pulls correction constants from deploymentyaml. If alpha, tau, or dTdC |
|
Optional: |
Module Contents#
- pyglider.utils.get_distance_over_ground(ds, varnames=None)#
Add a distance over ground variable to a netcdf structure
- Parameters:
- dsxarray.Dataset
Must have variable
latitudeandlongitudeindexed bytimedimension.- varnamesdict, optional
Role → variable-name mapping from
_get_varnames(). Uses roleslatitudeandlongitude. Defaults to those literal names when absent.
- Returns:
- ds.xarray.Dataset
With
distance_over_groundkey.
- pyglider.utils.get_glider_depth(ds, varnames=None)#
Get glider depth from pressure sensor.
- Parameters:
- dsxarray.Dataset
Must have variables
pressureandlatitudeindexed bytimedimension. Assume pressure sensor in dbar.- varnamesdict, optional
Role → variable-name mapping from
_get_varnames(). Uses rolespressure,latitude, anddepth(output). Defaults to those literal names when absent.
- Returns:
- ds.xarray.Dataset
With depth variable (named per
varnames['depth'], default'depth') added.
- pyglider.utils.get_profiles_new(ds, min_dp=10.0, filt_time=100, profile_min_time=300, varnames=None)#
Find profiles in a glider timeseries:
- Parameters:
- dsxarray.Dataset
Must have time coordinate and pressure as a variable
- min_dpfloat, default=10.0
Minimum distance a profile must transit to be considered a profile, in dbar.
- filt_timefloat, default=100
Approximate length of time filter, in seconds. Note that the filter is really implemented by sample, so the number of samples is
filt_time / dtwhere dt is the median time between samples in the time series.- profile_min_timefloat, default=300
Minimum time length of profile in s.
- varnamesdict, optional
Role → variable-name mapping from
_get_varnames(). Uses rolespressureandprofile_index(output). Defaults to those literal names when absent.
- pyglider.utils.get_derived_eos_raw(ds, varnames=None)#
Calculate salinity, potential density, density, and potential temperature
- Parameters:
- dsxarray.Dataset
Must have time coordinate and temperature, conductivity, pressure, and latitude and longitude as variables.
- varnamesdict, optional
Role → variable-name mapping from
_get_varnames(). Uses rolesconductivity,temperature,pressure,latitude, andlongitudefor inputs. Output variables are always written assalinity,potential_density,density, andpotential_temperature(IOOS GDAC convention); for OG 1.0 useprocessing_methodin the deployment YAML instead.
- Returns:
- dsxarray.Dataset
with salinity, potential_density, density, and potential_temperature as new variables.
Notes
Thermodynamic variables derived from the Gibbs seawater toolbox
import gsw.- salinity::
gsw.conversions.SP_from_C(r, ds.temperature, ds.pressure)
- potential_density::
- sa = gsw.SA_from_SP(ds[‘salinity’], ds[‘pressure’],
ds[‘longitude’], ds[‘latitude’])
ct = gsw.CT_from_t(sa, ds[‘temperature’], ds[‘pressure’]) ds[‘potential_density’] = ((‘time’),
1000 + gsw.density.sigma0(sa, ct).values)
- density::
- ds[‘density’] = ((‘time’), gsw.density.rho(
ds.salinity, ds.temperature, ds.pressure).values)
- potential_temperature::
- ds[‘potential_temperature’] = ((‘time’), gsw.conversions.pt0_from_t(
ds.salinity, ds.temperature, ds.pressure).values)
- pyglider.utils.fill_metadata(ds, metadata, sensor_data, varnames=None)#
Add metadata to a Dataset
- Parameters:
- dsxarray.Dataset
Dataset must have longitude, latitude, and time values
- metadatadict
dictionary of attributes to add to the global attributes. Usually taken from deployment.yml file.
- sensor_datadict
dictionary of device data to add to the global attributes.
- varnamesdict, optional
Role → variable-name mapping from
_get_varnames(). Uses roleslatitudeandlongitude. Defaults to those literal names when absent.
- Returns:
- dsxarray.Dataset
Dataset with attributes filled out.
- pyglider.utils.nmea2deg(nmea)#
Convert a NMEA float to a decimal degree float. e.g. -12640.3232 = -126.6721
- pyglider.utils.oxygen_concentration_correction(ds, ncvar)#
Correct oxygen signal for salinity signal
- Parameters:
- dsxarray.Dataset
Should have oxygen_concentration, potential_temperature, salinity, on a time coordinate.
- ncvardict
dictionary with netcdf variable definitions in it. Should have oxygen_concentration as a key, which itself should specify a reference_salinity and have correct_oxygen set to
"True".
- Returns:
- dsxarray.Dataset
With oxygen_concentration corrected for the salinity effect.
- pyglider.utils._get_varnames(deployment)#
Build a role → variable-name mapping from the deployment YAML.
Each entry in
netcdf_variablesthat carries aprocessing_roleattribute contributes a{role: varname}pair. For YAML files that follow the IOOS GDAC convention (no explicitprocessing_role), any variable whose name matches a known role is mapped to itself so that existing processing code continues to work without changes.- Parameters:
- deploymentdict
Deployment metadata loaded from the YAML file.
- Returns:
- varnamesdict
Mapping of role string to the actual variable name used in the dataset, e.g.
{'pressure': 'PRES', 'temperature': 'TEMP', ...}.
Notes
Required roles — looked up directly by the pipeline; must be declared with
processing_rolewhen non-standard variable names are used:time,latitude,longitude,pressure,depth,profile_index.Legacy fallback roles — only used when no
processing_methodcovers the relevant derived variable; do not needprocessing_roleifprocessing_methodentries name their inputs explicitly:temperature,conductivity,salinity,profile_direction,oxygen_concentration.
- pyglider.utils._resolve_role(ds, varnames, role)#
Resolve a processing role to its variable name and verify it exists in ds.
- Parameters:
- dsxarray.Dataset
Dataset to check against.
- varnamesdict
Role → variable-name mapping from
_get_varnames().- rolestr
The role to look up (e.g.
'latitude','depth').
- Returns:
- str
The variable name in ds that corresponds to role.
- Raises:
- ValueError
If the resolved name is not present in ds. The message names the role, the resolved variable name, and how to fix the YAML.
- pyglider.utils.BUILTIN_METHODS#
- pyglider.utils._dispatch_processing_methods(ds, ncvar)#
Compute all variables in ncvar that carry a
processing_methodentry.Variables are processed in YAML order, so later entries can depend on earlier ones (e.g.
POTENTIAL_TEMPERATUREusesPSALwhich must be computed first). Methods already handled by explicit utility calls (depth_from_pressure,find_profiles) are not recomputed, but any YAML attributes (e.g.vocabulary,long_name) are applied to the variable if it already exists in ds — so callers should invoke the explicit utility functions before calling this dispatcher.- Parameters:
- dsxarray.Dataset
Dataset containing measured variables.
- ncvardict
Mapping of variable name → YAML attributes (
netcdf_variables).
- Returns:
- dsxarray.Dataset
Dataset with dispatched derived variables added.
- pyglider.utils._load_dataset(filename)#
Open a netCDF dataset and normalise the primary dimension to
time.Files saved with a custom
output_dimension(e.g.N_MEASUREMENTSfor OG 1.0) carry their time coordinate under a different dimension name. This function detects that case viastandard_name: timeon the coordinate variable and renames both the dimension and (if necessary) the coordinate back totimeso the rest of the processing pipeline can use a single consistent name.- Parameters:
- filenamestr or Path
Path to the netCDF file.
- Returns:
- dsxarray.Dataset
Dataset with
timeas the primary dimension and coordinate.
- pyglider.utils._save_dataset(ds, filename, deployment, **kwargs)#
Write a dataset to netCDF, renaming the time dimension when required.
The deployment YAML may specify
output_dimension(e.g.N_MEASUREMENTSfor OG 1.0). If set, thetimedimension is renamed tooutput_dimensionbefore writing so the file conforms to the target convention. Internal processing always works withtime; this rename only affects what is written to disk.- Parameters:
- dsxarray.Dataset
Dataset whose primary dimension is
time.- filenamestr or Path
Output file path.
- deploymentdict
Deployment metadata loaded from the YAML file. The optional key
output_dimensioncontrols the dimension name written to the file (default:'time').- **kwargs
Passed through to
xarray.Dataset.to_netcdf().
- pyglider.utils.flag_conductivity_in_depth_space(ts0, d_profile=50, dz=5, clean_stdev=3, accuracy=None)#
Flag conductivity as QC1 (good) or QC4 (bad) using profile bins and depth bins.
Conductivity data are grouped into profile bins of width d_profile and depth bins of width dz. Within each depth bin, points farther than 5 standard deviations from the mean are temporarily excluded. A second mean and standard deviation are then computed from the remaining points, and values farther than clean_stdev standard deviations from that cleaned mean are flagged as QC4. If accuracy is provided, deviations smaller than accuracy are not flagged.
- Parameters:
- ts0xarray.Dataset
Timeseries dataset containing conductivity, depth, and profile_index.
- d_profilefloat, optional
Width of the profile bins.
- dzfloat, optional
Width of the depth bins in meters.
- clean_stdevfloat, optional
Number of standard deviations used in the second-pass flagging step.
- accuracyfloat or None, optional
Sensor accuracy threshold. Deviations smaller than this are not flagged.
- Returns:
- qcnp.ndarray
Array of QC flags with the same shape as ts0.conductivity. Good data are flagged as 1, bad data as 4.
- pyglider.utils.interpolate_over_salinity_NANs(ds)#
Function applied to the dataset before finding the internal temperature. Function interpolates temperature over bad data and small data gaps to prevent errors from affecting the neighbouring cells.
- Parameters:
- ds: DataArray
Timeseries of mission data
- Returns:
- interp: DataArray
Timeseries of interpolated temperature
- pyglider.utils.apply_thermal_lag(ds, fn, alpha, tau, interpolate_filter=None)#
Function from Garau et al. (2011): estimates temperature inside the conductivity cell then recalculates salinity
- Parameters:
- ds: DataArray
Timeseries of mission data
- fn: float
Sampling frequency of the sensor
- alphafloat
Thermal lag strength constant for the sensor.
- tau: float
Thermal lag time constant for the sensor.
- interpolate_filter: callable or None, optional
Function applied to the dataset before finding the internal temperature. Function interpolates over bad data and small data gaps to prevent errors from affecting the neighbouring cells.
- Returns:
- sal: DataArray
Timeseries of salinity_adjusted calculated using the internal temperature of the conductivity cell.
- pyglider.utils.flag_CTD_data(ts0, clean_stdev=3, accuracy=None)#
Wrapper function to flag CTD data.
Uses flag_conductivity_in_depth_space to flag conductivity as QC1 (good) or QC4 (bad) in profile-depth space. Conductivity and salinity are then flagged as QC4 wherever conductivity is flagged as QC4.
Creates conductivity_QC, salinity_QC, and temperature_QC if they do not already exist.
- Parameters:
- ts0xarray.Dataset
Timeseries of mission data.
- clean_stdevfloat, optional
Number of standard deviations from the cleaned mean for data to be flagged as QC4.
- Returns:
- tsxarray.Dataset
Timeseries of mission data with conductivity_QC, salinity_QC, and temperature_QC.
- pyglider.utils.adjust_CTD(ts, deploymentyaml, alpha=None, tau=None, dTdC=None, interpolate_filter=None)#
Pulls correction constants from deploymentyaml. If alpha, tau, or dTdC differ from the values in the YAML file, the values provided as function arguments are used and a warning is issued.
Applies conductivity–temperature lag correction and thermal lag correction when the corresponding constants are not None or 0. This produces the variables temperature_adjusted and salinity_adjusted. The variables potential_density_adjusted and potential_temperature_adjusted are derived from the adjusted temperature and salinity.
- Parameters:
- tsxarray.Dataset
Time series of mission data.
- deploymentyamlstr or list
Path to a YAML file containing deployment information for the glider.
If a list is provided, YAML files are read in order, and top-level keys in later files overwrite those in earlier files.
- alphafloat, optional
Thermal lag correction parameter alpha. Default is None.
- taufloat, optional
Thermal lag correction parameter tau. Default is None.
- dTdCfloat, optional
Time lag (seconds) between temperature and conductivity sensors. Default is None.
- interpolate_filter: callable or None, optional
Function applied to the dataset before finding the internal temperature. Function interpolates over bad data and small data gaps to prevent errors from affecting the neighbouring cells. Default is None.
- Returns
- ——-
- tsxarray.Dataset
Time series dataset with the additional variables: temperature_adjusted, salinity_adjusted, potential_density_adjusted, and potential_temperature_adjusted. Metadata are updated to reflect applied corrections.
- pyglider.utils.maskQC4(ds)#
Optional: Masks QC4 samples in data variables (set to NaN) so gridding ignores them. Only QC1 (good) data are gridded.
- Parameters:
- ds: DataArray
Timeseries of a data
- Returns:
- ds: DataArray
Timeseries of a data with QC4 data masked