pyglider.utils ============== .. py:module:: pyglider.utils .. autoapi-nested-parse:: Utilities that are used for processing scripts. .. !! processed by numpydoc !! Attributes ---------- .. autoapisummary:: pyglider.utils.BUILTIN_METHODS Functions --------- .. autoapisummary:: pyglider.utils.get_distance_over_ground pyglider.utils.get_glider_depth pyglider.utils.get_profiles_new pyglider.utils.get_derived_eos_raw pyglider.utils.fill_metadata pyglider.utils.nmea2deg pyglider.utils.oxygen_concentration_correction pyglider.utils._get_varnames pyglider.utils._resolve_role pyglider.utils._dispatch_processing_methods pyglider.utils._load_dataset pyglider.utils._save_dataset pyglider.utils.flag_conductivity_in_depth_space pyglider.utils.interpolate_over_salinity_NANs pyglider.utils.apply_thermal_lag pyglider.utils.flag_CTD_data pyglider.utils.adjust_CTD pyglider.utils.maskQC4 Module Contents --------------- .. py:function:: get_distance_over_ground(ds, varnames=None) Add a distance over ground variable to a netcdf structure :Parameters: **ds** : `xarray.Dataset` Must have variable ``latitude`` and ``longitude`` indexed by ``time`` dimension. **varnames** : dict, optional Role → variable-name mapping from :func:`_get_varnames`. Uses roles ``latitude`` and ``longitude``. Defaults to those literal names when absent. :Returns: **ds** : `.xarray.Dataset` With ``distance_over_ground`` key. .. !! processed by numpydoc !! .. py:function:: get_glider_depth(ds, varnames=None) Get glider depth from pressure sensor. :Parameters: **ds** : `xarray.Dataset` Must have variables ``pressure`` and ``latitude`` indexed by ``time`` dimension. Assume pressure sensor in dbar. **varnames** : dict, optional Role → variable-name mapping from :func:`_get_varnames`. Uses roles ``pressure``, ``latitude``, and ``depth`` (output). Defaults to those literal names when absent. :Returns: **ds** : `.xarray.Dataset` With depth variable (named per ``varnames['depth']``, default ``'depth'``) added. .. !! processed by numpydoc !! .. py:function:: get_profiles_new(ds, min_dp=10.0, filt_time=100, profile_min_time=300, varnames=None) Find profiles in a glider timeseries: :Parameters: **ds** : `xarray.Dataset` Must have *time* coordinate and *pressure* as a variable **min_dp** : float, default=10.0 Minimum distance a profile must transit to be considered a profile, in dbar. **filt_time** : float, default=100 Approximate length of time filter, in seconds. Note that the filter is really implemented by sample, so the number of samples is ``filt_time / dt`` where *dt* is the median time between samples in the time series. **profile_min_time** : float, default=300 Minimum time length of profile in s. **varnames** : dict, optional Role → variable-name mapping from :func:`_get_varnames`. Uses roles ``pressure`` and ``profile_index`` (output). Defaults to those literal names when absent. .. !! processed by numpydoc !! .. py:function:: get_derived_eos_raw(ds, varnames=None) Calculate salinity, potential density, density, and potential temperature :Parameters: **ds** : `xarray.Dataset` Must have *time* coordinate and *temperature*, *conductivity*, *pressure*, and *latitude* and *longitude* as variables. **varnames** : dict, optional Role → variable-name mapping from :func:`_get_varnames`. Uses roles ``conductivity``, ``temperature``, ``pressure``, ``latitude``, and ``longitude`` for inputs. Output variables are always written as ``salinity``, ``potential_density``, ``density``, and ``potential_temperature`` (IOOS GDAC convention); for OG 1.0 use ``processing_method`` in the deployment YAML instead. :Returns: **ds** : `xarray.Dataset` with *salinity*, *potential_density*, *density*, and *potential_temperature* as new variables. .. rubric:: Notes Thermodynamic variables derived from the Gibbs seawater toolbox ``import gsw``. - *salinity*:: gsw.conversions.SP_from_C(r, ds.temperature, ds.pressure) - *potential_density*:: sa = gsw.SA_from_SP(ds['salinity'], ds['pressure'], ds['longitude'], ds['latitude']) ct = gsw.CT_from_t(sa, ds['temperature'], ds['pressure']) ds['potential_density'] = (('time'), 1000 + gsw.density.sigma0(sa, ct).values) - *density*:: ds['density'] = (('time'), gsw.density.rho( ds.salinity, ds.temperature, ds.pressure).values) - *potential_temperature*:: ds['potential_temperature'] = (('time'), gsw.conversions.pt0_from_t( ds.salinity, ds.temperature, ds.pressure).values) .. !! processed by numpydoc !! .. py:function:: fill_metadata(ds, metadata, sensor_data, varnames=None) Add metadata to a Dataset :Parameters: **ds** : `xarray.Dataset` Dataset must have *longitude*, *latitude*, and *time* values **metadata** : dict dictionary of attributes to add to the global attributes. Usually taken from *deployment.yml* file. **sensor_data** : dict dictionary of device data to add to the global attributes. **varnames** : dict, optional Role → variable-name mapping from :func:`_get_varnames`. Uses roles ``latitude`` and ``longitude``. Defaults to those literal names when absent. :Returns: **ds** : `xarray.Dataset` Dataset with attributes filled out. .. !! processed by numpydoc !! .. py:function:: nmea2deg(nmea) Convert a NMEA float to a decimal degree float. e.g. -12640.3232 = -126.6721 .. !! processed by numpydoc !! .. py:function:: oxygen_concentration_correction(ds, ncvar) Correct oxygen signal for salinity signal :Parameters: **ds** : `xarray.Dataset` Should have *oxygen_concentration*, *potential_temperature*, *salinity*, on a *time* coordinate. **ncvar** : dict dictionary with netcdf variable definitions in it. Should have *oxygen_concentration* as a key, which itself should specify a *reference_salinity* and have *correct_oxygen* set to ``"True"``. :Returns: **ds** : `xarray.Dataset` With *oxygen_concentration* corrected for the salinity effect. .. !! processed by numpydoc !! .. py:function:: _get_varnames(deployment) Build a role → variable-name mapping from the deployment YAML. Each entry in ``netcdf_variables`` that carries a ``processing_role`` attribute contributes a ``{role: varname}`` pair. For YAML files that follow the IOOS GDAC convention (no explicit ``processing_role``), any variable whose name matches a known role is mapped to itself so that existing processing code continues to work without changes. :Parameters: **deployment** : dict Deployment metadata loaded from the YAML file. :Returns: **varnames** : dict Mapping of role string to the actual variable name used in the dataset, e.g. ``{'pressure': 'PRES', 'temperature': 'TEMP', ...}``. .. rubric:: Notes **Required roles** — looked up directly by the pipeline; must be declared with ``processing_role`` when non-standard variable names are used: ``time``, ``latitude``, ``longitude``, ``pressure``, ``depth``, ``profile_index``. **Legacy fallback roles** — only used when no ``processing_method`` covers the relevant derived variable; do not need ``processing_role`` if ``processing_method`` entries name their inputs explicitly: ``temperature``, ``conductivity``, ``salinity``, ``profile_direction``, ``oxygen_concentration``. .. !! processed by numpydoc !! .. py:function:: _resolve_role(ds, varnames, role) Resolve a processing role to its variable name and verify it exists in *ds*. :Parameters: **ds** : xarray.Dataset Dataset to check against. **varnames** : dict Role → variable-name mapping from :func:`_get_varnames`. **role** : str The role to look up (e.g. ``'latitude'``, ``'depth'``). :Returns: str The variable name in *ds* that corresponds to *role*. :Raises: ValueError If the resolved name is not present in *ds*. The message names the role, the resolved variable name, and how to fix the YAML. .. !! processed by numpydoc !! .. py:data:: BUILTIN_METHODS .. py:function:: _dispatch_processing_methods(ds, ncvar) Compute all variables in *ncvar* that carry a ``processing_method`` entry. Variables are processed in YAML order, so later entries can depend on earlier ones (e.g. ``POTENTIAL_TEMPERATURE`` uses ``PSAL`` which must be computed first). Methods already handled by explicit utility calls (``depth_from_pressure``, ``find_profiles``) are not recomputed, but any YAML attributes (e.g. ``vocabulary``, ``long_name``) are applied to the variable if it already exists in *ds* — so callers should invoke the explicit utility functions before calling this dispatcher. :Parameters: **ds** : xarray.Dataset Dataset containing measured variables. **ncvar** : dict Mapping of variable name → YAML attributes (``netcdf_variables``). :Returns: **ds** : xarray.Dataset Dataset with dispatched derived variables added. .. !! processed by numpydoc !! .. py:function:: _load_dataset(filename) Open a netCDF dataset and normalise the primary dimension to ``time``. Files saved with a custom ``output_dimension`` (e.g. ``N_MEASUREMENTS`` for OG 1.0) carry their time coordinate under a different dimension name. This function detects that case via ``standard_name: time`` on the coordinate variable and renames both the dimension and (if necessary) the coordinate back to ``time`` so the rest of the processing pipeline can use a single consistent name. :Parameters: **filename** : str or Path Path to the netCDF file. :Returns: **ds** : xarray.Dataset Dataset with ``time`` as the primary dimension and coordinate. .. !! processed by numpydoc !! .. py:function:: _save_dataset(ds, filename, deployment, **kwargs) Write a dataset to netCDF, renaming the time dimension when required. The deployment YAML may specify ``output_dimension`` (e.g. ``N_MEASUREMENTS`` for OG 1.0). If set, the ``time`` dimension is renamed to ``output_dimension`` before writing so the file conforms to the target convention. Internal processing always works with ``time``; this rename only affects what is written to disk. :Parameters: **ds** : xarray.Dataset Dataset whose primary dimension is ``time``. **filename** : str or Path Output file path. **deployment** : dict Deployment metadata loaded from the YAML file. The optional key ``output_dimension`` controls the dimension name written to the file (default: ``'time'``). **\*\*kwargs** Passed through to :func:`xarray.Dataset.to_netcdf`. .. !! processed by numpydoc !! .. py:function:: flag_conductivity_in_depth_space(ts0, d_profile=50, dz=5, clean_stdev=3, accuracy=None) Flag conductivity as QC1 (good) or QC4 (bad) using profile bins and depth bins. Conductivity data are grouped into profile bins of width `d_profile` and depth bins of width `dz`. Within each depth bin, points farther than 5 standard deviations from the mean are temporarily excluded. A second mean and standard deviation are then computed from the remaining points, and values farther than `clean_stdev` standard deviations from that cleaned mean are flagged as QC4. If `accuracy` is provided, deviations smaller than `accuracy` are not flagged. :Parameters: **ts0** : xarray.Dataset Timeseries dataset containing conductivity, depth, and profile_index. **d_profile** : float, optional Width of the profile bins. **dz** : float, optional Width of the depth bins in meters. **clean_stdev** : float, optional Number of standard deviations used in the second-pass flagging step. **accuracy** : float or None, optional Sensor accuracy threshold. Deviations smaller than this are not flagged. :Returns: **qc** : np.ndarray Array of QC flags with the same shape as ts0.conductivity. Good data are flagged as 1, bad data as 4. .. !! processed by numpydoc !! .. py:function:: interpolate_over_salinity_NANs(ds) Function applied to the dataset before finding the internal temperature. Function interpolates temperature over bad data and small data gaps to prevent errors from affecting the neighbouring cells. :Parameters: **ds: DataArray** Timeseries of mission data :Returns: interp: DataArray Timeseries of interpolated temperature .. !! processed by numpydoc !! .. py:function:: apply_thermal_lag(ds, fn, alpha, tau, interpolate_filter=None) Function from Garau et al. (2011): estimates temperature inside the conductivity cell then recalculates salinity :Parameters: **ds: DataArray** Timeseries of mission data **fn: float** Sampling frequency of the sensor **alpha** : float Thermal lag strength constant for the sensor. **tau: float** Thermal lag time constant for the sensor. **interpolate_filter: callable or None, optional** Function applied to the dataset before finding the internal temperature. Function interpolates over bad data and small data gaps to prevent errors from affecting the neighbouring cells. :Returns: sal: DataArray Timeseries of salinity_adjusted calculated using the internal temperature of the conductivity cell. .. !! processed by numpydoc !! .. py:function:: flag_CTD_data(ts0, clean_stdev=3, accuracy=None) Wrapper function to flag CTD data. Uses `flag_conductivity_in_depth_space` to flag conductivity as QC1 (good) or QC4 (bad) in profile-depth space. Conductivity and salinity are then flagged as QC4 wherever conductivity is flagged as QC4. Creates `conductivity_QC`, `salinity_QC`, and `temperature_QC` if they do not already exist. :Parameters: **ts0** : xarray.Dataset Timeseries of mission data. **clean_stdev** : float, optional Number of standard deviations from the cleaned mean for data to be flagged as QC4. :Returns: **ts** : xarray.Dataset Timeseries of mission data with `conductivity_QC`, `salinity_QC`, and `temperature_QC`. .. !! processed by numpydoc !! .. py:function:: adjust_CTD(ts, deploymentyaml, alpha=None, tau=None, dTdC=None, interpolate_filter=None) Pulls correction constants from `deploymentyaml`. If `alpha`, `tau`, or `dTdC` differ from the values in the YAML file, the values provided as function arguments are used and a warning is issued. Applies conductivity–temperature lag correction and thermal lag correction when the corresponding constants are not `None` or 0. This produces the variables `temperature_adjusted` and `salinity_adjusted`. The variables `potential_density_adjusted` and `potential_temperature_adjusted` are derived from the adjusted temperature and salinity. :Parameters: **ts** : xarray.Dataset Time series of mission data. **deploymentyaml** : str or list Path to a YAML file containing deployment information for the glider. If a list is provided, YAML files are read in order, and top-level keys in later files overwrite those in earlier files. **alpha** : float, optional Thermal lag correction parameter alpha. Default is None. **tau** : float, optional Thermal lag correction parameter tau. Default is None. **dTdC** : float, optional Time lag (seconds) between temperature and conductivity sensors. Default is None. **interpolate_filter: callable or None, optional** Function applied to the dataset before finding the internal temperature. Function interpolates over bad data and small data gaps to prevent errors from affecting the neighbouring cells. Default is None. **Returns** .. **-------** .. **ts** : xarray.Dataset Time series dataset with the additional variables: `temperature_adjusted`, `salinity_adjusted`, `potential_density_adjusted`, and `potential_temperature_adjusted`. Metadata are updated to reflect applied corrections. .. !! processed by numpydoc !! .. py:function:: maskQC4(ds) Optional: Masks QC4 samples in data variables (set to NaN) so gridding ignores them. Only QC1 (good) data are gridded. :Parameters: **ds: DataArray** Timeseries of a data :Returns: ds: DataArray Timeseries of a data with QC4 data masked .. !! processed by numpydoc !!