`pyglider.seaexplorer`#

SeaExplorer-specific processing routines.

Functions #

raw_to_rawnc(): Convert seaexplorer text files to raw parquet pandas files.
merge_parquet(): Merge all the raw netcdf files in indir. These are meant to be
raw_to_timeseries(): Convert raw seaexplorer data to a timeseries netcdf file.

pyglider.seaexplorer.raw_to_rawnc(indir, outdir, deploymentyaml, incremental=True, min_samples_in_file=5, dropna_subset=None, dropna_thresh=1)#

Convert seaexplorer text files to raw parquet pandas files.

Parameters:

indirstr: Directory with the raw files are kept. Recommend naming this directory “raw”
outdirstr: Directory to write the matching *.nc files. Recommend rawnc.
deploymentyamlstr: YAML text file with deployment information for this glider.
incrementalbool, optional: If True (default), only netcdf files that are older than the binary files are re-parsed.
min_samples_in_fileint: Minimum number of samples in a raw file to trigger writing a netcdf file. Defaults to 5
dropna_subsetlist of strings, default None: If more values than dropna_thresh of the variables listed here are empty (NaN), then drop this line of data. Useful for raw payload files that are heavily oversampled. Get the variable names from the raw text file. See pandas.DataFrame.dropna.
dropna_threshinteger, default 1: Number of variables listed in dropna_subset that can be empty before the line is dropped.

Returns:

statusbool: True success.

Notes

This process can be slow for many files.

For the dropna functionality, list one variable for each of the sensors that is not over-sampled. For instance, we had an AROD, GPCTD, and FLBBCD and the AROD was grossly oversampled, whereas the other two were not, but were not sampled synchronously. In that case we chose: dropna_subset=[‘GPCTD_TEMPERATURE’, ‘FLBBCD_CHL_COUNT’] to keep all rows where either of these were good, and dropped all other rows.

pyglider.seaexplorer.merge_parquet(indir, outdir, deploymentyaml, incremental=False, kind='raw')#

Merge all the raw netcdf files in indir. These are meant to be the raw flight and science files from the slocum.

Parameters:

indirstr: Directory where the raw *.ebd.nc and *.dbd.nc files are. Recommend: ./rawnc
outdirstr: Directory where merged raw netcdf files will be put. Recommend: ./rawnc/. Note that the netcdf files will be named following the data in deploymentyaml: glider_nameglider_serial-YYYYmmddTHHMM-rawebd.nc and ...rawdbd.nc.
deploymentyamlstr: YAML text file with deployment information for this glider.
incrementalbool: Only add new files….

pyglider.seaexplorer.raw_to_timeseries(indir, outdir, deploymentyaml, kind='raw', profile_filt_time=100, profile_min_time=300, maxgap=10, interpolate=False, start_time=None, fnamesuffix='', deadreckon=False, replace_attrs=None)#

Convert raw seaexplorer data to a timeseries netcdf file.

Parameters:

indirstr: Directory with the raw files are kept.
outdirstr: Directory to write the matching *.nc files.
deploymentyamlstr: YAML text file with deployment information for this glider.
kind‘raw’ or ‘sub’: The type of data to process. ‘raw’ is the full resolution data, ‘sub’ is the sub-sampled data. The default is ‘raw’. Note that realtime data is typically sub-sampled.
profile_filt_timefloat: Time in seconds to use for filtering the profiles. Default is 100.
profile_min_timefloat: Minimum time in seconds for a profile to be considered a valid profile. Default is 300.
maxgapfloat: Maximum gap in seconds to interpolate over. Default is 10.
interpolatebool: If True, interpolate the data to fill in gaps. Default is False.
start_timestr or None: Drop data if before this date - sometimes there are bad times. Default is None
fnamesuffixstr: Suffix to add to the output file name. Default is ‘’.
deadreckonbool: If True use the dead reckoning latitude and longitude data from the glider. Default is False, and latitude and longitude are linearly interpolated between surface fixes. False is the default, and recommended to avoid a-physical underwater jumps.
replace_attrsdict or None: replace global attributes in the metadata after reading the metadata file in. Helpful when processing runs with only a couple things that change.

Returns:

outnamestr: Name of the output netcdf file.

pyglider.seaexplorer#

Functions#

`pyglider.seaexplorer`#

Functions #