pyglider.seaexplorer#

SeaExplorer-specific processing routines.

Functions#

pyglider.seaexplorer.raw_to_rawnc(indir, outdir, deploymentyaml, incremental=True, min_samples_in_file=5, dropna_subset=None, dropna_thresh=1)#

Convert seaexplorer text files to raw parquet pandas files.

Parameters:
indirstr

Directory with the raw files are kept. Recommend naming this directory “raw”

outdirstr

Directory to write the matching *.nc files. Recommend rawnc.

deploymentyamlstr

YAML text file with deployment information for this glider.

incrementalbool, optional

If True (default), only netcdf files that are older than the binary files are re-parsed.

min_samples_in_fileint

Minimum number of samples in a raw file to trigger writing a netcdf file. Defaults to 5

dropna_subsetlist of strings, default None

If more values than dropna_thresh of the variables listed here are empty (NaN), then drop this line of data. Useful for raw payload files that are heavily oversampled. Get the variable names from the raw text file. See pandas.DataFrame.dropna.

dropna_threshinteger, default 1

Number of variables listed in dropna_subset that can be empty before the line is dropped.

Returns:
statusbool

True success.

Notes

This process can be slow for many files.

For the dropna functionality, list one variable for each of the sensors that is not over-sampled. For instance, we had an AROD, GPCTD, and FLBBCD and the AROD was grossly oversampled, whereas the other two were not, but were not sampled synchronously. In that case we chose: dropna_subset=[‘GPCTD_TEMPERATURE’, ‘FLBBCD_CHL_COUNT’] to keep all rows where either of these were good, and dropped all other rows.

pyglider.seaexplorer.merge_parquet(indir, outdir, deploymentyaml, incremental=False, kind='raw')#

Merge all the raw netcdf files in indir. These are meant to be the raw flight and science files from the slocum.

Parameters:
indirstr

Directory where the raw *.ebd.nc and *.dbd.nc files are. Recommend: ./rawnc

outdirstr

Directory where merged raw netcdf files will be put. Recommend: ./rawnc/. Note that the netcdf files will be named following the data in deploymentyaml: glider_nameglider_serial-YYYYmmddTHHMM-rawebd.nc and ...rawdbd.nc.

deploymentyamlstr

YAML text file with deployment information for this glider.

incrementalbool

Only add new files….

pyglider.seaexplorer.raw_to_timeseries(indir, outdir, deploymentyaml, kind='raw', profile_filt_time=100, profile_min_time=300, maxgap=10, interpolate=False, start_time=None, fnamesuffix='', deadreckon=False, replace_attrs=None)#

Convert raw seaexplorer data to a timeseries netcdf file.

Parameters:
indirstr

Directory with the raw files are kept.

outdirstr

Directory to write the matching *.nc files.

deploymentyamlstr

YAML text file with deployment information for this glider.

kind‘raw’ or ‘sub’

The type of data to process. ‘raw’ is the full resolution data, ‘sub’ is the sub-sampled data. The default is ‘raw’. Note that realtime data is typically sub-sampled.

profile_filt_timefloat

Time in seconds to use for filtering the profiles. Default is 100.

profile_min_timefloat

Minimum time in seconds for a profile to be considered a valid profile. Default is 300.

maxgapfloat

Maximum gap in seconds to interpolate over. Default is 10.

interpolatebool

If True, interpolate the data to fill in gaps. Default is False.

start_timestr or None

Drop data if before this date - sometimes there are bad times. Default is None

fnamesuffixstr

Suffix to add to the output file name. Default is ‘’.

deadreckonbool

If True use the dead reckoning latitude and longitude data from the glider. Default is False, and latitude and longitude are linearly interpolated between surface fixes. False is the default, and recommended to avoid a-physical underwater jumps.

replace_attrsdict or None

replace global attributes in the metadata after reading the metadata file in. Helpful when processing runs with only a couple things that change.

Returns:
outnamestr

Name of the output netcdf file.