pyglider.seaexplorer ==================== .. py:module:: pyglider.seaexplorer .. autoapi-nested-parse:: SeaExplorer-specific processing routines. .. !! processed by numpydoc !! Functions --------- .. autoapisummary:: pyglider.seaexplorer.raw_to_rawnc pyglider.seaexplorer.merge_parquet pyglider.seaexplorer.raw_to_timeseries Module Contents --------------- .. py:function:: raw_to_rawnc(indir, outdir, deploymentyaml, incremental=True, min_samples_in_file=5, dropna_subset=None, dropna_thresh=1) Convert seaexplorer text files to raw parquet pandas files. :Parameters: **indir** : str Directory with the raw files are kept. Recommend naming this directory "raw" **outdir** : str Directory to write the matching ``*.nc`` files. Recommend ``rawnc``. **deploymentyaml** : str YAML text file with deployment information for this glider. **incremental** : bool, optional If *True* (default), only netcdf files that are older than the binary files are re-parsed. **min_samples_in_file** : int Minimum number of samples in a raw file to trigger writing a netcdf file. Defaults to 5 **dropna_subset** : list of strings, default None If more values than *dropna_thresh* of the variables listed here are empty (NaN), then drop this line of data. Useful for raw payload files that are heavily oversampled. Get the variable names from the raw text file. See `pandas.DataFrame.dropna`. **dropna_thresh** : integer, default 1 Number of variables listed in dropna_subset that can be empty before the line is dropped. :Returns: **status** : bool *True* success. .. rubric:: Notes This process can be slow for many files. For the *dropna* functionality, list one variable for each of the sensors that is *not* over-sampled. For instance, we had an AROD, GPCTD, and FLBBCD and the AROD was grossly oversampled, whereas the other two were not, but were not sampled synchronously. In that case we chose: `dropna_subset=['GPCTD_TEMPERATURE', 'FLBBCD_CHL_COUNT']` to keep all rows where either of these were good, and dropped all other rows. .. !! processed by numpydoc !! .. py:function:: merge_parquet(indir, outdir, deploymentyaml, incremental=False, kind='raw') Merge all the raw netcdf files in indir. These are meant to be the raw flight and science files from the slocum. :Parameters: **indir** : str Directory where the raw ``*.ebd.nc`` and ``*.dbd.nc`` files are. Recommend: ``./rawnc`` **outdir** : str Directory where merged raw netcdf files will be put. Recommend: ``./rawnc/``. Note that the netcdf files will be named following the data in *deploymentyaml*: ``glider_nameglider_serial-YYYYmmddTHHMM-rawebd.nc`` and ``...rawdbd.nc``. **deploymentyaml** : str YAML text file with deployment information for this glider. **incremental** : bool Only add new files.... .. !! processed by numpydoc !! .. py:function:: raw_to_timeseries(indir, outdir, deploymentyaml, kind='raw', profile_filt_time=100, profile_min_time=300, maxgap=10, interpolate=False, start_time=None, fnamesuffix='', deadreckon=False, replace_attrs=None) Convert raw seaexplorer data to a timeseries netcdf file. :Parameters: **indir** : str Directory with the raw files are kept. **outdir** : str Directory to write the matching ``*.nc`` files. **deploymentyaml** : str YAML text file with deployment information for this glider. **kind** : 'raw' or 'sub' The type of data to process. 'raw' is the full resolution data, 'sub' is the sub-sampled data. The default is 'raw'. Note that realtime data is typically sub-sampled. **profile_filt_time** : float Time in seconds to use for filtering the profiles. Default is 100. **profile_min_time** : float Minimum time in seconds for a profile to be considered a valid profile. Default is 300. **maxgap** : float Maximum gap in seconds to interpolate over. Default is 10. **interpolate** : bool If *True*, interpolate the data to fill in gaps. Default is False. **start_time** : str or None Drop data if before this date - sometimes there are bad times. Default is *None* **fnamesuffix** : str Suffix to add to the output file name. Default is ''. **deadreckon** : bool If *True* use the dead reckoning latitude and longitude data from the glider. Default is *False*, and latitude and longitude are linearly interpolated between surface fixes. *False* is the default, and recommended to avoid a-physical underwater jumps. **replace_attrs** : dict or None replace global attributes in the metadata after reading the metadata file in. Helpful when processing runs with only a couple things that change. :Returns: **outname** : str Name of the output netcdf file. .. !! processed by numpydoc !!