pyglider.seaexplorer
====================

.. py:module:: pyglider.seaexplorer

.. autoapi-nested-parse::

   SeaExplorer-specific processing routines.

   ..
       !! processed by numpydoc !!


Functions
---------

.. autoapisummary::

   pyglider.seaexplorer.raw_to_rawnc
   pyglider.seaexplorer.merge_parquet
   pyglider.seaexplorer.raw_to_timeseries


Module Contents
---------------

.. py:function:: raw_to_rawnc(indir, outdir, deploymentyaml, incremental=True, min_samples_in_file=5, dropna_subset=None, dropna_thresh=1)

   
   Convert seaexplorer text files to raw parquet pandas files.


   :Parameters:

       **indir** : str
           Directory with the raw files are kept.  Recommend naming this
           directory "raw"

       **outdir** : str
           Directory to write the matching ``*.nc`` files. Recommend ``rawnc``.

       **deploymentyaml** : str
           YAML text file with deployment information for this glider.

       **incremental** : bool, optional
           If *True* (default), only netcdf files that are older than the
           binary files are re-parsed.

       **min_samples_in_file** : int
           Minimum number of samples in a raw file to trigger writing a netcdf
           file. Defaults to 5

       **dropna_subset** : list of strings, default None
           If more values than *dropna_thresh* of the variables listed here are
           empty (NaN), then drop this line of data.  Useful for raw payload files
           that are heavily oversampled.  Get the variable names from the raw text
           file.  See `pandas.DataFrame.dropna`.

       **dropna_thresh** : integer, default 1
           Number of variables listed in dropna_subset that can be empty before
           the line is dropped.


   :Returns:

       **status** : bool
           *True* success.


   .. rubric:: Notes

   This process can be slow for many files.

   For the *dropna* functionality, list one variable for each of the sensors
   that is *not* over-sampled.  For instance, we had an AROD, GPCTD, and
   FLBBCD and the AROD was grossly oversampled, whereas the other two were not,
   but were not sampled synchronously.  In that case we chose:
   `dropna_subset=['GPCTD_TEMPERATURE', 'FLBBCD_CHL_COUNT']` to keep all
   rows where either of these were good, and dropped all other rows.


   ..
       !! processed by numpydoc !!

.. py:function:: merge_parquet(indir, outdir, deploymentyaml, incremental=False, kind='raw')

   
   Merge all the raw netcdf files in indir.  These are meant to be
   the raw flight and science files from the slocum.


   :Parameters:

       **indir** : str
           Directory where the raw ``*.ebd.nc`` and ``*.dbd.nc`` files are.
           Recommend: ``./rawnc``

       **outdir** : str
           Directory where merged raw netcdf files will be put. Recommend:
           ``./rawnc/``.  Note that the netcdf files will be named following
           the data in *deploymentyaml*:
           ``glider_nameglider_serial-YYYYmmddTHHMM-rawebd.nc`` and
           ``...rawdbd.nc``.

       **deploymentyaml** : str
           YAML text file with deployment information for this glider.

       **incremental** : bool
           Only add new files....


   ..
       !! processed by numpydoc !!

.. py:function:: raw_to_timeseries(indir, outdir, deploymentyaml, kind='raw', profile_filt_time=100, profile_min_time=300, maxgap=10, interpolate=False, start_time=None, fnamesuffix='', deadreckon=False, replace_attrs=None)

   
   Convert raw seaexplorer data to a timeseries netcdf file.


   :Parameters:

       **indir** : str
           Directory with the raw files are kept.

       **outdir** : str
           Directory to write the matching ``*.nc`` files.

       **deploymentyaml** : str
           YAML text file with deployment information for this glider.

       **kind** : 'raw' or 'sub'
           The type of data to process.  'raw' is the full resolution data, 'sub'
           is the sub-sampled data.  The default is 'raw'.  Note that realtime data is
           typically sub-sampled.

       **profile_filt_time** : float
           Time in seconds to use for filtering the profiles.  Default is 100.

       **profile_min_time** : float
           Minimum time in seconds for a profile to be considered a valid profile.
           Default is 300.

       **maxgap** : float
           Maximum gap in seconds to interpolate over.  Default is 10.

       **interpolate** : bool
           If *True*, interpolate the data to fill in gaps.  Default is False.

       **start_time** : str or None
           Drop data if before this date - sometimes there are bad times. Default is *None*

       **fnamesuffix** : str
           Suffix to add to the output file name.  Default is ''.

       **deadreckon** : bool
           If *True* use the dead reckoning latitude and longitude data from the glider.  Default
           is *False*, and latitude and longitude are linearly interpolated between surface fixes.
           *False* is the default, and recommended to avoid a-physical underwater jumps.

       **replace_attrs** : dict or None
           replace global attributes in the metadata after reading the metadata
           file in.  Helpful when processing runs with only a couple things that
           change.


   :Returns:

       **outname** : str
           Name of the output netcdf file.


   ..
       !! processed by numpydoc !!