clouddrift.adapters.gdp.gdp1h

clouddrift.adapters.gdp.gdp1h#

This module provides functions and metadata that can be used to convert the hourly Global Drifter Program (GDP) data to a clouddrift.RaggedArray instance.

Functions

download([url, tmp_path, drifter_ids, ...])

Download individual NetCDF files from the AOML server.

preprocess(index, **kwargs)

Extract and preprocess the Lagrangian data and attributes.

to_raggedarray([drifter_ids, n_random_id, ...])

Download and process individual GDP hourly files and return a RaggedArray instance with the data.

clouddrift.adapters.gdp.gdp1h.download(url: str = 'https://www.aoml.noaa.gov/ftp/pub/phod/buoydata/hourly_product/v2.01', tmp_path: str = '/tmp/clouddrift/gdp', drifter_ids: list[int] | None = None, n_random_id: int | None = None)[source]#

Download individual NetCDF files from the AOML server.

Parameters#

urlstr

URL from which to download the data.

tmp_pathstr

Path to the directory where the individual NetCDF files are stored.

drifter_idslist, optional

List of drifter to retrieve (Default: all)

n_random_idint, optional

Randomly select n_random_id drifter IDs to download (Default: None)

Returns#

outlist

List of retrieved drifters

clouddrift.adapters.gdp.gdp1h.preprocess(index: int, **kwargs) Dataset[source]#

Extract and preprocess the Lagrangian data and attributes.

This function takes an identification number that can be used to create a file or url pattern or select data from a Dataframe. It then preprocesses the data and returns a clean Xarray Dataset.

Parameters#

indexint

Drifter’s identification number

Returns#

dsxr.Dataset

Xarray Dataset containing the data and attributes

clouddrift.adapters.gdp.gdp1h.to_raggedarray(drifter_ids: list[int] | None = None, n_random_id: int | None = None, url: str = 'https://www.aoml.noaa.gov/ftp/pub/phod/buoydata/hourly_product/v2.01', tmp_path: str | None = None) RaggedArray[source]#

Download and process individual GDP hourly files and return a RaggedArray instance with the data.

Parameters#

drifter_idslist[int], optional

List of drifters to retrieve (Default: all)

n_random_idlist[int], optional

Randomly select n_random_id drifter NetCDF files

urlstr

URL from which to download the data (Default: GDP_DATA_URL). Alternatively, it can be GDP_DATA_URL_EXPERIMENTAL.

tmp_pathstr, optional

Path to the directory where the individual NetCDF files are stored (default varies depending on operating system; /tmp/clouddrift/gdp on Linux)

Returns#

outRaggedArray

A RaggedArray instance of the requested dataset

Examples#

Invoke to_raggedarray without any arguments to download all drifter data from the 2.01 GDP feed:

>>> from clouddrift.adapters.gdp1h import to_raggedarray
>>> ra = to_raggedarray()

To download a random sample of 100 drifters, for example for development or testing, use the n_random_id argument:

>>> ra = to_raggedarray(n_random_id=100)

To download a specific list of drifters, use the drifter_ids argument:

>>> ra = to_raggedarray(drifter_ids=[44136, 54680, 83463])

To download the experimental 2.01 GDP feed, use the url argument to specify the experimental feed URL:

>>> from clouddrift.adapters.gdp1h import GDP_DATA_URL_EXPERIMENTAL, to_raggedarray
>>> ra = to_raggedarray(url=GDP_DATA_URL_EXPERIMENTAL)

Finally, to_raggedarray returns a RaggedArray instance which provides a convenience method to emit a xarray.Dataset instance:

>>> ds = ra.to_xarray()

To write the ragged array dataset to a NetCDF file on disk, do

>>> ds.to_netcdf("gdp1h.nc", format="NETCDF4")

Alternatively, to write the ragged array to a Parquet file, first create it as an Awkward Array:

>>> arr = ra.to_awkward()
>>> arr.to_parquet("gdp1h.parquet")