clouddrift.adapters.gdp.gdp6h#
This module provides functions and metadata that can be used to convert the
6-hourly Global Drifter Program (GDP) data to a clouddrift.RaggedArray
instance.
Functions
|
Download individual NetCDF files from the AOML server. |
|
Extract and preprocess the Lagrangian data and attributes. |
|
Download and process individual GDP 6-hourly files and return a RaggedArray instance with the data. |
- clouddrift.adapters.gdp.gdp6h.download(url: str = 'https://www.aoml.noaa.gov/ftp/pub/phod/buoydata/6h', tmp_path: str = '/tmp/clouddrift/gdp6h', drifter_ids: list[int] | None = None, n_random_id: int | None = None, skip_download: bool = False)[source]#
Download individual NetCDF files from the AOML server.
Parameters#
- urlstr
URL from which to download the data (Default: GDP_DATA_URL). Alternatively, it can be GDP_DATA_URL_EXPERIMENTAL.
- tmp_pathstr, optional
Path to the directory where the individual NetCDF files are stored (default varies depending on operating system; /tmp/clouddrift/gdp6h on Linux)
- drifter_idslist
List of drifter to retrieve (Default: all)
- n_random_idint
Randomly select n_random_id drifter IDs to download (Default: None)
- skip_downloadbool, optional
If True, make no network requests: discover drifter IDs by scanning
tmp_pathfor existingdrifter_6h_*.ncfiles and use locally cacheddirfl_*.datmetadata files. Default is False.
Returns#
- outlist
List of retrieved drifters
Raises#
- ValueError
If no matching drifter files are found for the requested selection.
- clouddrift.adapters.gdp.gdp6h.preprocess(index: int, **kwargs) Dataset[source]#
Extract and preprocess the Lagrangian data and attributes.
This function takes an identification number that can be used to create a file or url pattern or select data from a Dataframe. It then preprocesses the data and returns a clean Xarray Dataset.
Parameters#
- indexint
Drifter’s identification number
Returns#
- dsxr.Dataset
Xarray Dataset containing the data and attributes
- clouddrift.adapters.gdp.gdp6h.to_raggedarray(drifter_ids: list[int] | None = None, n_random_id: int | None = None, tmp_path: str = '/tmp/clouddrift/gdp6h', skip_download: bool = False) RaggedArray[source]#
Download and process individual GDP 6-hourly files and return a RaggedArray instance with the data.
Parameters#
- drifter_idslist[int], optional
List of drifters to retrieve (Default: all)
- n_random_idlist[int], optional
Randomly select n_random_id drifter NetCDF files
- tmp_pathstr, optional
Path to the directory where the individual NetCDF files are stored (default varies depending on operating system; /tmp/clouddrift/gdp6h on Linux)
- skip_downloadbool, optional
If True, make no network requests: discover drifter IDs by scanning
tmp_pathfor existingdrifter_6h_*.ncfiles and use locally cacheddirfl_*.datmetadata files. Default is False.
Returns#
- outRaggedArray
A RaggedArray instance of the requested dataset.
Raises#
- ValueError
If no matching drifter files are found for the requested selection.
Examples#
Invoke to_raggedarray without any arguments to download all drifter data from the 6-hourly GDP feed:
>>> from clouddrift.adapters.gdp.gdp6h import to_raggedarray >>> ra = to_raggedarray()
To download a random sample of 100 drifters, for example for development or testing, use the n_random_id argument:
>>> ra = to_raggedarray(n_random_id=100)
To download a specific list of drifters, use the drifter_ids argument:
>>> ra = to_raggedarray(drifter_ids=[54375, 114956, 126934])
The function to_raggedarray returns a RaggedArray instance which provides a convenience method to produce a xarray.Dataset instance for analysis:
>>> ds = ra.to_xarray()
To write the ragged array dataset to a NetCDF file or a Zarr file on disk, you can use the to_netcdf or to_zarr method of the xarray.Dataset instance:
>>> ds.to_netcdf("gdp6h.nc") >>> ds.to_zarr("gdp6h.zarr", mode="w")
To write the ragged array dataset to a Parquet file, you can directly use the to_parquet method of the RaggedArray instance:
>>> ra.to_parquet("gdp6h.parquet")