clouddrift.adapters.gdp.gdpsource

Contents

clouddrift.adapters.gdp.gdpsource#

Functions

to_raggedarray([tmp_path, skip_download, ...])

Convert GDP source data into a ragged array format and return it as an xarray Dataset.

clouddrift.adapters.gdp.gdpsource.to_raggedarray(tmp_path: str = '/tmp/clouddrift/gdpsource', skip_download: bool = False, max: int | None = None, chunk_size: int = 100000, use_fill_values: bool = True, max_chunks: int | None = None) Dataset[source]#

Convert GDP source data into a ragged array format and return it as an xarray Dataset.

This function processes drifter data from the NOAA GDP (Global Drifter Program) source, organizes it into a ragged array format, and returns the resulting dataset. It supports downloading, filtering, and parallel processing of the data.

Args:
tmp_path (str): Path to the temporary directory for storing downloaded files.

Defaults to _TMP_PATH.

skip_download (bool): If True, skips downloading the data and assumes it is

already available in tmp_path. Defaults to False.

max (int | None): Maximum number of requests to process for testing purposes.

If None, processes all requests. Defaults to None.

chunk_size (int): Number of observations to process in each chunk. Defaults to 100,000. use_fill_values (bool): Whether to use fill values for missing data. Defaults to True. max_chunks (int | None): Maximum number of chunks to process. If None, processes all

chunks. Defaults to None.

Returns:
xr.Dataset: An xarray Dataset containing the processed GDP drifter data in a

ragged array format. The dataset includes both observation and trajectory metadata variables, with appropriate attributes added.

Raises:

Any exceptions raised during file operations, data processing, or async tasks will propagate to the caller.

Notes:
  • The function performs parallel processing of drifter data using asyncio.

  • The resulting dataset is sorted by the start date of each drifter.

  • Metadata attributes for variables are added based on predefined mappings.