clouddrift.adapters.gdp.gdpsource.to_raggedarray#
- clouddrift.adapters.gdp.gdpsource.to_raggedarray(tmp_path: str = '/tmp/clouddrift/gdpsource', skip_download: bool = False, max: int | None = None, chunk_size: int = 100000, use_fill_values: bool = True, max_chunks: int | None = None) Dataset [source]#
Convert GDP source data into a ragged array format and return it as an xarray Dataset.
This function processes drifter data from the NOAA GDP (Global Drifter Program) source, organizes it into a ragged array format, and returns the resulting dataset. It supports downloading, filtering, and parallel processing of the data.
- Args:
- tmp_path (str): Path to the temporary directory for storing downloaded files.
Defaults to _TMP_PATH.
- skip_download (bool): If True, skips downloading the data and assumes it is
already available in tmp_path. Defaults to False.
- max (int | None): Maximum number of requests to process for testing purposes.
If None, processes all requests. Defaults to None.
chunk_size (int): Number of observations to process in each chunk. Defaults to 100,000. use_fill_values (bool): Whether to use fill values for missing data. Defaults to True. max_chunks (int | None): Maximum number of chunks to process. If None, processes all
chunks. Defaults to None.
- Returns:
- xr.Dataset: An xarray Dataset containing the processed GDP drifter data in a
ragged array format. The dataset includes both observation and trajectory metadata variables, with appropriate attributes added.
- Raises:
Any exceptions raised during file operations, data processing, or async tasks will propagate to the caller.
- Notes:
The function performs parallel processing of drifter data using asyncio.
The resulting dataset is sorted by the start date of each drifter.
Metadata attributes for variables are added based on predefined mappings.