clouddrift.raggedarray.RaggedArray#

class clouddrift.raggedarray.RaggedArray(coords: dict, metadata: dict, data: dict, attrs_global: dict = {}, attrs_variables: dict = {}, name_dims: dict[str, Literal['rows', 'obs'] | str] = {}, coord_dims: dict[str, str] = {}, var_dims: dict[str, list[str]] = {})[source]#

Bases: object

__init__(coords: dict, metadata: dict, data: dict, attrs_global: dict = {}, attrs_variables: dict = {}, name_dims: dict[str, Literal['rows', 'obs'] | str] = {}, coord_dims: dict[str, str] = {}, var_dims: dict[str, list[str]] = {})[source]#

Methods

`__init__`(coords, metadata, data[, ...])
`allocate`(preprocess_func, indices, rowsize, ...)	Iterate through the files and fill for the ragged array associated with coordinates, and selected metadata and data variables.
`attributes`(ds, name_coords, name_meta, name_data)	Return global attributes and the attributes of all variables (name_coords, name_meta, and name_data) from an Xarray Dataset.
`from_awkward`(array, name_coords, name_dims, ...)	Load a RaggedArray instance from an Awkward Array.
`from_files`(indices, preprocess_func, name_coords)	Generate a ragged array archive from a list of files
`from_netcdf`(filename[, rows_dim_name, ...])	Read a ragged arrays archive from a NetCDF file.
`from_parquet`(filename, name_coords, ...)	Read a ragged array from a parquet file.
`from_xarray`(ds[, rows_dim_name, obs_dim_name])	Populate a RaggedArray instance from an xarray Dataset instance.
`number_of_observations`(rowsize_func, ...)	Iterate through the files and evaluate the number of observations.
`to_awkward`()	Convert ragged array object to an Awkward Array.
`to_netcdf`(filename)	Export ragged array object to a NetCDF file.
`to_parquet`(filename)	Export ragged array object to a parquet file.
`to_xarray`()	Convert ragged array object to a xarray Dataset.
`validate_attributes`()	Validate that each variable has an assigned attribute tag.

static allocate(preprocess_func: Callable[[int], Dataset], indices: list, rowsize: list | ndarray | DataArray, name_coords: list, name_meta: list, name_data: list, name_dims: dict[str, Literal['rows', 'obs'] | str], **kwargs) → tuple[dict, dict, dict, dict, dict][source]#

Iterate through the files and fill for the ragged array associated with coordinates, and selected metadata and data variables.

Parameters#

preprocess_funcCallable[[int], xr.Dataset]: Returns a processed xarray Dataset from an identification number.
indiceslist: List of indices separating row in the ragged arrays.
rowsizelist: List of the number of observations per row.
name_coordslist: Name of the coordinate variables to include in the archive.
name_metalist, optional: Name of metadata variables to include in the archive (Defaults to []).
name_datalist, optional: Name of the data variables to include in the archive (Defaults to []).
name_dims: dict[str, DimNames]: Dimension alias mapped to the name used by clouddrift.

Returns#

Tuple[dict, dict, dict, dict]: Dictionaries containing numerical data and attributes of coordinates, metadata and data variables.

static attributes(ds: Dataset, name_coords: list, name_meta: list, name_data: list) → tuple[dict, dict][source]#

Return global attributes and the attributes of all variables (name_coords, name_meta, and name_data) from an Xarray Dataset.

Parameters#

dsxr.Dataset: _description_
name_coordslist, optional: Name of metadata variables to include in the archive (default is [])
name_metalist, optional: Name of metadata variables to include in the archive (default is [])
name_datalist, optional: Name of the data variables to include in the archive (default is [])

Returns#

Tuple[dict, dict]: The global and variables attributes

classmethod from_awkward(array: Array, name_coords: list, name_dims: dict[str, Literal['rows', 'obs'] | str], coord_dims: dict[str, str])[source]#

Load a RaggedArray instance from an Awkward Array.

Parameters#

arrayak.Array: Awkward Array instance to load the data from
name_coordslist, optional: Names of the coordinate variables in the ragged arrays
name_dims: dict: Map a dimension to an alias.
coord_dims: dict: Map a coordinate to a dimension alias.

Returns#

RaggedArray: A RaggedArray instance

classmethod from_files(indices: list[int], preprocess_func: Callable[[int], Dataset], name_coords: list, name_meta: list = [], name_data: list = [], name_dims: dict[str, Literal['rows', 'obs'] | str] = {}, rowsize_func: Callable[[int], int] | None = None, attrs_global: dict | None = None, attrs_variables: dict | None = None, **kwargs)[source]#

Generate a ragged array archive from a list of files

Parameters#

indiceslist: Identification numbers list to iterate
preprocess_funcCallable[[int], xr.Dataset]: Returns a processed xarray Dataset from an identification number
name_metalist, optional: Name of metadata variables to include in the archive (Defaults to [])
name_datalist, optional: Name of the data variables to include in the archive (Defaults to [])
name_dims: dict: Map an alias to a dimension.
rowsize_funcOptional[Callable[[int], int]], optional: Returns the number of observations from an identification number (to speed up processing) (Defaults to None)

Returns#

RaggedArray: A RaggedArray instance

classmethod from_netcdf(filename: str, rows_dim_name='rows', obs_dim_name='obs')[source]#

Read a ragged arrays archive from a NetCDF file.

This is a thin wrapper around from_xarray().

Parameters#

filenamestr: File name of the NetCDF archive to read.

Returns#

RaggedArray: A ragged array instance

classmethod from_parquet(filename: str, name_coords: list, name_dims: dict[str, Literal['rows', 'obs'] | str], coord_dims: dict[str, str])[source]#

Read a ragged array from a parquet file.

Parameters#

filenamestr: File name of the parquet archive to read.
name_coordslist, optional: Names of the coordinate variables in the ragged arrays
name_dims: dict: Map a alias to a dimension.
coord_dims: dict: Map a coordinate to a dimension alias.

Returns#

RaggedArray: A ragged array instance

classmethod from_xarray(ds: Dataset, rows_dim_name: str = 'rows', obs_dim_name: str = 'obs')[source]#

Populate a RaggedArray instance from an xarray Dataset instance.

Parameters#

dsxr.Dataset: Xarray Dataset from which to load the RaggedArray
rows_dim_namestr, optional: Name of the row dimension in the xarray Dataset
obs_dim_namestr, optional: Name of the observations dimension in the xarray Dataset

Returns#

RaggedArray: A RaggedArray instance

static number_of_observations(rowsize_func: Callable[[int], int], indices: list, **kwargs) → ndarray[source]#

Iterate through the files and evaluate the number of observations.

Parameters#

rowsize_funcCallable[[int], int]]: Function that returns the number observations of a row from its identification number
indiceslist: Identification numbers list to iterate

Returns#

np.ndarray: Number of observations

to_awkward()[source]#

Convert ragged array object to an Awkward Array.

Returns#

ak.Array: Awkward Array containing the ragged array and its attributes

to_netcdf(filename: str)[source]#

Export ragged array object to a NetCDF file.

Parameters#

filenamestr: Name of the NetCDF file to create.

to_parquet(filename: str)[source]#

Export ragged array object to a parquet file.

Parameters#

filenamestr: Name of the parquet file to create.

to_xarray()[source]#

Convert ragged array object to a xarray Dataset.

Parameters#

cast_to_float32bool, optional: Cast all float64 variables to float32 (default is True). This option aims at minimizing the size of the xarray dataset.

Returns#

xr.Dataset: Xarray Dataset containing the ragged arrays and their attributes

validate_attributes()[source]#: Validate that each variable has an assigned attribute tag.

clouddrift.raggedarray.RaggedArray

Contents

clouddrift.raggedarray.RaggedArray#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Parameters#

Returns#

Returns#

Parameters#

Parameters#

Parameters#

Returns#