clouddrift.raggedarray.RaggedArray#
- class clouddrift.raggedarray.RaggedArray(coords: dict, metadata: dict, data: dict, attrs_global: dict = {}, attrs_variables: dict = {}, name_dims: dict[str, Literal['rows', 'obs']] = {}, coord_dims: dict[str, str] = {})[source]#
Bases:
object
- __init__(coords: dict, metadata: dict, data: dict, attrs_global: dict = {}, attrs_variables: dict = {}, name_dims: dict[str, Literal['rows', 'obs']] = {}, coord_dims: dict[str, str] = {})[source]#
Methods
__init__
(coords, metadata, data[, ...])allocate
(preprocess_func, indices, rowsize, ...)Iterate through the files and fill for the ragged array associated with coordinates, and selected metadata and data variables.
attributes
(ds, name_coords, name_meta, name_data)Return global attributes and the attributes of all variables (name_coords, name_meta, and name_data) from an Xarray Dataset.
from_awkward
(array, name_coords, name_dims, ...)Load a RaggedArray instance from an Awkward Array.
from_files
(indices, preprocess_func, name_coords)Generate a ragged array archive from a list of files
from_netcdf
(filename[, rows_dim_name, ...])Read a ragged arrays archive from a NetCDF file.
from_parquet
(filename, name_coords, ...)Read a ragged array from a parquet file.
from_xarray
(ds[, rows_dim_name, obs_dim_name])Populate a RaggedArray instance from an xarray Dataset instance.
number_of_observations
(rowsize_func, ...)Iterate through the files and evaluate the number of observations.
Convert ragged array object to an Awkward Array.
to_netcdf
(filename)Export ragged array object to a NetCDF file.
to_parquet
(filename)Export ragged array object to a parquet file.
Convert ragged array object to a xarray Dataset.
Validate that each variable has an assigned attribute tag.
- static allocate(preprocess_func: Callable[[int], Dataset], indices: list, rowsize: list | ndarray | DataArray, name_coords: list, name_meta: list, name_data: list, name_dims: dict[str, Literal['rows', 'obs']], **kwargs) tuple[dict, dict, dict, dict] [source]#
Iterate through the files and fill for the ragged array associated with coordinates, and selected metadata and data variables.
Parameters#
- preprocess_funcCallable[[int], xr.Dataset]
Returns a processed xarray Dataset from an identification number.
- indiceslist
List of indices separating row in the ragged arrays.
- rowsizelist
List of the number of observations per row.
- name_coordslist
Name of the coordinate variables to include in the archive.
- name_metalist, optional
Name of metadata variables to include in the archive (Defaults to []).
- name_datalist, optional
Name of the data variables to include in the archive (Defaults to []).
- name_dims: dict[str, DimNames]
Dimension alias mapped to the name used by clouddrift.
Returns#
- Tuple[dict, dict, dict, dict]
Dictionaries containing numerical data and attributes of coordinates, metadata and data variables.
- static attributes(ds: Dataset, name_coords: list, name_meta: list, name_data: list) tuple[dict, dict] [source]#
Return global attributes and the attributes of all variables (name_coords, name_meta, and name_data) from an Xarray Dataset.
Parameters#
- dsxr.Dataset
_description_
- name_coordslist, optional
Name of metadata variables to include in the archive (default is [])
- name_metalist, optional
Name of metadata variables to include in the archive (default is [])
- name_datalist, optional
Name of the data variables to include in the archive (default is [])
Returns#
- Tuple[dict, dict]
The global and variables attributes
- classmethod from_awkward(array: Array, name_coords: list, name_dims: dict[str, Literal['rows', 'obs']], coord_dims: dict[str, str])[source]#
Load a RaggedArray instance from an Awkward Array.
Parameters#
- arrayak.Array
Awkward Array instance to load the data from
- name_coordslist, optional
Names of the coordinate variables in the ragged arrays
- name_dims: dict
Map a dimension to an alias.
- coord_dims: dict
Map a coordinate to a dimension alias.
Returns#
- RaggedArray
A RaggedArray instance
- classmethod from_files(indices: list[int], preprocess_func: Callable[[int], Dataset], name_coords: list, name_meta: list = [], name_data: list = [], name_dims: dict[str, Literal['rows', 'obs']] = {}, rowsize_func: Callable[[int], int] | None = None, attrs_global: dict | None = None, attrs_variables: dict | None = None, **kwargs)[source]#
Generate a ragged array archive from a list of files
Parameters#
- indiceslist
Identification numbers list to iterate
- preprocess_funcCallable[[int], xr.Dataset]
Returns a processed xarray Dataset from an identification number
- name_metalist, optional
Name of metadata variables to include in the archive (Defaults to [])
- name_datalist, optional
Name of the data variables to include in the archive (Defaults to [])
- name_dims: dict
Map an alias to a dimension.
- rowsize_funcOptional[Callable[[int], int]], optional
Returns the number of observations from an identification number (to speed up processing) (Defaults to None)
Returns#
- RaggedArray
A RaggedArray instance
- classmethod from_netcdf(filename: str, rows_dim_name='rows', obs_dim_name='obs')[source]#
Read a ragged arrays archive from a NetCDF file.
This is a thin wrapper around
from_xarray()
.Parameters#
- filenamestr
File name of the NetCDF archive to read.
Returns#
- RaggedArray
A ragged array instance
- classmethod from_parquet(filename: str, name_coords: list, name_dims: dict[str, Literal['rows', 'obs']], coord_dims: dict[str, str])[source]#
Read a ragged array from a parquet file.
Parameters#
- filenamestr
File name of the parquet archive to read.
- name_coordslist, optional
Names of the coordinate variables in the ragged arrays
- name_dims: dict
Map a alias to a dimension.
- coord_dims: dict
Map a coordinate to a dimension alias.
Returns#
- RaggedArray
A ragged array instance
- classmethod from_xarray(ds: Dataset, rows_dim_name: str = 'rows', obs_dim_name: str = 'obs')[source]#
Populate a RaggedArray instance from an xarray Dataset instance.
Parameters#
- dsxr.Dataset
Xarray Dataset from which to load the RaggedArray
- rows_dim_namestr, optional
Name of the row dimension in the xarray Dataset
- obs_dim_namestr, optional
Name of the observations dimension in the xarray Dataset
Returns#
- RaggedArray
A RaggedArray instance
- static number_of_observations(rowsize_func: Callable[[int], int], indices: list, **kwargs) ndarray [source]#
Iterate through the files and evaluate the number of observations.
Parameters#
- rowsize_funcCallable[[int], int]]
Function that returns the number observations of a row from its identification number
- indiceslist
Identification numbers list to iterate
Returns#
- np.ndarray
Number of observations
- to_awkward()[source]#
Convert ragged array object to an Awkward Array.
Returns#
- ak.Array
Awkward Array containing the ragged array and its attributes
- to_netcdf(filename: str)[source]#
Export ragged array object to a NetCDF file.
Parameters#
- filenamestr
Name of the NetCDF file to create.
- to_parquet(filename: str)[source]#
Export ragged array object to a parquet file.
Parameters#
- filenamestr
Name of the parquet file to create.
- to_xarray()[source]#
Convert ragged array object to a xarray Dataset.
Parameters#
- cast_to_float32bool, optional
Cast all float64 variables to float32 (default is True). This option aims at minimizing the size of the xarray dataset.
Returns#
- xr.Dataset
Xarray Dataset containing the ragged arrays and their attributes