clouddrift.binning.binned_statistics

clouddrift.binning.binned_statistics#

clouddrift.binning.binned_statistics(coords: ndarray | list[ndarray], data: ndarray | list[ndarray] | None = None, bins: int | list = 10, bins_range: list | None = None, dim_names: list[str] | None = None, output_names: list[str] | None = None, statistics: str | list | Callable[[ndarray], float] = 'count') Dataset[source]#

Perform N-dimensional binning and compute statistics of values in each bin. The result is returned as an Xarray Dataset.

Parameters#

coordsarray-like or list of array-like

Array(s) of Lagrangian data coordinates to be binned. For 1D, provide a single array. For N-dimensions, provide a list of N arrays, each giving coordinates along one dimension.

dataarray-like or list of array-like

Data values associated with the Lagrangian coordinates in coords. Can be a single array or a list of arrays for multiple variables. Complex values are supported for the supported statistics except for ‘min’, ‘max’, and ‘median’.

binsint or lists, optional

Number of bins or bin edges per dimension. It can be: - An int: same number of bins for all dimensions, - A list of ints or arrays: one per dimension, specifying either bin count or bin edges, - None: defaults to 10 bins per dimension.

bins_rangelist of tuples, optional

Outer bin limits for each dimension.

statisticsstr or list of str, Callable[[np.ndarray], float] or list[Callable[[np.ndarray], float]]

Statistics to compute for each bin. It can be: - a string, supported values: ‘count’, ‘sum’, ‘mean’, ‘median’, ‘std’, ‘min’, ‘max’, (default: “count”), - a custom function as a callable for univariate statistics that take a 1D array of values and return a single value.

The callable is applied to each variable of data.

  • a tuple of (output_name, callable) for multivariate statistics. ‘output_name’ is used to identify the resulting variables. In this case, the callable will receive the list of arrays provided in data. For example, to calculate kinetic energy from data with velocity components u and v, you can pass data = [u, v] and statistics=(“ke”, lambda data: np.sqrt(np.mean(data[0] ** 2 + data[1] ** 2))).

  • a list containing any combination of the above, e.g., [‘mean’, np.nanmax, (‘ke’, lambda data: np.sqrt(np.mean(data[0] ** 2 + data[1] ** 2)))].

dim_nameslist of str, optional

Names for the dimensions of the output xr.Dataset. If None, default names are “coord_0”, “coord_1”, etc.

output_nameslist of str, optional

Names for output variables in the xr.Dataset. If None, default names are “data_0_{statistic}”, “data_1_{statistic}”, etc.

Returns#

xr.Dataset

Xarray dataset with binned means and count for each variable.