clouddrift.ragged.segment#
- clouddrift.ragged.segment(x: ndarray, tolerance: float | timedelta64 | timedelta | Timedelta, rowsize: ndarray[int] = None) ndarray[int] [source]#
Divide an array into segments based on a tolerance value.
Parameters#
- xlist, np.ndarray, or xr.DataArray
An array to divide into segment.
- tolerancefloat, np.timedelta64, timedelta, pd.Timedelta
The maximum signed difference between consecutive points in a segment. The array x will be segmented wherever differences exceed the tolerance.
- rowsizenp.ndarray[int], optional
The size of rows if x is originally a ragged array. If present, x will be divided both by gaps that exceed the tolerance, and by the original rows of the ragged array.
Returns#
- np.ndarray[int]
An array of row sizes that divides the input array into segments.
Examples#
The simplest use of
segment
is to provide a tolerance value that is used to divide an array into segments: >>> from clouddrift.ragged import segment, subset >>> import numpy as np>>> x = [0, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4] >>> segment(x, 0.5) array([1, 3, 2, 4, 1])
If the array is already previously segmented (e.g. multiple rows in a ragged array), then the
rowsize
argument can be used to preserve the original segments:>>> x = [0, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4] >>> rowsize = [3, 2, 6] >>> segment(x, 0.5, rowsize) array([1, 2, 1, 1, 1, 4, 1])
The tolerance can also be negative. In this case, the input array is segmented where the negative difference exceeds the negative value of the tolerance, i.e. where
x[n+1] - x[n] < -tolerance
:>>> x = [0, 1, 2, 0, 1, 2] >>> segment(x, -0.5) array([3, 3])
To segment an array for both positive and negative gaps, invoke the function twice, once for a positive tolerance and once for a negative tolerance. The result of the first invocation can be passed as the
rowsize
argument to the firstsegment
invocation:>>> x = [1, 1, 2, 2, 1, 1, 2, 2] >>> segment(x, 0.5, rowsize=segment(x, -0.5)) array([2, 2, 2, 2])
If the input array contains time objects, the tolerance must be a time interval:
>>> x = np.array([np.datetime64("2023-01-01"), np.datetime64("2023-01-02"), ... np.datetime64("2023-01-03"), np.datetime64("2023-02-01"), ... np.datetime64("2023-02-02")]) >>> segment(x, np.timedelta64(1, "D")) array([3, 2])