clouddrift.ragged.segment#
- clouddrift.ragged.segment(x: ndarray, tolerance: float | timedelta64 | timedelta | Timedelta, rowsize: ndarray[int] = None) ndarray[int][source]#
- Divide an array into segments based on a tolerance value. - Parameters#- xlist, np.ndarray, or xr.DataArray
- An array to divide into segment. 
- tolerancefloat, np.timedelta64, timedelta, pd.Timedelta
- The maximum signed difference between consecutive points in a segment. The array x will be segmented wherever differences exceed the tolerance. 
- rowsizenp.ndarray[int], optional
- The size of rows if x is originally a ragged array. If present, x will be divided both by gaps that exceed the tolerance, and by the original rows of the ragged array. 
 - Returns#- np.ndarray[int]
- An array of row sizes that divides the input array into segments. 
 - Examples#- The simplest use of - segmentis to provide a tolerance value that is used to divide an array into segments: >>> from clouddrift.ragged import segment, subset >>> import numpy as np- >>> x = [0, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4] >>> segment(x, 0.5) array([1, 3, 2, 4, 1]) - If the array is already previously segmented (e.g. multiple rows in a ragged array), then the - rowsizeargument can be used to preserve the original segments:- >>> x = [0, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4] >>> rowsize = [3, 2, 6] >>> segment(x, 0.5, rowsize) array([1, 2, 1, 1, 1, 4, 1]) - The tolerance can also be negative. In this case, the input array is segmented where the negative difference exceeds the negative value of the tolerance, i.e. where - x[n+1] - x[n] < -tolerance:- >>> x = [0, 1, 2, 0, 1, 2] >>> segment(x, -0.5) array([3, 3]) - To segment an array for both positive and negative gaps, invoke the function twice, once for a positive tolerance and once for a negative tolerance. The result of the first invocation can be passed as the - rowsizeargument to the first- segmentinvocation:- >>> x = [1, 1, 2, 2, 1, 1, 2, 2] >>> segment(x, 0.5, rowsize=segment(x, -0.5)) array([2, 2, 2, 2]) - If the input array contains time objects, the tolerance must be a time interval: - >>> x = np.array([np.datetime64("2023-01-01"), np.datetime64("2023-01-02"), ... np.datetime64("2023-01-03"), np.datetime64("2023-02-01"), ... np.datetime64("2023-02-02")]) >>> segment(x, np.timedelta64(1, "D")) array([3, 2]) 
