CloudDrift, a platform for accelerating research with Lagrangian climate data

CloudDrift, a platform for accelerating research with Lagrangian climate data#

Lagrangian data typically refers to oceanic and atmosphere information acquired by observing platforms drifting with the flow they are embedded within, but also refers more broadly to the data originating from uncrewed platforms, vehicles, and animals that gather data along their unrestricted and often complex paths. Because such paths traverse both spatial and temporal dimensions, Lagrangian data can convolve spatial and temporal information that cannot always readily be organized in common data structures and stored in standard file formats with the help of common libraries and standards.

As such, for both originators and users, Lagrangian data present challenges that the CloudDrift project aims to overcome. This project is funded by the NSF EarthCube program through EarthCube Capabilities Grant No. 2126413.

Motivations#

The Global Drifter Program (GDP) of the US National Oceanic and Atmospheric Administration has released to date nearly 25,000 drifting buoys, or drifters, with the goal of obtaining observations of oceanic velocity, sea surface temperature, and sea level pressure. From these drifter observations, the GDP generates two data products: one of oceanic variables estimated along drifter trajectories at hourly time steps, and one at six-hourly steps.

There are a few ways to retrieve the data, but all typically require time-consuming preprocessing steps in order to prepare the data for analysis. As an example, the datasets can be retrieved through an ERDDAP server, but requests are limited in size. The latest 6-hourly dataset is distributed as a collection of thousands of individual NetCDF files or as a series of ASCII files. Until recently, the hourly dataset was distributed as a collection of individual NetCDF files (17,324 for version 1.04c) but is now distributed by NOAA NCEI as a single NetCDF file containing a series of ragged arrays, thanks to the work of CloudDrift. A single file simplifies data distribution, decreases metadata redundancies, and efficiently stores a Lagrangian data collection of uneven lengths.

CloudDrift’s analysis functions are centered around the ragged-array data structure:

Ragged array schematic

CloudDrift’s goals are to simplify the necessary steps to get started with Lagrangian datasets and to provide a cloud-ready library to accelerate Lagrangian analysis.

Getting started#

Reference#