Problem
SedonaDB's raster type models data as 2D spatial grids (width × height) with bands as a flat list. This can't represent multi-dimensional
geospatial datasets — climate models with time dimensions, hyperspectral imagery, atmospheric profiles with pressure levels, or Zarr/NetCDF datacubes. Users must flatten into 2D+band (losing semantics) or leave SedonaDB.
Approach
Upgrade each band's data from a 2D tile to an N-D chunk with named dimensions and shape. The band/variable structure is preserved — a Zarr variable or GeoTIFF band maps to a band, but each band can now have shape [time=12, y=256, x=256] instead of just [y=256, x=256]. Legacy rasters load as bands with shape [y, x] — zero change for existing workloads.
Key decisions
-
Band = variable, each band is N-D — Zarr's temperature, pressure, wind_u become 3 bands, each an N-D chunk. GeoTIFF bands map directly. Band math (in[0], in[1]) unchanged.
-
Named dimensions per band — Each band stores dim_names + shape. y/x (or lat/lon) are the spatial axes with hard-coded meaning; all others (time, wavelength, pressure, ...) are arbitrary. Bands may have different dimension sets but must agree on shared dimension sizes.
-
RS_DimToBand / RS_BandToDim — Bridge between "everything is a dimension" (Zarr) and the band model. RS_DimToBand(raster, 'wavelength') promotes a within-band dimension into separate bands so standard band math works.
-
Two execution paths — Native impls for metadata, coordinate conversion, predicates, and new N-D functions. GDAL-backed impls for
compute-heavy spatial ops (clip, zonal stats, map algebra) — these extract y/x slices and operate per spatial slice.
-
Single schema — Legacy 2D schema retired. All loaders produce N-D layout directly. No runtime schema detection needed.
-
Trait-based band storage — NdBandRef trait with nd_buffer() (returns raw buffer + shape + strides for zero-copy access) and
contiguous_data() (flat bytes, copies only if strided). Implementations: InMemoryBand (Phase 1), ZarrBand + LazySlicedBand (Phase 2), GeoTiffBand (Phase 2/3). Strided views are just InMemoryBand with non-standard strides — Arrow BinaryView refcounting handles lifetime.
-
Affine transform — Single transform: List<Float64> (GDAL GeoTransform convention) at raster level. Applies to y/x dims only.
-
OutDb references — Single outdb_uri field per band with scheme-based dispatch (zarr://..., geotiff://...).
Arrow schema
Struct {
crs: Utf8View,
transform: List, -- [origin_x, scale_x, skew_x, origin_y, skew_y, scale_y]
bands: List<Struct {
name: Utf8, -- e.g. "temperature" (nullable)
dim_names: List, -- ["time", "y", "x"]
shape: List, -- [12, 256, 256]
data_type: UInt32,
nodata: Binary,
strides: List, -- per-dim byte strides
offset: UInt64,
outdb_uri: Utf8, -- "zarr://s3://bucket/store#var/0.0.0" (nullable)
data: BinaryView, -- row-major N-D array (eager) or empty (lazy)
}>
}
Core traits
pub struct NdBuffer<'a> {
pub buffer: &'a [u8],
pub shape: &'a [u64],
pub strides: &'a [i64],
pub offset: u64,
pub data_type: BandDataType,
}
pub trait NdBandRef {
fn ndim(&self) -> usize;
fn dim_names(&self) -> &[&str];
fn shape(&self) -> &[u64];
fn dim_size(&self, name: &str) -> Option<u64>;
fn data_type(&self) -> BandDataType;
fn nodata(&self) -> Option<&[u8]>;
/// Raw buffer + strides — for zero-copy consumers (numpy, Arrow FFI).
/// Triggers load for lazy impls.
fn nd_buffer(&self) -> Result<NdBuffer<'_>>;
/// Contiguous row-major bytes — copies only if strides are non-standard.
/// Most RS_* functions use this and never think about strides.
fn contiguous_data(&self) -> Result<Cow<'_, [u8]>>;
}
Phases
Phase 1 (this issue): N-D schema, NdRasterRef/NdBandRef traits, InMemoryBand, reimplement all 33 SedonaDB RS_* functions against traits, new N-D functions (RS_NumDimensions, RS_DimNames, RS_DimSize, RS_Shape, RS_Slice, RS_DimToBand, RS_BandToDim). Strides always contiguous. Crates: sedona-schema, sedona-raster, sedona-raster-functions.
Phase 2: Zarr I/O. Add ZarrBand (lazy load on first access) and LazySlicedBand (wraps lazy band + slice spec so RS_DimToBand stays lazy). Chunk-level caching inside impls. RS_NormalizedDifference(RS_DimToBand(data, 'wavelength'), 77, 54) loads only the chunks for wavelengths 77 and 54.
Phase 3: N-D aggregations (reduce along a dimension), coordinate label arrays, dimension algebra.
Problem
SedonaDB's raster type models data as 2D spatial grids (width × height) with bands as a flat list. This can't represent multi-dimensional
geospatial datasets — climate models with time dimensions, hyperspectral imagery, atmospheric profiles with pressure levels, or Zarr/NetCDF datacubes. Users must flatten into 2D+band (losing semantics) or leave SedonaDB.
Approach
Upgrade each band's data from a 2D tile to an N-D chunk with named dimensions and shape. The band/variable structure is preserved — a Zarr variable or GeoTIFF band maps to a band, but each band can now have shape
[time=12, y=256, x=256]instead of just[y=256, x=256]. Legacy rasters load as bands with shape[y, x]— zero change for existing workloads.Key decisions
Band = variable, each band is N-D — Zarr's
temperature,pressure,wind_ubecome 3 bands, each an N-D chunk. GeoTIFF bands map directly. Band math (in[0],in[1]) unchanged.Named dimensions per band — Each band stores
dim_names+shape.y/x(orlat/lon) are the spatial axes with hard-coded meaning; all others (time,wavelength,pressure, ...) are arbitrary. Bands may have different dimension sets but must agree on shared dimension sizes.RS_DimToBand/RS_BandToDim— Bridge between "everything is a dimension" (Zarr) and the band model.RS_DimToBand(raster, 'wavelength')promotes a within-band dimension into separate bands so standard band math works.Two execution paths — Native impls for metadata, coordinate conversion, predicates, and new N-D functions. GDAL-backed impls for
compute-heavy spatial ops (clip, zonal stats, map algebra) — these extract y/x slices and operate per spatial slice.
Single schema — Legacy 2D schema retired. All loaders produce N-D layout directly. No runtime schema detection needed.
Trait-based band storage —
NdBandReftrait withnd_buffer()(returns raw buffer + shape + strides for zero-copy access) andcontiguous_data()(flat bytes, copies only if strided). Implementations:InMemoryBand(Phase 1),ZarrBand+LazySlicedBand(Phase 2),GeoTiffBand(Phase 2/3). Strided views are justInMemoryBandwith non-standard strides — Arrow BinaryView refcounting handles lifetime.Affine transform — Single
transform: List<Float64>(GDAL GeoTransform convention) at raster level. Applies to y/x dims only.OutDb references — Single
outdb_urifield per band with scheme-based dispatch (zarr://...,geotiff://...).Arrow schema
Core traits
Phases
Phase 1 (this issue): N-D schema, NdRasterRef/NdBandRef traits, InMemoryBand, reimplement all 33 SedonaDB RS_* functions against traits, new N-D functions (RS_NumDimensions, RS_DimNames, RS_DimSize, RS_Shape, RS_Slice, RS_DimToBand, RS_BandToDim). Strides always contiguous. Crates: sedona-schema, sedona-raster, sedona-raster-functions.
Phase 2: Zarr I/O. Add ZarrBand (lazy load on first access) and LazySlicedBand (wraps lazy band + slice spec so RS_DimToBand stays lazy). Chunk-level caching inside impls.
RS_NormalizedDifference(RS_DimToBand(data, 'wavelength'), 77, 54)loads only the chunks for wavelengths 77 and 54.Phase 3: N-D aggregations (reduce along a dimension), coordinate label arrays, dimension algebra.