Skip to content

N-Dimensional Raster Type Extension #746

@james-willis

Description

@james-willis

Problem

SedonaDB's raster type models data as 2D spatial grids (width × height) with bands as a flat list. This can't represent multi-dimensional
geospatial datasets — climate models with time dimensions, hyperspectral imagery, atmospheric profiles with pressure levels, or Zarr/NetCDF datacubes. Users must flatten into 2D+band (losing semantics) or leave SedonaDB.

Approach

Upgrade each band's data from a 2D tile to an N-D chunk with named dimensions and shape. The band/variable structure is preserved — a Zarr variable or GeoTIFF band maps to a band, but each band can now have shape [time=12, y=256, x=256] instead of just [y=256, x=256]. Legacy rasters load as bands with shape [y, x] — zero change for existing workloads.

Key decisions

  1. Band = variable, each band is N-D — Zarr's temperature, pressure, wind_u become 3 bands, each an N-D chunk. GeoTIFF bands map directly. Band math (in[0], in[1]) unchanged.

  2. Named dimensions per band — Each band stores dim_names + shape. y/x (or lat/lon) are the spatial axes with hard-coded meaning; all others (time, wavelength, pressure, ...) are arbitrary. Bands may have different dimension sets but must agree on shared dimension sizes.

  3. RS_DimToBand / RS_BandToDim — Bridge between "everything is a dimension" (Zarr) and the band model. RS_DimToBand(raster, 'wavelength') promotes a within-band dimension into separate bands so standard band math works.

  4. Two execution paths — Native impls for metadata, coordinate conversion, predicates, and new N-D functions. GDAL-backed impls for
    compute-heavy spatial ops (clip, zonal stats, map algebra) — these extract y/x slices and operate per spatial slice.

  5. Single schema — Legacy 2D schema retired. All loaders produce N-D layout directly. No runtime schema detection needed.

  6. Trait-based band storageNdBandRef trait with nd_buffer() (returns raw buffer + shape + strides for zero-copy access) and
    contiguous_data() (flat bytes, copies only if strided). Implementations: InMemoryBand (Phase 1), ZarrBand + LazySlicedBand (Phase 2), GeoTiffBand (Phase 2/3). Strided views are just InMemoryBand with non-standard strides — Arrow BinaryView refcounting handles lifetime.

  7. Affine transform — Single transform: List<Float64> (GDAL GeoTransform convention) at raster level. Applies to y/x dims only.

  8. OutDb references — Single outdb_uri field per band with scheme-based dispatch (zarr://..., geotiff://...).

Arrow schema

Struct {
  crs:       Utf8View,
  transform: List,     -- [origin_x, scale_x, skew_x, origin_y, skew_y, scale_y]
  bands: List<Struct {
    name:      Utf8,            -- e.g. "temperature" (nullable)
    dim_names: List,      -- ["time", "y", "x"]
    shape:     List,    -- [12, 256, 256]
    data_type: UInt32,
    nodata:    Binary,
    strides:   List,     -- per-dim byte strides
    offset:    UInt64,
    outdb_uri: Utf8,            -- "zarr://s3://bucket/store#var/0.0.0" (nullable)
    data:      BinaryView,      -- row-major N-D array (eager) or empty (lazy)
  }>
}

Core traits

pub struct NdBuffer<'a> {
    pub buffer: &'a [u8],
    pub shape: &'a [u64],
    pub strides: &'a [i64],
    pub offset: u64,
    pub data_type: BandDataType,
}

pub trait NdBandRef {
    fn ndim(&self) -> usize;
    fn dim_names(&self) -> &[&str];
    fn shape(&self) -> &[u64];
    fn dim_size(&self, name: &str) -> Option<u64>;
    fn data_type(&self) -> BandDataType;
    fn nodata(&self) -> Option<&[u8]>;

    /// Raw buffer + strides — for zero-copy consumers (numpy, Arrow FFI).
    /// Triggers load for lazy impls.
    fn nd_buffer(&self) -> Result<NdBuffer<'_>>;

    /// Contiguous row-major bytes — copies only if strides are non-standard.
    /// Most RS_* functions use this and never think about strides.
    fn contiguous_data(&self) -> Result<Cow<'_, [u8]>>;
}

Phases

Phase 1 (this issue): N-D schema, NdRasterRef/NdBandRef traits, InMemoryBand, reimplement all 33 SedonaDB RS_* functions against traits, new N-D functions (RS_NumDimensions, RS_DimNames, RS_DimSize, RS_Shape, RS_Slice, RS_DimToBand, RS_BandToDim). Strides always contiguous. Crates: sedona-schema, sedona-raster, sedona-raster-functions.

Phase 2: Zarr I/O. Add ZarrBand (lazy load on first access) and LazySlicedBand (wraps lazy band + slice spec so RS_DimToBand stays lazy). Chunk-level caching inside impls. RS_NormalizedDifference(RS_DimToBand(data, 'wavelength'), 77, 54) loads only the chunks for wavelengths 77 and 54.

Phase 3: N-D aggregations (reduce along a dimension), coordinate label arrays, dimension algebra.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions