rust/sedona-spatial-join: Improve memory estimation and serialization for EvaluatedGeometryArray

There is at least one place where we may be double counting a large chunk of memory in the EvaluatedBatch:

```rust
        // NOTE: sometimes `geom_array` will reuse the memory of `batch`, especially when
        // the expression for evaluating the geometry is a simple column reference. In this case,
        // the in_mem_size will be overestimated. It is a conservative estimation so there's no risk
        // of running out of memory because of underestimation.
        let record_batch_size = get_record_batch_memory_size(&self.batch)?;
        let geom_array_size = self.geom_array.in_mem_size()?;
        Ok(record_batch_size + geom_array_size)
```

This might explain an issue we ran across when trying to enable this by default where we determined we'd need to set the memory pool size to more than twice as much memory as was required for a join used in the released post (my reading of that comment is that we would be reserving ~2x as much memory as was required for most joins but I have not investigated).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rust/sedona-spatial-join: Improve memory estimation and serialization for EvaluatedGeometryArray #745

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

rust/sedona-spatial-join: Improve memory estimation and serialization for EvaluatedGeometryArray #745

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions