There is at least one place where we may be double counting a large chunk of memory in the EvaluatedBatch:
// NOTE: sometimes `geom_array` will reuse the memory of `batch`, especially when
// the expression for evaluating the geometry is a simple column reference. In this case,
// the in_mem_size will be overestimated. It is a conservative estimation so there's no risk
// of running out of memory because of underestimation.
let record_batch_size = get_record_batch_memory_size(&self.batch)?;
let geom_array_size = self.geom_array.in_mem_size()?;
Ok(record_batch_size + geom_array_size)
This might explain an issue we ran across when trying to enable this by default where we determined we'd need to set the memory pool size to more than twice as much memory as was required for a join used in the released post (my reading of that comment is that we would be reserving ~2x as much memory as was required for most joins but I have not investigated).
There is at least one place where we may be double counting a large chunk of memory in the EvaluatedBatch:
This might explain an issue we ran across when trying to enable this by default where we determined we'd need to set the memory pool size to more than twice as much memory as was required for a join used in the released post (my reading of that comment is that we would be reserving ~2x as much memory as was required for most joins but I have not investigated).