Skip to content

NPE in ST_Union_Aggr in Spark #2816

@umartin

Description

@umartin

Steps to reproduce:

spark.sql("""
select st_union_aggr(geom) from values (st_point(1,1)), (st_point(2,2)), (null) as t(geom)
""").show()

Result:

ERROR Executor: Exception in task 0.0 in stage 20.0 (TID 23)
java.lang.NullPointerException: Cannot invoke "org.locationtech.jts.geom.Geometry.apply(org.locationtech.jts.geom.GeometryFilter)" because "geom" is null
	at org.locationtech.jts.operation.union.InputExtracter.add(InputExtracter.java:136)
	at org.locationtech.jts.operation.union.InputExtracter.add(InputExtracter.java:128)
	at org.locationtech.jts.operation.union.InputExtracter.extract(InputExtracter.java:49)
	at org.locationtech.jts.operation.union.UnaryUnionOp.extract(UnaryUnionOp.java:156)
	at org.locationtech.jts.operation.union.UnaryUnionOp.<init>(UnaryUnionOp.java:137)
	at org.locationtech.jts.operation.overlayng.OverlayNGRobust.union(OverlayNGRobust.java:82)
	at org.apache.spark.sql.sedona_sql.expressions.ST_Union_Aggr.finish(AggregateFunctions.scala:88)
	at org.apache.spark.sql.sedona_sql.expressions.ST_Union_Aggr.finish(AggregateFunctions.scala:57)
	at org.apache.spark.sql.execution.aggregate.ScalaAggregator.eval(udaf.scala:532)
	at org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.eval(interfaces.scala:594)
	at org.apache.spark.sql.execution.aggregate.AggregationIterator.$anonfun$generateResultProjection$5(AggregationIterator.scala:260)
	at org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.next(ObjectAggregationIterator.scala:100)
	at org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.next(ObjectAggregationIterator.scala:35)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:403)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:901)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:901)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:374)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:338)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)
	at org.apache.spark.scheduler.Task.run(Task.scala:147)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:647)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:80)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:77)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:650)
	at java.base[/java.util.concurrent.ThreadPoolExecutor.runWorker](```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions