Skip to content

CompileException: Unknown variable or type "org.apache.spark.sql.sedona_sql.UDT.GeometryUDT$$.MODULE$" with Sedona 1.8.1 on Databricks 17.3 LTS #2820

@ilyaavgul-tomtom

Description

@ilyaavgul-tomtom

Problem

After upgrading from Databricks Runtime 16.4 to 17.3 LTS, Spark whole-stage codegen fails with:

org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 45, Column 66: 
Unknown variable or type "org.apache.spark.sql.sedona_sql.UDT.GeometryUDT$$.MODULE$"

The same cluster config, jars, and application code work without issues on DBR 16.4.

Setup

  • Sedona: 1.8.1 (sedona-spark-shaded-4.0_2.13-1.8.1.jar + geotools-wrapper-1.8.1-33.1.jar) installed via init script to /databricks/jars/
  • Scala: 2.13.15
  • Spark config:
spark.driver.maxResultSize 8g
spark.databricks.delta.preview.enabled true
spark.network.timeout 600s
spark.shuffle.io.retryWait 30s
spark.cleaner.referenceTracking.cleanCheckpoints true
spark.sql.shuffle.partitions auto

Reproduction

The error triggers when calling .map() on a DataFrame produced by Adapter.toDf() after a spatial join via the RDD API:

import org.apache.sedona.core.spatialOperator.{JoinQuery, SpatialPredicate}
import org.apache.sedona.sql.utils.Adapter
import org.locationtech.jts.geom.Geometry

val joinResult = JoinQuery.SpatialJoinQueryFlat(blueSpatialRdd, redSpatialRdd, true, SpatialPredicate.INTERSECTS)
val df = Adapter.toDf(joinResult, Seq("redId"), Seq("blueId"), spark)

// This triggers whole-stage codegen which fails on DBR 17.3 LTS
df.map(row => row.getAs[Geometry]("leftgeometry")).collect()

DBR 16.4: works
DBR 17.3 LTS: fails with Unknown variable or type "org.apache.spark.sql.sedona_sql.UDT.GeometryUDT$$.MODULE$"

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions