chore(python/sedonadb): Add geography integration test framework#756
Open
paleolimbot wants to merge 13 commits intoapache:mainfrom
Open
chore(python/sedonadb): Add geography integration test framework#756paleolimbot wants to merge 13 commits intoapache:mainfrom
paleolimbot wants to merge 13 commits intoapache:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces a geography integration test framework for SedonaDB’s Python package, including a new BigQuery-backed engine (via ADBC) with a YAML-based query-result cache to avoid repeated slow BigQuery roundtrips.
Changes:
- Added a
BigQueryDBEngineimplementation plus a YAML-backedArrowSQLCachefor cached Arrow IPC results. - Added new geography-focused integration test modules (accessors/measures/predicates/transformations/constructors) that run across SedonaDB/PostGIS/BigQuery where applicable.
- Added a checked-in BigQuery cache file and updated Python test extras to include
pyyaml; removed the oldST_GeogPointtest from the generic functions suite.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| python/sedonadb/python/sedonadb/testing.py | Adds BigQuery engine and ArrowSQLCache YAML cache support. |
| python/sedonadb/tests/geography/test_geog_accessors.py | New geography accessor tests (e.g., ST_Area) across engines. |
| python/sedonadb/tests/geography/test_geog_measures.py | New geography measure tests (e.g., ST_Distance) across engines. |
| python/sedonadb/tests/geography/test_geog_predicates.py | New geography predicate tests (e.g., ST_Intersects) across engines. |
| python/sedonadb/tests/geography/test_geog_transformations.py | New geography transformation tests (e.g., ST_Centroid) across engines. |
| python/sedonadb/tests/geography/test_constructors_parsers_formatters.py | New geography constructor/formatter tests (e.g., ST_GeogPoint, ST_AsBinary). |
| python/sedonadb/tests/geography/bigquery_cache.yml | Adds cached BigQuery query results to run tests without a live BigQuery connection. |
| python/sedonadb/tests/functions/test_functions.py | Removes the older test_st_geogpoint from the generic functions tests. |
| python/sedonadb/pyproject.toml | Adds pyyaml to the test optional dependency group. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
python/sedonadb/tests/geography/test_constructors_parsers_formatters.py
Outdated
Show resolved
Hide resolved
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a framework for geography integration tests. This specifically adds one for each category (categories based on the BigQuery categories here: https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions ), and, critically, adds a
BigQueryengine to test against (because PostGIS only supports a subset of the functions we implement now and bigquery uses the same underlying library so the results in theory should match more closely).For BigQuery, the engine uses application default credentials (i.e.,
gcloud login) and caches the responses by query in a YAML file. This is a bit crude but is needed because the latency of a single BigQuery roundtrip is very slow. We may need something that scales better at some point but this should get us through the basic suite of "is all of this plugged in and can we add regression tests".