fix(sqllab): require dataset match for raw query access#10
Open
hbrooks wants to merge 16 commits into
Open
Conversation
In raise_for_access, the SQL Lab branch let catalog_access and schema_access short-circuit the per-table dataset-existence check. The chart-data branch right next door requires can_access_schema on a registered SqlaTable, so the SQL Lab path was strictly broader than the chart-data path for the same permission. Tighten the SQL Lab path so every referenced table must resolve to a Superset dataset the user holds datasource_access on (or owns). The table-only call sites (/table_metadata/, /select_star/, MetaDB) keep the historical catalog_access / schema_access fallthrough since they return schema metadata only, not row data. Two existing tests updated to mock query_datasources_by_name so they exercise the new dataset-match path; one regression test added. Backwards compatibility: deployments that used schema_access as a deliberate "full SQL Lab access to one schema" grant will start being denied. Mitigation: grant database_access, or register the relevant tables as Superset datasets. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…t_match)
Code review of the previous revision flagged that `query is not None`
was too broad as a discriminator: it caught chart-data on saved-query
datasources, dataset create/update from SQL, SQL Lab results/export,
and explore on saved queries, all of which were schema_access-by-design
flows. It also missed MetaDB, which uses the table-only call path but
actually returns row data.
Replace the inferred flag with an explicit kwarg
`force_dataset_match: bool = False` on `raise_for_access`. Default
(False) preserves the historical catalog_access / schema_access
fallthrough for chart-data, dataset CRUD, table_metadata, select_star,
explore, etc. The three call sites that actually execute or return raw
row data set it to True:
- superset/sqllab/validators.py (SQL Lab execute pre-flight)
- superset/models/sql_lab.py (Query.raise_for_access, used by
results.py / export.py /
streaming_export_command.py)
- superset/extensions/metadb.py (closes the same bypass on the
database+table path)
Tests:
- Reverts the broad updates to test_raise_for_access_query /
test_raise_for_access_sql so they exercise the default path again.
- Drops the vacuous regression test that never granted schema_access
in the first place.
- Adds three focused tests under force_dataset_match=True:
* allows when a dataset exists and datasource_access matches
* denies when only schema_access is held and no dataset exists
* (default path) still allows schema_access for non-dataset
tables, proving backwards compatibility.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Code review on the previous revision flagged two related ways an authenticated user could still defeat the new force_dataset_match gate. Both close the same authorization gap and are confirmed. 1. SqlaTable.query_datasources_by_name drops the schema filter when schema is None (see connectors/sqla/models.py:1938 `if schema:`). On a backend whose default_schema_for_query() returns None and a parser output that did not include a schema, the lookup matches SqlaTables across every schema. The engine then resolves the actual query against the wrong schema via search_path, while the dataset-match check passed against an unrelated row. Fix: under force_dataset_match=True, refuse tables whose qualified schema is None. Users must pick a schema in the SQL Lab schema picker or qualify the table explicitly. 2. sqlglot represents statements it cannot fully model as exp.Command. extract_tables_from_statement returns set() for an exp.Command with no parseable inner literal (e.g. CALL stored_proc() that builds dynamic SQL internally). The per-table check below then sees zero tables and the script proceeds, even though the procedure may read arbitrary tables at execution time. Fix: add SQLScript.has_unparseable_statement; under force_dataset_match=True, refuse any script that contains one. The default (chart-data / dataset CRUD / metadata previews) is unchanged. Two new regression tests cover both vectors. Existing tests for the non-strict path are untouched. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… + cover unparseable check - security/manager.py: apply default schema in the table branch of raise_for_access so MetaDB's 2-part URIs (db.table) keep working under force_dataset_match; the strict-deny still fires when no default can be resolved. - tests/integration_tests/security_tests.py: pin the new behavior. - tests/unit_tests/sql/parse_tests.py: cover has_unparseable_statement to reach 100% sql/ coverage (parameterized: SELECT, CALL, mixed, KQL). - tests/integration_tests/sqllab_tests.py: update test_sql_json_schema_access to reflect that schema_access alone no longer grants raw SQL Lab access (the legacy fallthrough this PR closes). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…' into fix/sqllab-dataset-scoped-access
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Originally PR apache#40409 in apache/superset by @sha174n