Skip to content

fix(sqllab): require dataset match for raw query access#10

Open
hbrooks wants to merge 16 commits into
masterfrom
demo/pr-40409
Open

fix(sqllab): require dataset match for raw query access#10
hbrooks wants to merge 16 commits into
masterfrom
demo/pr-40409

Conversation

@hbrooks

@hbrooks hbrooks commented May 28, 2026

Copy link
Copy Markdown

Originally PR apache#40409 in apache/superset by @sha174n

sha174n and others added 16 commits May 24, 2026 20:50
In raise_for_access, the SQL Lab branch let catalog_access and
schema_access short-circuit the per-table dataset-existence check.
The chart-data branch right next door requires can_access_schema
on a registered SqlaTable, so the SQL Lab path was strictly broader
than the chart-data path for the same permission.

Tighten the SQL Lab path so every referenced table must resolve to
a Superset dataset the user holds datasource_access on (or owns).
The table-only call sites (/table_metadata/, /select_star/, MetaDB)
keep the historical catalog_access / schema_access fallthrough since
they return schema metadata only, not row data.

Two existing tests updated to mock query_datasources_by_name so they
exercise the new dataset-match path; one regression test added.

Backwards compatibility: deployments that used schema_access as a
deliberate "full SQL Lab access to one schema" grant will start
being denied. Mitigation: grant database_access, or register the
relevant tables as Superset datasets.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…t_match)

Code review of the previous revision flagged that `query is not None`
was too broad as a discriminator: it caught chart-data on saved-query
datasources, dataset create/update from SQL, SQL Lab results/export,
and explore on saved queries, all of which were schema_access-by-design
flows. It also missed MetaDB, which uses the table-only call path but
actually returns row data.

Replace the inferred flag with an explicit kwarg
`force_dataset_match: bool = False` on `raise_for_access`. Default
(False) preserves the historical catalog_access / schema_access
fallthrough for chart-data, dataset CRUD, table_metadata, select_star,
explore, etc. The three call sites that actually execute or return raw
row data set it to True:

  - superset/sqllab/validators.py   (SQL Lab execute pre-flight)
  - superset/models/sql_lab.py      (Query.raise_for_access, used by
                                     results.py / export.py /
                                     streaming_export_command.py)
  - superset/extensions/metadb.py   (closes the same bypass on the
                                     database+table path)

Tests:
  - Reverts the broad updates to test_raise_for_access_query /
    test_raise_for_access_sql so they exercise the default path again.
  - Drops the vacuous regression test that never granted schema_access
    in the first place.
  - Adds three focused tests under force_dataset_match=True:
      * allows when a dataset exists and datasource_access matches
      * denies when only schema_access is held and no dataset exists
      * (default path) still allows schema_access for non-dataset
        tables, proving backwards compatibility.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Code review on the previous revision flagged two related ways an
authenticated user could still defeat the new force_dataset_match
gate. Both close the same authorization gap and are confirmed.

1. SqlaTable.query_datasources_by_name drops the schema filter when
   schema is None (see connectors/sqla/models.py:1938 `if schema:`).
   On a backend whose default_schema_for_query() returns None and a
   parser output that did not include a schema, the lookup matches
   SqlaTables across every schema. The engine then resolves the
   actual query against the wrong schema via search_path, while the
   dataset-match check passed against an unrelated row.

   Fix: under force_dataset_match=True, refuse tables whose qualified
   schema is None. Users must pick a schema in the SQL Lab schema
   picker or qualify the table explicitly.

2. sqlglot represents statements it cannot fully model as exp.Command.
   extract_tables_from_statement returns set() for an exp.Command with
   no parseable inner literal (e.g. CALL stored_proc() that builds
   dynamic SQL internally). The per-table check below then sees zero
   tables and the script proceeds, even though the procedure may read
   arbitrary tables at execution time.

   Fix: add SQLScript.has_unparseable_statement; under
   force_dataset_match=True, refuse any script that contains one. The
   default (chart-data / dataset CRUD / metadata previews) is unchanged.

Two new regression tests cover both vectors. Existing tests for the
non-strict path are untouched.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… + cover unparseable check

- security/manager.py: apply default schema in the table branch of
  raise_for_access so MetaDB's 2-part URIs (db.table) keep working
  under force_dataset_match; the strict-deny still fires when no
  default can be resolved.
- tests/integration_tests/security_tests.py: pin the new behavior.
- tests/unit_tests/sql/parse_tests.py: cover has_unparseable_statement
  to reach 100% sql/ coverage (parameterized: SELECT, CALL, mixed, KQL).
- tests/integration_tests/sqllab_tests.py: update test_sql_json_schema_access
  to reflect that schema_access alone no longer grants raw SQL Lab access
  (the legacy fallthrough this PR closes).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants