Skip to content

fix: return real bucket/table IDs on dev branches (prefix-based model)#464

Open
padak wants to merge 5 commits intokeboola:mainfrom
padak:fix/branch-prefix-stripping-table-bucket-ids
Open

fix: return real bucket/table IDs on dev branches (prefix-based model)#464
padak wants to merge 5 commits intokeboola:mainfrom
padak:fix/branch-prefix-stripping-table-bucket-ids

Conversation

@padak
Copy link
Copy Markdown
Member

@padak padak commented Apr 10, 2026

Summary

On projects using the legacy prefix-based branch model (non-branched-storage), get_buckets, get_table, and get_tables were normalizing bucket/table IDs by stripping the branch prefix (e.g. out.c-15865-test became out.c-test). These normalized IDs don't physically exist in the Storage API and break all downstream tool calls that reference them.

Root cause: _combine_buckets, _get_table, and _list_tables in tools/storage/tools.py were replacing the real id field with prod_id (the stripped version) when returning results for dev-branch-only buckets/tables.

Fix: Stop overwriting id with prod_id in the output. The prod_id field is preserved for internal deduplication logic (dict key in _list_tables for merging prod+dev tables of the same logical table), but the id field returned to the caller is now always the real, physical Storage API ID.

Changes

  • _combine_buckets (dev-only bucket case): removed 'id': dev_bucket.prod_id from model_copy update
  • _get_table: removed 'id': table.prod_id from model_copy update
  • _list_tables (dev bucket tables): removed 'id': table.prod_id from model_copy update
  • Updated 7 test expected values to use real (prefixed) IDs instead of stripped ones

Fixes keboola/keboola_agent_cli#113

Test plan

  • All 51 existing tests in tests/tools/storage/test_tools.py pass
  • test_get_bucket dev-only case now expects real prefixed ID
  • test_get_buckets dev branch case now expects real prefixed ID
  • test_get_table dev branch cases (3 parametrizations) now expect real prefixed IDs
  • test_get_tables dev branch case now expects real prefixed IDs for both emails and assets

@padak
Copy link
Copy Markdown
Member Author

padak commented Apr 10, 2026

Update: Branched Storage Support Added

Added a second commit that extends the fix to also handle the branched storage model (projects where dev branches don't use prefixed bucket/table IDs).

What changed

1. _get_table now handles branched storage correctly

The old logic had a bug for branched storage:

  1. _get_table('out.c-results.output') on a dev branch would find the table, see it has KBC.createdBy.branch.id metadata, and discard it ("not a prod table")
  2. Then try the prefixed fallback out.c-9999999-results.output — which doesn't exist in branched storage
  3. Result: table not found for a table that clearly exists

The fix mirrors how _find_buckets already works — when the first lookup returns a table whose branch_id matches client.branch_id, treat it as the dev table directly:

# Before (discards matching dev table):
if branch_id:
    prod_table = None  # gone forever

# After (saves matching dev table):
if branch_id:
    if branch_id == client.branch_id:
        dev_table = first_result  # branched storage: keep it

Bonus: this also eliminates a redundant API call when querying with a prefixed dev table ID (e.g., in.c-1246948-foo.emails) — previously it would call table_detail twice with the same ID.

2. Six new branched storage tests

Test What it verifies
test_get_bucket_branched_storage (x2) Dev-only bucket + prod bucket detail on branched storage branch
test_get_buckets_branched_storage Listing all buckets, correct counts
test_get_table_branched_storage (x2) Dev table with metadata (fails on main!) + prod table on branch
test_get_tables_branched_storage Listing tables in dev-only bucket

Verified: fails on main, passes on fix

$ pytest -k "branched_storage" (on main branch)
PASSED  test_get_bucket_branched_storage[out.c-results]
PASSED  test_get_bucket_branched_storage[in.c-data]
PASSED  test_get_buckets_branched_storage
FAILED  test_get_table_branched_storage[out.c-results.output]  ← the key one
PASSED  test_get_table_branched_storage[in.c-data.customers]
PASSED  test_get_tables_branched_storage

$ pytest (on fix branch)
57 passed ✓
1139 passed (full suite) ✓

@padak padak force-pushed the fix/branch-prefix-stripping-table-bucket-ids branch from c01bb85 to 85e7e19 Compare April 10, 2026 19:50
@padak
Copy link
Copy Markdown
Member Author

padak commented Apr 13, 2026

Updated integration tests per @Matovidlo's feedback:

Now tests both branch storage models:

  • [prefix-based] — project with storage-branches feature OFF (bucket IDs have c-{branch_id}- prefix)
  • [branched-storage] — project with storage-branches feature ON (bucket IDs identical to production)

Feature detection: Each test run calls tokens/verify and checks owner.features for storage-branches. If the project doesn't match the expected mode, the test is skipped with a clear message.

10 tests total (5 per mode), all passing:

test_get_bucket_by_id_returns_same_id[prefix-based]       PASSED
test_get_bucket_by_id_returns_same_id[branched-storage]    PASSED
test_list_all_buckets_includes_dev_bucket[prefix-based]    PASSED
test_list_all_buckets_includes_dev_bucket[branched-storage] PASSED
test_returned_id_is_api_callable[prefix-based]             PASSED
test_returned_id_is_api_callable[branched-storage]          PASSED
test_get_tables_returns_real_id[prefix-based]               PASSED
test_get_tables_returns_real_id[branched-storage]            PASSED
test_returned_table_id_is_api_callable[prefix-based]        PASSED
test_returned_table_id_is_api_callable[branched-storage]    PASSED

Env vars:

TEST_KBC_TOKEN_PREFIX=xxx    # project with branched storage OFF
TEST_KBC_TOKEN_BRANCHED=yyy  # project with branched storage ON
TEST_KBC_URL=https://connection.keboola.com  # optional, defaults to this

Tests are skipped automatically when env vars are not set.

@padak
Copy link
Copy Markdown
Member Author

padak commented Apr 13, 2026

I've addressed the feedback — moved tests to integtests/, adopted the INTEGTEST_STORAGE_TOKENS pool pattern, added feature detection via tokens/verify, and removed all skipping logic.

I just noticed the CLAUDE.md in the repo — should have read it earlier, apologies. I see the conventions now (Linear issue ID in branch/commits, version bump in every PR, tox for testing, PR template). Since I'm an external contributor without a Linear issue, some of these don't directly apply, but I understand the structure better now.

A few things I'm not sure how to handle from outside your team:

  • Version bump: should I bump pyproject.toml or do you prefer to do that on merge?
  • Linear ID: I don't have one for this fix — is there an existing ticket I should reference?
  • tox: I tested locally with pytest against two real projects (prefix-based + branched-storage, 10/10 pass). Happy to run tox if you point me to the right venv setup.

The core fix is 3 one-line changes in tools.py (stop overwriting id with prod_id). Verified against real projects in both branch models. Let me know what else you need to get this across the line.

Copy link
Copy Markdown
Contributor

@Matovidlo Matovidlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

padak added 5 commits April 13, 2026 17:05
On projects using the legacy prefix-based branch model (non-branched-storage),
get_buckets and get_tables were normalizing IDs by stripping the branch prefix
(e.g. out.c-15865-test → out.c-test). These normalized IDs don't physically
exist and break downstream tool calls that reference them.

Keep prod_id for internal deduplication but return real Storage API IDs.

Fixes keboola/keboola_agent_cli#113
In the branched storage model, dev branch tables have the same IDs as
production and are distinguished only by KBC.createdBy.branch.id metadata.

The old _get_table logic discarded any table with branch metadata during
the initial lookup, then tried a prefixed ID that doesn't exist in
branched storage — resulting in "table not found" for valid tables.

Fix mirrors _find_buckets pattern: when the first lookup returns a table
whose branch_id matches client.branch_id, treat it as the dev table
directly instead of discarding it.

Also adds 6 branched storage tests covering get_bucket, get_buckets,
get_table, and get_tables for projects without prefix-based IDs.
Runs against a real Keboola project with branched storage OFF (prefix-based
model). Creates a dev branch, bucket, and table, then verifies that
get_buckets and get_tables return real physical IDs — not stripped prod_id
values that would break downstream API calls.

5 tests:
- get_bucket_by_id returns the same ID that was passed in
- list_all_buckets includes dev-branch bucket with real ID
- returned bucket ID works in subsequent API calls (bucket_detail)
- get_tables returns real table ID
- returned table ID works in subsequent API calls (table_detail)

Requires TEST_STORAGE_TOKEN and TEST_STORAGE_API_URL env vars.
Skipped automatically when env vars are not set.
Parametrized tests now run against two real projects:
- prefix-based (branched storage OFF): bucket IDs have c-{branch_id}- prefix
- branched-storage (ON): bucket IDs identical to production

Also verifies project features via token/verify to ensure the test
project matches the expected mode. Uses TEST_KBC_TOKEN_PREFIX and
TEST_KBC_TOKEN_BRANCHED env vars (skipped when not set).

10 tests total (5 per mode), all passing.
Addresses review feedback:
- Moved from tests/ to integtests/ (proper location for integration tests)
- Uses standard INTEGTEST_STORAGE_TOKENS pool instead of custom env vars
- Detects storage-branches feature via tokens/verify to classify project
- No skipping: tests run against whatever project the pool provides
- Removed standalone test_integration_branches.py from tests/

The test creates a dev branch + bucket + table, calls get_buckets/get_tables
via MCP tools, and verifies returned IDs are real physical IDs that work
in subsequent API calls. Both prefix-based and branched-storage models
are covered (depending on which projects are in the pool).
@padak padak force-pushed the fix/branch-prefix-stripping-table-bucket-ids branch from 2dfc305 to 5a7e60d Compare April 13, 2026 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants