Skip to content

[AMORO-4163][ams] Fix Version-N-already-exists race in newTableOperations for legacy mixed-iceberg#4185

Merged
czy006 merged 1 commit intoapache:masterfrom
lintingbin:fix/legacy-mixed-iceberg-commit-on-load
Apr 23, 2026
Merged

[AMORO-4163][ams] Fix Version-N-already-exists race in newTableOperations for legacy mixed-iceberg#4185
czy006 merged 1 commit intoapache:masterfrom
lintingbin:fix/legacy-mixed-iceberg-commit-on-load

Conversation

@lintingbin
Copy link
Copy Markdown
Contributor

@lintingbin lintingbin commented Apr 16, 2026

What changes were proposed in this pull request?

Follow-up fix for AMORO-4163. The original fix changed ops.current() to ops.refresh() inside InternalMixedIcebergHandler.newTableOperations() so that the returned TableMetadata reference satisfies Iceberg 1.7.x's reference-equality check in commit(). However, this introduced a new race condition:

DefaultTableManager.createTable() consists of two sequential steps:

  1. catalog.createTable() — saves the table to the database (the table is now visible to all threads)
  2. onTableCreated()triggerTableAdded()newTableOperations()ops.commit() — tries to write v2.metadata.json

The background tableExplorerScheduler runs exploreInternalCatalog() concurrently. If it scans the database between steps 1 and 2, it finds the new table (not yet in tableRuntimeMap) and also calls triggerTableAdded()newTableOperations()ops.commit(). Whichever thread wins writes v2.metadata.json first; the other then fails with:

CommitFailedException: Version 2 already exists: .../base/metadata/v2.metadata.json

Fix: catch CommitFailedException in the commit block, refresh to obtain the latest on-disk metadata, and proceed normally if a concurrent writer already committed the correct mixed-format properties. Re-throw only when the properties still differ after refresh, indicating a genuine conflict.

How was this patch tested?

The existing CompatibilityCatalogTests.testNewCatalogLoadHistorical in TestInternalMixedCatalogService covers this scenario — it was consistently failing in CI before this fix.

…ions for legacy mixed-iceberg

When AMS saves a new legacy mixed-iceberg table to the database, a background
table-explorer thread can concurrently pick up the same table and call
newTableOperations(), which now commits the mixed-format properties to the
Iceberg metadata (introduced by the AMORO-4163 fix).  If the background thread
wins the race and writes v2.metadata.json before the onTableCreated() path
attempts the same commit, HadoopTableOperations throws CommitFailedException:
"Version 2 already exists", aborting the createTableMeta RPC.

Fix: catch CommitFailedException in the commit block, refresh to obtain the
latest on-disk metadata, and proceed if a concurrent writer already committed
the correct properties.  Re-throw only when the properties still differ after
refresh, indicating a genuine commit conflict.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added the module:ams-server Ams server module label Apr 16, 2026
@lintingbin
Copy link
Copy Markdown
Contributor Author

Gentle nudge — this fix also unblocks CI on #4180.

The recent CI run on #4180 fails with the exact pattern described in this PR's body:

TestInternalMixedCatalogService$CompatibilityCatalogTests.testNewCatalogLoadHistorical(boolean)[2]
  → IllegalStateException: update table meta failed
    at BasicMixedCatalog$MixedTableBuilder.createTableMeta(BasicMixedCatalog.java:404)

And the cascading symptoms match too — once the commit race fails once, @After can't drop test_ns, so every subsequent nested class (TestTableCommit, TestTableOperation, TestDatabaseOperation) fails at .before:259 / :186 / :168 with Database test_ns already exists.

Same pattern showed up on a recent master hadoop3 run (#24488461765, 2026-04-16), confirming it's pre-existing flakiness rather than regressions introduced by the feature PRs that hit it.

Would be great to get this merged so downstream CI stabilizes.

@czy006 czy006 merged commit 08d49f4 into apache:master Apr 23, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ams-server Ams server module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants