Skip to content

[ams][test] Make RestCatalogService nested tests order-independent#4201

Open
lintingbin wants to merge 1 commit intoapache:masterfrom
lintingbin:fix-rest-catalog-test-isolation
Open

[ams][test] Make RestCatalogService nested tests order-independent#4201
lintingbin wants to merge 1 commit intoapache:masterfrom
lintingbin:fix-rest-catalog-test-isolation

Conversation

@lintingbin
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

amoro-ams/src/test/java/org/apache/amoro/server/RestCatalogServiceTestBase and its subclasses (TestInternalMixedCatalogService, TestInternalIcebergCatalogService) share a single static AmsEnvironment and a hard-coded database = \"test_ns\" / table = \"test_iceberg_tbl\" across every @Nested class. Whenever cleanup leaks (the existing @AfterEach swallows dropDatabase exceptions) or background catalog scans race the tests, sibling nested classes inherit dirty catalog state and assertions that hard-code the catalog being empty start failing.

This PR makes the tests order-independent without restructuring the suite:

  1. Unique database name per test instanceRestCatalogServiceTestBase.database is now \"test_ns_\" + COUNTER.incrementAndGet(). Each test method gets a fresh test instance (JUnit 5 default Lifecycle.PER_METHOD), so each test gets a fresh database name. Pollution between tests can no longer cause AlreadyExists on createDatabase or pre-populated catalog snapshots.
  2. Delta-based catalog/namespace assertions in TestInternalMixedCatalogService.TestDatabaseOperation.test and TestInternalIcebergCatalogService.NamespaceTests.testNamespaceOperations. The previous code asserted catalog.listDatabases().isEmpty() which only holds in a single-test-class-per-JVM execution.

Why are the changes needed?

These tests have been flaky on master for a while — for example, the Core/hadoop3 CI run on 2026-04-23 failed inside TestInternalIcebergCatalogService\$CatalogPropertiesTest's suite, and PR #4200's CI just hit the same family of failures, all surfacing the same root cause:

TestInternalMixedCatalogService\$TestDatabaseOperation.test:168
    expected: <true> but was: <false>          // listDatabases().isEmpty() failed because a previous test leaked test_ns

TestInternalMixedCatalogService\$TestTableOperation.before:186
    AlreadyExists Database: test_ns already exists  // previous @AfterEach swallowed dropDatabase failure

TestInternalMixedCatalogService\$TestTableCommit.testTableCommit:295
    NoSuchTableException: ...test_iceberg_tbl not exists  // sibling test's @AfterEach.dropTable raced into our just-created table

The shared test_ns / test_iceberg_tbl names are the load-bearing reason these races collide. With unique per-instance database names, every test gets its own namespace, so concurrent or leaked cleanup from a sibling test no longer touches it.

This is complementary to #4185, which fixed a Version-N-already-exists race in InternalMixedIcebergHandler.newTableOperations triggered by the same shared-state pattern.

How was this patch tested?

Ran the affected suites locally on Java 11:

./mvnw test -pl amoro-ams -am -Pspark-3.5,support-all-formats,openapi-sdk \\
  -Dtest='TestInternalMixedCatalogService,TestInternalIcebergCatalogService'
# Tests run: 16, Failures: 0, Errors: 0, Skipped: 0

./mvnw spotless:check checkstyle:check -pl amoro-ams -am is clean.

Does this PR introduce any user-facing change?

No — test-only change.

The integration tests under
amoro-ams/src/test/java/org/apache/amoro/server/TestInternal*CatalogService
share a static AmsEnvironment instance and a hard-coded "test_ns"
database / "test_iceberg_tbl" table name across every @nested class.
Whenever cleanup leaks (the existing @AfterEach swallows dropDatabase
exceptions) or background scans race the tests, sibling @nested
classes inherit dirty catalog state and assertions that hard-code the
catalog being empty start failing.

Concrete fixes:

* Each test instance now picks a fresh database name via a static
  AtomicLong counter ("test_ns_<n>"), so unrelated tests can never
  collide on the same namespace even if a previous run leaked state.
* TestInternalMixedCatalogService.TestDatabaseOperation.test now
  asserts the create / drop delta against the initial database and
  namespace counts instead of asserting the catalog is empty.
* TestInternalIcebergCatalogService.NamespaceTests.testNamespaceOperations
  has the same delta-based fix.

This addresses the same pre-existing flakiness seen on master CI
(see Core/hadoop3 run on 2026-04-23 failing in
TestInternalIcebergCatalogService$CatalogPropertiesTest's suite) and
the failures reproduced on PR apache#4200 CI in
TestInternalMixedCatalogService$TestDatabaseOperation /
TestTableOperation / TestTableCommit.
@github-actions github-actions Bot added the module:ams-server Ams server module label May 1, 2026
@lintingbin
Copy link
Copy Markdown
Contributor Author

Hi @zhoujinsong, would you mind taking a look when you have time? Thanks!

This is test-only and is meant to fix the flaky TestInternalMixedCatalogService / TestInternalIcebergCatalogService failures that have been intermittently breaking CI on master and on #4200. Complementary to your #4185 — same test suite, but a different (test-isolation) flavor of the same shared-state pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ams-server Ams server module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant