Skip to content

[AMORO-4183][Improvement]: Refactor Optimizing Interface Move to amoro-common#4184

Open
czy006 wants to merge 1 commit intoapache:masterfrom
czy006:issues/amoro-4183
Open

[AMORO-4183][Improvement]: Refactor Optimizing Interface Move to amoro-common#4184
czy006 wants to merge 1 commit intoapache:masterfrom
czy006:issues/amoro-4183

Conversation

@czy006
Copy link
Copy Markdown
Contributor

@czy006 czy006 commented Apr 16, 2026

Why are the changes needed?

Close #4183.

Brief change log

  • refactor: extract getStructLikeCollections from TaskProperties to AbstractRewriteFilesExecutor
  • refactor: move optimizing SPI interfaces from amoro-format-iceberg to amoro-common
  • refactor: replace Iceberg DynConstructors with amoro-common DynConstructors in OptimizerExecutor

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? ( no)
  • If yes, how is the feature documented? (ot documented)

@czy006 czy006 force-pushed the issues/amoro-4183 branch from 0484464 to 55698dc Compare April 16, 2026 06:30
@czy006 czy006 requested review from Aireed and xxubai April 16, 2026 07:51
Copy link
Copy Markdown
Contributor

@lintingbin lintingbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactor — overall direction looks right. Moving the SPI interfaces into amoro-common is a sensible prerequisite for supporting self-optimizing across multiple formats (Paimon / Hudi / etc.), and since the package path is kept unchanged, all ~72 downstream call sites compile without modification. Risk of the move itself is low.

A few thoughts after going through the diff:

1. getStructLikeCollections landed in too narrow a scope (recommend addressing)

Previously TaskProperties#getStructLikeCollections was a public static helper any caller could use. It has been moved to AbstractRewriteFilesExecutor as protected static, which:

  • Narrows the visibility: only subclasses of AbstractRewriteFilesExecutor can call it. If another iceberg / mixed / hive-side component ever needs the same "build StructLikeCollections from task properties" behavior, it will have to duplicate this logic.
  • Is at the wrong abstraction level: this helper is a generic "task properties → StructLikeCollections" mapping, not something that conceptually belongs to "the base class for rewrite-file executors".

Suggestion: either make it public static on AbstractRewriteFilesExecutor, or — better — move it into a dedicated iceberg utility class (e.g. IcebergTaskUtils / IcebergStructLikeCollectionsFactory).

2. The extracted method itself is not covered by any new test

The two actual behavioral changes in this PR are (a) moving the interfaces and (b) extracting getStructLikeCollections. Neither TestOptimizingSpiInterfaces nor TestTaskPropertiesSplit asserts anything about getStructLikeCollections. Since the PR description claims "Add some test cases that check the changes thoroughly", please consider adding at minimum:

  • empty properties → enableSpillMap=false, maxInMemory=512MB (default), spillMapPath=null
  • EXTEND_DISK_STORAGE=true takes effect
  • custom MEMORY_STORAGE_SIZE and DISK_STORAGE_PATH are passed through

3. Duplicate / low-value constant assertions

  • TestOptimizingSpiInterfaces.testTaskPropertyConstants and TestTaskPropertiesSplit.testTaskPropertyConstants assert essentially the same constant literals.
  • Asserting constant literals has limited value: renaming a constant is a breaking change that callers detect at compile time, and tests that bind the literal can make future renames more painful rather than safer.

Suggestion: drop one of them, or drop both and keep only the behavioral tests (option / options / getProcessId null+empty semantics — those are worth testing).

4. PropertyUtil source silently changed (minor, style)

The old TaskProperties.getStructLikeCollections used org.apache.amoro.utils.PropertyUtil. After extraction, AbstractRewriteFilesExecutor picks up org.apache.iceberg.util.PropertyUtil from the file's existing imports. Behaviorally equivalent, but it runs counter to the PR's own direction — the DynConstructors change is de-iceberg-ifying, while this one silently adds one more usage of Iceberg's PropertyUtil. Consider sticking with org.apache.amoro.utils.PropertyUtil explicitly here for consistency.

5. TestDynConstructorsReplacement exception assertion is a bit fragile

testLoadNonExistentClassThrows asserts NoSuchMethodException.class. The exact exception type of DynConstructors.buildChecked() on failure is an implementation detail (class-not-found vs. constructor-not-found may differ). Loosening to Exception.class would be more robust against future refactoring of DynConstructors.

6. Drive-by nit (not introduced by this PR)

BaseOptimizingInput implements Serializable (via OptimizingInput extends Serializable) without declaring serialVersionUID. Since this PR adds serialization coverage, it would be a natural opportunity to add one.


The two items I'd most like to see addressed before merging are (1) widening the scope / relocation of getStructLikeCollections and (2) adding behavioral coverage for the extracted method. The rest are polish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement]: Refactor Optimizing Interface Move to amoro-common

2 participants