[AMORO-4183][Improvement]: Refactor Optimizing Interface Move to amoro-common#4184
[AMORO-4183][Improvement]: Refactor Optimizing Interface Move to amoro-common#4184czy006 wants to merge 1 commit intoapache:masterfrom
Conversation
0484464 to
55698dc
Compare
lintingbin
left a comment
There was a problem hiding this comment.
Thanks for the refactor — overall direction looks right. Moving the SPI interfaces into amoro-common is a sensible prerequisite for supporting self-optimizing across multiple formats (Paimon / Hudi / etc.), and since the package path is kept unchanged, all ~72 downstream call sites compile without modification. Risk of the move itself is low.
A few thoughts after going through the diff:
1. getStructLikeCollections landed in too narrow a scope (recommend addressing)
Previously TaskProperties#getStructLikeCollections was a public static helper any caller could use. It has been moved to AbstractRewriteFilesExecutor as protected static, which:
- Narrows the visibility: only subclasses of
AbstractRewriteFilesExecutorcan call it. If another iceberg / mixed / hive-side component ever needs the same "buildStructLikeCollectionsfrom task properties" behavior, it will have to duplicate this logic. - Is at the wrong abstraction level: this helper is a generic "task properties →
StructLikeCollections" mapping, not something that conceptually belongs to "the base class for rewrite-file executors".
Suggestion: either make it public static on AbstractRewriteFilesExecutor, or — better — move it into a dedicated iceberg utility class (e.g. IcebergTaskUtils / IcebergStructLikeCollectionsFactory).
2. The extracted method itself is not covered by any new test
The two actual behavioral changes in this PR are (a) moving the interfaces and (b) extracting getStructLikeCollections. Neither TestOptimizingSpiInterfaces nor TestTaskPropertiesSplit asserts anything about getStructLikeCollections. Since the PR description claims "Add some test cases that check the changes thoroughly", please consider adding at minimum:
- empty properties →
enableSpillMap=false,maxInMemory=512MB(default),spillMapPath=null EXTEND_DISK_STORAGE=truetakes effect- custom
MEMORY_STORAGE_SIZEandDISK_STORAGE_PATHare passed through
3. Duplicate / low-value constant assertions
TestOptimizingSpiInterfaces.testTaskPropertyConstantsandTestTaskPropertiesSplit.testTaskPropertyConstantsassert essentially the same constant literals.- Asserting constant literals has limited value: renaming a constant is a breaking change that callers detect at compile time, and tests that bind the literal can make future renames more painful rather than safer.
Suggestion: drop one of them, or drop both and keep only the behavioral tests (option / options / getProcessId null+empty semantics — those are worth testing).
4. PropertyUtil source silently changed (minor, style)
The old TaskProperties.getStructLikeCollections used org.apache.amoro.utils.PropertyUtil. After extraction, AbstractRewriteFilesExecutor picks up org.apache.iceberg.util.PropertyUtil from the file's existing imports. Behaviorally equivalent, but it runs counter to the PR's own direction — the DynConstructors change is de-iceberg-ifying, while this one silently adds one more usage of Iceberg's PropertyUtil. Consider sticking with org.apache.amoro.utils.PropertyUtil explicitly here for consistency.
5. TestDynConstructorsReplacement exception assertion is a bit fragile
testLoadNonExistentClassThrows asserts NoSuchMethodException.class. The exact exception type of DynConstructors.buildChecked() on failure is an implementation detail (class-not-found vs. constructor-not-found may differ). Loosening to Exception.class would be more robust against future refactoring of DynConstructors.
6. Drive-by nit (not introduced by this PR)
BaseOptimizingInput implements Serializable (via OptimizingInput extends Serializable) without declaring serialVersionUID. Since this PR adds serialization coverage, it would be a natural opportunity to add one.
The two items I'd most like to see addressed before merging are (1) widening the scope / relocation of getStructLikeCollections and (2) adding behavioral coverage for the extracted method. The rest are polish.
Why are the changes needed?
Close #4183.
Brief change log
How was this patch tested?
Add some test cases that check the changes thoroughly including negative and positive cases if possible
Add screenshots for manual tests if appropriate
Run test locally before making a pull request
Documentation