Description
Add workload-aware dynamic resource allocation to Amoro's optimizer, enabling automatic scale-up when optimization demand exceeds capacity and scale-down when optimizers are idle. Inspired by Spark's Dynamic Resource Allocation.
Use case/motivation
Currently, Amoro's optimizer only supports a static min-parallelism floor — it scales up to meet the minimum but never scales down, and it does not respond to actual workload.
1. Resource waste during low-load periods: Optimizer pods remain running even when there are no optimization tasks for hours or days, consuming cluster resources that could serve other workloads.
2. Manual intervention for load spikes: When a burst of table optimizations is triggered (e.g., after a large batch ingestion), users must manually adjust min-parallelism — which is slow, error-prone, and operationally painful.
Describe the solution
All scaling logic lives in AMS (DefaultOptimizingService.OptimizerGroupKeeper), making it container-agnostic. Container implementations (KubernetesOptimizerContainer, etc.) remain unchanged. Kubernetes is the primary target for testing and tuning.
┌──────────────────────────────────┐
│ DefaultOptimizingService (AMS) │
│ │
│ OptimizerGroupKeeper │
│ ├─ min-parallelism (existing) │
│ ├─ demand-driven scale-up │
│ ├─ idle-driven scale-down │
│ └─ graceful drain │
└──────────────┬────────────────────┘
│
requestResource() / releaseResource()
│
┌─────────────────────┼─────────────────────┐
│ │ │
K8s Container ★ Flink Container Spark Container
(primary target) (unchanged) (unchanged)
New configuration properties (ResourceGroup level, opt-in)
| Property |
Default |
Description |
dynamic-allocation.enabled |
false |
Opt-in flag. Existing behavior fully preserved when disabled. |
max-parallelism |
Integer.MAX_VALUE |
Upper bound on total threads |
dynamic-allocation.executor-idle-timeout |
5m |
Remove an idle optimizer after this duration |
dynamic-allocation.scheduler-backlog-timeout |
1m |
Trigger scale-up after sustained demand |
dynamic-allocation.sustained-backlog-timeout |
30s |
Interval between subsequent scale-up rounds |
dynamic-allocation.scale-up-cooldown |
30s |
Minimum interval between scale-up actions |
dynamic-allocation.scale-down-cooldown |
1m |
Minimum interval between scale-down actions |
dynamic-allocation.drain-timeout |
15m |
Force-remove a draining optimizer after this duration |
Key behaviors
- Scale-up: Two-layer demand signal — immediate (PLANNED tasks exceed available threads → exponential ramp 1, 2, 4, 8...) and future (pending tables with low thread headroom → conservative pre-warm). Tracks in-flight pod registrations to avoid duplicate creation.
- Scale-down: Event-driven per-optimizer idle tracking. Removes the longest-idle optimizer one at a time, respecting
min-parallelism floor.
- Graceful drain: Draining optimizers receive no new tasks (
pollTask() returns null), but in-flight tasks complete naturally. Pod is deleted only after all tasks finish (or drain-timeout expires).
Subtasks
A detailed design document is available for reference during implementation and PR review.
Description
Add workload-aware dynamic resource allocation to Amoro's optimizer, enabling automatic scale-up when optimization demand exceeds capacity and scale-down when optimizers are idle. Inspired by Spark's Dynamic Resource Allocation.
Use case/motivation
Currently, Amoro's optimizer only supports a static
min-parallelismfloor — it scales up to meet the minimum but never scales down, and it does not respond to actual workload.1. Resource waste during low-load periods: Optimizer pods remain running even when there are no optimization tasks for hours or days, consuming cluster resources that could serve other workloads.
2. Manual intervention for load spikes: When a burst of table optimizations is triggered (e.g., after a large batch ingestion), users must manually adjust
min-parallelism— which is slow, error-prone, and operationally painful.Describe the solution
All scaling logic lives in AMS (
DefaultOptimizingService.OptimizerGroupKeeper), making it container-agnostic. Container implementations (KubernetesOptimizerContainer, etc.) remain unchanged. Kubernetes is the primary target for testing and tuning.New configuration properties (ResourceGroup level, opt-in)
dynamic-allocation.enabledfalsemax-parallelismInteger.MAX_VALUEdynamic-allocation.executor-idle-timeout5mdynamic-allocation.scheduler-backlog-timeout1mdynamic-allocation.sustained-backlog-timeout30sdynamic-allocation.scale-up-cooldown30sdynamic-allocation.scale-down-cooldown1mdynamic-allocation.drain-timeout15mKey behaviors
min-parallelismfloor.pollTask()returns null), but in-flight tasks complete naturally. Pod is deleted only after all tasks finish (ordrain-timeoutexpires).Subtasks
max-parallelismenforcementA detailed design document is available for reference during implementation and PR review.