-
Notifications
You must be signed in to change notification settings - Fork 9
Fix: Correct expansion of segmented clustered systems #573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ms, the effect totals now match correctly.
Root Cause
Segment values are per-segment TOTALS that were repeated N times when expanded to hourly resolution (where N = segment duration in timesteps). Summing these repeated values inflated totals by ~4x.
Fix Applied
1. Added build_expansion_divisor() to Clustering class (flixopt/clustering/base.py:920-1027)
- For each original timestep, returns the segment duration (number of timesteps in that segment)
- Handles multi-dimensional cases (periods/scenarios) by accessing each clustering result's segment info
2. Modified expand() method (flixopt/transform_accessor.py:1850-1875)
- Added _is_segment_total_var() helper to identify which variables should be divided
- For segmented systems, divides segment total variables by the expansion divisor to get correct hourly rates
- Correctly excludes:
- Share factors (stored as EffectA|(temporal)->EffectB(temporal)) - these are rates, not totals
- Flow rates, on/off states, charge states - these are already rates
Test Results
- All 83 cluster/expand tests pass
- All 27 effect tests pass
- Debug script shows all ratios are 1.0000x for all effects (EffectA, EffectB, EffectC, EffectD) across all periods and scenarios
📝 WalkthroughWalkthroughThe PR adds segmented expansion functionality to the clustering system. A new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant TransformAccessor
participant Clustering
participant FlowSystem
participant DataArray
User->>TransformAccessor: expand(segmented=True)
TransformAccessor->>FlowSystem: identify segment structure
TransformAccessor->>Clustering: build_expansion_divisor()
Clustering->>Clustering: combine segment_durations & segment_assignments
Clustering-->>TransformAccessor: expansion_divisor (DataArray)
TransformAccessor->>TransformAccessor: _build_segment_total_varnames()
TransformAccessor-->>TransformAccessor: segment_total_vars (set)
loop for each variable
TransformAccessor->>DataArray: expand_da(is_solution=True)
alt is segment_total_var
DataArray->>DataArray: divide by expansion_divisor
else
DataArray->>DataArray: standard expansion
end
DataArray-->>TransformAccessor: expanded_variable
end
TransformAccessor-->>User: expanded_solution
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…lution:
Key Changes
1. build_expansion_divisor() in Clustering (base.py:920-1027)
- Returns the segment duration for each original timestep
- Handles per-period/scenario clustering differences
2. _is_segment_total_solution_var() in expand() (transform_accessor.py:1855-1880)
- Only matches solution variables that represent segment totals:
- {contributor}->{effect}(temporal) - effect contributions
- *|per_timestep - per-timestep totals
- Explicitly does NOT match rates/states: |flow_rate, |on, |charge_state
3. expand_da() with is_solution parameter (transform_accessor.py:1882-1915)
- is_solution=False (default): Never applies segment correction (for FlowSystem data)
- is_solution=True: Applies segment correction if pattern matches (for solution)
Why This is Robust
┌───────────────────────────────────────┬─────────────────┬────────────────────┬───────────────────────────┐
│ Variable │ Location │ Pattern │ Divided? │
├───────────────────────────────────────┼─────────────────┼────────────────────┼───────────────────────────┤
│ EffectA|(temporal)->EffectB(temporal) │ FlowSystem DATA │ share factor │ ❌ No (is_solution=False) │
├───────────────────────────────────────┼─────────────────┼────────────────────┼───────────────────────────┤
│ Boiler(Q)->EffectA(temporal) │ SOLUTION │ contribution │ ✅ Yes │
├───────────────────────────────────────┼─────────────────┼────────────────────┼───────────────────────────┤
│ EffectA(temporal)->EffectB(temporal) │ SOLUTION │ contribution │ ✅ Yes │
├───────────────────────────────────────┼─────────────────┼────────────────────┼───────────────────────────┤
│ EffectA(temporal)|per_timestep │ SOLUTION │ per-timestep total │ ✅ Yes │
├───────────────────────────────────────┼─────────────────┼────────────────────┼───────────────────────────┤
│ Boiler(Q)|flow_rate │ SOLUTION │ rate │ ❌ No (no pattern match) │
├───────────────────────────────────────┼─────────────────┼────────────────────┼───────────────────────────┤
│ Storage|charge_state │ SOLUTION │ state │ ❌ No (no pattern match) │
└───────────────────────────────────────┴─────────────────┴────────────────────┴───────────────────────────┘
a6070c2 to
c19d3ed
Compare
…System structure:
Key Implementation
_build_segment_total_varnames() (transform_accessor.py:1776-1819)
- Derives exact variable names from FlowSystem structure
- No pattern matching on arbitrary strings
- Covers all contributor types:
a. {effect}(temporal)|per_timestep - from fs.effects
b. {flow}->{effect}(temporal) - from fs.flows
c. {component}->{effect}(temporal) - from fs.components
d. {source}(temporal)->{target}(temporal) - from effect.share_from_temporal
Why This is Robust
1. Derived from structure, not patterns: Variable names come from actual FlowSystem attributes
2. Clear separation: FlowSystem data is NEVER divided (only solution variables)
3. Explicit set lookup: var_name in segment_total_vars instead of pattern matching
4. Extensible: New contributor types just need to be added to _build_segment_total_varnames()
5. All tests pass: 83 cluster/expand tests + comprehensive debug script
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
1 similar comment
✅ Actions performedReview triggered.
|
Changes Made
1. Added VariableCategory enum (structure.py:64-77)
- STATE - For state variables like charge_state (interpolated within segments)
- SEGMENT_TOTAL - For segment totals like effect contributions (divided by expansion divisor)
- RATE - For rate variables like flow_rate (expanded as-is)
- BINARY - For binary variables like status (expanded as-is)
- OTHER - For uncategorized variables
2. Added variable_categories registry to FlowSystemModel (structure.py:214)
- Dictionary mapping variable names to their categories
3. Modified add_variables() method (structure.py:388-396)
- Added optional category parameter
- Automatically registers variables with their category
4. Updated variable creation calls:
- components.py: Storage variables (charge_state as STATE, netto_discharge as RATE)
- elements.py: Flow variables (flow_rate as RATE, status as BINARY)
- features.py: Effect contributions (per_timestep as SEGMENT_TOTAL, temporal shares as SEGMENT_TOTAL, startup/shutdown as BINARY)
5. Updated expand() method (transform_accessor.py:2074-2090)
- Uses variable_categories registry to identify segment totals and state variables
- Falls back to pattern matching for backwards compatibility with older FlowSystems
Benefits
- More robust categorization: Variables are categorized at creation time, not by pattern matching
- Extensible: New variable types can easily be added with proper category
- Backwards compatible: Old FlowSystems without categories still work via pattern matching fallback
New Categories (structure.py:45-103)
class VariableCategory(Enum):
# State variables
CHARGE_STATE, SOC_BOUNDARY
# Rate/Power variables
FLOW_RATE, NETTO_DISCHARGE, VIRTUAL_FLOW
# Binary state
STATUS, INACTIVE
# Binary events
STARTUP, SHUTDOWN
# Effect variables
PER_TIMESTEP, SHARE, TOTAL, TOTAL_OVER_PERIODS
# Investment
SIZE, INVESTED
# Counting/Duration
STARTUP_COUNT, DURATION
# Piecewise linearization
INSIDE_PIECE, LAMBDA0, LAMBDA1, ZERO_POINT
# Other
OTHER
Logical Groupings for Expansion
EXPAND_INTERPOLATE = {CHARGE_STATE} # Interpolate between boundaries
EXPAND_DIVIDE = {PER_TIMESTEP, SHARE} # Divide by expansion factor
# Default: repeat within segment
Files Modified
┌───────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ File │ Variables Updated │
├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ components.py │ charge_state, netto_discharge, SOC_boundary │
├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ elements.py │ flow_rate, status, virtual_supply, virtual_demand │
├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ features.py │ size, invested, inactive, startup, shutdown, startup_count, inside_piece, lambda0, lambda1, zero_point, total, per_timestep, shares │
├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ effects.py │ total, total_over_periods │
├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ modeling.py │ duration │
├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ transform_accessor.py │ Updated to use EXPAND_INTERPOLATE and EXPAND_DIVIDE groupings │
└───────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Test Results
- All 83 cluster/expand tests pass
- Variable categories correctly populated and grouped
Changes Made
1. Added combine_slices() utility to flixopt/clustering/base.py (lines 52-107)
- Simple function that stacks dict of {(dim_values): np.ndarray} into a DataArray
- Much cleaner than the previous reverse-concat pattern
2. Refactored 3 methods to use the new utility:
- Clustering.expand_data() - reduced from ~25 to ~12 lines
- Clustering.build_expansion_divisor() - reduced from ~35 to ~20 lines
- TransformAccessor._interpolate_charge_state_segmented() - reduced from ~43 to ~27 lines
3. Added 4 unit tests for combine_slices() in tests/test_cluster_reduce_expand.py
Results
┌───────────────────────────────────┬──────────┬────────────────────────┐
│ Metric │ Before │ After │
├───────────────────────────────────┼──────────┼────────────────────────┤
│ Complex reverse-concat blocks │ 3 │ 0 │
├───────────────────────────────────┼──────────┼────────────────────────┤
│ Lines of dimension iteration code │ ~100 │ ~60 │
├───────────────────────────────────┼──────────┼────────────────────────┤
│ Test coverage │ 83 tests │ 87 tests (all passing) │
└───────────────────────────────────┴──────────┴────────────────────────┘
The Pattern Change
Before (complex reverse-concat):
result_arrays = slices
for dim in reversed(extra_dims):
grouped = {}
for key, arr in result_arrays.items():
rest_key = key[:-1] if len(key) > 1 else ()
grouped.setdefault(rest_key, []).append(arr)
result_arrays = {k: xr.concat(v, dim=...) for k, v in grouped.items()}
result = list(result_arrays.values())[0].transpose('time', ...)
After (simple combine):
return combine_slices(slices, extra_dims, dim_coords, 'time', output_coord, attrs)
1. Fully Vectorized expand_data()
Before (~65 lines with loops):
for combo in np.ndindex(*[len(v) for v in dim_coords.values()]):
selector = {...}
mapping = _select_dims(timestep_mapping, **selector).values
data_slice = _select_dims(aggregated, **selector)
slices[key] = _expand_slice(mapping, data_slice)
return combine_slices(slices, ...)
After (~25 lines, fully vectorized):
timestep_mapping = self.timestep_mapping # Already multi-dimensional!
cluster_indices = timestep_mapping // time_dim_size
time_indices = timestep_mapping % time_dim_size
expanded = aggregated.isel(cluster=cluster_indices, time=time_indices)
# xarray handles broadcasting across period/scenario automatically
2. build_expansion_divisor() and _interpolate_charge_state_segmented()
These still use combine_slices() because they need per-result segment data (segment_assignments, segment_durations) which isn't available as concatenated Clustering properties yet.
Current State
┌───────────────────────────────────────┬─────────────────┬─────────────────────────────────┐
│ Method │ Vectorized? │ Uses Clustering Properties │
├───────────────────────────────────────┼─────────────────┼─────────────────────────────────┤
│ expand_data() │ Yes │ timestep_mapping (fully) │
├───────────────────────────────────────┼─────────────────┼─────────────────────────────────┤
│ build_expansion_divisor() │ No (small loop) │ cluster_assignments (partially) │
├───────────────────────────────────────┼─────────────────┼─────────────────────────────────┤
│ _interpolate_charge_state_segmented() │ No (small loop) │ cluster_assignments (partially) │
└───────────────────────────────────────┴─────────────────┴─────────────────────────────────┘
1. _interpolate_charge_state_segmented() - Fully vectorized from ~110 lines to ~55 lines
- Uses clustering.timestep_mapping for indexing
- Uses clustering.results.segment_assignments, segment_durations, and position_within_segment
- Single xarray expression instead of triple-nested loops
Previously completed (from before context limit):
- Added segment_assignments multi-dimensional property to ClusteringResults
- Added segment_durations multi-dimensional property to ClusteringResults
- Added position_within_segment property to ClusteringResults
- Vectorized expand_data()
- Vectorized build_expansion_divisor()
Test results: All 130 tests pass (87 cluster/expand + 43 IO tests)
The combine_slices utility function is still available in clustering/base.py if needed in the future, but all the main dimension-handling methods now use xarray's vectorized advanced indexing instead of the loop-based slice-and-combine pattern.
Summary of Simplifications 1. expand_da() in transform_accessor.py - Extracted duplicate "append extra timestep" logic into _append_final_state() helper - Reduced from ~50 lines to ~25 lines - Eliminated code duplication 2. _build_multi_dim_array() → _build_property_array() in clustering/base.py - Replaced 6 conditional branches with unified np.ndindex() pattern - Now handles both simple and multi-dimensional cases in one method - Reduced from ~50 lines to ~25 lines - Preserves dtype (fixed integer indexing bug) 3. Property boilerplate in ClusteringResults - 5 properties (cluster_assignments, cluster_occurrences, cluster_centers, segment_assignments, segment_durations) now use the unified _build_property_array() - Each property reduced from ~25 lines to ~8 lines - Total: ~165 lines → ~85 lines 4. _build_timestep_mapping() in Clustering - Simplified to single call using _build_property_array() - Reduced from ~16 lines to ~9 lines Total lines removed: ~150+ lines of duplicated/complex code
…as fixed:
Summary
The IO roundtrip bug was caused by representative_weights (a variable with only ('cluster',) dimension) being copied as-is during expansion, which caused the cluster dimension to incorrectly persist in the expanded dataset.
Fix applied in transform_accessor.py:2063-2065:
# Skip cluster-only vars (no time dim) - they don't make sense after expansion
if da.dims == ('cluster',):
continue
This skips variables that have only a cluster dimension (and no time dimension) during expansion, as these variables don't make sense after the clustering structure is removed.
Test results:
- All 87 tests in test_cluster_reduce_expand.py pass ✓
- All 43 tests in test_clustering_io.py pass ✓
- Manual IO roundtrip test passes ✓
- Tests with different segment counts (3, 6) pass ✓
- Tests with 2-hour timesteps pass ✓
# Skip vars with cluster dim but no time dim - they don't make sense after expansion
# (e.g., representative_weights with dims ('cluster',) or ('cluster', 'period'))
if 'cluster' in da.dims and 'time' not in da.dims:
continue
This correctly handles:
- ('cluster',) - simple cluster-only variables like cluster_weight
- ('cluster', 'period') - cluster variables with period dimension
- ('cluster', 'scenario') - cluster variables with scenario dimension
- ('cluster', 'period', 'scenario') - cluster variables with both
Variables with both cluster and time dimensions (like timestep_duration with dims ('cluster', 'time')) are correctly expanded since they contain time-series data that needs to be mapped back to original timesteps.
1. clustering/base.py - combine_slices() hardening (lines 52-118)
- Added validation for empty input: if not slices: raise ValueError("slices cannot be empty")
- Capture first array and preserve dtype: first = next(iter(slices.values())) → np.empty(shape, dtype=first.dtype)
- Clearer error on missing keys with try/except: raise KeyError(f"Missing slice for key {key} (extra_dims={extra_dims})")
2. flow_system.py - Variable categories cleanup and safe enum restoration
- Added self._variable_categories.clear() in _invalidate_model() (line 1692) to prevent stale categories from being reused
- Hardened VariableCategory restoration (lines 922-930) with try/except to handle unknown/renamed enum values gracefully with a warning instead of crashing
3. transform_accessor.py - Correct timestep_mapping decode for segmented systems (lines 1850-1857)
- For segmented systems, now uses clustering.n_segments instead of clustering.timesteps_per_cluster as the divisor
- This matches the encoding logic in expand_data() and build_expansion_divisor()
…+variable-categories
…sion+variable-categories # Conflicts: # flixopt/structure.py
modeling.py:200-242 - Added category: VariableCategory = None parameter and passed it to both add_variables calls. Updated Callers ┌─────────────┬──────┬─────────────────────────┬────────────────────┐ │ File │ Line │ Variable │ Category │ ├─────────────┼──────┼─────────────────────────┼────────────────────┤ │ features.py │ 208 │ active_hours │ TOTAL │ ├─────────────┼──────┼─────────────────────────┼────────────────────┤ │ elements.py │ 682 │ total_flow_hours │ TOTAL │ ├─────────────┼──────┼─────────────────────────┼────────────────────┤ │ elements.py │ 709 │ flow_hours_over_periods │ TOTAL_OVER_PERIODS │ └─────────────┴──────┴─────────────────────────┴────────────────────┘ All expression tracking variables now properly register their categories for segment expansion handling. The pattern is consistent: callers specify the appropriate category based on what the tracked expression represents.
variable_categories property (line 1672): @Property def variable_categories(self) -> dict[str, VariableCategory]: """Variable categories for filtering and segment expansion.""" return self._variable_categories get_variables_by_category() method (line 1681): def get_variables_by_category( self, *categories: VariableCategory, from_solution: bool = True ) -> list[str]: """Get variable names matching any of the specified categories.""" Updated in statistics_accessor.py ┌───────────────┬──────────────────────────────────────────┬──────────────────────────────────────────────────┐ │ Property │ Before │ After │ ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤ │ flow_rates │ endswith('|flow_rate') │ get_variables_by_category(FLOW_RATE) │ ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤ │ flow_sizes │ endswith('|size') + flow_labels check │ get_variables_by_category(SIZE) + flow_labels │ ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤ │ storage_sizes │ endswith('|size') + storage_labels check │ get_variables_by_category(SIZE) + storage_labels │ ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤ │ charge_states │ endswith('|charge_state') │ get_variables_by_category(CHARGE_STATE) │ └───────────────┴──────────────────────────────────────────┴──────────────────────────────────────────────────┘ Benefits 1. Single source of truth - Categories defined once in VariableCategory enum 2. Easier maintenance - Adding new variable types only requires updating one place 3. Type safety - Using enum values instead of magic strings 4. Flexible filtering - Can filter by multiple categories: get_variables_by_category(SIZE, INVESTED) 5. Consistent naming - Uses rsplit('|', 1)[0] instead of replace('|suffix', '') for label extraction
1. New SIZE Sub-Categories (structure.py)
- Added FLOW_SIZE and STORAGE_SIZE to differentiate flow vs storage investments
- Kept SIZE for backward compatibility
2. InvestmentModel Updated (features.py)
- Added size_category parameter to InvestmentModel.__init__()
- Callers now specify the appropriate category
3. Variable Registrations Updated
- elements.py: FlowModel uses FLOW_SIZE
- components.py: StorageModel uses STORAGE_SIZE (2 locations)
4. Statistics Accessor Simplified (statistics_accessor.py)
- flow_sizes: Now uses get_variables_by_category(FLOW_SIZE) directly
- storage_sizes: Now uses get_variables_by_category(STORAGE_SIZE) directly
- No more filtering by element labels after getting SIZE variables
5. Backward-Compatible Fallback (flow_system.py)
- get_variables_by_category() handles old files:
- FLOW_SIZE → matches |size suffix + flow labels
- STORAGE_SIZE → matches |size suffix + storage labels
6. SOC Boundary Pattern Matching Replaced (transform_accessor.py)
- Changed from endswith('|SOC_boundary') to get_variables_by_category(SOC_BOUNDARY)
7. Effect Variables Verified
- PER_TIMESTEP ✓ (features.py:659)
- SHARE ✓ (features.py:700 for temporal shares)
- TOTAL / TOTAL_OVER_PERIODS ✓ (multiple locations)
8. Documentation Updated
- _build_segment_total_varnames() marked as backwards-compatibility fallback
Benefits
- Cleaner code: No more string manipulation to filter by element type
- Type safety: Using enum values instead of magic strings
- Single source of truth: Categories defined once, used everywhere
- Backward compatible: Old files still work via fallback logic
…+variable-categories # Conflicts: # flixopt/clustering/base.py # flixopt/components.py # flixopt/structure.py # flixopt/transform_accessor.py
* Summary: Variable Registry Implementation
Changes Made
1. Added VariableCategory enum (structure.py:64-77)
- STATE - For state variables like charge_state (interpolated within segments)
- SEGMENT_TOTAL - For segment totals like effect contributions (divided by expansion divisor)
- RATE - For rate variables like flow_rate (expanded as-is)
- BINARY - For binary variables like status (expanded as-is)
- OTHER - For uncategorized variables
2. Added variable_categories registry to FlowSystemModel (structure.py:214)
- Dictionary mapping variable names to their categories
3. Modified add_variables() method (structure.py:388-396)
- Added optional category parameter
- Automatically registers variables with their category
4. Updated variable creation calls:
- components.py: Storage variables (charge_state as STATE, netto_discharge as RATE)
- elements.py: Flow variables (flow_rate as RATE, status as BINARY)
- features.py: Effect contributions (per_timestep as SEGMENT_TOTAL, temporal shares as SEGMENT_TOTAL, startup/shutdown as BINARY)
5. Updated expand() method (transform_accessor.py:2074-2090)
- Uses variable_categories registry to identify segment totals and state variables
- Falls back to pattern matching for backwards compatibility with older FlowSystems
Benefits
- More robust categorization: Variables are categorized at creation time, not by pattern matching
- Extensible: New variable types can easily be added with proper category
- Backwards compatible: Old FlowSystems without categories still work via pattern matching fallback
* Summary: Fine-Grained Variable Categories
New Categories (structure.py:45-103)
class VariableCategory(Enum):
# State variables
CHARGE_STATE, SOC_BOUNDARY
# Rate/Power variables
FLOW_RATE, NETTO_DISCHARGE, VIRTUAL_FLOW
# Binary state
STATUS, INACTIVE
# Binary events
STARTUP, SHUTDOWN
# Effect variables
PER_TIMESTEP, SHARE, TOTAL, TOTAL_OVER_PERIODS
# Investment
SIZE, INVESTED
# Counting/Duration
STARTUP_COUNT, DURATION
# Piecewise linearization
INSIDE_PIECE, LAMBDA0, LAMBDA1, ZERO_POINT
# Other
OTHER
Logical Groupings for Expansion
EXPAND_INTERPOLATE = {CHARGE_STATE} # Interpolate between boundaries
EXPAND_DIVIDE = {PER_TIMESTEP, SHARE} # Divide by expansion factor
# Default: repeat within segment
Files Modified
┌───────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ File │ Variables Updated │
├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ components.py │ charge_state, netto_discharge, SOC_boundary │
├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ elements.py │ flow_rate, status, virtual_supply, virtual_demand │
├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ features.py │ size, invested, inactive, startup, shutdown, startup_count, inside_piece, lambda0, lambda1, zero_point, total, per_timestep, shares │
├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ effects.py │ total, total_over_periods │
├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ modeling.py │ duration │
├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ transform_accessor.py │ Updated to use EXPAND_INTERPOLATE and EXPAND_DIVIDE groupings │
└───────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Test Results
- All 83 cluster/expand tests pass
- Variable categories correctly populated and grouped
* Add IO for variable categories
* The refactoring is complete. Here's what was accomplished:
Changes Made
1. Added combine_slices() utility to flixopt/clustering/base.py (lines 52-107)
- Simple function that stacks dict of {(dim_values): np.ndarray} into a DataArray
- Much cleaner than the previous reverse-concat pattern
2. Refactored 3 methods to use the new utility:
- Clustering.expand_data() - reduced from ~25 to ~12 lines
- Clustering.build_expansion_divisor() - reduced from ~35 to ~20 lines
- TransformAccessor._interpolate_charge_state_segmented() - reduced from ~43 to ~27 lines
3. Added 4 unit tests for combine_slices() in tests/test_cluster_reduce_expand.py
Results
┌───────────────────────────────────┬──────────┬────────────────────────┐
│ Metric │ Before │ After │
├───────────────────────────────────┼──────────┼────────────────────────┤
│ Complex reverse-concat blocks │ 3 │ 0 │
├───────────────────────────────────┼──────────┼────────────────────────┤
│ Lines of dimension iteration code │ ~100 │ ~60 │
├───────────────────────────────────┼──────────┼────────────────────────┤
│ Test coverage │ 83 tests │ 87 tests (all passing) │
└───────────────────────────────────┴──────────┴────────────────────────┘
The Pattern Change
Before (complex reverse-concat):
result_arrays = slices
for dim in reversed(extra_dims):
grouped = {}
for key, arr in result_arrays.items():
rest_key = key[:-1] if len(key) > 1 else ()
grouped.setdefault(rest_key, []).append(arr)
result_arrays = {k: xr.concat(v, dim=...) for k, v in grouped.items()}
result = list(result_arrays.values())[0].transpose('time', ...)
After (simple combine):
return combine_slices(slices, extra_dims, dim_coords, 'time', output_coord, attrs)
* Here's what we accomplished:
1. Fully Vectorized expand_data()
Before (~65 lines with loops):
for combo in np.ndindex(*[len(v) for v in dim_coords.values()]):
selector = {...}
mapping = _select_dims(timestep_mapping, **selector).values
data_slice = _select_dims(aggregated, **selector)
slices[key] = _expand_slice(mapping, data_slice)
return combine_slices(slices, ...)
After (~25 lines, fully vectorized):
timestep_mapping = self.timestep_mapping # Already multi-dimensional!
cluster_indices = timestep_mapping // time_dim_size
time_indices = timestep_mapping % time_dim_size
expanded = aggregated.isel(cluster=cluster_indices, time=time_indices)
# xarray handles broadcasting across period/scenario automatically
2. build_expansion_divisor() and _interpolate_charge_state_segmented()
These still use combine_slices() because they need per-result segment data (segment_assignments, segment_durations) which isn't available as concatenated Clustering properties yet.
Current State
┌───────────────────────────────────────┬─────────────────┬─────────────────────────────────┐
│ Method │ Vectorized? │ Uses Clustering Properties │
├───────────────────────────────────────┼─────────────────┼─────────────────────────────────┤
│ expand_data() │ Yes │ timestep_mapping (fully) │
├───────────────────────────────────────┼─────────────────┼─────────────────────────────────┤
│ build_expansion_divisor() │ No (small loop) │ cluster_assignments (partially) │
├───────────────────────────────────────┼─────────────────┼─────────────────────────────────┤
│ _interpolate_charge_state_segmented() │ No (small loop) │ cluster_assignments (partially) │
└───────────────────────────────────────┴─────────────────┴─────────────────────────────────┘
* Completed:
1. _interpolate_charge_state_segmented() - Fully vectorized from ~110 lines to ~55 lines
- Uses clustering.timestep_mapping for indexing
- Uses clustering.results.segment_assignments, segment_durations, and position_within_segment
- Single xarray expression instead of triple-nested loops
Previously completed (from before context limit):
- Added segment_assignments multi-dimensional property to ClusteringResults
- Added segment_durations multi-dimensional property to ClusteringResults
- Added position_within_segment property to ClusteringResults
- Vectorized expand_data()
- Vectorized build_expansion_divisor()
Test results: All 130 tests pass (87 cluster/expand + 43 IO tests)
The combine_slices utility function is still available in clustering/base.py if needed in the future, but all the main dimension-handling methods now use xarray's vectorized advanced indexing instead of the loop-based slice-and-combine pattern.
* All simplifications complete! Here's a summary of what we cleaned up:
Summary of Simplifications
1. expand_da() in transform_accessor.py
- Extracted duplicate "append extra timestep" logic into _append_final_state() helper
- Reduced from ~50 lines to ~25 lines
- Eliminated code duplication
2. _build_multi_dim_array() → _build_property_array() in clustering/base.py
- Replaced 6 conditional branches with unified np.ndindex() pattern
- Now handles both simple and multi-dimensional cases in one method
- Reduced from ~50 lines to ~25 lines
- Preserves dtype (fixed integer indexing bug)
3. Property boilerplate in ClusteringResults
- 5 properties (cluster_assignments, cluster_occurrences, cluster_centers, segment_assignments, segment_durations) now use the unified _build_property_array()
- Each property reduced from ~25 lines to ~8 lines
- Total: ~165 lines → ~85 lines
4. _build_timestep_mapping() in Clustering
- Simplified to single call using _build_property_array()
- Reduced from ~16 lines to ~9 lines
Total lines removed: ~150+ lines of duplicated/complex code
* Removed the unnecessary lookup and use segment_indices directl
* The IO roundtrip fix is working correctly. Here's a summary of what was fixed:
Summary
The IO roundtrip bug was caused by representative_weights (a variable with only ('cluster',) dimension) being copied as-is during expansion, which caused the cluster dimension to incorrectly persist in the expanded dataset.
Fix applied in transform_accessor.py:2063-2065:
# Skip cluster-only vars (no time dim) - they don't make sense after expansion
if da.dims == ('cluster',):
continue
This skips variables that have only a cluster dimension (and no time dimension) during expansion, as these variables don't make sense after the clustering structure is removed.
Test results:
- All 87 tests in test_cluster_reduce_expand.py pass ✓
- All 43 tests in test_clustering_io.py pass ✓
- Manual IO roundtrip test passes ✓
- Tests with different segment counts (3, 6) pass ✓
- Tests with 2-hour timesteps pass ✓
* Updated condition in transform_accessor.py:2063-2066:
# Skip vars with cluster dim but no time dim - they don't make sense after expansion
# (e.g., representative_weights with dims ('cluster',) or ('cluster', 'period'))
if 'cluster' in da.dims and 'time' not in da.dims:
continue
This correctly handles:
- ('cluster',) - simple cluster-only variables like cluster_weight
- ('cluster', 'period') - cluster variables with period dimension
- ('cluster', 'scenario') - cluster variables with scenario dimension
- ('cluster', 'period', 'scenario') - cluster variables with both
Variables with both cluster and time dimensions (like timestep_duration with dims ('cluster', 'time')) are correctly expanded since they contain time-series data that needs to be mapped back to original timesteps.
* Summary of Fixes
1. clustering/base.py - combine_slices() hardening (lines 52-118)
- Added validation for empty input: if not slices: raise ValueError("slices cannot be empty")
- Capture first array and preserve dtype: first = next(iter(slices.values())) → np.empty(shape, dtype=first.dtype)
- Clearer error on missing keys with try/except: raise KeyError(f"Missing slice for key {key} (extra_dims={extra_dims})")
2. flow_system.py - Variable categories cleanup and safe enum restoration
- Added self._variable_categories.clear() in _invalidate_model() (line 1692) to prevent stale categories from being reused
- Hardened VariableCategory restoration (lines 922-930) with try/except to handle unknown/renamed enum values gracefully with a warning instead of crashing
3. transform_accessor.py - Correct timestep_mapping decode for segmented systems (lines 1850-1857)
- For segmented systems, now uses clustering.n_segments instead of clustering.timesteps_per_cluster as the divisor
- This matches the encoding logic in expand_data() and build_expansion_divisor()
* Added test_segmented_total_effects_match_solution to TestSegmentation class
* Added all remaining tsam.aggregate() paramaters and missing type hint
* Updated expression_tracking_variable
modeling.py:200-242 - Added category: VariableCategory = None parameter and passed it to both add_variables calls.
Updated Callers
┌─────────────┬──────┬─────────────────────────┬────────────────────┐
│ File │ Line │ Variable │ Category │
├─────────────┼──────┼─────────────────────────┼────────────────────┤
│ features.py │ 208 │ active_hours │ TOTAL │
├─────────────┼──────┼─────────────────────────┼────────────────────┤
│ elements.py │ 682 │ total_flow_hours │ TOTAL │
├─────────────┼──────┼─────────────────────────┼────────────────────┤
│ elements.py │ 709 │ flow_hours_over_periods │ TOTAL_OVER_PERIODS │
└─────────────┴──────┴─────────────────────────┴────────────────────┘
All expression tracking variables now properly register their categories for segment expansion handling. The pattern is consistent: callers specify the appropriate category based on what the tracked expression represents.
* Added to flow_system.py
variable_categories property (line 1672):
@Property
def variable_categories(self) -> dict[str, VariableCategory]:
"""Variable categories for filtering and segment expansion."""
return self._variable_categories
get_variables_by_category() method (line 1681):
def get_variables_by_category(
self, *categories: VariableCategory, from_solution: bool = True
) -> list[str]:
"""Get variable names matching any of the specified categories."""
Updated in statistics_accessor.py
┌───────────────┬──────────────────────────────────────────┬──────────────────────────────────────────────────┐
│ Property │ Before │ After │
├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ flow_rates │ endswith('|flow_rate') │ get_variables_by_category(FLOW_RATE) │
├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ flow_sizes │ endswith('|size') + flow_labels check │ get_variables_by_category(SIZE) + flow_labels │
├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ storage_sizes │ endswith('|size') + storage_labels check │ get_variables_by_category(SIZE) + storage_labels │
├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤
│ charge_states │ endswith('|charge_state') │ get_variables_by_category(CHARGE_STATE) │
└───────────────┴──────────────────────────────────────────┴──────────────────────────────────────────────────┘
Benefits
1. Single source of truth - Categories defined once in VariableCategory enum
2. Easier maintenance - Adding new variable types only requires updating one place
3. Type safety - Using enum values instead of magic strings
4. Flexible filtering - Can filter by multiple categories: get_variables_by_category(SIZE, INVESTED)
5. Consistent naming - Uses rsplit('|', 1)[0] instead of replace('|suffix', '') for label extraction
* Ensure backwards compatability
* Summary of Changes
1. New SIZE Sub-Categories (structure.py)
- Added FLOW_SIZE and STORAGE_SIZE to differentiate flow vs storage investments
- Kept SIZE for backward compatibility
2. InvestmentModel Updated (features.py)
- Added size_category parameter to InvestmentModel.__init__()
- Callers now specify the appropriate category
3. Variable Registrations Updated
- elements.py: FlowModel uses FLOW_SIZE
- components.py: StorageModel uses STORAGE_SIZE (2 locations)
4. Statistics Accessor Simplified (statistics_accessor.py)
- flow_sizes: Now uses get_variables_by_category(FLOW_SIZE) directly
- storage_sizes: Now uses get_variables_by_category(STORAGE_SIZE) directly
- No more filtering by element labels after getting SIZE variables
5. Backward-Compatible Fallback (flow_system.py)
- get_variables_by_category() handles old files:
- FLOW_SIZE → matches |size suffix + flow labels
- STORAGE_SIZE → matches |size suffix + storage labels
6. SOC Boundary Pattern Matching Replaced (transform_accessor.py)
- Changed from endswith('|SOC_boundary') to get_variables_by_category(SOC_BOUNDARY)
7. Effect Variables Verified
- PER_TIMESTEP ✓ (features.py:659)
- SHARE ✓ (features.py:700 for temporal shares)
- TOTAL / TOTAL_OVER_PERIODS ✓ (multiple locations)
8. Documentation Updated
- _build_segment_total_varnames() marked as backwards-compatibility fallback
Benefits
- Cleaner code: No more string manipulation to filter by element type
- Type safety: Using enum values instead of magic strings
- Single source of truth: Categories defined once, used everywhere
- Backward compatible: Old files still work via fallback logic
* perf: Keep data in minimal form (no pre-broadcasting) (#575)
* Add collapse to IO
* Temp
* 1. Added dimension reduction functions in io.py:
- _reduce_constant_dims(): Reduces dimensions where all values are constant
- _expand_reduced_dims(): Restores original dimensions with correct order
2. Added reduce_constants parameter to FlowSystem.to_dataset():
- Default: False (opt-in)
- When True: Reduces constant dimensions for memory efficiency
3. Updated FlowSystem.from_dataset():
- Expands both collapsed arrays (from NetCDF) and reduced dimensions
4. Key fixes:
- Store original dimension order in attrs to preserve (cluster, time) vs (time, cluster) ordering
- Skip solution variables in reduction (they're prefixed with solution|)
The optimization is opt-in via to_dataset(reduce_constants=True). For file storage, save_dataset_to_netcdf still collapses constants to scalars by default.
* Set reduce_constants=True by default in to_dataset()
Benchmarks show 99%+ reduction in memory and file size for multi-period
models, with faster IO speeds. The optimization is now enabled by default.
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* Feature/speedup io+internals (#576)
* Add safe isel wrappers for dimension-independent operations
- Add _scalar_safe_isel_drop() to modeling.py for selecting from
potentially reduced arrays (handles both scalar and array cases)
- Update Storage validation to use _scalar_safe_isel for
relative_minimum/maximum_charge_state access at time=0
- Update StorageModel._relative_charge_state_bounds to handle
reduced arrays properly with dimension checks
These wrappers prepare the codebase for future optimization where
_expand_reduced_dims() could be removed from from_dataset(),
keeping data compact throughout the system.
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* Remove pre-broadcasting from DataConverter - data stays minimal
Major change: DataConverter.to_dataarray() now validates dimensions
instead of broadcasting to full target dimensions. This keeps data
compact (scalars stay scalar, 1D arrays stay 1D) and lets linopy
handle broadcasting at variable creation time.
Changes:
- core.py: Replace _broadcast_dataarray_to_target_specification with
_validate_dataarray_dims in to_dataarray() method
- components.py: Fix _relative_charge_state_bounds to handle scalar
inputs that have no time dimension (expand to timesteps_extra)
- conftest.py: Add assert_dims_compatible helper for tests
- test_*.py: Update dimension assertions to use subset checking
Memory impact (8760 timesteps × 20 periods):
- Before: 24.1 MB dataset size
- After: 137.1 KB dataset size
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* Remove redundant dimension reduction - data is minimal from start
Since DataConverter no longer broadcasts, these optimizations are redundant:
- Remove _reduce_constant_dims() and _expand_reduced_dims() from io.py
- Remove reduce_constants parameter from to_dataset()
- Remove _expand_reduced_dims() call from from_dataset()
Also ensure Dataset always has model coordinates (time, period, scenario,
cluster) even if no data variable uses them. This is important for Dataset
validity and downstream operations.
Update benchmark_io.py to reflect the simplified API.
Memory footprint now:
- 24 timesteps: 568 bytes (23/24 vars are scalar)
- 168 timesteps: 2.8 KB
- 8760 timesteps × 20 periods: ~137 KB (vs 24 MB with old broadcasting)
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* ⏺ Done. I've removed the _collapse_constant_arrays / _expand_collapsed_arrays optimization from the codebase. Here's a summary of the changes:
Files modified:
1. flixopt/io.py:
- Removed COLLAPSED_VAR_PREFIX constant
- Removed _collapse_constant_arrays() function (~45 lines)
- Removed _expand_collapsed_arrays() function (~40 lines)
- Removed collapse_constants parameter from save_dataset_to_netcdf()
- Removed expansion logic from load_dataset_from_netcdf()
2. flixopt/flow_system.py:
- Removed import and call to _expand_collapsed_arrays in from_dataset()
3. benchmark_io.py:
- Simplified benchmark_netcdf() to only benchmark single save/load (no more comparison)
- Removed collapse_constants parameter from roundtrip function
Rationale: Since data is now kept in minimal form throughout the system (scalars stay scalars, 1D arrays stay 1D), there's no need for the extra collapsing/expanding layer when saving to NetCDF. The file sizes are naturally small because the data isn't expanded to full dimensions.
Test results:
- All 10 IO tests pass
- All 4 integration tests pass
- Benchmark runs successfully
* Summary of Fixes
1. _dataset_resample handling all-scalar data (transform_accessor.py)
When all data variables are scalars (no time dimension), the resample function now:
- Creates a dummy variable to resample the time coordinate
- Preserves all original scalar data variables
- Preserves all non-time coordinates (period, scenario, cluster)
2. _scalar_safe_reduce helper (modeling.py)
Added a new helper function that safely performs aggregation operations (mean/sum/etc) over a dimension:
- Returns reduced data if the dimension exists
- Returns data unchanged if the dimension doesn't exist (scalar data)
3. Updated Storage intercluster linking (components.py)
Used _scalar_safe_reduce for:
- relative_loss_per_hour.mean('time')
- timestep_duration.mean('time')
4. Updated clustering expansion (transform_accessor.py)
Used _scalar_safe_reduce for relative_loss_per_hour.mean('time')
5. Fixed dimension order in expand_data (clustering/base.py)
When expanding clustered data back to original timesteps:
- Added transpose to ensure (cluster, time) dimension order before numpy indexing
- Fixes IndexError when dimensions were in different order
* Added reduction on loading old datasets
* Summary of changes made:
1. flixopt/structure.py: Added dimension transpose in _ensure_coords() to ensure correct dimension order when data already has all dims but in wrong order
2. tests/test_io_conversion.py: Updated test_returns_same_object → test_returns_equivalent_dataset since convert_old_dataset now creates a new dataset when reducing constants
3. tests/deprecated/test_flow.py: Updated 6 assertions to expect minimal dimensions ('time',) instead of broadcasting to all model coords
4. tests/deprecated/test_component.py: Updated 2 assertions to expect minimal dimensions ('time',)
5. tests/deprecated/test_linear_converter.py: Updated 1 assertion to expect minimal dimensions ('time',)
The key change is that data now stays in minimal form - a 1D time-varying array stays as ('time',) dimensions rather than being pre-broadcast to ('time', 'scenario') or other full model dimensions. Broadcasting happens at the linopy interface layer in FlowSystemModel.add_variables() when needed.
---------
Co-authored-by: Claude Opus 4.5 <[email protected]>
---------
Co-authored-by: Claude Opus 4.5 <[email protected]>
---------
Co-authored-by: Claude Opus 4.5 <[email protected]>
Summary
Fixes incorrect effect totals when expanding segmented clustered FlowSystems, and introduces a robust variable categorization system to handle different expansion behaviors.
Problem
When using
cluster()with segmentation (SegmentConfig), each segment represents multiple hours with variable durations (e.g., 3h, 4h, 5h). Duringexpand():{effect}(temporal)|per_timestepgave ~4x the actual solution totalMinimal Reproduction
Before fix:
FAIL (computed=360000.00, expected=90000.00)- 4x inflationAfter fix:
PASS (computed=90000.00, expected=90000.00)Solution
1. Variable Categorization System
Added
VariableCategoryenum to classify optimization variables by semantic meaning and expansion behavior:CHARGE_STATE,SOC_BOUNDARYFLOW_RATE,NETTO_DISCHARGE,VIRTUAL_FLOWSTATUS,INACTIVE,STARTUP,SHUTDOWNPER_TIMESTEP,SHARE,TOTAL,TOTAL_OVER_PERIODSSIZE,FLOW_SIZE,STORAGE_SIZE,INVESTEDVariables are categorized at creation time via
add_variables(category=...). Backwards compatibility maintained via fallback pattern matching for older FlowSystems.2. Expansion Divisor
Clustering.build_expansion_divisor(): Returns segment duration for each original timestep, handling per-period/scenario clustering_build_segment_total_varnames(): Derives exact variable names from FlowSystem structure (not pattern matching)3. Charge State Interpolation
For segmented systems,
charge_statevalues at segment boundaries are interpolated within segments to show actual charge trajectory.4. Code Quality Improvements
combine_slices()utility: Combines{(dim_values): np.ndarray}dicts into DataArrays with proper dtype preservationexpand_data(),build_expansion_divisor(),_interpolate_charge_state_segmented()use xarray's vectorized advanced indexingget_variables_by_category()instead of string suffix matchingType of Change
Testing
test_segmented_total_effects_match_solutioncombine_slices()Key Files Changed
structure.pyVariableCategoryenum andEXPAND_INTERPOLATE/EXPAND_DIVIDEgroupingsclustering/base.pybuild_expansion_divisor(),combine_slices(), vectorized property builderstransform_accessor.pyexpand(), charge state interpolationflow_system.pyvariable_categoriesproperty,get_variables_by_category()components.py,elements.py,features.py,effects.pystatistics_accessor.py