-
Notifications
You must be signed in to change notification settings - Fork 9
Fix/segmentation expansion+variable categories #574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: fix/segmentation-expansion
Are you sure you want to change the base?
Fix/segmentation expansion+variable categories #574
Conversation
Changes Made
1. Added VariableCategory enum (structure.py:64-77)
- STATE - For state variables like charge_state (interpolated within segments)
- SEGMENT_TOTAL - For segment totals like effect contributions (divided by expansion divisor)
- RATE - For rate variables like flow_rate (expanded as-is)
- BINARY - For binary variables like status (expanded as-is)
- OTHER - For uncategorized variables
2. Added variable_categories registry to FlowSystemModel (structure.py:214)
- Dictionary mapping variable names to their categories
3. Modified add_variables() method (structure.py:388-396)
- Added optional category parameter
- Automatically registers variables with their category
4. Updated variable creation calls:
- components.py: Storage variables (charge_state as STATE, netto_discharge as RATE)
- elements.py: Flow variables (flow_rate as RATE, status as BINARY)
- features.py: Effect contributions (per_timestep as SEGMENT_TOTAL, temporal shares as SEGMENT_TOTAL, startup/shutdown as BINARY)
5. Updated expand() method (transform_accessor.py:2074-2090)
- Uses variable_categories registry to identify segment totals and state variables
- Falls back to pattern matching for backwards compatibility with older FlowSystems
Benefits
- More robust categorization: Variables are categorized at creation time, not by pattern matching
- Extensible: New variable types can easily be added with proper category
- Backwards compatible: Old FlowSystems without categories still work via pattern matching fallback
New Categories (structure.py:45-103)
class VariableCategory(Enum):
# State variables
CHARGE_STATE, SOC_BOUNDARY
# Rate/Power variables
FLOW_RATE, NETTO_DISCHARGE, VIRTUAL_FLOW
# Binary state
STATUS, INACTIVE
# Binary events
STARTUP, SHUTDOWN
# Effect variables
PER_TIMESTEP, SHARE, TOTAL, TOTAL_OVER_PERIODS
# Investment
SIZE, INVESTED
# Counting/Duration
STARTUP_COUNT, DURATION
# Piecewise linearization
INSIDE_PIECE, LAMBDA0, LAMBDA1, ZERO_POINT
# Other
OTHER
Logical Groupings for Expansion
EXPAND_INTERPOLATE = {CHARGE_STATE} # Interpolate between boundaries
EXPAND_DIVIDE = {PER_TIMESTEP, SHARE} # Divide by expansion factor
# Default: repeat within segment
Files Modified
┌───────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ File │ Variables Updated │
├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ components.py │ charge_state, netto_discharge, SOC_boundary │
├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ elements.py │ flow_rate, status, virtual_supply, virtual_demand │
├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ features.py │ size, invested, inactive, startup, shutdown, startup_count, inside_piece, lambda0, lambda1, zero_point, total, per_timestep, shares │
├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ effects.py │ total, total_over_periods │
├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ modeling.py │ duration │
├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ transform_accessor.py │ Updated to use EXPAND_INTERPOLATE and EXPAND_DIVIDE groupings │
└───────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Test Results
- All 83 cluster/expand tests pass
- Variable categories correctly populated and grouped
Changes Made
1. Added combine_slices() utility to flixopt/clustering/base.py (lines 52-107)
- Simple function that stacks dict of {(dim_values): np.ndarray} into a DataArray
- Much cleaner than the previous reverse-concat pattern
2. Refactored 3 methods to use the new utility:
- Clustering.expand_data() - reduced from ~25 to ~12 lines
- Clustering.build_expansion_divisor() - reduced from ~35 to ~20 lines
- TransformAccessor._interpolate_charge_state_segmented() - reduced from ~43 to ~27 lines
3. Added 4 unit tests for combine_slices() in tests/test_cluster_reduce_expand.py
Results
┌───────────────────────────────────┬──────────┬────────────────────────┐
│ Metric │ Before │ After │
├───────────────────────────────────┼──────────┼────────────────────────┤
│ Complex reverse-concat blocks │ 3 │ 0 │
├───────────────────────────────────┼──────────┼────────────────────────┤
│ Lines of dimension iteration code │ ~100 │ ~60 │
├───────────────────────────────────┼──────────┼────────────────────────┤
│ Test coverage │ 83 tests │ 87 tests (all passing) │
└───────────────────────────────────┴──────────┴────────────────────────┘
The Pattern Change
Before (complex reverse-concat):
result_arrays = slices
for dim in reversed(extra_dims):
grouped = {}
for key, arr in result_arrays.items():
rest_key = key[:-1] if len(key) > 1 else ()
grouped.setdefault(rest_key, []).append(arr)
result_arrays = {k: xr.concat(v, dim=...) for k, v in grouped.items()}
result = list(result_arrays.values())[0].transpose('time', ...)
After (simple combine):
return combine_slices(slices, extra_dims, dim_coords, 'time', output_coord, attrs)
1. Fully Vectorized expand_data()
Before (~65 lines with loops):
for combo in np.ndindex(*[len(v) for v in dim_coords.values()]):
selector = {...}
mapping = _select_dims(timestep_mapping, **selector).values
data_slice = _select_dims(aggregated, **selector)
slices[key] = _expand_slice(mapping, data_slice)
return combine_slices(slices, ...)
After (~25 lines, fully vectorized):
timestep_mapping = self.timestep_mapping # Already multi-dimensional!
cluster_indices = timestep_mapping // time_dim_size
time_indices = timestep_mapping % time_dim_size
expanded = aggregated.isel(cluster=cluster_indices, time=time_indices)
# xarray handles broadcasting across period/scenario automatically
2. build_expansion_divisor() and _interpolate_charge_state_segmented()
These still use combine_slices() because they need per-result segment data (segment_assignments, segment_durations) which isn't available as concatenated Clustering properties yet.
Current State
┌───────────────────────────────────────┬─────────────────┬─────────────────────────────────┐
│ Method │ Vectorized? │ Uses Clustering Properties │
├───────────────────────────────────────┼─────────────────┼─────────────────────────────────┤
│ expand_data() │ Yes │ timestep_mapping (fully) │
├───────────────────────────────────────┼─────────────────┼─────────────────────────────────┤
│ build_expansion_divisor() │ No (small loop) │ cluster_assignments (partially) │
├───────────────────────────────────────┼─────────────────┼─────────────────────────────────┤
│ _interpolate_charge_state_segmented() │ No (small loop) │ cluster_assignments (partially) │
└───────────────────────────────────────┴─────────────────┴─────────────────────────────────┘
1. _interpolate_charge_state_segmented() - Fully vectorized from ~110 lines to ~55 lines
- Uses clustering.timestep_mapping for indexing
- Uses clustering.results.segment_assignments, segment_durations, and position_within_segment
- Single xarray expression instead of triple-nested loops
Previously completed (from before context limit):
- Added segment_assignments multi-dimensional property to ClusteringResults
- Added segment_durations multi-dimensional property to ClusteringResults
- Added position_within_segment property to ClusteringResults
- Vectorized expand_data()
- Vectorized build_expansion_divisor()
Test results: All 130 tests pass (87 cluster/expand + 43 IO tests)
The combine_slices utility function is still available in clustering/base.py if needed in the future, but all the main dimension-handling methods now use xarray's vectorized advanced indexing instead of the loop-based slice-and-combine pattern.
Summary of Simplifications 1. expand_da() in transform_accessor.py - Extracted duplicate "append extra timestep" logic into _append_final_state() helper - Reduced from ~50 lines to ~25 lines - Eliminated code duplication 2. _build_multi_dim_array() → _build_property_array() in clustering/base.py - Replaced 6 conditional branches with unified np.ndindex() pattern - Now handles both simple and multi-dimensional cases in one method - Reduced from ~50 lines to ~25 lines - Preserves dtype (fixed integer indexing bug) 3. Property boilerplate in ClusteringResults - 5 properties (cluster_assignments, cluster_occurrences, cluster_centers, segment_assignments, segment_durations) now use the unified _build_property_array() - Each property reduced from ~25 lines to ~8 lines - Total: ~165 lines → ~85 lines 4. _build_timestep_mapping() in Clustering - Simplified to single call using _build_property_array() - Reduced from ~16 lines to ~9 lines Total lines removed: ~150+ lines of duplicated/complex code
…as fixed:
Summary
The IO roundtrip bug was caused by representative_weights (a variable with only ('cluster',) dimension) being copied as-is during expansion, which caused the cluster dimension to incorrectly persist in the expanded dataset.
Fix applied in transform_accessor.py:2063-2065:
# Skip cluster-only vars (no time dim) - they don't make sense after expansion
if da.dims == ('cluster',):
continue
This skips variables that have only a cluster dimension (and no time dimension) during expansion, as these variables don't make sense after the clustering structure is removed.
Test results:
- All 87 tests in test_cluster_reduce_expand.py pass ✓
- All 43 tests in test_clustering_io.py pass ✓
- Manual IO roundtrip test passes ✓
- Tests with different segment counts (3, 6) pass ✓
- Tests with 2-hour timesteps pass ✓
# Skip vars with cluster dim but no time dim - they don't make sense after expansion
# (e.g., representative_weights with dims ('cluster',) or ('cluster', 'period'))
if 'cluster' in da.dims and 'time' not in da.dims:
continue
This correctly handles:
- ('cluster',) - simple cluster-only variables like cluster_weight
- ('cluster', 'period') - cluster variables with period dimension
- ('cluster', 'scenario') - cluster variables with scenario dimension
- ('cluster', 'period', 'scenario') - cluster variables with both
Variables with both cluster and time dimensions (like timestep_duration with dims ('cluster', 'time')) are correctly expanded since they contain time-series data that needs to be mapped back to original timesteps.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughIntroduces a VariableCategory enum for semantic classification of optimization variables, propagates categories through variable creation across multiple model components, and implements vectorized expansion/interpolation logic for segmented systems. New clustering utilities (combine_slices) and properties (position_within_segment) support multi-dimensional coordinate handling. Changes
Sequence Diagram(s)sequenceDiagram
participant VarCreation as Variable Creation<br/>(add_variables)
participant FlowSystem as FlowSystem
participant VarCatRegistry as Variable Category<br/>Registry
participant Serialization as Serialization<br/>(to_dataset)
participant Expansion as Expansion Logic<br/>(transform_accessor)
VarCreation->>FlowSystem: add_variables(name, category=VC.FLOW_RATE)
FlowSystem->>VarCatRegistry: Store mapping {name: category}
VarCatRegistry-->>FlowSystem: Registered
FlowSystem->>Serialization: to_dataset()
Serialization->>Serialization: Extract variable_categories dict
Serialization-->>Serialization: Serialize to dataset attrs
Note over Serialization: Dataset persisted with<br/>variable category metadata
Expansion->>VarCatRegistry: Lookup category for variable
VarCatRegistry-->>Expansion: Return category (e.g., EXPAND_DIVIDE)
Expansion->>Expansion: Apply vectorized expansion<br/>based on category
Expansion-->>Expansion: Generate expanded DataArray
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@flixopt/clustering/base.py`:
- Around line 52-108: combine_slices currently can break dtypes and crash
silently on empty input and missing keys; fix by validating input non-empty at
the top (if not slices: raise ValueError("slices cannot be empty")), capture the
first array via first = next(iter(slices.values())) and use its dtype when
allocating the output buffer (data = np.empty(shape, dtype=first.dtype)), and
replace the raw dict lookup inside the loop with a try/except that raises a
clearer KeyError including the missing key and the extra_dims (e.g., try:
data[...] = slices[key] except KeyError: raise KeyError(f"Missing slice for key
{key} (extra_dims={extra_dims})")).
In `@flixopt/flow_system.py`:
- Around line 259-262: Clear the stale _variable_categories mapping when the
model is invalidated and when reset() is called: add
self._variable_categories.clear() (or reassign {}) inside _invalidate_model()
and reset() so segment expansion (transform.expand) can’t reuse old categories;
and harden enum restoration by wrapping VariableCategory(value) in a try/except
ValueError (or use a safe lookup) to map unknown/renamed values to a fallback
enum (e.g., VariableCategory.UNKNOWN or a default) instead of raising, applying
this safe restore wherever serialized values are converted back to
VariableCategory.
In `@flixopt/transform_accessor.py`:
- Around line 1844-1858: The timestep_mapping decode wrongly always uses
clustering.timesteps_per_cluster; for segmented systems use
clustering.n_segments instead. In the block that computes time_dim_size (around
where timestep_mapping, cluster_indices, and time_indices are computed), set
time_dim_size = clustering.n_segments when clustering.is_segmented and
clustering.n_segments is not None, otherwise use
clustering.timesteps_per_cluster so that subsequent indexing into
segment_assignments and position_within_segment uses the correct divisor
(matching the logic in expand_data() and build_expansion_divisor()).
🧹 Nitpick comments (3)
flixopt/structure.py (1)
1670-1698:Submodel.add_variables(..., category=...)wiring looks correct; tighten typing + consider overwrite semantics.Two small tweaks:
- Type hint as
VariableCategory | Nonefor clarity.- Decide whether re-registering the same
variable.namewith a different category should warn (today it silently overwrites).flixopt/clustering/base.py (2)
326-405:_build_property_arrayrefactor is a solid direction; one ordering edge-case to confirm.This is much cleaner than hand-rolling multidim assembly per property, and preserving
dtypevianp.empty(..., dtype=...)is a good touch. Only thing to double-check: key construction assumes extra dims are ordered(period, scenario); ifself._dim_namesorder ever differs,self._results[key]lookups can break.Also applies to: 523-558
422-474:position_within_segmentimplementation looks correct; consideroutput_dtypesforapply_ufunc.If this ever runs with dask-backed arrays,
xr.apply_ufunc(..., vectorize=True)can benefit from explicitoutput_dtypes=[...]to avoid dtype inference surprises.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
flixopt/clustering/base.pyflixopt/components.pyflixopt/effects.pyflixopt/elements.pyflixopt/features.pyflixopt/flow_system.pyflixopt/modeling.pyflixopt/structure.pyflixopt/transform_accessor.pytests/test_cluster_reduce_expand.py
🧰 Additional context used
🧬 Code graph analysis (4)
flixopt/modeling.py (1)
flixopt/structure.py (2)
Submodel(1646-1844)VariableCategory(45-90)
flixopt/features.py (1)
flixopt/structure.py (9)
FlowSystemModel(149-413)Submodel(1646-1844)VariableCategory(45-90)add_variables(1670-1698)get_coords(358-393)get_coords(1748-1753)dims(293-295)label_full(1214-1215)label_full(1779-1780)
tests/test_cluster_reduce_expand.py (1)
flixopt/clustering/base.py (7)
combine_slices(52-107)dims(205-207)dims(685-687)values(292-294)values(1306-1313)coords(215-221)coords(690-700)
flixopt/transform_accessor.py (2)
flixopt/clustering/base.py (19)
timestep_mapping(802-808)segment_assignments(368-382)segment_assignments(841-849)results(1221-1232)segment_durations(385-404)segment_durations(852-860)position_within_segment(423-473)timesteps_per_cluster(309-311)timesteps_per_cluster(665-667)isel(244-268)is_segmented(752-758)build_expansion_divisor(924-969)items(284-286)items(1291-1298)values(292-294)values(1306-1313)dims(205-207)dims(685-687)expand_data(877-922)flixopt/flow_system.py (3)
isel(2581-2607)is_segmented(2243-2249)dims(2137-2157)
🔇 Additional comments (24)
flixopt/structure.py (2)
158-164:FlowSystemModel.variable_categoriesis a good central registry—ensure it’s kept consistent in IO paths.This is a clean “single source of truth” for expansion metadata; just ensure load/save roundtrips restore it (and don’t silently drop it).
16-104: No changes needed — serialization is already handled correctly.The enum values are already serialization-friendly. The code explicitly converts
VariableCategoryenum members to their.valuestrings during JSON serialization (lines 751–754 inflow_system.py) and reconstructs them from strings on deserialization (lines 919–924). This is proper handling and requires no modification.Using
frozensetfor the expansion constants is a reasonable optional improvement for immutability (no actual mutation risk exists today), but it is not necessary for correctness or serialization.Likely an incorrect or invalid review comment.
flixopt/features.py (4)
14-15: ImportingVariableCategoryfor tagging is straightforward.
67-81: Investment variables tagged sensibly (SIZE,INVESTED).
197-257: Status tracking tags (INACTIVE,STARTUP/SHUTDOWN,STARTUP_COUNT) align with the declared expansion groupings.One thing to double-check:
coords=self.get_coords()(Line 224/230) includes'time'in all configurations where startup/shutdown constraints are built.
403-425: Piecewise-related tagging (INSIDE_PIECE,LAMBDA0/1,ZERO_POINT) looks consistent.Also applies to: 515-520
flixopt/modeling.py (2)
7-9: Category import is fine.
296-307: Tagging the duration tracking variable asDURATIONis consistent with the new registry.Please verify
duration_per_stepis always anxr.DataArrayhere (it’s typed as scalar-or-DataArray, butsum(...)is used earlier), otherwise this path can still break irrespective of the new category.flixopt/components.py (3)
20-21: Import update for tagging is fine.
935-949: Storage tagging matches intended expansion semantics; verify extra-timestep interpolation path.
charge_stateis bothextra_timestep=TrueandCHARGE_STATE; make sure segmented expansion handles the last boundary timestep correctly (no off-by-one / dropped final point).
1344-1351: TaggingSOC_boundaryasSOC_BOUNDARYis consistent and should help expansion logic.flixopt/effects.py (2)
20-28: Import block update is OK.
383-409: Effect total variables are categorized appropriately (TOTAL,TOTAL_OVER_PERIODS).Just ensure expansion code doesn’t try to “expand” these when there’s no time/cluster dim (should be a no-op).
tests/test_cluster_reduce_expand.py (1)
1398-1478: Nice targeted coverage forcombine_slices(dims/coords/attrs + DatetimeIndex).These tests should catch most regressions in dimension ordering, coordinate wiring, and attribute propagation.
flixopt/elements.py (4)
19-25: ImportingVariableCategoryhere is appropriate for centralized variable tagging.
671-677: Flow rate variable categorization looks correct (FLOW_RATE).
730-736: Status variables are consistently tagged (STATUS) for both FlowModel and ComponentModel.Also applies to: 1043-1049
966-978: Virtual imbalance vars are correctly categorized asVIRTUAL_FLOW.flixopt/clustering/base.py (1)
877-970: Vectorizedexpand_data/build_expansion_divisoris great—please sanity-check against pinned xarray.The advanced-indexing approach should scale better and eliminates nested loops over period/scenario. The cleanup (
drop_vars(['cluster','time',...])+ renamingoriginal_time→time) is sensible, but behavior can differ subtly across xarray versions (coords retained vs. turned into non-dim coords).Also applies to: 1072-1081
flixopt/transform_accessor.py (5)
19-20: LGTM!Clean import of the new category constants used for registry-based expansion logic below.
1860-1877: Interpolation logic is mathematically sound.The interpolation factor calculation and linear interpolation between segment boundaries is correct:
- Uses mid-point positioning within segments (
positions + 0.5)- Handles single-timestep segments with fallback to 0.5
- Properly indexes start (
seg_indices) and end (seg_indices + 1) boundary values
2058-2067: Good defensive handling for cluster-only variables.The new skip logic at lines 2065-2066 correctly excludes variables that have a
clusterdimension but notimedimension. These variables (e.g.,representative_weights) are cluster-level metadata that don't make sense after expansion to original timesteps.
2031-2053: Clear control flow for variable expansion paths.The
expand_dafunction correctly handles the three expansion paths:
- Segmented state variables: Interpolate within segments, then append final boundary value
- Segment total variables: Expand and divide by segment duration to get hourly rates
- Other variables (including non-segmented state): Standard expansion with final state append for state vars
The mutual exclusivity between state variables and segment totals ensures no variable gets both treatments.
2002-2029: Good backward-compatible registry approach with pattern-matching fallback.The implementation correctly:
- Attempts registry-based lookup first via variable_categories (lines 2006, 2014)
- Falls back to structural pattern matching for older FlowSystems without categories
- Properly extracts and appends final state values for state variables
The membership checks are valid:
EXPAND_DIVIDEandEXPAND_INTERPOLATEare sets ofVariableCategoryvalues defined instructure.py(lines 96, 99), supporting theinoperator used at lines 2006 and 2014.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| def combine_slices( | ||
| slices: dict[tuple, np.ndarray], | ||
| extra_dims: list[str], | ||
| dim_coords: dict[str, list], | ||
| output_dim: str, | ||
| output_coord: Any, | ||
| attrs: dict | None = None, | ||
| ) -> xr.DataArray: | ||
| """Combine {(dim_values): 1D_array} dict into a DataArray. | ||
| This utility simplifies the common pattern of iterating over extra dimensions | ||
| (like period, scenario), processing each slice, and combining results. | ||
| Args: | ||
| slices: Dict mapping dimension value tuples to 1D numpy arrays. | ||
| Keys are tuples like ('period1', 'scenario1') matching extra_dims order. | ||
| extra_dims: Dimension names in order (e.g., ['period', 'scenario']). | ||
| dim_coords: Dict mapping dimension names to coordinate values. | ||
| output_dim: Name of the output dimension (typically 'time'). | ||
| output_coord: Coordinate values for output dimension. | ||
| attrs: Optional DataArray attributes. | ||
| Returns: | ||
| DataArray with dims [output_dim, *extra_dims]. | ||
| Example: | ||
| >>> slices = { | ||
| ... ('P1', 'base'): np.array([1, 2, 3]), | ||
| ... ('P1', 'high'): np.array([4, 5, 6]), | ||
| ... ('P2', 'base'): np.array([7, 8, 9]), | ||
| ... ('P2', 'high'): np.array([10, 11, 12]), | ||
| ... } | ||
| >>> result = combine_slices( | ||
| ... slices, | ||
| ... extra_dims=['period', 'scenario'], | ||
| ... dim_coords={'period': ['P1', 'P2'], 'scenario': ['base', 'high']}, | ||
| ... output_dim='time', | ||
| ... output_coord=[0, 1, 2], | ||
| ... ) | ||
| >>> result.dims | ||
| ('time', 'period', 'scenario') | ||
| """ | ||
| n_output = len(next(iter(slices.values()))) | ||
| shape = [n_output] + [len(dim_coords[d]) for d in extra_dims] | ||
| data = np.empty(shape) | ||
|
|
||
| for combo in np.ndindex(*shape[1:]): | ||
| key = tuple(dim_coords[d][i] for d, i in zip(extra_dims, combo, strict=True)) | ||
| data[(slice(None),) + combo] = slices[key] | ||
|
|
||
| return xr.DataArray( | ||
| data, | ||
| dims=[output_dim] + extra_dims, | ||
| coords={output_dim: output_coord, **dim_coords}, | ||
| attrs=attrs or {}, | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Find project configuration files
fd -HI 'pyproject\.toml|setup\.cfg|setup\.py|requirements.*\.txt|environment\.yml|pixi\.toml' . -type fRepository: flixOpt/flixopt
Length of output: 229
🏁 Script executed:
# Check the actual combine_slices implementation and surrounding context
cat -n flixopt/clustering/base.py | head -120Repository: flixOpt/flixopt
Length of output: 5161
🏁 Script executed:
# Search for Python version requirements and dependency pins
rg -n "requires-python|python_requires|Programming Language :: Python ::|xarray|numpy" . --type toml --type cfg --type txt -A 2Repository: flixOpt/flixopt
Length of output: 86
🏁 Script executed:
# Find config files
find . -maxdepth 2 -name "pyproject.toml" -o -name "setup.py" -o -name "setup.cfg" -o -name "requirements*.txt" -o -name "pixi.toml" 2>/dev/nullRepository: flixOpt/flixopt
Length of output: 75
🏁 Script executed:
# Search for Python version and dependency information
rg "requires-python|python_requires|Programming Language :: Python" . -A 2Repository: flixOpt/flixopt
Length of output: 537
🏁 Script executed:
# Look for tests of combine_slices
rg -n "combine_slices" . -B 2 -A 5Repository: flixOpt/flixopt
Length of output: 9577
🏁 Script executed:
# Get full test class for combine_slices
sed -n '1398,1500p' ./tests/test_cluster_reduce_expand.pyRepository: flixOpt/flixopt
Length of output: 2955
🏁 Script executed:
# Check if there are more tests beyond line 1500
sed -n '1500,1600p' ./tests/test_cluster_reduce_expand.pyRepository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Check xarray and numpy versions in pyproject.toml
cat ./pyproject.toml | grep -A 20 "dependencies"Repository: flixOpt/flixopt
Length of output: 1738
🏁 Script executed:
# Check how combine_slices is actually used in the codebase
rg -B 5 -A 10 "combine_slices\(" ./flixopt --type py | head -60Repository: flixOpt/flixopt
Length of output: 2088
🏁 Script executed:
# Check if there are any integer or boolean dtype usages in the codebase related to clustering
rg "dtype.*int|dtype.*bool" ./flixopt/clustering -A 2 -B 2Repository: flixOpt/flixopt
Length of output: 449
🏁 Script executed:
# Check if any code passes integer arrays to combine_slices or similar
rg "_build_property_array|combine_slices" ./flixopt/clustering -B 3 -A 15 | grep -A 15 "np.array"Repository: flixOpt/flixopt
Length of output: 5321
🏁 Script executed:
# Check test file for any integer array tests
grep -n "np.array.*\[.*[0-9]" ./tests/test_cluster_reduce_expand.py | grep -v "1.0\|2.0\|3.0\|4.0\|5.0\|6.0\|7.0\|8.0\|9.0\|10\|11\|12" | head -20Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Check if there are tests for edge cases like empty dict or missing keys
rg -A 5 "empty|missing|KeyError|StopIteration" ./tests/test_cluster_reduce_expand.py | grep -A 5 "TestCombineSlices"Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Look at _build_property_array to see what arrays it creates
rg -B 5 -A 20 "_build_property_array" ./flixopt/clustering/base.py | head -100Repository: flixOpt/flixopt
Length of output: 3636
🏁 Script executed:
# Check the _cluster_occurrences function to confirm it returns integers
rg -B 2 -A 10 "_cluster_occurrences" ./flixopt/clustering/base.pyRepository: flixOpt/flixopt
Length of output: 1035
🏁 Script executed:
# Get the full _build_property_array implementation
rg -A 30 "def _build_property_array" ./flixopt/clustering/base.pyRepository: flixOpt/flixopt
Length of output: 1433
🏁 Script executed:
# Check what happens with integer arrays - specifically look for cluster_assignments and similar
rg "cluster_assignments|cluster_occurrences" ./flixopt/clustering/base.py -B 2 -A 8 | grep -A 8 "_build_property_array"Repository: flixOpt/flixopt
Length of output: 673
🏁 Script executed:
# Check if combine_slices is actually imported or called in the codebase
rg "from.*combine_slices|import.*combine_slices" ./flixopt --type pyRepository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Search for actual calls to combine_slices (not just definitions)
rg "combine_slices\(" ./flixopt --type py -B 3 | grep -v "def combine_slices"Repository: flixOpt/flixopt
Length of output: 440
🏁 Script executed:
# Verify what _cluster_occurrences returns - is it definitely integers?
python3 << 'EOF'
from collections import Counter
import numpy as np
counts = Counter([1, 1, 2, 3, 3, 3])
result = np.array([counts.get(i, 0) for i in range(4)])
print(f"Type: {type(result[0])}")
print(f"Dtype: {result.dtype}")
print(f"Values: {result}")
EOFRepository: flixOpt/flixopt
Length of output: 145
combine_slices: preserve dtype and add input validation.
The function uses np.empty(shape) which defaults to float64, causing integer and boolean arrays to lose their dtype. Additionally, next(iter(slices.values())) will raise StopIteration if the dict is empty, and missing keys will surface with an unhelpful KeyError.
Suggested fixes:
- Preserve dtype:
data = np.empty(shape, dtype=first.dtype)wherefirst = next(iter(slices.values())) - Validate non-empty input:
if not slices: raise ValueError("slices cannot be empty") - Enhance key lookup errors with context about which key was missing
Note: zip(..., strict=True) is valid—the project requires Python ≥3.11, which supports this since Python 3.10.
🤖 Prompt for AI Agents
In `@flixopt/clustering/base.py` around lines 52 - 108, combine_slices currently
can break dtypes and crash silently on empty input and missing keys; fix by
validating input non-empty at the top (if not slices: raise ValueError("slices
cannot be empty")), capture the first array via first =
next(iter(slices.values())) and use its dtype when allocating the output buffer
(data = np.empty(shape, dtype=first.dtype)), and replace the raw dict lookup
inside the loop with a try/except that raises a clearer KeyError including the
missing key and the extra_dims (e.g., try: data[...] = slices[key] except
KeyError: raise KeyError(f"Missing slice for key {key}
(extra_dims={extra_dims})")).
| # Variable categories for segment expansion handling | ||
| # Populated when model is built, used by transform.expand() | ||
| self._variable_categories: dict[str, VariableCategory] = {} | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
rg -n --hidden --no-ignore -S "_variable_categories" .Repository: flixOpt/flixopt
Length of output: 645
🏁 Script executed:
# Examine the from_dataset restore at lines 920-930
sed -n '920,930p' flixopt/flow_system.py
# Examine to_dataset serialization at lines 745-765
sed -n '745,765p' flixopt/flow_system.py
# Examine solve() copy at lines 1630-1640
sed -n '1630,1640p' flixopt/flow_system.py
# Look for invalidate/reset methods
rg -n "(invalidate|reset|clear)" flixopt/flow_system.py | head -20Repository: flixOpt/flixopt
Length of output: 3312
🏁 Script executed:
# Examine _invalidate_model() method to see if it clears _variable_categories
sed -n '1674,1710p' flixopt/flow_system.py
# Also check reset() method
rg -n "def reset" flixopt/flow_system.py -A 15Repository: flixOpt/flixopt
Length of output: 2164
🏁 Script executed:
# Get the full reset() method implementation
sed -n '1697,1730p' flixopt/flow_system.py
# Also check if VariableCategory enum can raise ValueError
rg -n "class VariableCategory" . -A 10Repository: flixOpt/flixopt
Length of output: 2103
🏁 Script executed:
# Examine the full from_dataset context around lines 920-925 to understand error handling
sed -n '900,935p' flixopt/flow_system.py
# Check if VariableCategory enum has specific values defined
sed -n '45,100p' flixopt/structure.pyRepository: flixOpt/flixopt
Length of output: 4034
Clear _variable_categories on model invalidation and harden enum restore against unknown values.
The _variable_categories mapping is not cleared by _invalidate_model() or reset(), allowing stale categories to persist across structural changes and cause incorrect behavior in segment expansion. Additionally, VariableCategory(value) raises ValueError on unknown/renamed enum values, making deserialization brittle across enum evolution.
Hardening for enum restore with graceful fallback and cleanup
- if 'variable_categories' in reference_structure:
- categories_dict = json.loads(reference_structure['variable_categories'])
- # Convert string values back to VariableCategory enum
- flow_system._variable_categories = {
- name: VariableCategory(value) for name, value in categories_dict.items()
- }
+ if 'variable_categories' in reference_structure:
+ categories_dict = json.loads(reference_structure['variable_categories'])
+ restored: dict[str, VariableCategory] = {}
+ for name, value in categories_dict.items():
+ try:
+ restored[name] = VariableCategory(value)
+ except ValueError:
+ logger.warning("Unknown VariableCategory value %r for variable %r; skipping.", value, name)
+ flow_system._variable_categories = restoredAnd in _invalidate_model() add:
self._flow_carriers = None # Invalidate flow-to-carrier mapping
+ self._variable_categories = {} # Clear stale variable category mappings🤖 Prompt for AI Agents
In `@flixopt/flow_system.py` around lines 259 - 262, Clear the stale
_variable_categories mapping when the model is invalidated and when reset() is
called: add self._variable_categories.clear() (or reassign {}) inside
_invalidate_model() and reset() so segment expansion (transform.expand) can’t
reuse old categories; and harden enum restoration by wrapping
VariableCategory(value) in a try/except ValueError (or use a safe lookup) to map
unknown/renamed values to a fallback enum (e.g., VariableCategory.UNKNOWN or a
default) instead of raising, applying this safe restore wherever serialized
values are converted back to VariableCategory.
| # Get multi-dimensional properties from Clustering | ||
| timestep_mapping = clustering.timestep_mapping | ||
| segment_assignments = clustering.results.segment_assignments | ||
| segment_durations = clustering.results.segment_durations | ||
| position_within_segment = clustering.results.position_within_segment | ||
|
|
||
| # Get start and end boundary values | ||
| start_val = charge_state_data[typical_cluster_idx, seg_idx] | ||
| end_val = charge_state_data[typical_cluster_idx, seg_idx + 1] | ||
| # Decode timestep_mapping into cluster and time indices | ||
| time_dim_size = clustering.timesteps_per_cluster | ||
| cluster_indices = timestep_mapping // time_dim_size | ||
| time_indices = timestep_mapping % time_dim_size | ||
|
|
||
| # Linear interpolation | ||
| result[orig_cluster_idx * timesteps_per_cluster + t] = start_val + (end_val - start_val) * factor | ||
| # Get segment index and position for each original timestep | ||
| seg_indices = segment_assignments.isel(cluster=cluster_indices, time=time_indices) | ||
| positions = position_within_segment.isel(cluster=cluster_indices, time=time_indices) | ||
| durations = segment_durations.isel(cluster=cluster_indices, segment=seg_indices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's find the file and examine the method in question
fd "transform_accessor.py" -type fRepository: flixOpt/flixopt
Length of output: 229
🏁 Script executed:
# Check the actual _interpolate_charge_state_segmented method
sed -n '1821,1877p' flixopt/transform_accessor.pyRepository: flixOpt/flixopt
Length of output: 3019
🏁 Script executed:
# Find where timestep_mapping is built/assigned
rg -n "timestep_mapping\s*=" --type py -B 2 -A 2 | head -80Repository: flixOpt/flixopt
Length of output: 1229
🏁 Script executed:
# Look for build_expansion_divisor to see the actual encoding logic
rg -n "def build_expansion_divisor" --type py -A 20Repository: flixOpt/flixopt
Length of output: 1738
🏁 Script executed:
# Check expand_data method
rg -n "def expand_data" --type py -A 25 | head -60Repository: flixOpt/flixopt
Length of output: 1934
🏁 Script executed:
# Examine segment_assignments definition and dimensions
rg -n "segment_assignments" --type py -B 1 -A 3 | head -120Repository: flixOpt/flixopt
Length of output: 8202
🏁 Script executed:
# Look for how timestep_mapping is created initially
rg -n "self.timestep_mapping\s*=" --type py -B 5 -A 5Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Check position_within_segment definition
rg -n "position_within_segment" --type py -B 1 -A 3 | head -80Repository: flixOpt/flixopt
Length of output: 2006
🏁 Script executed:
# Look at the full expand_data implementation, especially the segmented case
sed -n '877,920p' flixopt/clustering/base.pyRepository: flixOpt/flixopt
Length of output: 2161
🏁 Script executed:
# Find where timestep_mapping is actually created/built
sed -n '100,150p' flixopt/clustering/base.pyRepository: flixOpt/flixopt
Length of output: 2365
🏁 Script executed:
# Check the full build_expansion_divisor to see how it uses timestep_mapping
sed -n '924,980p' flixopt/clustering/base.pyRepository: flixOpt/flixopt
Length of output: 2670
🏁 Script executed:
# Let me trace through the actual usage context - check how _interpolate_charge_state_segmented is called
rg -n "_interpolate_charge_state_segmented" --type-add 'py:*.py' -B 5 -A 5 | head -80Repository: flixOpt/flixopt
Length of output: 1757
🏁 Script executed:
# Check what is_segmented actually means and how _interpolate_charge_state_segmented relates
sed -n '2030,2045p' flixopt/transform_accessor.pyRepository: flixOpt/flixopt
Length of output: 826
🏁 Script executed:
# Let me verify the structure of segment_assignments more carefully
sed -n '367,385p' flixopt/clustering/base.pyRepository: flixOpt/flixopt
Length of output: 827
🏁 Script executed:
# Check position_within_segment to understand its role
sed -n '422,475p' flixopt/clustering/base.pyRepository: flixOpt/flixopt
Length of output: 2163
Fix timestep_mapping decoding for segmented systems at line 1851.
The code unconditionally uses clustering.timesteps_per_cluster to decode timestep_mapping, but for segmented systems the mapping is encoded with n_segments (as confirmed by expand_data() and build_expansion_divisor()). This produces incorrect indices.
Since this method is only called for segmented systems (line 2039: if is_state and clustering.is_segmented), the decoding will always fail. The decoded time_indices will be segment indices (0..n_segments-1) rather than within-cluster time indices (0..timesteps_per_cluster-1), causing a dimension mismatch when indexing segment_assignments and position_within_segment.
Change line 1851 to:
if clustering.is_segmented and clustering.n_segments is not None:
time_dim_size = clustering.n_segments
else:
time_dim_size = clustering.timesteps_per_clusterThis aligns with the pattern used in expand_data() (lines 902-905) and build_expansion_divisor() (line 962).
🤖 Prompt for AI Agents
In `@flixopt/transform_accessor.py` around lines 1844 - 1858, The timestep_mapping
decode wrongly always uses clustering.timesteps_per_cluster; for segmented
systems use clustering.n_segments instead. In the block that computes
time_dim_size (around where timestep_mapping, cluster_indices, and time_indices
are computed), set time_dim_size = clustering.n_segments when
clustering.is_segmented and clustering.n_segments is not None, otherwise use
clustering.timesteps_per_cluster so that subsequent indexing into
segment_assignments and position_within_segment uses the correct divisor
(matching the logic in expand_data() and build_expansion_divisor()).
1. clustering/base.py - combine_slices() hardening (lines 52-118)
- Added validation for empty input: if not slices: raise ValueError("slices cannot be empty")
- Capture first array and preserve dtype: first = next(iter(slices.values())) → np.empty(shape, dtype=first.dtype)
- Clearer error on missing keys with try/except: raise KeyError(f"Missing slice for key {key} (extra_dims={extra_dims})")
2. flow_system.py - Variable categories cleanup and safe enum restoration
- Added self._variable_categories.clear() in _invalidate_model() (line 1692) to prevent stale categories from being reused
- Hardened VariableCategory restoration (lines 922-930) with try/except to handle unknown/renamed enum values gracefully with a warning instead of crashing
3. transform_accessor.py - Correct timestep_mapping decode for segmented systems (lines 1850-1857)
- For segmented systems, now uses clustering.n_segments instead of clustering.timesteps_per_cluster as the divisor
- This matches the encoding logic in expand_data() and build_expansion_divisor()
…+variable-categories
…sion+variable-categories # Conflicts: # flixopt/structure.py
modeling.py:200-242 - Added category: VariableCategory = None parameter and passed it to both add_variables calls. Updated Callers ┌─────────────┬──────┬─────────────────────────┬────────────────────┐ │ File │ Line │ Variable │ Category │ ├─────────────┼──────┼─────────────────────────┼────────────────────┤ │ features.py │ 208 │ active_hours │ TOTAL │ ├─────────────┼──────┼─────────────────────────┼────────────────────┤ │ elements.py │ 682 │ total_flow_hours │ TOTAL │ ├─────────────┼──────┼─────────────────────────┼────────────────────┤ │ elements.py │ 709 │ flow_hours_over_periods │ TOTAL_OVER_PERIODS │ └─────────────┴──────┴─────────────────────────┴────────────────────┘ All expression tracking variables now properly register their categories for segment expansion handling. The pattern is consistent: callers specify the appropriate category based on what the tracked expression represents.
variable_categories property (line 1672): @Property def variable_categories(self) -> dict[str, VariableCategory]: """Variable categories for filtering and segment expansion.""" return self._variable_categories get_variables_by_category() method (line 1681): def get_variables_by_category( self, *categories: VariableCategory, from_solution: bool = True ) -> list[str]: """Get variable names matching any of the specified categories.""" Updated in statistics_accessor.py ┌───────────────┬──────────────────────────────────────────┬──────────────────────────────────────────────────┐ │ Property │ Before │ After │ ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤ │ flow_rates │ endswith('|flow_rate') │ get_variables_by_category(FLOW_RATE) │ ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤ │ flow_sizes │ endswith('|size') + flow_labels check │ get_variables_by_category(SIZE) + flow_labels │ ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤ │ storage_sizes │ endswith('|size') + storage_labels check │ get_variables_by_category(SIZE) + storage_labels │ ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤ │ charge_states │ endswith('|charge_state') │ get_variables_by_category(CHARGE_STATE) │ └───────────────┴──────────────────────────────────────────┴──────────────────────────────────────────────────┘ Benefits 1. Single source of truth - Categories defined once in VariableCategory enum 2. Easier maintenance - Adding new variable types only requires updating one place 3. Type safety - Using enum values instead of magic strings 4. Flexible filtering - Can filter by multiple categories: get_variables_by_category(SIZE, INVESTED) 5. Consistent naming - Uses rsplit('|', 1)[0] instead of replace('|suffix', '') for label extraction
1. New SIZE Sub-Categories (structure.py)
- Added FLOW_SIZE and STORAGE_SIZE to differentiate flow vs storage investments
- Kept SIZE for backward compatibility
2. InvestmentModel Updated (features.py)
- Added size_category parameter to InvestmentModel.__init__()
- Callers now specify the appropriate category
3. Variable Registrations Updated
- elements.py: FlowModel uses FLOW_SIZE
- components.py: StorageModel uses STORAGE_SIZE (2 locations)
4. Statistics Accessor Simplified (statistics_accessor.py)
- flow_sizes: Now uses get_variables_by_category(FLOW_SIZE) directly
- storage_sizes: Now uses get_variables_by_category(STORAGE_SIZE) directly
- No more filtering by element labels after getting SIZE variables
5. Backward-Compatible Fallback (flow_system.py)
- get_variables_by_category() handles old files:
- FLOW_SIZE → matches |size suffix + flow labels
- STORAGE_SIZE → matches |size suffix + storage labels
6. SOC Boundary Pattern Matching Replaced (transform_accessor.py)
- Changed from endswith('|SOC_boundary') to get_variables_by_category(SOC_BOUNDARY)
7. Effect Variables Verified
- PER_TIMESTEP ✓ (features.py:659)
- SHARE ✓ (features.py:700 for temporal shares)
- TOTAL / TOTAL_OVER_PERIODS ✓ (multiple locations)
8. Documentation Updated
- _build_segment_total_varnames() marked as backwards-compatibility fallback
Benefits
- Cleaner code: No more string manipulation to filter by element type
- Type safety: Using enum values instead of magic strings
- Single source of truth: Categories defined once, used everywhere
- Backward compatible: Old files still work via fallback logic
Description
Brief description of the changes in this PR.
Type of Change
Related Issues
Closes #(issue number)
Testing
Checklist
Summary by CodeRabbit
Release Notes
New Features
Refactor
Tests
✏️ Tip: You can customize this high-level summary in your review settings.