Skip to content

Conversation

@FBumann
Copy link
Member

@FBumann FBumann commented Jan 14, 2026

Summary

Fixes incorrect effect totals when expanding segmented clustered FlowSystems, and introduces a robust variable categorization system to handle different expansion behaviors.

Problem

When using cluster() with segmentation (SegmentConfig), each segment represents multiple hours with variable durations (e.g., 3h, 4h, 5h). During expand():

  • Before: Segment values (which are per-segment totals) were repeated N times when expanded to hourly resolution
  • Result: Summing {effect}(temporal)|per_timestep gave ~4x the actual solution total

Minimal Reproduction

import numpy as np
import pandas as pd
import flixopt as fx
from tsam import SegmentConfig

# 3 days hourly
timesteps = pd.date_range('2024-01-01', periods=72, freq='h')
fs = fx.FlowSystem(timesteps=timesteps)

# Single effect + minimal components
fs.add_elements(fx.Effect('Cost', '€', is_objective=True))
fs.add_elements(fx.Bus('Heat'))
fs.add_elements(fx.Source('Boiler', outputs=[fx.Flow('Q', bus='Heat', size=100, effects_per_flow_hour={'Cost': 50})]))
fs.add_elements(fx.Sink('Demand', inputs=[fx.Flow('Q', bus='Heat', size=50, fixed_relative_profile=np.tile([0.5, 1], 36))]))

# Cluster with segments -> solve -> expand
fs_clustered = fs.transform.cluster(n_clusters=2, cluster_duration='1D', segments=SegmentConfig(n_segments=4))
fs_clustered.optimize(fx.solvers.HighsSolver())
fs_expanded = fs_clustered.transform.expand()

# Validate
computed = fs_expanded.statistics.total_effects['Cost'].sum('contributor')
expected = fs_expanded.solution['Cost']
match = np.allclose(computed.values, expected.values, rtol=1e-5)
print(f'Cost: {"PASS" if match else "FAIL"} (computed={float(computed):.2f}, expected={float(expected):.2f})')

Before fix: FAIL (computed=360000.00, expected=90000.00) - 4x inflation
After fix: PASS (computed=90000.00, expected=90000.00)

Solution

1. Variable Categorization System

Added VariableCategory enum to classify optimization variables by semantic meaning and expansion behavior:

Category Group Categories Expansion Behavior
State CHARGE_STATE, SOC_BOUNDARY Interpolate between segment boundaries
Rate/Power FLOW_RATE, NETTO_DISCHARGE, VIRTUAL_FLOW Repeat within segments
Binary STATUS, INACTIVE, STARTUP, SHUTDOWN Repeat within segments
Effects PER_TIMESTEP, SHARE, TOTAL, TOTAL_OVER_PERIODS Divide by expansion factor
Investment SIZE, FLOW_SIZE, STORAGE_SIZE, INVESTED Non-temporal

Variables are categorized at creation time via add_variables(category=...). Backwards compatibility maintained via fallback pattern matching for older FlowSystems.

2. Expansion Divisor

  • Clustering.build_expansion_divisor(): Returns segment duration for each original timestep, handling per-period/scenario clustering
  • _build_segment_total_varnames(): Derives exact variable names from FlowSystem structure (not pattern matching)

3. Charge State Interpolation

For segmented systems, charge_state values at segment boundaries are interpolated within segments to show actual charge trajectory.

4. Code Quality Improvements

  • combine_slices() utility: Combines {(dim_values): np.ndarray} dicts into DataArrays with proper dtype preservation
  • Vectorized operations: expand_data(), build_expansion_divisor(), _interpolate_charge_state_segmented() use xarray's vectorized advanced indexing
  • Statistics accessor: Uses get_variables_by_category() instead of string suffix matching

Type of Change

  • Bug fix
  • New feature
  • Code refactoring

Testing

  • All 130+ tests pass (cluster/expand + IO tests)
  • Added test_segmented_total_effects_match_solution
  • Added 4 unit tests for combine_slices()
  • Validated with minimal reproduction script

Key Files Changed

File Changes
structure.py VariableCategory enum and EXPAND_INTERPOLATE/EXPAND_DIVIDE groupings
clustering/base.py build_expansion_divisor(), combine_slices(), vectorized property builders
transform_accessor.py Segment correction in expand(), charge state interpolation
flow_system.py variable_categories property, get_variables_by_category()
components.py, elements.py, features.py, effects.py Variable registrations with categories
statistics_accessor.py Uses category-based variable lookup

…ms, the effect totals now match correctly.

  Root Cause

  Segment values are per-segment TOTALS that were repeated N times when expanded to hourly resolution (where N = segment duration in timesteps). Summing these repeated values inflated totals by ~4x.

  Fix Applied

  1. Added build_expansion_divisor() to Clustering class (flixopt/clustering/base.py:920-1027)
    - For each original timestep, returns the segment duration (number of timesteps in that segment)
    - Handles multi-dimensional cases (periods/scenarios) by accessing each clustering result's segment info
  2. Modified expand() method (flixopt/transform_accessor.py:1850-1875)
    - Added _is_segment_total_var() helper to identify which variables should be divided
    - For segmented systems, divides segment total variables by the expansion divisor to get correct hourly rates
    - Correctly excludes:
        - Share factors (stored as EffectA|(temporal)->EffectB(temporal)) - these are rates, not totals
      - Flow rates, on/off states, charge states - these are already rates

  Test Results

  - All 83 cluster/expand tests pass
  - All 27 effect tests pass
  - Debug script shows all ratios are 1.0000x for all effects (EffectA, EffectB, EffectC, EffectD) across all periods and scenarios
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 14, 2026

📝 Walkthrough

Walkthrough

The PR adds segmented expansion functionality to the clustering system. A new build_expansion_divisor() method computes divisors for converting expanded hourly totals back to per-hour rates. The transform_accessor is updated to identify segment-total variables and apply these divisors during solution expansion for segmented FlowSystems.

Changes

Cohort / File(s) Summary
Core Segmented Clustering Support
flixopt/clustering/base.py
Added build_expansion_divisor() method that computes expansion divisors for segmented clustering by combining segment_durations and segment_assignments, supporting both simple and multi-dimensional (period/scenario) coordinate cases.
Segmented Expansion Integration
flixopt/transform_accessor.py
Added _build_segment_total_varnames() helper to identify segment-total variables from FlowSystem structure; updated expand() to calculate expansion_divisor for segmented systems; modified expand_da() to accept is_solution flag and apply segment-total correction; integrated divisor application into solution expansion phase. Removed cluster count logging.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant TransformAccessor
    participant Clustering
    participant FlowSystem
    participant DataArray

    User->>TransformAccessor: expand(segmented=True)
    TransformAccessor->>FlowSystem: identify segment structure
    TransformAccessor->>Clustering: build_expansion_divisor()
    Clustering->>Clustering: combine segment_durations & segment_assignments
    Clustering-->>TransformAccessor: expansion_divisor (DataArray)
    TransformAccessor->>TransformAccessor: _build_segment_total_varnames()
    TransformAccessor-->>TransformAccessor: segment_total_vars (set)
    loop for each variable
        TransformAccessor->>DataArray: expand_da(is_solution=True)
        alt is segment_total_var
            DataArray->>DataArray: divide by expansion_divisor
        else
            DataArray->>DataArray: standard expansion
        end
        DataArray-->>TransformAccessor: expanded_variable
    end
    TransformAccessor-->>User: expanded_solution
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

A clustering scheme now knows the way,
To segment time through night and day.
With divisors built and variables named,
The expansion flows—precision reclaimed! 🐰✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: fixing expansion of segmented clustered systems, which is the primary purpose of this PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The pull request description comprehensively covers all required template sections including a detailed summary of changes, problem statement with minimal reproduction code, solution explanation, type of change, testing details, and key files changed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…lution:

  Key Changes

  1. build_expansion_divisor() in Clustering (base.py:920-1027)
  - Returns the segment duration for each original timestep
  - Handles per-period/scenario clustering differences

  2. _is_segment_total_solution_var() in expand() (transform_accessor.py:1855-1880)
  - Only matches solution variables that represent segment totals:
    - {contributor}->{effect}(temporal) - effect contributions
    - *|per_timestep - per-timestep totals
  - Explicitly does NOT match rates/states: |flow_rate, |on, |charge_state

  3. expand_da() with is_solution parameter (transform_accessor.py:1882-1915)
  - is_solution=False (default): Never applies segment correction (for FlowSystem data)
  - is_solution=True: Applies segment correction if pattern matches (for solution)

  Why This is Robust
  ┌───────────────────────────────────────┬─────────────────┬────────────────────┬───────────────────────────┐
  │               Variable                │    Location     │      Pattern       │         Divided?          │
  ├───────────────────────────────────────┼─────────────────┼────────────────────┼───────────────────────────┤
  │ EffectA|(temporal)->EffectB(temporal) │ FlowSystem DATA │ share factor       │ ❌ No (is_solution=False) │
  ├───────────────────────────────────────┼─────────────────┼────────────────────┼───────────────────────────┤
  │ Boiler(Q)->EffectA(temporal)          │ SOLUTION        │ contribution       │ ✅ Yes                    │
  ├───────────────────────────────────────┼─────────────────┼────────────────────┼───────────────────────────┤
  │ EffectA(temporal)->EffectB(temporal)  │ SOLUTION        │ contribution       │ ✅ Yes                    │
  ├───────────────────────────────────────┼─────────────────┼────────────────────┼───────────────────────────┤
  │ EffectA(temporal)|per_timestep        │ SOLUTION        │ per-timestep total │ ✅ Yes                    │
  ├───────────────────────────────────────┼─────────────────┼────────────────────┼───────────────────────────┤
  │ Boiler(Q)|flow_rate                   │ SOLUTION        │ rate               │ ❌ No (no pattern match)  │
  ├───────────────────────────────────────┼─────────────────┼────────────────────┼───────────────────────────┤
  │ Storage|charge_state                  │ SOLUTION        │ state              │ ❌ No (no pattern match)  │
  └───────────────────────────────────────┴─────────────────┴────────────────────┴───────────────────────────┘
@FBumann FBumann force-pushed the fix/segmentation-expansion branch from a6070c2 to c19d3ed Compare January 14, 2026 15:29
…System structure:

  Key Implementation

  _build_segment_total_varnames() (transform_accessor.py:1776-1819)
  - Derives exact variable names from FlowSystem structure
  - No pattern matching on arbitrary strings
  - Covers all contributor types:
    a. {effect}(temporal)|per_timestep - from fs.effects
    b. {flow}->{effect}(temporal) - from fs.flows
    c. {component}->{effect}(temporal) - from fs.components
    d. {source}(temporal)->{target}(temporal) - from effect.share_from_temporal

  Why This is Robust

  1. Derived from structure, not patterns: Variable names come from actual FlowSystem attributes
  2. Clear separation: FlowSystem data is NEVER divided (only solution variables)
  3. Explicit set lookup: var_name in segment_total_vars instead of pattern matching
  4. Extensible: New contributor types just need to be added to _build_segment_total_varnames()
  5. All tests pass: 83 cluster/expand tests + comprehensive debug script
@FBumann FBumann changed the title Fix/segmentation expansion Fix: Correct expansion of segmented clustered systems Jan 14, 2026
@FBumann
Copy link
Member Author

FBumann commented Jan 14, 2026

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 14, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

1 similar comment
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 14, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

  Changes Made

  1. Added VariableCategory enum (structure.py:64-77)
    - STATE - For state variables like charge_state (interpolated within segments)
    - SEGMENT_TOTAL - For segment totals like effect contributions (divided by expansion divisor)
    - RATE - For rate variables like flow_rate (expanded as-is)
    - BINARY - For binary variables like status (expanded as-is)
    - OTHER - For uncategorized variables
  2. Added variable_categories registry to FlowSystemModel (structure.py:214)
    - Dictionary mapping variable names to their categories
  3. Modified add_variables() method (structure.py:388-396)
    - Added optional category parameter
    - Automatically registers variables with their category
  4. Updated variable creation calls:
    - components.py: Storage variables (charge_state as STATE, netto_discharge as RATE)
    - elements.py: Flow variables (flow_rate as RATE, status as BINARY)
    - features.py: Effect contributions (per_timestep as SEGMENT_TOTAL, temporal shares as SEGMENT_TOTAL, startup/shutdown as BINARY)
  5. Updated expand() method (transform_accessor.py:2074-2090)
    - Uses variable_categories registry to identify segment totals and state variables
    - Falls back to pattern matching for backwards compatibility with older FlowSystems

  Benefits

  - More robust categorization: Variables are categorized at creation time, not by pattern matching
  - Extensible: New variable types can easily be added with proper category
  - Backwards compatible: Old FlowSystems without categories still work via pattern matching fallback
  New Categories (structure.py:45-103)

  class VariableCategory(Enum):
      # State variables
      CHARGE_STATE, SOC_BOUNDARY

      # Rate/Power variables
      FLOW_RATE, NETTO_DISCHARGE, VIRTUAL_FLOW

      # Binary state
      STATUS, INACTIVE

      # Binary events
      STARTUP, SHUTDOWN

      # Effect variables
      PER_TIMESTEP, SHARE, TOTAL, TOTAL_OVER_PERIODS

      # Investment
      SIZE, INVESTED

      # Counting/Duration
      STARTUP_COUNT, DURATION

      # Piecewise linearization
      INSIDE_PIECE, LAMBDA0, LAMBDA1, ZERO_POINT

      # Other
      OTHER

  Logical Groupings for Expansion

  EXPAND_INTERPOLATE = {CHARGE_STATE}      # Interpolate between boundaries
  EXPAND_DIVIDE = {PER_TIMESTEP, SHARE}    # Divide by expansion factor
  # Default: repeat within segment

  Files Modified
  ┌───────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
  │         File          │                                                          Variables Updated                                                          │
  ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ components.py         │ charge_state, netto_discharge, SOC_boundary                                                                                         │
  ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ elements.py           │ flow_rate, status, virtual_supply, virtual_demand                                                                                   │
  ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ features.py           │ size, invested, inactive, startup, shutdown, startup_count, inside_piece, lambda0, lambda1, zero_point, total, per_timestep, shares │
  ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ effects.py            │ total, total_over_periods                                                                                                           │
  ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ modeling.py           │ duration                                                                                                                            │
  ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ transform_accessor.py │ Updated to use EXPAND_INTERPOLATE and EXPAND_DIVIDE groupings                                                                       │
  └───────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
  Test Results

  - All 83 cluster/expand tests pass
  - Variable categories correctly populated and grouped
  Changes Made

  1. Added combine_slices() utility to flixopt/clustering/base.py (lines 52-107)
    - Simple function that stacks dict of {(dim_values): np.ndarray} into a DataArray
    - Much cleaner than the previous reverse-concat pattern
  2. Refactored 3 methods to use the new utility:
    - Clustering.expand_data() - reduced from ~25 to ~12 lines
    - Clustering.build_expansion_divisor() - reduced from ~35 to ~20 lines
    - TransformAccessor._interpolate_charge_state_segmented() - reduced from ~43 to ~27 lines
  3. Added 4 unit tests for combine_slices() in tests/test_cluster_reduce_expand.py

  Results
  ┌───────────────────────────────────┬──────────┬────────────────────────┐
  │              Metric               │  Before  │         After          │
  ├───────────────────────────────────┼──────────┼────────────────────────┤
  │ Complex reverse-concat blocks     │ 3        │ 0                      │
  ├───────────────────────────────────┼──────────┼────────────────────────┤
  │ Lines of dimension iteration code │ ~100     │ ~60                    │
  ├───────────────────────────────────┼──────────┼────────────────────────┤
  │ Test coverage                     │ 83 tests │ 87 tests (all passing) │
  └───────────────────────────────────┴──────────┴────────────────────────┘
  The Pattern Change

  Before (complex reverse-concat):
  result_arrays = slices
  for dim in reversed(extra_dims):
      grouped = {}
      for key, arr in result_arrays.items():
          rest_key = key[:-1] if len(key) > 1 else ()
          grouped.setdefault(rest_key, []).append(arr)
      result_arrays = {k: xr.concat(v, dim=...) for k, v in grouped.items()}
  result = list(result_arrays.values())[0].transpose('time', ...)

  After (simple combine):
  return combine_slices(slices, extra_dims, dim_coords, 'time', output_coord, attrs)
  1. Fully Vectorized expand_data()

  Before (~65 lines with loops):
  for combo in np.ndindex(*[len(v) for v in dim_coords.values()]):
      selector = {...}
      mapping = _select_dims(timestep_mapping, **selector).values
      data_slice = _select_dims(aggregated, **selector)
      slices[key] = _expand_slice(mapping, data_slice)
  return combine_slices(slices, ...)

  After (~25 lines, fully vectorized):
  timestep_mapping = self.timestep_mapping  # Already multi-dimensional!
  cluster_indices = timestep_mapping // time_dim_size
  time_indices = timestep_mapping % time_dim_size
  expanded = aggregated.isel(cluster=cluster_indices, time=time_indices)
  # xarray handles broadcasting across period/scenario automatically

  2. build_expansion_divisor() and _interpolate_charge_state_segmented()

  These still use combine_slices() because they need per-result segment data (segment_assignments, segment_durations) which isn't available as concatenated Clustering properties yet.

  Current State
  ┌───────────────────────────────────────┬─────────────────┬─────────────────────────────────┐
  │                Method                 │   Vectorized?   │   Uses Clustering Properties    │
  ├───────────────────────────────────────┼─────────────────┼─────────────────────────────────┤
  │ expand_data()                         │ Yes             │ timestep_mapping (fully)        │
  ├───────────────────────────────────────┼─────────────────┼─────────────────────────────────┤
  │ build_expansion_divisor()             │ No (small loop) │ cluster_assignments (partially) │
  ├───────────────────────────────────────┼─────────────────┼─────────────────────────────────┤
  │ _interpolate_charge_state_segmented() │ No (small loop) │ cluster_assignments (partially) │
  └───────────────────────────────────────┴─────────────────┴─────────────────────────────────┘
  1. _interpolate_charge_state_segmented() - Fully vectorized from ~110 lines to ~55 lines
    - Uses clustering.timestep_mapping for indexing
    - Uses clustering.results.segment_assignments, segment_durations, and position_within_segment
    - Single xarray expression instead of triple-nested loops

  Previously completed (from before context limit):
  - Added segment_assignments multi-dimensional property to ClusteringResults
  - Added segment_durations multi-dimensional property to ClusteringResults
  - Added position_within_segment property to ClusteringResults
  - Vectorized expand_data()
  - Vectorized build_expansion_divisor()

  Test results: All 130 tests pass (87 cluster/expand + 43 IO tests)

  The combine_slices utility function is still available in clustering/base.py if needed in the future, but all the main dimension-handling methods now use xarray's vectorized advanced indexing instead of the loop-based slice-and-combine pattern.
  Summary of Simplifications

  1. expand_da() in transform_accessor.py

  - Extracted duplicate "append extra timestep" logic into _append_final_state() helper
  - Reduced from ~50 lines to ~25 lines
  - Eliminated code duplication

  2. _build_multi_dim_array() → _build_property_array() in clustering/base.py

  - Replaced 6 conditional branches with unified np.ndindex() pattern
  - Now handles both simple and multi-dimensional cases in one method
  - Reduced from ~50 lines to ~25 lines
  - Preserves dtype (fixed integer indexing bug)

  3. Property boilerplate in ClusteringResults

  - 5 properties (cluster_assignments, cluster_occurrences, cluster_centers, segment_assignments, segment_durations) now use the unified _build_property_array()
  - Each property reduced from ~25 lines to ~8 lines
  - Total: ~165 lines → ~85 lines

  4. _build_timestep_mapping() in Clustering

  - Simplified to single call using _build_property_array()
  - Reduced from ~16 lines to ~9 lines

  Total lines removed: ~150+ lines of duplicated/complex code
…as fixed:

  Summary

  The IO roundtrip bug was caused by representative_weights (a variable with only ('cluster',) dimension) being copied as-is during expansion, which caused the cluster dimension to incorrectly persist in the expanded dataset.

  Fix applied in transform_accessor.py:2063-2065:
  # Skip cluster-only vars (no time dim) - they don't make sense after expansion
  if da.dims == ('cluster',):
      continue

  This skips variables that have only a cluster dimension (and no time dimension) during expansion, as these variables don't make sense after the clustering structure is removed.

  Test results:
  - All 87 tests in test_cluster_reduce_expand.py pass ✓
  - All 43 tests in test_clustering_io.py pass ✓
  - Manual IO roundtrip test passes ✓
  - Tests with different segment counts (3, 6) pass ✓
  - Tests with 2-hour timesteps pass ✓
  # Skip vars with cluster dim but no time dim - they don't make sense after expansion
  # (e.g., representative_weights with dims ('cluster',) or ('cluster', 'period'))
  if 'cluster' in da.dims and 'time' not in da.dims:
      continue

  This correctly handles:
  - ('cluster',) - simple cluster-only variables like cluster_weight
  - ('cluster', 'period') - cluster variables with period dimension
  - ('cluster', 'scenario') - cluster variables with scenario dimension
  - ('cluster', 'period', 'scenario') - cluster variables with both

  Variables with both cluster and time dimensions (like timestep_duration with dims ('cluster', 'time')) are correctly expanded since they contain time-series data that needs to be mapped back to original timesteps.
@FBumann FBumann marked this pull request as ready for review January 15, 2026 14:55
  1. clustering/base.py - combine_slices() hardening (lines 52-118)

  - Added validation for empty input: if not slices: raise ValueError("slices cannot be empty")
  - Capture first array and preserve dtype: first = next(iter(slices.values())) → np.empty(shape, dtype=first.dtype)
  - Clearer error on missing keys with try/except: raise KeyError(f"Missing slice for key {key} (extra_dims={extra_dims})")

  2. flow_system.py - Variable categories cleanup and safe enum restoration

  - Added self._variable_categories.clear() in _invalidate_model() (line 1692) to prevent stale categories from being reused
  - Hardened VariableCategory restoration (lines 922-930) with try/except to handle unknown/renamed enum values gracefully with a warning instead of crashing

  3. transform_accessor.py - Correct timestep_mapping decode for segmented systems (lines 1850-1857)

  - For segmented systems, now uses clustering.n_segments instead of clustering.timesteps_per_cluster as the divisor
  - This matches the encoding logic in expand_data() and build_expansion_divisor()
…sion+variable-categories

# Conflicts:
#	flixopt/structure.py
  modeling.py:200-242 - Added category: VariableCategory = None parameter and passed it to both add_variables calls.

  Updated Callers
  ┌─────────────┬──────┬─────────────────────────┬────────────────────┐
  │    File     │ Line │        Variable         │      Category      │
  ├─────────────┼──────┼─────────────────────────┼────────────────────┤
  │ features.py │ 208  │ active_hours            │ TOTAL              │
  ├─────────────┼──────┼─────────────────────────┼────────────────────┤
  │ elements.py │ 682  │ total_flow_hours        │ TOTAL              │
  ├─────────────┼──────┼─────────────────────────┼────────────────────┤
  │ elements.py │ 709  │ flow_hours_over_periods │ TOTAL_OVER_PERIODS │
  └─────────────┴──────┴─────────────────────────┴────────────────────┘
  All expression tracking variables now properly register their categories for segment expansion handling. The pattern is consistent: callers specify the appropriate category based on what the tracked expression represents.
  variable_categories property (line 1672):
  @Property
  def variable_categories(self) -> dict[str, VariableCategory]:
      """Variable categories for filtering and segment expansion."""
      return self._variable_categories

  get_variables_by_category() method (line 1681):
  def get_variables_by_category(
      self, *categories: VariableCategory, from_solution: bool = True
  ) -> list[str]:
      """Get variable names matching any of the specified categories."""

  Updated in statistics_accessor.py
  ┌───────────────┬──────────────────────────────────────────┬──────────────────────────────────────────────────┐
  │   Property    │                  Before                  │                      After                       │
  ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ flow_rates    │ endswith('|flow_rate')                   │ get_variables_by_category(FLOW_RATE)             │
  ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ flow_sizes    │ endswith('|size') + flow_labels check    │ get_variables_by_category(SIZE) + flow_labels    │
  ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ storage_sizes │ endswith('|size') + storage_labels check │ get_variables_by_category(SIZE) + storage_labels │
  ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ charge_states │ endswith('|charge_state')                │ get_variables_by_category(CHARGE_STATE)          │
  └───────────────┴──────────────────────────────────────────┴──────────────────────────────────────────────────┘
  Benefits

  1. Single source of truth - Categories defined once in VariableCategory enum
  2. Easier maintenance - Adding new variable types only requires updating one place
  3. Type safety - Using enum values instead of magic strings
  4. Flexible filtering - Can filter by multiple categories: get_variables_by_category(SIZE, INVESTED)
  5. Consistent naming - Uses rsplit('|', 1)[0] instead of replace('|suffix', '') for label extraction
  1. New SIZE Sub-Categories (structure.py)

  - Added FLOW_SIZE and STORAGE_SIZE to differentiate flow vs storage investments
  - Kept SIZE for backward compatibility

  2. InvestmentModel Updated (features.py)

  - Added size_category parameter to InvestmentModel.__init__()
  - Callers now specify the appropriate category

  3. Variable Registrations Updated

  - elements.py: FlowModel uses FLOW_SIZE
  - components.py: StorageModel uses STORAGE_SIZE (2 locations)

  4. Statistics Accessor Simplified (statistics_accessor.py)

  - flow_sizes: Now uses get_variables_by_category(FLOW_SIZE) directly
  - storage_sizes: Now uses get_variables_by_category(STORAGE_SIZE) directly
  - No more filtering by element labels after getting SIZE variables

  5. Backward-Compatible Fallback (flow_system.py)

  - get_variables_by_category() handles old files:
    - FLOW_SIZE → matches |size suffix + flow labels
    - STORAGE_SIZE → matches |size suffix + storage labels

  6. SOC Boundary Pattern Matching Replaced (transform_accessor.py)

  - Changed from endswith('|SOC_boundary') to get_variables_by_category(SOC_BOUNDARY)

  7. Effect Variables Verified

  - PER_TIMESTEP ✓ (features.py:659)
  - SHARE ✓ (features.py:700 for temporal shares)
  - TOTAL / TOTAL_OVER_PERIODS ✓ (multiple locations)

  8. Documentation Updated

  - _build_segment_total_varnames() marked as backwards-compatibility fallback

  Benefits

  - Cleaner code: No more string manipulation to filter by element type
  - Type safety: Using enum values instead of magic strings
  - Single source of truth: Categories defined once, used everywhere
  - Backward compatible: Old files still work via fallback logic
…+variable-categories

# Conflicts:
#	flixopt/clustering/base.py
#	flixopt/components.py
#	flixopt/structure.py
#	flixopt/transform_accessor.py
@FBumann FBumann changed the base branch from feature/tsam-v3+rework to dev January 16, 2026 12:22
@FBumann FBumann changed the base branch from dev to feature/tsam-v3+rework January 16, 2026 12:22
* Summary: Variable Registry Implementation

  Changes Made

  1. Added VariableCategory enum (structure.py:64-77)
    - STATE - For state variables like charge_state (interpolated within segments)
    - SEGMENT_TOTAL - For segment totals like effect contributions (divided by expansion divisor)
    - RATE - For rate variables like flow_rate (expanded as-is)
    - BINARY - For binary variables like status (expanded as-is)
    - OTHER - For uncategorized variables
  2. Added variable_categories registry to FlowSystemModel (structure.py:214)
    - Dictionary mapping variable names to their categories
  3. Modified add_variables() method (structure.py:388-396)
    - Added optional category parameter
    - Automatically registers variables with their category
  4. Updated variable creation calls:
    - components.py: Storage variables (charge_state as STATE, netto_discharge as RATE)
    - elements.py: Flow variables (flow_rate as RATE, status as BINARY)
    - features.py: Effect contributions (per_timestep as SEGMENT_TOTAL, temporal shares as SEGMENT_TOTAL, startup/shutdown as BINARY)
  5. Updated expand() method (transform_accessor.py:2074-2090)
    - Uses variable_categories registry to identify segment totals and state variables
    - Falls back to pattern matching for backwards compatibility with older FlowSystems

  Benefits

  - More robust categorization: Variables are categorized at creation time, not by pattern matching
  - Extensible: New variable types can easily be added with proper category
  - Backwards compatible: Old FlowSystems without categories still work via pattern matching fallback

* Summary: Fine-Grained Variable Categories

  New Categories (structure.py:45-103)

  class VariableCategory(Enum):
      # State variables
      CHARGE_STATE, SOC_BOUNDARY

      # Rate/Power variables
      FLOW_RATE, NETTO_DISCHARGE, VIRTUAL_FLOW

      # Binary state
      STATUS, INACTIVE

      # Binary events
      STARTUP, SHUTDOWN

      # Effect variables
      PER_TIMESTEP, SHARE, TOTAL, TOTAL_OVER_PERIODS

      # Investment
      SIZE, INVESTED

      # Counting/Duration
      STARTUP_COUNT, DURATION

      # Piecewise linearization
      INSIDE_PIECE, LAMBDA0, LAMBDA1, ZERO_POINT

      # Other
      OTHER

  Logical Groupings for Expansion

  EXPAND_INTERPOLATE = {CHARGE_STATE}      # Interpolate between boundaries
  EXPAND_DIVIDE = {PER_TIMESTEP, SHARE}    # Divide by expansion factor
  # Default: repeat within segment

  Files Modified
  ┌───────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
  │         File          │                                                          Variables Updated                                                          │
  ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ components.py         │ charge_state, netto_discharge, SOC_boundary                                                                                         │
  ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ elements.py           │ flow_rate, status, virtual_supply, virtual_demand                                                                                   │
  ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ features.py           │ size, invested, inactive, startup, shutdown, startup_count, inside_piece, lambda0, lambda1, zero_point, total, per_timestep, shares │
  ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ effects.py            │ total, total_over_periods                                                                                                           │
  ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ modeling.py           │ duration                                                                                                                            │
  ├───────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ transform_accessor.py │ Updated to use EXPAND_INTERPOLATE and EXPAND_DIVIDE groupings                                                                       │
  └───────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
  Test Results

  - All 83 cluster/expand tests pass
  - Variable categories correctly populated and grouped

* Add IO for variable categories

* The refactoring is complete. Here's what was accomplished:

  Changes Made

  1. Added combine_slices() utility to flixopt/clustering/base.py (lines 52-107)
    - Simple function that stacks dict of {(dim_values): np.ndarray} into a DataArray
    - Much cleaner than the previous reverse-concat pattern
  2. Refactored 3 methods to use the new utility:
    - Clustering.expand_data() - reduced from ~25 to ~12 lines
    - Clustering.build_expansion_divisor() - reduced from ~35 to ~20 lines
    - TransformAccessor._interpolate_charge_state_segmented() - reduced from ~43 to ~27 lines
  3. Added 4 unit tests for combine_slices() in tests/test_cluster_reduce_expand.py

  Results
  ┌───────────────────────────────────┬──────────┬────────────────────────┐
  │              Metric               │  Before  │         After          │
  ├───────────────────────────────────┼──────────┼────────────────────────┤
  │ Complex reverse-concat blocks     │ 3        │ 0                      │
  ├───────────────────────────────────┼──────────┼────────────────────────┤
  │ Lines of dimension iteration code │ ~100     │ ~60                    │
  ├───────────────────────────────────┼──────────┼────────────────────────┤
  │ Test coverage                     │ 83 tests │ 87 tests (all passing) │
  └───────────────────────────────────┴──────────┴────────────────────────┘
  The Pattern Change

  Before (complex reverse-concat):
  result_arrays = slices
  for dim in reversed(extra_dims):
      grouped = {}
      for key, arr in result_arrays.items():
          rest_key = key[:-1] if len(key) > 1 else ()
          grouped.setdefault(rest_key, []).append(arr)
      result_arrays = {k: xr.concat(v, dim=...) for k, v in grouped.items()}
  result = list(result_arrays.values())[0].transpose('time', ...)

  After (simple combine):
  return combine_slices(slices, extra_dims, dim_coords, 'time', output_coord, attrs)

* Here's what we accomplished:

  1. Fully Vectorized expand_data()

  Before (~65 lines with loops):
  for combo in np.ndindex(*[len(v) for v in dim_coords.values()]):
      selector = {...}
      mapping = _select_dims(timestep_mapping, **selector).values
      data_slice = _select_dims(aggregated, **selector)
      slices[key] = _expand_slice(mapping, data_slice)
  return combine_slices(slices, ...)

  After (~25 lines, fully vectorized):
  timestep_mapping = self.timestep_mapping  # Already multi-dimensional!
  cluster_indices = timestep_mapping // time_dim_size
  time_indices = timestep_mapping % time_dim_size
  expanded = aggregated.isel(cluster=cluster_indices, time=time_indices)
  # xarray handles broadcasting across period/scenario automatically

  2. build_expansion_divisor() and _interpolate_charge_state_segmented()

  These still use combine_slices() because they need per-result segment data (segment_assignments, segment_durations) which isn't available as concatenated Clustering properties yet.

  Current State
  ┌───────────────────────────────────────┬─────────────────┬─────────────────────────────────┐
  │                Method                 │   Vectorized?   │   Uses Clustering Properties    │
  ├───────────────────────────────────────┼─────────────────┼─────────────────────────────────┤
  │ expand_data()                         │ Yes             │ timestep_mapping (fully)        │
  ├───────────────────────────────────────┼─────────────────┼─────────────────────────────────┤
  │ build_expansion_divisor()             │ No (small loop) │ cluster_assignments (partially) │
  ├───────────────────────────────────────┼─────────────────┼─────────────────────────────────┤
  │ _interpolate_charge_state_segmented() │ No (small loop) │ cluster_assignments (partially) │
  └───────────────────────────────────────┴─────────────────┴─────────────────────────────────┘

* Completed:

  1. _interpolate_charge_state_segmented() - Fully vectorized from ~110 lines to ~55 lines
    - Uses clustering.timestep_mapping for indexing
    - Uses clustering.results.segment_assignments, segment_durations, and position_within_segment
    - Single xarray expression instead of triple-nested loops

  Previously completed (from before context limit):
  - Added segment_assignments multi-dimensional property to ClusteringResults
  - Added segment_durations multi-dimensional property to ClusteringResults
  - Added position_within_segment property to ClusteringResults
  - Vectorized expand_data()
  - Vectorized build_expansion_divisor()

  Test results: All 130 tests pass (87 cluster/expand + 43 IO tests)

  The combine_slices utility function is still available in clustering/base.py if needed in the future, but all the main dimension-handling methods now use xarray's vectorized advanced indexing instead of the loop-based slice-and-combine pattern.

* All simplifications complete! Here's a summary of what we cleaned up:

  Summary of Simplifications

  1. expand_da() in transform_accessor.py

  - Extracted duplicate "append extra timestep" logic into _append_final_state() helper
  - Reduced from ~50 lines to ~25 lines
  - Eliminated code duplication

  2. _build_multi_dim_array() → _build_property_array() in clustering/base.py

  - Replaced 6 conditional branches with unified np.ndindex() pattern
  - Now handles both simple and multi-dimensional cases in one method
  - Reduced from ~50 lines to ~25 lines
  - Preserves dtype (fixed integer indexing bug)

  3. Property boilerplate in ClusteringResults

  - 5 properties (cluster_assignments, cluster_occurrences, cluster_centers, segment_assignments, segment_durations) now use the unified _build_property_array()
  - Each property reduced from ~25 lines to ~8 lines
  - Total: ~165 lines → ~85 lines

  4. _build_timestep_mapping() in Clustering

  - Simplified to single call using _build_property_array()
  - Reduced from ~16 lines to ~9 lines

  Total lines removed: ~150+ lines of duplicated/complex code

* Removed the unnecessary lookup and use segment_indices directl

* The IO roundtrip fix is working correctly. Here's a summary of what was fixed:

  Summary

  The IO roundtrip bug was caused by representative_weights (a variable with only ('cluster',) dimension) being copied as-is during expansion, which caused the cluster dimension to incorrectly persist in the expanded dataset.

  Fix applied in transform_accessor.py:2063-2065:
  # Skip cluster-only vars (no time dim) - they don't make sense after expansion
  if da.dims == ('cluster',):
      continue

  This skips variables that have only a cluster dimension (and no time dimension) during expansion, as these variables don't make sense after the clustering structure is removed.

  Test results:
  - All 87 tests in test_cluster_reduce_expand.py pass ✓
  - All 43 tests in test_clustering_io.py pass ✓
  - Manual IO roundtrip test passes ✓
  - Tests with different segment counts (3, 6) pass ✓
  - Tests with 2-hour timesteps pass ✓

* Updated condition in transform_accessor.py:2063-2066:

  # Skip vars with cluster dim but no time dim - they don't make sense after expansion
  # (e.g., representative_weights with dims ('cluster',) or ('cluster', 'period'))
  if 'cluster' in da.dims and 'time' not in da.dims:
      continue

  This correctly handles:
  - ('cluster',) - simple cluster-only variables like cluster_weight
  - ('cluster', 'period') - cluster variables with period dimension
  - ('cluster', 'scenario') - cluster variables with scenario dimension
  - ('cluster', 'period', 'scenario') - cluster variables with both

  Variables with both cluster and time dimensions (like timestep_duration with dims ('cluster', 'time')) are correctly expanded since they contain time-series data that needs to be mapped back to original timesteps.

* Summary of Fixes

  1. clustering/base.py - combine_slices() hardening (lines 52-118)

  - Added validation for empty input: if not slices: raise ValueError("slices cannot be empty")
  - Capture first array and preserve dtype: first = next(iter(slices.values())) → np.empty(shape, dtype=first.dtype)
  - Clearer error on missing keys with try/except: raise KeyError(f"Missing slice for key {key} (extra_dims={extra_dims})")

  2. flow_system.py - Variable categories cleanup and safe enum restoration

  - Added self._variable_categories.clear() in _invalidate_model() (line 1692) to prevent stale categories from being reused
  - Hardened VariableCategory restoration (lines 922-930) with try/except to handle unknown/renamed enum values gracefully with a warning instead of crashing

  3. transform_accessor.py - Correct timestep_mapping decode for segmented systems (lines 1850-1857)

  - For segmented systems, now uses clustering.n_segments instead of clustering.timesteps_per_cluster as the divisor
  - This matches the encoding logic in expand_data() and build_expansion_divisor()

* Added test_segmented_total_effects_match_solution to TestSegmentation class

* Added all remaining tsam.aggregate() paramaters and missing type hint

* Updated expression_tracking_variable

  modeling.py:200-242 - Added category: VariableCategory = None parameter and passed it to both add_variables calls.

  Updated Callers
  ┌─────────────┬──────┬─────────────────────────┬────────────────────┐
  │    File     │ Line │        Variable         │      Category      │
  ├─────────────┼──────┼─────────────────────────┼────────────────────┤
  │ features.py │ 208  │ active_hours            │ TOTAL              │
  ├─────────────┼──────┼─────────────────────────┼────────────────────┤
  │ elements.py │ 682  │ total_flow_hours        │ TOTAL              │
  ├─────────────┼──────┼─────────────────────────┼────────────────────┤
  │ elements.py │ 709  │ flow_hours_over_periods │ TOTAL_OVER_PERIODS │
  └─────────────┴──────┴─────────────────────────┴────────────────────┘
  All expression tracking variables now properly register their categories for segment expansion handling. The pattern is consistent: callers specify the appropriate category based on what the tracked expression represents.

* Added to flow_system.py

  variable_categories property (line 1672):
  @Property
  def variable_categories(self) -> dict[str, VariableCategory]:
      """Variable categories for filtering and segment expansion."""
      return self._variable_categories

  get_variables_by_category() method (line 1681):
  def get_variables_by_category(
      self, *categories: VariableCategory, from_solution: bool = True
  ) -> list[str]:
      """Get variable names matching any of the specified categories."""

  Updated in statistics_accessor.py
  ┌───────────────┬──────────────────────────────────────────┬──────────────────────────────────────────────────┐
  │   Property    │                  Before                  │                      After                       │
  ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ flow_rates    │ endswith('|flow_rate')                   │ get_variables_by_category(FLOW_RATE)             │
  ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ flow_sizes    │ endswith('|size') + flow_labels check    │ get_variables_by_category(SIZE) + flow_labels    │
  ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ storage_sizes │ endswith('|size') + storage_labels check │ get_variables_by_category(SIZE) + storage_labels │
  ├───────────────┼──────────────────────────────────────────┼──────────────────────────────────────────────────┤
  │ charge_states │ endswith('|charge_state')                │ get_variables_by_category(CHARGE_STATE)          │
  └───────────────┴──────────────────────────────────────────┴──────────────────────────────────────────────────┘
  Benefits

  1. Single source of truth - Categories defined once in VariableCategory enum
  2. Easier maintenance - Adding new variable types only requires updating one place
  3. Type safety - Using enum values instead of magic strings
  4. Flexible filtering - Can filter by multiple categories: get_variables_by_category(SIZE, INVESTED)
  5. Consistent naming - Uses rsplit('|', 1)[0] instead of replace('|suffix', '') for label extraction

* Ensure backwards compatability

* Summary of Changes

  1. New SIZE Sub-Categories (structure.py)

  - Added FLOW_SIZE and STORAGE_SIZE to differentiate flow vs storage investments
  - Kept SIZE for backward compatibility

  2. InvestmentModel Updated (features.py)

  - Added size_category parameter to InvestmentModel.__init__()
  - Callers now specify the appropriate category

  3. Variable Registrations Updated

  - elements.py: FlowModel uses FLOW_SIZE
  - components.py: StorageModel uses STORAGE_SIZE (2 locations)

  4. Statistics Accessor Simplified (statistics_accessor.py)

  - flow_sizes: Now uses get_variables_by_category(FLOW_SIZE) directly
  - storage_sizes: Now uses get_variables_by_category(STORAGE_SIZE) directly
  - No more filtering by element labels after getting SIZE variables

  5. Backward-Compatible Fallback (flow_system.py)

  - get_variables_by_category() handles old files:
    - FLOW_SIZE → matches |size suffix + flow labels
    - STORAGE_SIZE → matches |size suffix + storage labels

  6. SOC Boundary Pattern Matching Replaced (transform_accessor.py)

  - Changed from endswith('|SOC_boundary') to get_variables_by_category(SOC_BOUNDARY)

  7. Effect Variables Verified

  - PER_TIMESTEP ✓ (features.py:659)
  - SHARE ✓ (features.py:700 for temporal shares)
  - TOTAL / TOTAL_OVER_PERIODS ✓ (multiple locations)

  8. Documentation Updated

  - _build_segment_total_varnames() marked as backwards-compatibility fallback

  Benefits

  - Cleaner code: No more string manipulation to filter by element type
  - Type safety: Using enum values instead of magic strings
  - Single source of truth: Categories defined once, used everywhere
  - Backward compatible: Old files still work via fallback logic

* perf: Keep data in minimal form (no pre-broadcasting) (#575)

* Add collapse to IO

* Temp

* 1. Added dimension reduction functions in io.py:
    - _reduce_constant_dims(): Reduces dimensions where all values are constant
    - _expand_reduced_dims(): Restores original dimensions with correct order
  2. Added reduce_constants parameter to FlowSystem.to_dataset():
    - Default: False (opt-in)
    - When True: Reduces constant dimensions for memory efficiency
  3. Updated FlowSystem.from_dataset():
    - Expands both collapsed arrays (from NetCDF) and reduced dimensions
  4. Key fixes:
    - Store original dimension order in attrs to preserve (cluster, time) vs (time, cluster) ordering
    - Skip solution variables in reduction (they're prefixed with solution|)

  The optimization is opt-in via to_dataset(reduce_constants=True). For file storage, save_dataset_to_netcdf still collapses constants to scalars by default.

* Set reduce_constants=True by default in to_dataset()

Benchmarks show 99%+ reduction in memory and file size for multi-period
models, with faster IO speeds. The optimization is now enabled by default.

Co-Authored-By: Claude Opus 4.5 <[email protected]>

* Feature/speedup io+internals (#576)

* Add safe isel wrappers for dimension-independent operations

- Add _scalar_safe_isel_drop() to modeling.py for selecting from
  potentially reduced arrays (handles both scalar and array cases)
- Update Storage validation to use _scalar_safe_isel for
  relative_minimum/maximum_charge_state access at time=0
- Update StorageModel._relative_charge_state_bounds to handle
  reduced arrays properly with dimension checks

These wrappers prepare the codebase for future optimization where
_expand_reduced_dims() could be removed from from_dataset(),
keeping data compact throughout the system.

Co-Authored-By: Claude Opus 4.5 <[email protected]>

* Remove pre-broadcasting from DataConverter - data stays minimal

Major change: DataConverter.to_dataarray() now validates dimensions
instead of broadcasting to full target dimensions. This keeps data
compact (scalars stay scalar, 1D arrays stay 1D) and lets linopy
handle broadcasting at variable creation time.

Changes:
- core.py: Replace _broadcast_dataarray_to_target_specification with
  _validate_dataarray_dims in to_dataarray() method
- components.py: Fix _relative_charge_state_bounds to handle scalar
  inputs that have no time dimension (expand to timesteps_extra)
- conftest.py: Add assert_dims_compatible helper for tests
- test_*.py: Update dimension assertions to use subset checking

Memory impact (8760 timesteps × 20 periods):
- Before: 24.1 MB dataset size
- After: 137.1 KB dataset size

Co-Authored-By: Claude Opus 4.5 <[email protected]>

* Remove redundant dimension reduction - data is minimal from start

Since DataConverter no longer broadcasts, these optimizations are redundant:
- Remove _reduce_constant_dims() and _expand_reduced_dims() from io.py
- Remove reduce_constants parameter from to_dataset()
- Remove _expand_reduced_dims() call from from_dataset()

Also ensure Dataset always has model coordinates (time, period, scenario,
cluster) even if no data variable uses them. This is important for Dataset
validity and downstream operations.

Update benchmark_io.py to reflect the simplified API.

Memory footprint now:
- 24 timesteps: 568 bytes (23/24 vars are scalar)
- 168 timesteps: 2.8 KB
- 8760 timesteps × 20 periods: ~137 KB (vs 24 MB with old broadcasting)

Co-Authored-By: Claude Opus 4.5 <[email protected]>

* ⏺ Done. I've removed the _collapse_constant_arrays / _expand_collapsed_arrays optimization from the codebase. Here's a summary of the changes:

  Files modified:

  1. flixopt/io.py:
    - Removed COLLAPSED_VAR_PREFIX constant
    - Removed _collapse_constant_arrays() function (~45 lines)
    - Removed _expand_collapsed_arrays() function (~40 lines)
    - Removed collapse_constants parameter from save_dataset_to_netcdf()
    - Removed expansion logic from load_dataset_from_netcdf()
  2. flixopt/flow_system.py:
    - Removed import and call to _expand_collapsed_arrays in from_dataset()
  3. benchmark_io.py:
    - Simplified benchmark_netcdf() to only benchmark single save/load (no more comparison)
    - Removed collapse_constants parameter from roundtrip function

  Rationale: Since data is now kept in minimal form throughout the system (scalars stay scalars, 1D arrays stay 1D), there's no need for the extra collapsing/expanding layer when saving to NetCDF. The file sizes are naturally small because the data isn't expanded to full dimensions.

  Test results:
  - All 10 IO tests pass
  - All 4 integration tests pass
  - Benchmark runs successfully

* Summary of Fixes

  1. _dataset_resample handling all-scalar data (transform_accessor.py)

  When all data variables are scalars (no time dimension), the resample function now:
  - Creates a dummy variable to resample the time coordinate
  - Preserves all original scalar data variables
  - Preserves all non-time coordinates (period, scenario, cluster)

  2. _scalar_safe_reduce helper (modeling.py)

  Added a new helper function that safely performs aggregation operations (mean/sum/etc) over a dimension:
  - Returns reduced data if the dimension exists
  - Returns data unchanged if the dimension doesn't exist (scalar data)

  3. Updated Storage intercluster linking (components.py)

  Used _scalar_safe_reduce for:
  - relative_loss_per_hour.mean('time')
  - timestep_duration.mean('time')

  4. Updated clustering expansion (transform_accessor.py)

  Used _scalar_safe_reduce for relative_loss_per_hour.mean('time')

  5. Fixed dimension order in expand_data (clustering/base.py)

  When expanding clustered data back to original timesteps:
  - Added transpose to ensure (cluster, time) dimension order before numpy indexing
  - Fixes IndexError when dimensions were in different order

* Added reduction on loading old datasets

* Summary of changes made:

  1. flixopt/structure.py: Added dimension transpose in _ensure_coords() to ensure correct dimension order when data already has all dims but in wrong order
  2. tests/test_io_conversion.py: Updated test_returns_same_object → test_returns_equivalent_dataset since convert_old_dataset now creates a new dataset when reducing constants
  3. tests/deprecated/test_flow.py: Updated 6 assertions to expect minimal dimensions ('time',) instead of broadcasting to all model coords
  4. tests/deprecated/test_component.py: Updated 2 assertions to expect minimal dimensions ('time',)
  5. tests/deprecated/test_linear_converter.py: Updated 1 assertion to expect minimal dimensions ('time',)

  The key change is that data now stays in minimal form - a 1D time-varying array stays as ('time',) dimensions rather than being pre-broadcast to ('time', 'scenario') or other full model dimensions. Broadcasting happens at the linopy interface layer in FlowSystemModel.add_variables() when needed.

---------

Co-authored-by: Claude Opus 4.5 <[email protected]>

---------

Co-authored-by: Claude Opus 4.5 <[email protected]>

---------

Co-authored-by: Claude Opus 4.5 <[email protected]>
@FBumann
Copy link
Member Author

FBumann commented Jan 16, 2026

https://github.com/coderabbitai review

@FBumann FBumann merged commit 450739c into feature/tsam-v3+rework Jan 16, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants