-
Notifications
You must be signed in to change notification settings - Fork 72
Fix coordinate misalignment in expression merge #550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…ing logic - Mark defensive code paths with pragma: no cover (unreachable in practice)
|
Im not entirely sure if this affects performance, but its at least worth raising i thought |
…ion-based approach. For each non-helper dimension, it computes the union of coordinate values across all datasets. If any mismatch is found, all datasets are reindexed to the union coordinates with FILL_VALUE defaults. This handles both the reordered-coords case and the overlapping-subset case. test/test_linear_expression.py: Added test_merge_with_overlapping_coords that creates variables with ["alice", "bob"] and ["bob", "charlie"], merges them, and verifies correct alignment — bob gets both terms, alice and charlie get only their respective term with fill values for the missing one.
…es) and then patching up mismatches with reindexing, _check_coords_match now checks that actual coordinate values and order are identical. When they're not, override is False and xarray's join='outer' handles alignment correctly. The entire reindex block is gone.
|
this is a very sensible API change! let's pull that in. do you see anything missing here? |
|
I would like to reconsider if there is a vlid use case for this, or if its truly only a bug. |
|
@FabianHofmann
So the behaviour change is for such cases: import pandas as pd
import numpy as np
import linopy
m = linopy.Model()
hours = pd.date_range("2025-01-01", periods=8760, freq="h", name="time")
gen = m.add_variables(coords=[hours], name="gen")
# Suppose demand comes from an external source, sorted differently
demand = pd.Series(
data=np.linspace(0, 100, 8760),
index=hours.sort_values(ascending=False), # reversed!
name="demand",
)
# Before the PR fix, this silently misaligns:
expr = gen - demand
print(expr)If the indexes were not both containing equal keys, nothing changes (shift, subset, etc were always aligned correctly) |
|
yes, I agree. did you test if this now also applies to |
Im not sure. Should i verify the call graph? |
|
That would be awesome, but if not also fine. I guess I have to sit down next week and figure out a consistent convention for all linopy operations anyway |
|
@FabianHofmann I'll see if i find the time. Unfortunately i will be quite busy next week, except friday. So if you need someone to brainstorm with and you can wait until friday tell me ;) |
Summary
Fixes silent data corruption when adding expressions whose coordinates have the same values but in different order (e.g.
['costs', 'Penalty']vs['Penalty', 'costs']). Previously,merge()used positional concatenation viajoin='override', ignoring labels entirely.What changed
When
merge()detects that two datasets share the same coordinate values in a different order, it now reindexes to the first dataset's order before concatenation. This only triggers when the coordinate value sets are identical — different subsets or different sizes are unaffected.What is NOT affected
v.loc[:9] + v.loc[10:]— different value sets, override preservedBug reproduction script
Benchmark script (
dev-scripts/benchmark_merge_alignment.py)Checklist
doc/release_notes.rstis included