Bug description
TTSIM produces a (silent) bug when operating with uint dtypes. Often, variables are measured as uints (e.g. earnings which cannot be negative by definition). However, internally we do all sorts of operations with them making the uint dtype invalid.
I stumbled over this issue when looking at the output of the following policy function:
@policy_function()
def einnahmen_nach_abzug_werbungskosten_y(
einnahmen__bruttolohn_y: float,
werbungskosten_y: float,
) -> float:
"""Take gross wage and deduct Werbungskosten."""
return max(einnahmen__bruttolohn_y - werbungskosten_y, 0.0)
Now, pyarrow would throw an ArrowInvalid error here. However, internally, we transform uint dtypes to numpy uints. They produce silent bugs in these cases
def test_uint_bruttolohn_does_not_overflow():
result = max(np.uint32(0) - np.uint32(1230), 0.0)
assert result == 0.0 # fails because result returns 4294966066.0
The following is a reproducer that demonstrated the bug still occurs if the input is uint32[pyarrow]:
def test_uint_bruttolohn_does_not_overflow():
result = main(
main_target=MainTarget.results.tree,
tt_targets=TTTargets.qname([
"einkommensteuer__einkünfte__aus_nichtselbstständiger_arbeit__einnahmen_nach_abzug_werbungskosten_y",
]),
input_data=InputData.tree({
"p_id": pd.Series([1]),
"einnahmen": {
"bruttolohn_y": pd.Series([0], dtype="uint32[pyarrow]"),
},
}),
policy_date_str="2023-01-01",
)
einnahmen = result["einkommensteuer"]["einkünfte"]["aus_nichtselbstständiger_arbeit"]["einnahmen_nach_abzug_werbungskosten_y"]
assert float(einnahmen[0]) == 0.0
Proposed Solution
Transform uints internally to regular ints/floats.
Related to #94
Bug description
TTSIM produces a (silent) bug when operating with uint dtypes. Often, variables are measured as uints (e.g. earnings which cannot be negative by definition). However, internally we do all sorts of operations with them making the uint dtype invalid.
I stumbled over this issue when looking at the output of the following policy function:
Now, pyarrow would throw an
ArrowInvaliderror here. However, internally, we transform uint dtypes to numpy uints. They produce silent bugs in these casesThe following is a reproducer that demonstrated the bug still occurs if the input is
uint32[pyarrow]:Proposed Solution
Transform uints internally to regular ints/floats.
Related to #94