End-to-end support for coarser-than-daily count signals#794
End-to-end support for coarser-than-daily count signals#794cdc-mitzimorris wants to merge 53 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 38 out of 38 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
pyrenew/datasets/synthetic_CA_126/true_parameters.json:3
- true_parameters.json still says it was generated from “synthetic 120-day CA data”, but this dataset is now 126 days long. Update the description string to match the new dataset so it doesn’t mislead users/tests that inspect metadata.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 41 out of 41 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@dylanhmorris and @damonbayer - this is ready for review. PR Analysis:
|
| area | doc+ | doc- | code+ | code- | other+ | other- |
|---|---|---|---|---|---|---|
| pyrenew/ (sans datasets) | 644 | 90 | 487 | 67 | 113 | 13 |
| test/ | 713 | 22 | 1,943 | 94 | 473 | 2 |
pyrenew/ per file (sorted by code churn)
| file | doc+ | doc- | code+ | code- | other+ | other- |
|---|---|---|---|---|---|---|
observation/count_observations.py |
283 | 51 | 218 | 52 | 35 | 13 |
latent/temporal_processes.py |
114 | 0 | 85 | 1 | 23 | 0 |
observation/base.py |
89 | 12 | 76 | 10 | 18 | 0 |
model/multisignal_model.py |
46 | 15 | 39 | 3 | 7 | 0 |
model/pyrenew_builder.py |
20 | 5 | 37 | 0 | 7 | 0 |
time.py |
54 | 3 | 11 | 0 | 19 | 0 |
latent/subpopulation_infections.py |
10 | 2 | 10 | 0 | 0 | 0 |
latent/population_infections.py |
9 | 2 | 4 | 0 | 0 | 0 |
arrayutils.py |
15 | 0 | 3 | 0 | 4 | 0 |
observation/measurement_observations.py |
4 | 0 | 2 | 1 | 0 | 0 |
latent/__init__.py |
0 | 0 | 2 | 0 | 0 | 0 |
test/ per file (sorted by code churn)
| file | doc+ | doc- | code+ | code- | other+ | other- |
|---|---|---|---|---|---|---|
test_observation_counts.py |
63 | 0 | 482 | 8 | 87 | 0 |
test_pyrenew_builder.py |
67 | 1 | 312 | 10 | 45 | 0 |
test_observation_validation.py |
53 | 9 | 221 | 30 | 54 | 2 |
integration/test_population_infections_he_weekly.py |
165 | 0 | 223 | 0 | 70 | 0 |
integration/test_population_infections_he_weekly_rt.py |
128 | 0 | 193 | 0 | 62 | 0 |
test_temporal_processes.py |
22 | 0 | 153 | 1 | 33 | 0 |
conftest.py |
142 | 0 | 100 | 0 | 71 | 0 |
integration/conftest.py |
47 | 0 | 93 | 3 | 21 | 0 |
test_datagen_he_CA_126.py |
8 | 6 | 32 | 24 | 5 | 0 |
test_population_infections.py |
2 | 0 | 37 | 1 | 6 | 0 |
test_subpopulation_infections.py |
2 | 0 | 38 | 0 | 4 | 0 |
test_time.py |
8 | 0 | 32 | 0 | 9 | 0 |
integration/test_population_infections_he.py |
1 | 1 | 7 | 9 | 1 | 0 |
test_ar_process.py |
0 | 0 | 13 | 1 | 5 | 0 |
test_datasets_synthetic.py |
5 | 5 | 7 | 7 | 0 | 0 |
Notes on the split
- The code-vs-docstring split uses the Python AST of each file version to identify
Module/ClassDef/FunctionDefleading string-literal bodies; every+/-line is classified by its line number in the respective revision. Triple-quoted strings that aren't docstrings (e.g., multiline literals inside function bodies) count as code. - Code churn is dominated by
count_observations.py(the irregular/weekly aggregation landing for Handle temporal aggregation in observational data #789) andtemporal_processes.py.observation/base.pyandmultisignal_model.pycarry the dispatch plumbing forfirst_day_dow/obs_start_date.time.pyandarrayutils.pyare mostly docstring (new module-level prose around small helpers).
| between R(t) parametrization cadence and observation aggregation. | ||
| """ | ||
|
|
||
| step_size: int |
There was a problem hiding this comment.
Given docs above about default. Consider setting to 1 in the base class and then overriding in inherited classes when appropriate? Open to the more explicit approach as well though.
There was a problem hiding this comment.
agreed we can do this either way, but there are a few reasons to keep it as is:
- TemporalProcess is a typing.Protocol, which is meant to describe a structural shape, not to provide default values.
- Yes, the current code has 3 lines of boilerplate but they're in the right place - reading AR1 on its own tells you its step_size is 1. design precept: explicit > implicit (PEP 20).
- If someone wants to add another TemporalProcess, seeing that the existing subclasses declare
step_sizeexplicitly is a prompt to consider what behavoir applies; otherwise, they might overlook an inappropriate default.
| more volatile trajectories; smaller values produce smoother ones. | ||
| """ | ||
|
|
||
| step_size: int = 1 |
There was a problem hiding this comment.
See above about potential to avoid with inheritance.
High-level comment: do we definitely need to enforce all weekly quantities sharing the same week? I agree that in many cases a user will want this, but it's not a given. For example, you could imagine two weekly aggregate observables, one reported in MMWR epiweeks, the other in isoweeks. Similarly, while matching temporal process weeks to observation weeks makes sense to me as a default, I don't think we should enforce it. |
Overview
Adds support for count observations aggregated to a weekly grid while the renewal equation continues to be evaluated daily. Two design pieces:
End-to-end flow for a weekly signal
pyrenew.time.daily_to_weekly.Relationship to pyrenew-hew
The production pyrenew-hew model parameterizes$\mathcal{R}(t)$ weekly and aggregates daily predicted hospital admissions to a weekly grid. This PR brings the same capability into PyRenew while making parameter cadence a user choice rather than a fixed coupling:
StepwiseTemporalProcess(step_size=7, alignment="calendar_week", week_start_dow=...).Observation cadence and parameter cadence are independent design choices; the builder no longer enforces a pairing rule between them.
Reviewer guide
Review bottom-up through the dependency chain. Each unit's changes are self-contained.
1. Synthetic data refresh (120 → 126 days)
datagen_he_CA_126.py,synthetic_CA_126/*.csv,synthetic_data.py,test_datagen_he_CA_126.py,test_datasets_synthetic.py.true_parameters.jsonmatches the generating process.2. Observation base validators —
pyrenew/observation/base.py_validate_aggregation_params,_compute_period_offset,_validate_period_end_times._validate_shapes_matchreplaces_validate_obs_times_shape._validate_dow_effectnow uses the sharedrequire_shapehelper.(period_end_dow + 1 - first_day_dow) % 7and period-boundary alignment.3. Latent: temporal processes —
pyrenew/latent/temporal_processes.pyStepwiseTemporalProcess;step_sizeattr added to theTemporalProcessProtocol and to AR1 / DifferencedAR1 / RandomWalk."model_index"(default) starts blocks at model index 0;"calendar_week"aligns weekly blocks to a declaredweek_start_dow.pyrenew.time.weekly_to_daily; coarse trajectory recorded as{name_prefix}_coarse.first_day_dowthreaded through the Protocol so calendar-aligned wrappers can use the model-axis day-of-week; standard processes ignore it.4. Latent: shape contracts —
pyrenew/latent/{population,subpopulation}_infections.pysample()methods acceptfirst_day_dowand forward it to their temporal processes.pyrenew.arrayutils.require_shapehelper.5. Count observation path —
pyrenew/observation/count_observations.pyaggregation_period,reporting_schedule,period_end_dow._aggregatewrapspyrenew.time.daily_to_weekly.validate_data/samplefor regular vs. irregular schedules.SubpopulationCountscall sites updated;times→period_end_timesrename consistent.6. Measurement observations —
pyrenew/observation/measurement_observations.py7. Builder + model coherence —
pyrenew/model/{pyrenew_builder,multisignal_model}.pyMultiSignalModel.sample()acceptsfirst_day_dowand forwards it to the latent process._validate_coherenceenforces calendar-anchor and structural coherence:aggregation_period > 1must agree onperiod_end_dow.step_sizemust be a positive integer.week_start_dowconsistent with weeklyperiod_end_dow:period_end_dow == (week_start_dow + 6) % 7.first_day_dowrequired atvalidate_datawhen any obs hasaggregation_period > 1.8. Integration test —
test/integration/test_population_infections_he_weekly.pynumpyro.enable_x64()+set_host_device_count(4).9. Test fixtures + config —
test/conftest.py,pyproject.toml,_typos.tomlWrongShapeTemporalProcess,ConstantTemporalProcess,InvalidStepSizeTemporalProcess).integrationmarker added so-m "not integration"skips MCMC tests.10. Tutorial —
docs/tutorials/building_multisignal_models.qmdalignment="calendar_week",week_start_dow=6) with weekly observations (period_end_dow=5).Where to focus review attention
_compute_period_offsetand the period-boundary check in_validate_period_end_times— these govern correctness of weekly alignment.StepwiseTemporalProcesscalendar-week broadcasting —weekly_to_dailyis reused; sanity check withn_timepoints=17,first_day_dow=3,week_start_dow=6produces 3 coarse samples broadcast to[c0×3, c1×7, c2×7]._validate_coherence— each has pass + distinct-failure tests intest_pyrenew_builder.py.CountObservation._aggregateruns inside the numpyro-traced graph — likelihood scoring is at weekly scale, not post-hoc.Incidental fixes (not directly tied to #789)
Found while implementing aggregation; small enough that separating them would create churn.
_validate_index_arrayempty-array guard (observation/base.py) —jnp.any(indices < 0)on empty arrays returnedFalse; now returns early onsize == 0._validate_index_array/_validate_obs_densendim check — previously accepted non-1D arrays and relied on silent broadcasting; now rejects with a clear error.test_ar_process_asymptotics— the bound|long_ts[-1]| < 3 * noise_sdwas too tight (stationary SD strictly > innovation SD when autoregressive coefficients are non-zero) and latently flaky; replaced with closed-form stationary SD per order._validate_obs_times_shape→_validate_shapes_matchrename — prerequisite refactor; same shape-match logic needed for(obs, period_end_times)pairs._typos.toml: allowdows.pyproject.toml: registerintegrationpytest marker.