Skip to content

End-to-end support for coarser-than-daily count signals#794

Open
cdc-mitzimorris wants to merge 53 commits intomainfrom
mem_789_temporal_aggregation
Open

End-to-end support for coarser-than-daily count signals#794
cdc-mitzimorris wants to merge 53 commits intomainfrom
mem_789_temporal_aggregation

Conversation

@cdc-mitzimorris
Copy link
Copy Markdown
Collaborator

@cdc-mitzimorris cdc-mitzimorris commented Apr 21, 2026

Overview

Adds support for count observations aggregated to a weekly grid while the renewal equation continues to be evaluated daily. Two design pieces:

  1. Weekly observation likelihood path. Daily predicted counts are summed to the weekly observation grid before the likelihood is scored. Aggregation lives inside the numpyro-traced graph so the noise model and the likelihood operate on the same scale.
  2. Independent parameter cadence. The temporal process for $\mathcal{R}(t)$ may be parameterized daily or weekly (or any stepwise cadence) and is broadcast to the daily axis. The renewal equation and all observation transforms run daily either way.

End-to-end flow for a weekly signal

  • $\mathcal{R}(t)$ sampled by a temporal process at the chosen cadence and broadcast to the daily model axis if needed.
  • Daily renewal equation consumes daily $\mathcal{R}(t)$ and produces the daily infection trajectory.
  • Daily predicted counts computed via ascertainment + delay convolution.
  • Daily predicted counts summed to the weekly observation grid via pyrenew.time.daily_to_weekly.
  • Likelihood scored at weekly scale against weekly observations.

Relationship to pyrenew-hew

The production pyrenew-hew model parameterizes $\mathcal{R}(t)$ weekly and aggregates daily predicted hospital admissions to a weekly grid. This PR brings the same capability into PyRenew while making parameter cadence a user choice rather than a fixed coupling:

  • Match pyrenew-hew today: wrap the inner temporal process in StepwiseTemporalProcess(step_size=7, alignment="calendar_week", week_start_dow=...).
  • Daily parameterization for the same signals: drop the wrapper and use the inner process directly.
  • Mixed cadences: weekly hospital admissions and daily ED visits in one model regardless of the Rt parameter cadence.

Observation cadence and parameter cadence are independent design choices; the builder no longer enforces a pairing rule between them.

Reviewer guide

Review bottom-up through the dependency chain. Each unit's changes are self-contained.

1. Synthetic data refresh (120 → 126 days)

  • Files: datagen_he_CA_126.py, synthetic_CA_126/*.csv, synthetic_data.py, test_datagen_he_CA_126.py, test_datasets_synthetic.py.
  • Six extra days yields 18 full weeks. No API change.
  • Verify true_parameters.json matches the generating process.

2. Observation base validators — pyrenew/observation/base.py

  • New: _validate_aggregation_params, _compute_period_offset, _validate_period_end_times.
  • Generalized _validate_shapes_match replaces _validate_obs_times_shape.
  • _validate_dow_effect now uses the shared require_shape helper.
  • Focus: offset arithmetic (period_end_dow + 1 - first_day_dow) % 7 and period-boundary alignment.

3. Latent: temporal processes — pyrenew/latent/temporal_processes.py

  • New class StepwiseTemporalProcess; step_size attr added to the TemporalProcess Protocol and to AR1 / DifferencedAR1 / RandomWalk.
  • Alignment options: "model_index" (default) starts blocks at model index 0; "calendar_week" aligns weekly blocks to a declared week_start_dow.
  • Calendar-week broadcast delegates to pyrenew.time.weekly_to_daily; coarse trajectory recorded as {name_prefix}_coarse.
  • first_day_dow threaded through the Protocol so calendar-aligned wrappers can use the model-axis day-of-week; standard processes ignore it.

4. Latent: shape contracts — pyrenew/latent/{population,subpopulation}_infections.py

  • Both sample() methods accept first_day_dow and forward it to their temporal processes.
  • Output shape of every temporal process is validated via the shared pyrenew.arrayutils.require_shape helper.

5. Count observation path — pyrenew/observation/count_observations.py

  • New ctor params aggregation_period, reporting_schedule, period_end_dow.
  • New _aggregate wraps pyrenew.time.daily_to_weekly.
  • Branching in validate_data / sample for regular vs. irregular schedules.
  • Confirm all in-tree SubpopulationCounts call sites updated; timesperiod_end_times rename consistent.

6. Measurement observations — pyrenew/observation/measurement_observations.py

  • One-line refactor to use the renamed shape validator.

7. Builder + model coherence — pyrenew/model/{pyrenew_builder,multisignal_model}.py

  • MultiSignalModel.sample() accepts first_day_dow and forwards it to the latent process.
  • _validate_coherence enforces calendar-anchor and structural coherence:
    • All observations sharing an aggregation_period > 1 must agree on period_end_dow.
    • Every temporal-process step_size must be a positive integer.
    • Calendar-week-aligned temporal processes must have week_start_dow consistent with weekly period_end_dow: period_end_dow == (week_start_dow + 6) % 7.
  • first_day_dow required at validate_data when any obs has aggregation_period > 1.
  • The previous "step_size ≤ finest aggregation_period" rule was deliberately removed — see the in-body comment.

8. Integration test — test/integration/test_population_infections_he_weekly.py

  • End-to-end MCMC: weekly admissions + daily ED visits, calendar-week-aligned weekly Rt.
  • Prior-predictive shape check and posterior recovery for I0 + baseline R(t).
  • Uses numpyro.enable_x64() + set_host_device_count(4).

9. Test fixtures + config — test/conftest.py, pyproject.toml, _typos.toml

  • 196-line conftest promotes repeated setup to fixtures, including reusable temporal-process stubs (WrongShapeTemporalProcess, ConstantTemporalProcess, InvalidStepSizeTemporalProcess).
  • Pytest integration marker added so -m "not integration" skips MCMC tests.

10. Tutorial — docs/tutorials/building_multisignal_models.qmd

  • New section explaining the three independent choices: parameter cadence, model time axis, observation cadence.
  • Worked example pairing weekly Rt (alignment="calendar_week", week_start_dow=6) with weekly observations (period_end_dow=5).

Where to focus review attention

  • Offset math in _compute_period_offset and the period-boundary check in _validate_period_end_times — these govern correctness of weekly alignment.
  • StepwiseTemporalProcess calendar-week broadcasting — weekly_to_daily is reused; sanity check with n_timepoints=17, first_day_dow=3, week_start_dow=6 produces 3 coarse samples broadcast to [c0×3, c1×7, c2×7].
  • The three coherence rules in _validate_coherence — each has pass + distinct-failure tests in test_pyrenew_builder.py.
  • CountObservation._aggregate runs inside the numpyro-traced graph — likelihood scoring is at weekly scale, not post-hoc.

Incidental fixes (not directly tied to #789)

Found while implementing aggregation; small enough that separating them would create churn.

  • _validate_index_array empty-array guard (observation/base.py) — jnp.any(indices < 0) on empty arrays returned False; now returns early on size == 0.
  • _validate_index_array / _validate_obs_dense ndim check — previously accepted non-1D arrays and relied on silent broadcasting; now rejects with a clear error.
  • AR(p) stationary-SD bound in test_ar_process_asymptotics — the bound |long_ts[-1]| < 3 * noise_sd was too tight (stationary SD strictly > innovation SD when autoregressive coefficients are non-zero) and latently flaky; replaced with closed-form stationary SD per order.
  • _validate_obs_times_shape_validate_shapes_match rename — prerequisite refactor; same shape-match logic needed for (obs, period_end_times) pairs.
  • _typos.toml: allow dows. pyproject.toml: register integration pytest marker.

@cdc-mitzimorris cdc-mitzimorris marked this pull request as ready for review April 23, 2026 11:57
@cdc-mitzimorris cdc-mitzimorris requested a review from Copilot April 23, 2026 11:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 38 out of 38 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

pyrenew/datasets/synthetic_CA_126/true_parameters.json:3

  • true_parameters.json still says it was generated from “synthetic 120-day CA data”, but this dataset is now 126 days long. Update the description string to match the new dataset so it doesn’t mislead users/tests that inspect metadata.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/tutorials/building_multisignal_models.qmd Outdated
Comment thread pyrenew/observation/base.py
Comment thread pyrenew/latent/temporal_processes.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 41 out of 41 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pyrenew/observation/count_observations.py Outdated
@cdc-mitzimorris
Copy link
Copy Markdown
Collaborator Author

cdc-mitzimorris commented Apr 23, 2026

@dylanhmorris and @damonbayer - this is ready for review.
there's a lot of changes to look at via the github interface - here's an analysis of how much of this is code changes

PR Analysis: mem_789_temporal_aggregation vs main

Excluding pyrenew/datasets/. "other" = blank lines and # comments.

Top-level totals

area doc+ doc- code+ code- other+ other-
pyrenew/ (sans datasets) 644 90 487 67 113 13
test/ 713 22 1,943 94 473 2

pyrenew/ per file (sorted by code churn)

file doc+ doc- code+ code- other+ other-
observation/count_observations.py 283 51 218 52 35 13
latent/temporal_processes.py 114 0 85 1 23 0
observation/base.py 89 12 76 10 18 0
model/multisignal_model.py 46 15 39 3 7 0
model/pyrenew_builder.py 20 5 37 0 7 0
time.py 54 3 11 0 19 0
latent/subpopulation_infections.py 10 2 10 0 0 0
latent/population_infections.py 9 2 4 0 0 0
arrayutils.py 15 0 3 0 4 0
observation/measurement_observations.py 4 0 2 1 0 0
latent/__init__.py 0 0 2 0 0 0

test/ per file (sorted by code churn)

file doc+ doc- code+ code- other+ other-
test_observation_counts.py 63 0 482 8 87 0
test_pyrenew_builder.py 67 1 312 10 45 0
test_observation_validation.py 53 9 221 30 54 2
integration/test_population_infections_he_weekly.py 165 0 223 0 70 0
integration/test_population_infections_he_weekly_rt.py 128 0 193 0 62 0
test_temporal_processes.py 22 0 153 1 33 0
conftest.py 142 0 100 0 71 0
integration/conftest.py 47 0 93 3 21 0
test_datagen_he_CA_126.py 8 6 32 24 5 0
test_population_infections.py 2 0 37 1 6 0
test_subpopulation_infections.py 2 0 38 0 4 0
test_time.py 8 0 32 0 9 0
integration/test_population_infections_he.py 1 1 7 9 1 0
test_ar_process.py 0 0 13 1 5 0
test_datasets_synthetic.py 5 5 7 7 0 0

Notes on the split

  • The code-vs-docstring split uses the Python AST of each file version to identify Module/ClassDef/FunctionDef leading string-literal bodies; every +/- line is classified by its line number in the respective revision. Triple-quoted strings that aren't docstrings (e.g., multiline literals inside function bodies) count as code.
  • Code churn is dominated by count_observations.py (the irregular/weekly aggregation landing for Handle temporal aggregation in observational data #789) and temporal_processes.py. observation/base.py and multisignal_model.py carry the dispatch plumbing for first_day_dow / obs_start_date. time.py and arrayutils.py are mostly docstring (new module-level prose around small helpers).

Comment thread docs/tutorials/building_multisignal_models.qmd Outdated
Comment thread docs/tutorials/building_multisignal_models.qmd Outdated
Comment thread docs/tutorials/building_multisignal_models.qmd Outdated
Comment thread docs/tutorials/building_multisignal_models.qmd Outdated
Comment thread docs/tutorials/building_multisignal_models.qmd
Comment thread pyrenew/latent/base.py Outdated
Comment thread pyrenew/latent/temporal_processes.py Outdated
Comment thread pyrenew/arrayutils.py
between R(t) parametrization cadence and observation aggregation.
"""

step_size: int
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given docs above about default. Consider setting to 1 in the base class and then overriding in inherited classes when appropriate? Open to the more explicit approach as well though.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@damonbayer thoughts?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed we can do this either way, but there are a few reasons to keep it as is:

  • TemporalProcess is a typing.Protocol, which is meant to describe a structural shape, not to provide default values.
  • Yes, the current code has 3 lines of boilerplate but they're in the right place - reading AR1 on its own tells you its step_size is 1. design precept: explicit > implicit (PEP 20).
  • If someone wants to add another TemporalProcess, seeing that the existing subclasses declare step_size explicitly is a prompt to consider what behavoir applies; otherwise, they might overlook an inappropriate default.

more volatile trajectories; smaller values produce smoother ones.
"""

step_size: int = 1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above about potential to avoid with inheritance.

Comment thread pyrenew/latent/temporal_processes.py
Comment thread pyrenew/latent/temporal_processes.py
Comment thread pyrenew/model/pyrenew_builder.py Outdated
@dylanhmorris
Copy link
Copy Markdown
Collaborator

dylanhmorris commented Apr 24, 2026

All observations sharing an aggregation_period > 1 must agree on period_end_dow.
Every temporal-process step_size must be a positive integer.
Calendar-week-aligned temporal processes must have week_start_dow consistent with weekly period_end_dow: period_end_dow == (week_start_dow + 6) % 7.

High-level comment: do we definitely need to enforce all weekly quantities sharing the same week?

I agree that in many cases a user will want this, but it's not a given. For example, you could imagine two weekly aggregate observables, one reported in MMWR epiweeks, the other in isoweeks. Similarly, while matching temporal process weeks to observation weeks makes sense to me as a default, I don't think we should enforce it.
https://github.com/CDCgov/PyRenew/pull/794/changes#r3140086415

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants