End-to-end support for coarser-than-daily count signals
Replaces the original scope of issue #789, which addressed only the observation side. That scope leaves the model incoherent for weekly data (e.g., NHSN hospital admissions) because R(t) remains daily-parametrized while observations are weekly. This revision covers both sides in one PR.
Problem statement
Each observed count must correspond to exactly one estimated R(t) parameter, so that parameter count tracks signal information. When a signal is reported on a coarser grid than daily (e.g., weekly MMWR hospital admissions), independently estimating one R(t) per day produces 7x as many free parameters as informative observations and is unidentified. The latent infection trajectory must be driven by an R(t) that is piecewise-constant at the observation cadence.
Under a daily R(t) prior, weekly observations alone cannot distinguish between 7-tuples of R(t) values that sum to the same weekly total. The prior does the work of smoothing, and the posterior is sensitive to prior tightness rather than data. Matching the R(t) parametrization cadence to the observation cadence removes this pathology: information and parameters are in 1:1 correspondence.
Target use case
Weekly NHSN hospital admissions and daily NSSP ED visits at population level (future: county level) in a single multi-signal model.
The end-to-end flow for the weekly signal is:
- Weekly R(t) sampled from a temporal process (one parameter per observation week), broadcast to daily by repetition.
- Daily renewal equation consumes the daily R(t) and produces a daily infection trajectory.
- Daily predicted admissions computed via ascertainment and delay convolution.
- Daily predicted admissions summed to the weekly observation grid.
- Likelihood scored at weekly scale against weekly observations.
Aggregation must sit inside the likelihood path, not in downstream processing. It is part of the computational graph that numpyro traces for gradient-based inference, and the noise model needs predicted and observed values on the same aggregated scale.
The goal is to express this flow through the public PyRenew builder API with a clean separation of concerns, so that scale of parametrization, daily renewal mechanics, and aggregation of predictions are independent building blocks.
Implications for the three components
Observation processing (CountObservation): Daily predictions from the renewal equation must be summed to the observation cadence before entering the likelihood.
Latent R(t) parametrization (TemporalProcess): R(t) is sampled at the observation cadence (one value per reporting period) and broadcast to daily by repetition for the renewal equation. Cleanest implementation is a wrapper around any inner TemporalProcess, leaving PopulationInfections and SubpopulationInfections unchanged.
Infection trajectory (PopulationInfections / SubpopulationInfections): No change. Renewal is always computed daily. The input R(t) happens to be piecewise-constant over blocks; this is transparent to the renewal mechanics.
Coherence requirement
For a model to be coherent end-to-end at a given cadence P:
- R(t) parametrization cadence =
P (via a stepwise temporal process)
- Observation aggregation cadence =
P (via CountObservation(aggregation_period=P, ...))
- Both cadences share the same calendar anchoring (
period_end_dow + first_day_dow)
For a mixed-cadence multi-signal model (e.g., weekly NHSN + daily NSSP), each signal has its own aggregation_period because each CountObservation carries its own parameters. The R(t) parametrization is global (one TemporalProcess drives the latent process). In such models R(t) should be parametrized at the finest observation cadence (daily) so all signals remain identifiable.
API design
Observation side: new constructor arguments on CountObservation
Applied to both PopulationCounts and SubpopulationCounts:
aggregation_period: int = 1 # {1, 7}
reporting_schedule: str = "regular" # {"regular", "irregular"}
period_end_dow: int | None = None # required iff aggregation_period == 7
Constructor-time validation:
aggregation_period in {1, 7} (scope guard; extension to other periods is follow-up work)
period_end_dow in {0, ..., 6} when aggregation_period == 7; ignored when aggregation_period == 1
day_of_week_rv may not be combined with aggregation_period > 1 (within-period structure is destroyed by aggregation)
reporting_schedule in {"regular", "irregular"}
Sample-time behavior by (schedule, class):
| class |
schedule |
obs shape |
extra required kwargs |
PopulationCounts |
"regular" |
(n_periods,) dense + NaN |
first_day_dow when P > 1 |
PopulationCounts |
"irregular" |
(n_obs,) |
first_day_dow, period_end_times |
SubpopulationCounts |
"regular" |
(n_periods, n_observed_subpops) dense + NaN |
first_day_dow when P > 1, subpop_indices |
SubpopulationCounts |
"irregular" |
(n_obs,) |
first_day_dow, period_end_times, subpop_indices |
In all cases period_end_times are indices on the daily axis of each observed period's final day, converted internally to period indices via (period_end_times - offset - (P - 1)) // P. Likelihood path is dense-with-NaN + mask for regular and fancy-index on aggregated array for irregular, matching the existing daily-scale conventions on these classes.
Latent side: new StepwiseTemporalProcess
In pyrenew/latent/temporal_processes.py:
class StepwiseTemporalProcess(TemporalProcess):
"""
Parameterize a trajectory at a coarser cadence and broadcast to
the per-timepoint scale by repetition.
"""
def __init__(self, inner: TemporalProcess, step_size: int = 7):
self.inner = inner
self.step_size = step_size
def sample(self, n_timepoints, initial_value, name_prefix):
n_steps = (n_timepoints + self.step_size - 1) // self.step_size
coarse = self.inner.sample(
n_timepoints=n_steps,
initial_value=initial_value,
name_prefix=name_prefix,
)
return jnp.repeat(coarse, repeats=self.step_size, axis=0)[:n_timepoints]
Accepts any AR1, DifferencedAR1, RandomWalk, or other TemporalProcess concrete as the inner process. Weekly R(t): StepwiseTemporalProcess(inner=AR1(...), step_size=7). Daily R(t) (default): AR1(...) directly, unchanged.
Separation of concerns achieved
| concern |
lives in |
change |
| daily renewal equation |
PopulationInfections / SubpopulationInfections |
none |
| scale of R(t) parametrization |
TemporalProcess wrapper |
new StepwiseTemporalProcess |
| daily -> period aggregation of predictions |
CountObservation |
new parameters and sample-time logic |
API defaults and breaking change
Defaults are uniform across both subclasses: aggregation_period=1, reporting_schedule="regular", period_end_dow=None.
PopulationCounts(...) with no new arguments: unchanged behavior.
SubpopulationCounts(...) with no new arguments: new default is dense 2D daily (matches the county-level NSSP use case). Existing sparse-observation callers must now opt in with reporting_schedule="irregular". This is a breaking change for SubpopulationCounts, but the only affected call sites are unit and integration tests; there is no associated tutorial or production code.
Non-goals
MeasurementObservation does not receive these parameters in this PR because the immediate goal is to handle count observations, i.e. hospitalizations + ed visits. Tracked as a follow-up issue.
- Calendar anchoring of
StepwiseTemporalProcess R(t) blocks. The broadcast is "blocks of step_size timepoints starting at element 0 of the daily axis." Whether those blocks calendar-align to observation periods depends on first_day_dow. Document as user responsibility; a follow-up may add explicit anchoring.
- Aggregation periods other than
{1, 7}. period_end_dow is a weekly-specific anchor (a day-of-week index uniquely identifies a period boundary only when P == 7). Generalizing to P > 7 (biweekly, four-weekly) requires a different anchor representation (a reference period-end date) and is a follow-up.
- Tutorial coverage. Tracked as a separate follow-up once the API lands.
End-to-end support for coarser-than-daily count signals
Replaces the original scope of issue #789, which addressed only the observation side. That scope leaves the model incoherent for weekly data (e.g., NHSN hospital admissions) because R(t) remains daily-parametrized while observations are weekly. This revision covers both sides in one PR.
Problem statement
Each observed count must correspond to exactly one estimated R(t) parameter, so that parameter count tracks signal information. When a signal is reported on a coarser grid than daily (e.g., weekly MMWR hospital admissions), independently estimating one R(t) per day produces 7x as many free parameters as informative observations and is unidentified. The latent infection trajectory must be driven by an R(t) that is piecewise-constant at the observation cadence.
Under a daily R(t) prior, weekly observations alone cannot distinguish between 7-tuples of R(t) values that sum to the same weekly total. The prior does the work of smoothing, and the posterior is sensitive to prior tightness rather than data. Matching the R(t) parametrization cadence to the observation cadence removes this pathology: information and parameters are in 1:1 correspondence.
Target use case
Weekly NHSN hospital admissions and daily NSSP ED visits at population level (future: county level) in a single multi-signal model.
The end-to-end flow for the weekly signal is:
Aggregation must sit inside the likelihood path, not in downstream processing. It is part of the computational graph that numpyro traces for gradient-based inference, and the noise model needs predicted and observed values on the same aggregated scale.
The goal is to express this flow through the public PyRenew builder API with a clean separation of concerns, so that scale of parametrization, daily renewal mechanics, and aggregation of predictions are independent building blocks.
Implications for the three components
Observation processing (
CountObservation): Daily predictions from the renewal equation must be summed to the observation cadence before entering the likelihood.Latent R(t) parametrization (
TemporalProcess): R(t) is sampled at the observation cadence (one value per reporting period) and broadcast to daily by repetition for the renewal equation. Cleanest implementation is a wrapper around any innerTemporalProcess, leavingPopulationInfectionsandSubpopulationInfectionsunchanged.Infection trajectory (
PopulationInfections/SubpopulationInfections): No change. Renewal is always computed daily. The input R(t) happens to be piecewise-constant over blocks; this is transparent to the renewal mechanics.Coherence requirement
For a model to be coherent end-to-end at a given cadence
P:P(via a stepwise temporal process)P(viaCountObservation(aggregation_period=P, ...))period_end_dow+first_day_dow)For a mixed-cadence multi-signal model (e.g., weekly NHSN + daily NSSP), each signal has its own
aggregation_periodbecause eachCountObservationcarries its own parameters. The R(t) parametrization is global (oneTemporalProcessdrives the latent process). In such models R(t) should be parametrized at the finest observation cadence (daily) so all signals remain identifiable.API design
Observation side: new constructor arguments on
CountObservationApplied to both
PopulationCountsandSubpopulationCounts:Constructor-time validation:
aggregation_period in {1, 7}(scope guard; extension to other periods is follow-up work)period_end_dow in {0, ..., 6}whenaggregation_period == 7; ignored whenaggregation_period == 1day_of_week_rvmay not be combined withaggregation_period > 1(within-period structure is destroyed by aggregation)reporting_schedule in {"regular", "irregular"}Sample-time behavior by (schedule, class):
obsshapePopulationCounts"regular"(n_periods,)dense + NaNfirst_day_dowwhenP > 1PopulationCounts"irregular"(n_obs,)first_day_dow,period_end_timesSubpopulationCounts"regular"(n_periods, n_observed_subpops)dense + NaNfirst_day_dowwhenP > 1,subpop_indicesSubpopulationCounts"irregular"(n_obs,)first_day_dow,period_end_times,subpop_indicesIn all cases
period_end_timesare indices on the daily axis of each observed period's final day, converted internally to period indices via(period_end_times - offset - (P - 1)) // P. Likelihood path is dense-with-NaN + mask for regular and fancy-index on aggregated array for irregular, matching the existing daily-scale conventions on these classes.Latent side: new
StepwiseTemporalProcessIn
pyrenew/latent/temporal_processes.py:Accepts any
AR1,DifferencedAR1,RandomWalk, or otherTemporalProcessconcrete as the inner process. Weekly R(t):StepwiseTemporalProcess(inner=AR1(...), step_size=7). Daily R(t) (default):AR1(...)directly, unchanged.Separation of concerns achieved
PopulationInfections/SubpopulationInfectionsTemporalProcesswrapperStepwiseTemporalProcessCountObservationAPI defaults and breaking change
Defaults are uniform across both subclasses:
aggregation_period=1, reporting_schedule="regular", period_end_dow=None.PopulationCounts(...)with no new arguments: unchanged behavior.SubpopulationCounts(...)with no new arguments: new default is dense 2D daily (matches the county-level NSSP use case). Existing sparse-observation callers must now opt in withreporting_schedule="irregular". This is a breaking change forSubpopulationCounts, but the only affected call sites are unit and integration tests; there is no associated tutorial or production code.Non-goals
MeasurementObservationdoes not receive these parameters in this PR because the immediate goal is to handle count observations, i.e. hospitalizations + ed visits. Tracked as a follow-up issue.StepwiseTemporalProcessR(t) blocks. The broadcast is "blocks ofstep_sizetimepoints starting at element 0 of the daily axis." Whether those blocks calendar-align to observation periods depends onfirst_day_dow. Document as user responsibility; a follow-up may add explicit anchoring.{1, 7}.period_end_dowis a weekly-specific anchor (a day-of-week index uniquely identifies a period boundary only whenP == 7). Generalizing toP > 7(biweekly, four-weekly) requires a different anchor representation (a reference period-end date) and is a follow-up.