Skip to content

Handle temporal aggregation in observational data #789

@cdc-mitzimorris

Description

@cdc-mitzimorris

End-to-end support for coarser-than-daily count signals

Replaces the original scope of issue #789, which addressed only the observation side. That scope leaves the model incoherent for weekly data (e.g., NHSN hospital admissions) because R(t) remains daily-parametrized while observations are weekly. This revision covers both sides in one PR.

Problem statement

Each observed count must correspond to exactly one estimated R(t) parameter, so that parameter count tracks signal information. When a signal is reported on a coarser grid than daily (e.g., weekly MMWR hospital admissions), independently estimating one R(t) per day produces 7x as many free parameters as informative observations and is unidentified. The latent infection trajectory must be driven by an R(t) that is piecewise-constant at the observation cadence.

Under a daily R(t) prior, weekly observations alone cannot distinguish between 7-tuples of R(t) values that sum to the same weekly total. The prior does the work of smoothing, and the posterior is sensitive to prior tightness rather than data. Matching the R(t) parametrization cadence to the observation cadence removes this pathology: information and parameters are in 1:1 correspondence.

Target use case

Weekly NHSN hospital admissions and daily NSSP ED visits at population level (future: county level) in a single multi-signal model.
The end-to-end flow for the weekly signal is:

  • Weekly R(t) sampled from a temporal process (one parameter per observation week), broadcast to daily by repetition.
  • Daily renewal equation consumes the daily R(t) and produces a daily infection trajectory.
  • Daily predicted admissions computed via ascertainment and delay convolution.
  • Daily predicted admissions summed to the weekly observation grid.
  • Likelihood scored at weekly scale against weekly observations.

Aggregation must sit inside the likelihood path, not in downstream processing. It is part of the computational graph that numpyro traces for gradient-based inference, and the noise model needs predicted and observed values on the same aggregated scale.

The goal is to express this flow through the public PyRenew builder API with a clean separation of concerns, so that scale of parametrization, daily renewal mechanics, and aggregation of predictions are independent building blocks.

Implications for the three components

Observation processing (CountObservation): Daily predictions from the renewal equation must be summed to the observation cadence before entering the likelihood.

Latent R(t) parametrization (TemporalProcess): R(t) is sampled at the observation cadence (one value per reporting period) and broadcast to daily by repetition for the renewal equation. Cleanest implementation is a wrapper around any inner TemporalProcess, leaving PopulationInfections and SubpopulationInfections unchanged.

Infection trajectory (PopulationInfections / SubpopulationInfections): No change. Renewal is always computed daily. The input R(t) happens to be piecewise-constant over blocks; this is transparent to the renewal mechanics.

Coherence requirement

For a model to be coherent end-to-end at a given cadence P:

  • R(t) parametrization cadence = P (via a stepwise temporal process)
  • Observation aggregation cadence = P (via CountObservation(aggregation_period=P, ...))
  • Both cadences share the same calendar anchoring (period_end_dow + first_day_dow)

For a mixed-cadence multi-signal model (e.g., weekly NHSN + daily NSSP), each signal has its own aggregation_period because each CountObservation carries its own parameters. The R(t) parametrization is global (one TemporalProcess drives the latent process). In such models R(t) should be parametrized at the finest observation cadence (daily) so all signals remain identifiable.


API design

Observation side: new constructor arguments on CountObservation

Applied to both PopulationCounts and SubpopulationCounts:

aggregation_period: int = 1                     # {1, 7}
reporting_schedule: str = "regular"             # {"regular", "irregular"}
period_end_dow: int | None = None               # required iff aggregation_period == 7

Constructor-time validation:

  • aggregation_period in {1, 7} (scope guard; extension to other periods is follow-up work)
  • period_end_dow in {0, ..., 6} when aggregation_period == 7; ignored when aggregation_period == 1
  • day_of_week_rv may not be combined with aggregation_period > 1 (within-period structure is destroyed by aggregation)
  • reporting_schedule in {"regular", "irregular"}

Sample-time behavior by (schedule, class):

class schedule obs shape extra required kwargs
PopulationCounts "regular" (n_periods,) dense + NaN first_day_dow when P > 1
PopulationCounts "irregular" (n_obs,) first_day_dow, period_end_times
SubpopulationCounts "regular" (n_periods, n_observed_subpops) dense + NaN first_day_dow when P > 1, subpop_indices
SubpopulationCounts "irregular" (n_obs,) first_day_dow, period_end_times, subpop_indices

In all cases period_end_times are indices on the daily axis of each observed period's final day, converted internally to period indices via (period_end_times - offset - (P - 1)) // P. Likelihood path is dense-with-NaN + mask for regular and fancy-index on aggregated array for irregular, matching the existing daily-scale conventions on these classes.

Latent side: new StepwiseTemporalProcess

In pyrenew/latent/temporal_processes.py:

class StepwiseTemporalProcess(TemporalProcess):
    """
    Parameterize a trajectory at a coarser cadence and broadcast to
    the per-timepoint scale by repetition.
    """
    def __init__(self, inner: TemporalProcess, step_size: int = 7):
        self.inner = inner
        self.step_size = step_size

    def sample(self, n_timepoints, initial_value, name_prefix):
        n_steps = (n_timepoints + self.step_size - 1) // self.step_size
        coarse = self.inner.sample(
            n_timepoints=n_steps,
            initial_value=initial_value,
            name_prefix=name_prefix,
        )
        return jnp.repeat(coarse, repeats=self.step_size, axis=0)[:n_timepoints]

Accepts any AR1, DifferencedAR1, RandomWalk, or other TemporalProcess concrete as the inner process. Weekly R(t): StepwiseTemporalProcess(inner=AR1(...), step_size=7). Daily R(t) (default): AR1(...) directly, unchanged.

Separation of concerns achieved

concern lives in change
daily renewal equation PopulationInfections / SubpopulationInfections none
scale of R(t) parametrization TemporalProcess wrapper new StepwiseTemporalProcess
daily -> period aggregation of predictions CountObservation new parameters and sample-time logic

API defaults and breaking change

Defaults are uniform across both subclasses: aggregation_period=1, reporting_schedule="regular", period_end_dow=None.

  • PopulationCounts(...) with no new arguments: unchanged behavior.
  • SubpopulationCounts(...) with no new arguments: new default is dense 2D daily (matches the county-level NSSP use case). Existing sparse-observation callers must now opt in with reporting_schedule="irregular". This is a breaking change for SubpopulationCounts, but the only affected call sites are unit and integration tests; there is no associated tutorial or production code.

Non-goals

  • MeasurementObservation does not receive these parameters in this PR because the immediate goal is to handle count observations, i.e. hospitalizations + ed visits. Tracked as a follow-up issue.
  • Calendar anchoring of StepwiseTemporalProcess R(t) blocks. The broadcast is "blocks of step_size timepoints starting at element 0 of the daily axis." Whether those blocks calendar-align to observation periods depends on first_day_dow. Document as user responsibility; a follow-up may add explicit anchoring.
  • Aggregation periods other than {1, 7}. period_end_dow is a weekly-specific anchor (a day-of-week index uniquely identifies a period boundary only when P == 7). Generalizing to P > 7 (biweekly, four-weekly) requires a different anchor representation (a reference period-end date) and is a follow-up.
  • Tutorial coverage. Tracked as a separate follow-up once the API lands.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions