Skip to content

Add per-session LRU visualization caching#76

Open
heaven-howard wants to merge 6 commits intoFNLCR-DMAP:devfrom
Saran-Nag:plot-caching
Open

Add per-session LRU visualization caching#76
heaven-howard wants to merge 6 commits intoFNLCR-DMAP:devfrom
Saran-Nag:plot-caching

Conversation

@heaven-howard
Copy link
Copy Markdown
Contributor

@heaven-howard heaven-howard commented Mar 31, 2026

Feature

Previously, users would have to re-compute plots after switching between features. Now, plots are saved within a cache. If a user wants to return to a specific visualization, it can be instantly retrieved from cache, and there will be no delay from re-rendering.

How the cache works

A new VisualizationCache class (utils/cache_manager.py) is instantiated once per Shiny session and stored in shared['cache']. It is an LRU (Least Recently Used) cache backed by an OrderedDict, with a default capacity of 50 entries.

Each visualization is keyed by three things:

(dataset_version, viz_name, normalized_params)
  • dataset_version is an integer counter that increments every time the user loads a new dataset. All previous cache entries become unreachable automatically on data change, and cache.invalidate() is called immediately to free memory.
  • viz_name is a string identifying the plot (e.g. 'boxplot_interactive', 'ripley_l').
  • normalized_params is a hashable, canonical representation of all the UI inputs that affect the plot output. The normalize_params() helper recursively converts dicts, lists, and sets into tuples so the key is stable. Multi-select inputs whose display order matters (features in boxplot, target cell labels in nearest neighbor) are stored as plain tuples to preserve user-selected ordering to maintain previous app behavior. Inputs where order is irrelevant (e.g. region label filters) use sorted tuples to avoid redundant cache entries.

In every visualization server function, the pattern is:

params = { ...all inputs that affect the plot... }

def compute():
    # expensive work here
    return fig, df

fig, df = cache.get_or_compute('viz_name', version, params, compute)

compute is only called on a cache miss. On a hit, the previously computed (fig, df) pair is returned immediately and promoted to most-recently-used position. When the cache exceeds 50 entries, the oldest is evicted.

Cache invalidation

The cache is invalidated automatically whenever the underlying dataset changes. This covers two cases:

  1. Loading a new file — the user uploads a .h5ad or .pickle file on the data input tab
  2. Applying a subset — the user filters the dataset by annotation and label on the data input tab

Both actions update shared['adata_main'], which triggers the update_parts reactive effect in data_input_server.py. That effect increments dataset_version and calls cache.invalidate() to clear all entries immediately.

Because dataset_version is part of every cache key, any entries computed against the old dataset are unreachable even before invalidate() runs — the version bump alone is sufficient to treat them as stale. The explicit invalidate() call is an optimization that reclaims memory right away rather than waiting for LRU eviction to cycle them out.

Matplotlib Figure Serialization

Static matplotlib figures are serialized to PNG bytes before caching via fig_to_png_bytes() (utils/plot_utils.py), and reconstructed via png_bytes_to_figure() when served. This prevents layout mutations (margins, label sizes) from accumulating across repeated Shiny renders of the same figure object.

Files changed

File Change
utils/cache_manager.py New — VisualizationCache and normalize_params
utils/plot_utils.py New helpers: fig_to_png_bytes, png_bytes_to_figure
server/data_input_server.py Increments dataset_version and calls cache.invalidate() on data load
app.py Initializes shared['cache'] and shared['dataset_version'] per session
All visualization server modules Refactored to use cache.get_or_compute pattern

Testing

When plots are re-queried, Docker logs do not show that a plot is being generated. Docker logs appear stable, with no noticeable bugs or errors while generating plots. A large (5M cells) test .pickle file was generated for testing. Boxplots that take roughly 10 seconds to compute and generate can be re-rendered instantly. (Demo)

@heaven-howard heaven-howard marked this pull request as ready for review March 31, 2026 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants