Skip to content

feat(decoders): add ajpegli benchmark support#38

Merged
ternaus merged 1 commit into
mainfrom
codex/add-ajpegli-decoder
May 20, 2026
Merged

feat(decoders): add ajpegli benchmark support#38
ternaus merged 1 commit into
mainfrom
codex/add-ajpegli-decoder

Conversation

@ternaus
Copy link
Copy Markdown
Owner

@ternaus ternaus commented May 20, 2026

Adds the ajpegli decoder backend and dependency wiring, includes the five-platform ajpegli result matrix, refreshes README plots/tables, and updates paper asset validation for the 13/11 decoder scope.

Summary by Sourcery

Add the ajpegli JPEG decoder to the benchmark suite and update tooling, benchmarks, and documentation to include it in the five-platform paper matrix.

New Features:

  • Introduce an AJpegliDecoder backend wired into the decoder entry-point registry and optional mainstream dependency group.
  • Allow GCP benchmark runs to target specific decoders via a new --libs option propagated through run.sh, VM metadata, and vm_startup.sh.

Enhancements:

  • Refresh benchmark outputs, README tables/metadata, and narrative guidance to cover ajpegli alongside existing decoders and clarify wording around strict/native JPEG decoders.
  • Extend paper asset generation and robustness reporting to treat ajpegli as a strict native decoder that skips one uncommon ImageNet JPEG, and improve AMD worker-delta plotting scale based on data.
  • Document arXiv-specific paper asset generation and clarify GCP docs with generic bucket placeholders and single-decoder run examples.

Tests:

  • Update paper data validation expectations and decoder registry tests to account for ajpegli, and wire ajpegli into test skipping via package-to-module mapping.

Adds the ajpegli decoder backend and dependency wiring, includes the five-platform ajpegli result matrix, refreshes README plots/tables, and updates paper asset validation for the 13/11 decoder scope.
Copilot AI review requested due to automatic review settings May 20, 2026 11:29
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented May 20, 2026

Reviewer's Guide

Adds an ajpegli decoder backend and wires it into the benchmark/plotting pipeline, enables selecting specific decoders (e.g., ajpegli) in GCP benchmark runs, and refreshes documentation, figures, and tests to reflect the expanded 13-decoder, 5-platform scope.

Sequence diagram for selecting ajpegli in GCP benchmark run

sequenceDiagram
    actor User
    participant run_sh as gcp_run_sh
    participant GCE as gcp_instances
    participant VM as vm_startup_sh
    participant CLI as imread_benchmark_run
    participant AJ as AJpegliDecoder
    participant Lib as ajpegli

    User->>run_sh: ./gcp/run.sh --libs ajpegli
    run_sh->>GCE: instances create ... metadata libs=ajpegli
    GCE-->>VM: start VM with metadata
    VM->>VM: LIBS=$(meta libs)
    VM->>CLI: imread-benchmark run --libs "$LIBS"
    CLI->>AJ: load entry point ajpegli
    loop for each image
        CLI->>AJ: decode(data)
        AJ->>Lib: imdecode(data, mode=RGB)
        Lib-->>AJ: np.ndarray
        AJ-->>CLI: np.ndarray
    end
Loading

File-Level Changes

Change Details Files
Introduce ajpegli decoder backend and register it in the benchmark system.
  • Add AJpegliDecoder implementing BaseDecoder with in-memory and path-based RGB decode using ajpegli
  • Register ajpegli as an optional mainstream dependency and as an imread_benchmark.decoders entry point
  • Extend decoder registry/test expectations and package-to-module mapping to include ajpegli
  • Include ajpegli in ordered decoder lists used by tools
imread_benchmark/decoders/ajpegli_decoder.py
pyproject.toml
tests/conftest.py
tests/test_decoders.py
tools/_results.py
Wire ajpegli results into benchmark outputs and README tables for all five platforms.
  • Add ajpegli single-thread and DataLoader JSON result files for each platform output directory
  • Update README single-thread and DataLoader summary tables to include ajpegli rows and refresh metadata date
  • Document ajpegli in the decoder catalog and guidance text as a fast native decoder option
output/linux_AMD-EPYC-9B14/ajpegli_1t_results.json
output/linux_AMD-EPYC-9B14/ajpegli_dataloader_results.json
output/linux_AMD-EPYC-9B45/ajpegli_1t_results.json
output/linux_AMD-EPYC-9B45/ajpegli_dataloader_results.json
output/linux_INTEL(R)-XEON(R)-PLATINUM-8581C-CPU-@-2.30GHz/ajpegli_1t_results.json
output/linux_INTEL(R)-XEON(R)-PLATINUM-8581C-CPU-@-2.30GHz/ajpegli_dataloader_results.json
output/linux_Neoverse-N1/ajpegli_1t_results.json
output/linux_Neoverse-N1/ajpegli_dataloader_results.json
output/linux_Neoverse-V2/ajpegli_1t_results.json
output/linux_Neoverse-V2/ajpegli_dataloader_results.json
output/linux_AMD-EPYC-9B14/run_summary.json
output/linux_AMD-EPYC-9B45/run_summary.json
output/linux_INTEL(R)-XEON(R)-PLATINUM-8581C-CPU-@-2.30GHz/run_summary.json
output/linux_Neoverse-N1/run_summary.json
output/linux_Neoverse-V2/run_summary.json
README.md
Allow GCP benchmark harness to run a subset of decoders via a new --libs parameter, and document ajpegli-focused runs.
  • Add LIBS variable with default all, parse --libs flag in gcp/run.sh, and include libs in instance metadata
  • Read libs metadata in gcp/vm_startup.sh, echo it in logs, and pass through to imread-benchmark run instead of hardcoded all
  • Update GCP docs to use YOUR_BUCKET placeholders, describe --libs usage, and show ajpegli-only run examples
  • Extend options table to document --libs flag
gcp/run.sh
gcp/vm_startup.sh
docs/gcp_benchmarks.md
Update paper asset generation, robustness accounting, and tests for the expanded decoder scope and ajpegli behavior.
  • Include ajpegli in EXPECTED_SKIP_DECODERS and ROBUSTNESS_DECODERS, and add robustness-table interpretation text for ajpegli
  • Adjust wording in platform recommendation note and plotting/statistics guidance to talk about strict native JPEG decoders instead of libjpeg-turbo-only
  • Refine AMD worker-delta plot to precompute deltas, compute dynamic x-axis limits across platforms, and reuse them for both subplots
  • Add arXiv-preprint-specific paper_assets invocation example and update test expectations for 13 single-thread and 11 DataLoader decoders and row counts
tools/paper_assets.py
docs/plotting_and_statistics.md
docs/gcp_benchmarks.md
tests/test_paper_assets.py
Refresh dependency lockfile and ancillary metadata for the new configuration.
  • Update uv.lock to capture the new ajpegli dependency and any resolver changes
  • Ensure docs now describe cold-install cost generically across decoder libraries rather than hardcoded count of 11
uv.lock
docs/gcp_benchmarks.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In plot_fig02_amd_worker_delta, finite_deltas can be empty if all library/platform combinations are missing or NaN, which will cause min()/max() to raise; consider guarding for the empty case and falling back to the previous static x-limits.
  • In AJpegliDecoder, you import ajpegli inside both decode and decode_path; consider importing once at module scope (or caching the import) to avoid repeated import overhead in tight loops.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `plot_fig02_amd_worker_delta`, `finite_deltas` can be empty if all library/platform combinations are missing or NaN, which will cause `min()`/`max()` to raise; consider guarding for the empty case and falling back to the previous static x-limits.
- In `AJpegliDecoder`, you import `ajpegli` inside both `decode` and `decode_path`; consider importing once at module scope (or caching the import) to avoid repeated import overhead in tight loops.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 862ee8a031

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread gcp/run.sh
--boot-disk-size=60GB \
--boot-disk-type="$BOOT_DISK_TYPE" \
--metadata="results-bucket=$RUN_GCS,imagenet-bucket=$IMAGENET_BUCKET,num-images=$NUM_IMAGES,num-runs=$NUM_RUNS,dl-runs=$DL_RUNS,workers=$WORKERS,repo-tarball=$RUN_GCS/repo.tar.gz,cache-bucket=$CACHE_BUCKET,force-rebuild=$FORCE_REBUILD,keep-on-failure=$([[ $KEEP_ON_FAILURE == true ]] && echo 1 || echo 0)" \
--metadata="results-bucket=$RUN_GCS,imagenet-bucket=$IMAGENET_BUCKET,libs=$LIBS,num-images=$NUM_IMAGES,num-runs=$NUM_RUNS,dl-runs=$DL_RUNS,workers=$WORKERS,repo-tarball=$RUN_GCS/repo.tar.gz,cache-bucket=$CACHE_BUCKET,force-rebuild=$FORCE_REBUILD,keep-on-failure=$([[ $KEEP_ON_FAILURE == true ]] && echo 1 || echo 0)" \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Encode --libs metadata before passing comma-separated values

--libs is documented and parsed as a comma-separated list, but this value is interpolated directly into gcloud ... --metadata, whose syntax is KEY=VALUE,KEY=VALUE,.... When a user passes multiple decoders (for example --libs opencv,pillow), the comma inside $LIBS is treated as a metadata separator, so the second token is no longer part of libs and can make instance creation fail (or at best corrupt the metadata payload). This breaks the new multi-library selection path introduced by this commit.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an ajpegli decoder backend to the benchmark suite and wires it into dependency management, result loading, paper-asset generation, and the GCP benchmarking workflow so ajpegli can be benchmarked and included in published tables/figures.

Changes:

  • Introduce AJpegliDecoder and register it via pyproject.toml entry points; add ajpegli to the mainstream optional dependency set (and lockfile).
  • Extend results/paper tooling to include ajpegli in library ordering/scope checks; refresh README tables and narrative to include ajpegli.
  • Add GCP --libs plumbing so runs can target a subset of decoders (e.g., only ajpegli) and commit new ajpegli result JSONs for the 5-platform matrix.

Reviewed changes

Copilot reviewed 27 out of 32 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
uv.lock Adds the ajpegli locked dependency and wires it into the mainstream extra metadata.
pyproject.toml Adds ajpegli to mainstream extras and registers the new decoder via entry points.
imread_benchmark/decoders/ajpegli_decoder.py New decoder implementation using ajpegli.imread/ajpegli.imdecode.
tools/_results.py Adds ajpegli to the canonical library ordering used by loaders/plots/README rendering.
tools/paper_assets.py Updates paper-scope validation/robustness notes and makes Fig02 x-limits dynamic.
tests/test_decoders.py Extends registry expectations to include ajpegli.
tests/conftest.py Adds ajpegli distribution→module mapping for test skip behavior.
tests/test_paper_assets.py Updates expected decoder counts/row counts for the expanded paper scope.
README.md Updates benchmark tables/metadata date and documentation sections to include ajpegli.
gcp/run.sh Adds --libs flag support and passes it via instance metadata.
gcp/vm_startup.sh Reads libs metadata and passes it through to imread-benchmark run --libs.
docs/plotting_and_statistics.md Adjusts wording to cover strict native decoders more generally (incl. ajpegli).
docs/gcp_benchmarks.md Updates examples/placeholders and documents --libs for single-decoder runs.
output/linux_AMD-EPYC-9B14/run_summary.json Updates run summary timestamp and recorded requested/run libs.
output/linux_AMD-EPYC-9B14/ajpegli_1t_results.json Adds ajpegli 1-thread benchmark results for this platform.
output/linux_AMD-EPYC-9B14/ajpegli_dataloader_results.json Adds ajpegli DataLoader benchmark results for this platform.
output/linux_AMD-EPYC-9B45/run_summary.json Updates run summary timestamp and recorded requested/run libs.
output/linux_AMD-EPYC-9B45/ajpegli_1t_results.json Adds ajpegli 1-thread benchmark results for this platform.
output/linux_AMD-EPYC-9B45/ajpegli_dataloader_results.json Adds ajpegli DataLoader benchmark results for this platform.
output/linux_INTEL(R)-XEON(R)-PLATINUM-8581C-CPU-@-2.30GHz/run_summary.json Updates run summary timestamp and recorded requested/run libs.
output/linux_INTEL(R)-XEON(R)-PLATINUM-8581C-CPU-@-2.30GHz/ajpegli_1t_results.json Adds ajpegli 1-thread benchmark results for this platform.
output/linux_INTEL(R)-XEON(R)-PLATINUM-8581C-CPU-@-2.30GHz/ajpegli_dataloader_results.json Adds ajpegli DataLoader benchmark results for this platform.
output/linux_Neoverse-V2/run_summary.json Updates run summary timestamp and recorded requested/run libs.
output/linux_Neoverse-V2/ajpegli_1t_results.json Adds ajpegli 1-thread benchmark results for this platform.
output/linux_Neoverse-V2/ajpegli_dataloader_results.json Adds ajpegli DataLoader benchmark results for this platform.
output/linux_Neoverse-N1/run_summary.json Updates run summary timestamp and recorded requested/run libs.
output/linux_Neoverse-N1/ajpegli_1t_results.json Adds ajpegli 1-thread benchmark results for this platform.
output/linux_Neoverse-N1/ajpegli_dataloader_results.json Adds ajpegli DataLoader benchmark results for this platform.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tools/paper_assets.py
Comment on lines +692 to +695
finite_deltas = [d for deltas in delta_by_platform.values() for d in deltas if not pd.isna(d)]
x_min = min(-18.0, min(finite_deltas) - 5.0)
x_max = max(32.0, max(finite_deltas) + 8.0)

Comment on lines 14 to 20
"libs_requested": [
"imagecodecs",
"imageio",
"jpeg4py",
"kornia",
"opencv",
"pillow",
"pyvips",
"simplejpeg",
"skimage",
"tensorflow",
"torchvision",
"turbojpeg"
"ajpegli"
],
"libs_skipped_platform": [],
"libs_run": [
"imagecodecs",
"imageio",
"jpeg4py",
"kornia",
"opencv",
"pillow",
"pyvips",
"simplejpeg",
"skimage",
"tensorflow",
"torchvision",
"turbojpeg"
"ajpegli"
],
Comment on lines 14 to 20
"libs_requested": [
"imagecodecs",
"imageio",
"jpeg4py",
"kornia",
"opencv",
"pillow",
"pyvips",
"simplejpeg",
"skimage",
"tensorflow",
"torchvision",
"turbojpeg"
"ajpegli"
],
"libs_skipped_platform": [],
"libs_run": [
"imagecodecs",
"imageio",
"jpeg4py",
"kornia",
"opencv",
"pillow",
"pyvips",
"simplejpeg",
"skimage",
"tensorflow",
"torchvision",
"turbojpeg"
"ajpegli"
],
Comment on lines 14 to 20
"libs_requested": [
"imagecodecs",
"imageio",
"jpeg4py",
"kornia",
"opencv",
"pillow",
"pyvips",
"simplejpeg",
"skimage",
"tensorflow",
"torchvision",
"turbojpeg"
"ajpegli"
],
"libs_skipped_platform": [],
"libs_run": [
"imagecodecs",
"imageio",
"jpeg4py",
"kornia",
"opencv",
"pillow",
"pyvips",
"simplejpeg",
"skimage",
"tensorflow",
"torchvision",
"turbojpeg"
"ajpegli"
],
Comment on lines 14 to 20
"libs_requested": [
"imagecodecs",
"imageio",
"jpeg4py",
"kornia",
"opencv",
"pillow",
"pyvips",
"simplejpeg",
"skimage",
"tensorflow",
"torchvision",
"turbojpeg"
"ajpegli"
],
"libs_skipped_platform": [],
"libs_run": [
"imagecodecs",
"imageio",
"jpeg4py",
"kornia",
"opencv",
"pillow",
"pyvips",
"simplejpeg",
"skimage",
"tensorflow",
"torchvision",
"turbojpeg"
"ajpegli"
],
Comment on lines 14 to 20
"libs_requested": [
"imagecodecs",
"imageio",
"jpeg4py",
"kornia",
"opencv",
"pillow",
"pyvips",
"simplejpeg",
"skimage",
"tensorflow",
"torchvision",
"turbojpeg"
"ajpegli"
],
"libs_skipped_platform": [],
"libs_run": [
"imagecodecs",
"imageio",
"jpeg4py",
"kornia",
"opencv",
"pillow",
"pyvips",
"simplejpeg",
"skimage",
"tensorflow",
"torchvision",
"turbojpeg"
"ajpegli"
],
@ternaus ternaus merged commit f869db8 into main May 20, 2026
10 checks passed
@ternaus ternaus deleted the codex/add-ajpegli-decoder branch May 20, 2026 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants