Skip to content

Codex/update arxiv preprint citation#35

Merged
ternaus merged 2 commits into
mainfrom
codex/update-arxiv-preprint-citation
May 12, 2026
Merged

Codex/update arxiv preprint citation#35
ternaus merged 2 commits into
mainfrom
codex/update-arxiv-preprint-citation

Conversation

@ternaus
Copy link
Copy Markdown
Owner

@ternaus ternaus commented May 12, 2026

Summary by Sourcery

Add a new LaTeX platform recommendation table for zero-skip DataLoader choices, ensure generated plots use embeddable fonts, and update the README to reference the new arXiv preprint and citation.

New Features:

  • Generate a per-platform zero-skip DataLoader recommendation LaTeX table, including integration into the table generation workflow.

Enhancements:

  • Configure matplotlib PDF/PS font types for better compatibility with LaTeX and vector outputs.

Documentation:

  • Update README with the current preprint link and BibTeX citation, and clarify citation guidance.

Tests:

  • Add coverage to verify the platform recommendation table contents, formatting, and decoder inclusion/exclusion rules.

Copilot AI review requested due to automatic review settings May 12, 2026 07:21
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented May 12, 2026

Reviewer's Guide

Adds a new LaTeX platform recommendation table generator (including escaping helpers and deterministic selection logic), configures matplotlib font types for vector output, and updates the README citation to the new arXiv preprint, with tests covering the new table output expectations.

File-Level Changes

Change Details Files
Configure matplotlib to use TrueType fonts in generated PDF/PS assets.
  • Import the top-level matplotlib module alongside pyplot, seaborn, and Patch in the plotting helper.
  • Set rcParams pdf.fonttype and ps.fonttype to 42 after successful imports to ensure text is embedded as TrueType fonts.
tools/paper_assets.py
Add LaTeX-safe string escaping and a new platform recommendation LaTeX table generator for zero-skip DataLoader choices.
  • Introduce a _latex_escape helper that replaces LaTeX-reserved characters with escaped equivalents.
  • Add helpers to format per-platform DataLoader choices and build table rows using paper-scoped, peak-throughput data and filtered zero-skip decoders.
  • Generate a LaTeX tabularx table with caption, label, and notes describing per-platform zero-skip recommendations, and write it to a .tex file.
tools/paper_assets.py
Integrate the new platform recommendation table into the table generation pipeline and documentation.
  • Call generate_platform_recommendation_table from generate_tables to produce table07_platform_recommendations.tex alongside existing tables.
  • Update the generated tables README section to list the new .tex table output.
tools/paper_assets.py
Add tests to validate the new platform recommendation table content and structure.
  • Import generate_platform_recommendation_table into the test module.
  • Verify that the generated LaTeX file has one data row per platform, each with exactly three img/s entries, and excludes skip decoders, pyvips, and tensorflow.
  • Assert that a set of expected formatted choices (decoder, throughput, workers) appear in the generated table for regression coverage.
tests/test_paper_assets.py
Update README to reference the new arXiv preprint and citation details.
  • Add a direct preprint link near the top of the README in the results description.
  • Replace the old BibTeX entry with a new one for the updated preprint, including new title, year, identifier, primary class, and URL.
README.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The test_platform_recommendation_table_has_three_zero_skip_choices_per_platform test hard-codes exact throughput values and worker counts, which will make it very brittle to any change in source data; consider asserting structural properties (e.g., counts, ordering, presence/absence of libraries) rather than exact numeric strings.
  • The LaTeX table assembly in generate_platform_recommendation_table is largely built from inline raw string fragments; consider factoring the common LaTeX boilerplate or format patterns into small helpers/constants to reduce duplication and make future layout changes easier.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `test_platform_recommendation_table_has_three_zero_skip_choices_per_platform` test hard-codes exact throughput values and worker counts, which will make it very brittle to any change in source data; consider asserting structural properties (e.g., counts, ordering, presence/absence of libraries) rather than exact numeric strings.
- The LaTeX table assembly in `generate_platform_recommendation_table` is largely built from inline raw string fragments; consider factoring the common LaTeX boilerplate or format patterns into small helpers/constants to reduce duplication and make future layout changes easier.

## Individual Comments

### Comment 1
<location path="tests/test_paper_assets.py" line_range="141-150" />
<code_context>
         assert expected_prefix in matching_line
+
+
+def test_platform_recommendation_table_has_three_zero_skip_choices_per_platform(tmp_path: Path) -> None:
+    root = _repo_output()
+    dest = tmp_path / "table07_platform_recommendations.tex"
+    generate_platform_recommendation_table(load_dataloader_results(root), dest)
+
+    text = dest.read_text(encoding="utf-8")
+    data_rows = [line for line in text.splitlines() if "GCP \\texttt" in line]
+    assert len(data_rows) == len(PAPER_PLATFORMS)
+    assert all(row.count("img/s") == 3 for row in data_rows)
+
+    for decoder in EXPECTED_SKIP_DECODERS:
+        assert f"\\texttt{{{decoder}}}:" not in text
+    assert r"\texttt{pyvips}:" not in text
+    assert r"\texttt{tensorflow}:" not in text
+
+    expected_choices = [
</code_context>
<issue_to_address>
**issue (testing):** Missing negative-path test for the case where a platform has fewer than three zero-skip DataLoader choices.

The new test only exercises the happy path with real benchmark data and doesn’t verify the error branch in `_platform_recommendation_rows` that raises `ValueError` when a platform has fewer than three zero-skip choices.

Please add a small synthetic test that builds a minimal `dl` where one platform has only 0–2 zero-skip choices (or temporarily patches `EXPECTED_DATALOADER_DECODERS` / `PAPER_PLATFORMS`), and asserts that `_platform_recommendation_rows` or `generate_platform_recommendation_table` raises `ValueError` with the expected message. This will lock in the failure mode and protect future refactors.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +141 to +150
def test_platform_recommendation_table_has_three_zero_skip_choices_per_platform(tmp_path: Path) -> None:
root = _repo_output()
dest = tmp_path / "table07_platform_recommendations.tex"
generate_platform_recommendation_table(load_dataloader_results(root), dest)

text = dest.read_text(encoding="utf-8")
data_rows = [line for line in text.splitlines() if "GCP \\texttt" in line]
assert len(data_rows) == len(PAPER_PLATFORMS)
assert all(row.count("img/s") == 3 for row in data_rows)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (testing): Missing negative-path test for the case where a platform has fewer than three zero-skip DataLoader choices.

The new test only exercises the happy path with real benchmark data and doesn’t verify the error branch in _platform_recommendation_rows that raises ValueError when a platform has fewer than three zero-skip choices.

Please add a small synthetic test that builds a minimal dl where one platform has only 0–2 zero-skip choices (or temporarily patches EXPECTED_DATALOADER_DECODERS / PAPER_PLATFORMS), and asserts that _platform_recommendation_rows or generate_platform_recommendation_table raises ValueError with the expected message. This will lock in the failure mode and protect future refactors.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the repository’s publication-facing assets to reflect the new arXiv preprint, and extends the paper asset generator to emit an additional per-platform “starting points” LaTeX table (with a validating test).

Changes:

  • Add a new LaTeX table generator for per-platform zero-skip DataLoader recommendations, including LaTeX escaping utilities.
  • Add a regression test asserting three zero-skip choices per platform and expected selections from the committed output/ dataset.
  • Update README with the new preprint link and refreshed BibTeX citation; set Matplotlib font embedding defaults for PDF/PS outputs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
tools/paper_assets.py Adds LaTeX escaping + a new platform recommendation table generator; sets Matplotlib rcParams for font embedding.
tests/test_paper_assets.py Adds a test validating the generated platform recommendation LaTeX table contents.
README.md Updates the preprint link and BibTeX citation to the new arXiv preprint.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ternaus ternaus merged commit 0cd6d69 into main May 12, 2026
10 checks passed
@ternaus ternaus deleted the codex/update-arxiv-preprint-citation branch May 12, 2026 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants