Skip to content

PR 6/7: Benchmarking infrastructure for scipy vs Pyomo#11

Merged
bernalde merged 15 commits into
pr/pyomo-optimizersfrom
pr/benchmarks
May 31, 2026
Merged

PR 6/7: Benchmarking infrastructure for scipy vs Pyomo#11
bernalde merged 15 commits into
pr/pyomo-optimizersfrom
pr/benchmarks

Conversation

@bernalde

@bernalde bernalde commented Mar 3, 2026

Copy link
Copy Markdown
Member

Summary

Systematic comparison framework for scipy vs Pyomo optimizers. CLI tool (grid_cli.py) generates parameter sweeps across product resistance, heat transfer coefficients, and operating conditions. Compares objective values, solve times, and constraint satisfaction. Includes Jupyter notebook for visualization and analysis.

PR 6 of 7 in the Pyomo integration series.

Changes

  • benchmarks/adapters.py — Scipy/Pyomo adapter layer
  • benchmarks/grid_cli.py — CLI for running benchmark grids
  • benchmarks/scenarios.py — Predefined test scenarios
  • benchmarks/schema.py — Result schema definition
  • benchmarks/validate.py — Result validation
  • benchmarks/grid_analysis.ipynb — Analysis notebook
  • benchmarks/README.md — Benchmark documentation
  • benchmarks/results/ — Reference results and .gitignore

Usage

# Run default benchmark
python benchmarks/grid_cli.py

# Run specific scenario
python benchmarks/grid_cli.py --scenario sucrose_5pct

PR Chain

# PR Status
0 Sync upstream (#5) Merged
1 CI/CD for Pyomo (#6) Open
2 Pyomo dependencies (#7) Open
3 Utils & single-step (#8) Open
4 Multi-period model (#9) Open
5 Optimizer functions (#10) Open
6 Benchmarks (this PR)
7 Docs & examples Pending

Testing

No new pytest tests — benchmarks are standalone scripts. Results directory is gitignored.

@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@bernalde bernalde force-pushed the pr/pyomo-optimizers branch 2 times, most recently from 48a565d to bd79668 Compare March 3, 2026 01:09
@bernalde bernalde force-pushed the pr/benchmarks branch 3 times, most recently from 76a65e5 to fa20d81 Compare March 3, 2026 18:07
@bernalde bernalde force-pushed the pr/pyomo-optimizers branch from 4367219 to f5b43a7 Compare March 3, 2026 18:07
@bernalde bernalde force-pushed the pr/pyomo-optimizers branch from f5b43a7 to ad70a69 Compare April 1, 2026 20:56
@bernalde bernalde force-pushed the pr/pyomo-optimizers branch from ad70a69 to f2ae241 Compare April 1, 2026 21:36
@bernalde bernalde force-pushed the pr/benchmarks branch 2 times, most recently from df516f5 to d2c8ede Compare April 2, 2026 22:19
@bernalde bernalde force-pushed the pr/pyomo-optimizers branch from f2ae241 to c20f8aa Compare April 2, 2026 22:19
@bernalde bernalde force-pushed the pr/pyomo-optimizers branch from c20f8aa to 37d3c04 Compare May 7, 2026 21:42
@bernalde bernalde force-pushed the pr/pyomo-optimizers branch from 37d3c04 to b2fd8fc Compare May 7, 2026 22:27
@bernalde bernalde force-pushed the pr/benchmarks branch 2 times, most recently from 457cd46 to 885a54d Compare May 7, 2026 22:39

@bernalde bernalde left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the PR #7 follow-up comments against this stacked PR. One copyright-header issue carries forward.

Comment thread benchmarks/adapters.py
Comment thread benchmarks/validate.py
Comment thread benchmarks/adapters.py Outdated
@bernalde bernalde force-pushed the pr/pyomo-optimizers branch from e68f7be to d363ddd Compare May 8, 2026 04:10
Comment thread benchmarks/adapters.py Outdated

@bernalde bernalde left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary:

Blocking issues:

  • Documented CLI invocation fails in script mode.
  • Benchmark metrics serialize dryness_target_met as a JSON string instead of a boolean.
  • Tracked reference result files include full trajectory data and add about 27 MB of generated data, contrary to the stated policy.
  • The analysis notebook defaults to a missing, gitignored input file.

Nonblocking issues:

  • README documents Make shortcuts that do not exist.
  • The CLI eagerly requires optional Pyomo dependencies even for help/scipy-only use.
  • hash.inputs is documented but not emitted by generated records.

Questions: None.

Tests run and outcomes:

  • python benchmarks/grid_cli.py --help failed with ModuleNotFoundError: No module named 'benchmarks'.
  • python -m benchmarks.grid_cli --help passed.
  • python -m benchmarks.grid_cli generate --task Tsh --scenario baseline --vary product.A1=16 --vary ht.KC=2.75e-4 --methods scipy --out /tmp/lyopronto-pr11-scipy-smoke.jsonl --force passed and confirmed the serialized metric type issue.
  • python -m ruff check benchmarks passed.
  • python -m pytest tests/ -n auto -v -m "not notebook and not pyomo" --ignore=tests/test_pyomo_models passed: 237 passed in 577.15s. An initial run from a hyphenated temporary worktree failed during collection because this repo has a top-level __init__.py with a relative import; rerunning from /tmp/LyoPRONTO_pr11_review avoided that path artifact.

I would not merge this until the blocking issues above are addressed.

Comment thread benchmarks/grid_cli.py Outdated
Comment thread benchmarks/validate.py Outdated
Comment thread benchmarks/results/README.md Outdated
Comment thread benchmarks/README.md
Comment thread benchmarks/README.md Outdated
Comment thread benchmarks/adapters.py Outdated
Comment thread benchmarks/schema.py
@bernalde

bernalde commented May 8, 2026

Copy link
Copy Markdown
Member Author

ReviewNB bot comment: Not addressed; this was an informational notebook-diff link and did not request a repository change.

@bernalde bernalde left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary:

Blocking issues:

  • The updated PR fixes the earlier review comments, but the new benchmark test module breaks the GitHub doctests workflow during collection because benchmarks is not importable in the installed CI environment.

Nonblocking issues: None.

Questions: None.

Tests run and outcomes:

  • python -m ruff check benchmarks tests/test_benchmarks.py passed.
  • python -m pytest tests/test_benchmarks.py -v passed: 4 passed.
  • python -c 'import json; json.load(open("benchmarks/grid_analysis.ipynb")); print("notebook json ok")' passed.
  • python benchmarks/grid_cli.py --help passed.
  • python -m benchmarks.grid_cli --help passed.
  • python benchmarks/grid_cli.py generate --task Tsh --scenario baseline --vary product.A1=16 --vary ht.KC=2.75e-4 --methods scipy --out /tmp/lyopronto-pr11-review-scipy.jsonl --force passed and produced native JSON booleans, hash.inputs, no trajectory payload, and product_temp_ok=true.
  • python benchmarks/grid_cli.py generate --task Tsh --scenario baseline --vary product.A1=16 --vary ht.KC=2.75e-4 --methods fd --n-elements 4 --out /tmp/lyopronto-pr11-review-fd.jsonl --force passed and produced native JSON booleans, hash.inputs, no trajectory payload, and product_temp_ok=true.
  • python -m pytest tests/ -n auto -v -m "not notebook and not pyomo" --ignore=tests/test_pyomo_models passed: 241 passed.
  • python -m pytest tests/ -n auto -v -m "notebook" --ignore=tests/test_pyomo_models --cov=lyopronto --cov-report=xml:/tmp/pr11-notebook-coverage.xml --cov-report=term-missing could not complete locally because this sandbox disallows Jupyter kernel socket creation (PermissionError: [Errno 1] Operation not permitted). The GitHub doctests check on head 070c68276191fbb6670ab97754f9bcf09e2338ef failed in the same workflow before notebook execution due to ModuleNotFoundError: No module named 'benchmarks' while collecting tests/test_benchmarks.py.

The PR should not be merged as-is. I would not merge this until the blocking issue above is addressed.

I attempted to submit this as REQUEST_CHANGES, but GitHub rejected that state because this account is the PR author, so I am submitting the review as COMMENT.

Comment thread tests/test_benchmarks.py Outdated
@bernalde

bernalde commented May 8, 2026

Copy link
Copy Markdown
Member Author

Pushed commits:

  • 828fd78 (Package benchmark tooling for CI)
  • Earlier review-fix commit already on this branch: 070c682 (Address benchmark review feedback)

Main changes made:

  • Added benchmarks/__init__.py so benchmark modules are an importable package.
  • Updated [tool.setuptools.packages.find] to include benchmarks*, matching the CI installed-package environment.
  • Verified the built wheel contains benchmarks/__init__.py, benchmarks/grid_cli.py, and benchmarks/adapters.py.

Tests run:

  • python -m ruff check benchmarks tests/test_benchmarks.py: passed.
  • python -m pytest tests/test_benchmarks.py -v: passed, 4 passed.
  • python -m pytest tests/ -n auto -v -m "notebook" --ignore=tests/test_pyomo_models --collect-only: passed collection, 2 selected and 241 deselected.
  • python -c 'from setuptools import find_packages; print(find_packages(include=["lyopronto*", "benchmarks*"]))': confirmed benchmarks is discovered.
  • python -m pip wheel . --no-deps --no-build-isolation -w /tmp/pr11-wheel-check: passed.
  • Wheel inspection confirmed benchmark package files are included.
  • python -m pytest tests/ -n auto -v -m "not notebook and not pyomo" --ignore=tests/test_pyomo_models: passed, 241 passed.

Comments intentionally not addressed:

  • The ReviewNB bot comment is informational and does not request a repository change.
  • Older unresolved review threads were already addressed by 070c682; no additional code change was needed for those threads in this pass.

Remaining risks or follow-up:

  • Full local notebook execution could not be completed in this sandbox because Jupyter kernel startup needs socket creation and fails with PermissionError: [Errno 1] Operation not permitted. The relevant CI collection failure was reproduced through GitHub logs and covered locally with notebook-marker collection plus wheel/package verification. The GitHub doctests rerun for 828fd78 is pending.

bernalde added 13 commits May 30, 2026 18:10
This PR adds the benchmarking infrastructure for comparing scipy and Pyomo solvers:

benchmarks/ directory contents:
- adapters.py: Adapter classes for scipy and Pyomo solvers
- grid_cli.py: CLI tool for running grid comparison benchmarks
- scenarios.py: Test scenario definitions (standard, high resistance, etc.)
- schema.py: Result data schema for consistent output
- validate.py: Validation utilities for comparing results
- grid_analysis.ipynb: Jupyter notebook for analyzing benchmark results
- README.md: Documentation for running benchmarks

benchmarks/results/:
- .gitignore: Ignore large result files
- README.md: Documentation for result files
- archive/: Historical benchmark results
- baseline_*.jsonl: Baseline comparison results

Usage:
  python -m benchmarks.grid_cli --scenario baseline_Tsh --grid 3x3
  python -m benchmarks.grid_cli --scenario baseline_Pch --grid 5x5 --solver pyomo

This enables systematic comparison of scipy and Pyomo optimizer performance
across different scenarios and parameter grids.
…chmarks

# Conflicts:
#	lyopronto/pyomo_models/optimizers.py

@bernalde bernalde left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Blocking issues: None.
  2. Nonblocking issues:
  • The tracked benchmark summary JSONL files are stale relative to the current metrics schema.
  • The analysis notebook reports 100% success without checking record-level failure flags.
  1. Questions: None.
  2. Tests run and outcomes:
  • Fetched base pr/pyomo-optimizers and head pr/benchmarks; merge-base diff matched GitHub totals: 87 files, 4,813 additions, 275 deletions.
  • python -m ruff check benchmarks tests/test_benchmarks.py: passed.
  • PYTHONPATH=/tmp/LyoPRONTO_pr11_review pytest tests/test_benchmarks.py -v: 17 passed.
  • PYTHONPATH=/tmp/LyoPRONTO_pr11_review pytest tests/ -n auto -v -m "not notebook and not pyomo" --ignore=tests/test_pyomo_models: 257 passed.
  • PYTHONPATH=/tmp/LyoPRONTO_pr11_review pytest tests/ -n auto -v -m "notebook" --ignore=tests/test_pyomo_models: 2 passed after rerunning outside the sandbox socket restriction.
  • PYTHONPATH=/tmp/LyoPRONTO_pr11_review pytest tests/test_pyomo_models/test_optimizer_framework.py -v: 20 passed.
  • CLI help checks and a one-case SciPy grid_cli.py generate smoke run: passed.
  1. Merge-readiness: No blocking issues found. Since I am authenticated as bernalde, the PR author, I am submitting this as COMMENT; any formal approval must come from another maintainer.

@@ -0,0 +1,27 @@
{"timestamp":"2026-05-08T12:42:03.725801Z","environment":{"python":"3.13.5","platform":"Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.39","numpy":"2.4.2","pyomo":"6.9.5"},"version":2,"task":"Tsh","scenario":"baseline","params":{"product.A1":16.0,"ht.KC":0.000275},"grid":{"param1":{"path":"product.A1","value":16.0},"param2":{"path":"ht.KC","value":0.000275}},"scipy":{"success":true,"wall_time_s":13.709681862965226,"objective_time_hr":12.193457634926368,"solver":{"status":"n/a","termination_condition":"n/a"},"metrics":{"n_points":1221,"final_percent_dried":100.0,"max_Tbot":-24.99999999999993,"monotonic_dried":true,"tsh_bounds_ok":true,"pch_positive":true,"flux_nonnegative":true,"dryness_target_met":true,"product_temp_ok":true,"product_critical_temp":-25.0}},"pyomo":null,"failed":false,"hash.inputs":"272cfbac930314f1","hash.record":"d24b9be03fc7c3f6"}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nonblocking: These tracked reference summaries look stale relative to the current compute_residuals() schema. A fresh one-case CLI run now emits dryness_target_percent, final_dryness_shortfall_percent, max_product_temp_violation_C, and the ramp metric keys, but both checked-in summary files omit them. Because the README and notebook use these as reference data, please regenerate both summaries with the current CLI or update the docs/tests to define them as legacy samples.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in d11a211. Regenerated both tracked summary JSONL files with the current CLI schema and added test_tracked_reference_summaries_use_current_metric_schema to prevent the reference files from drifting again.

Comment thread benchmarks/grid_analysis.ipynb Outdated
"print(\n",
" f\"- Pyomo methods are {df['fd_speedup'].mean():.1f}\u00d7 faster on average (simultaneous vs sequential)\"\n",
")\n",
"print(f\"- All {len(grid_recs)} runs converged successfully (100% success rate)\")\n",

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nonblocking: This prints 100% success unconditionally, even though grid records now carry failed, scipy.success, and pyomo.success. If a benchmark file contains solver failures or post-solve validation failures, the notebook will still report all runs converged. Please compute the success rate from the record flags and list failed parameter combinations.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in d11a211. Added benchmarks.analysis success-summary helpers, updated the notebook to compute the success rate from record flags and list failed parameter combinations, and covered the behavior with test_success_summary_uses_record_failure_flags.

@bernalde

Copy link
Copy Markdown
Member Author
  1. Commits pushed:
  • d11a211 (Address benchmark review comments)
  1. Main changes made:
  • Added benchmarks.analysis helpers to compute record-level success from failed, scipy.success, and pyomo.success.
  • Updated benchmarks/grid_analysis.ipynb to report computed success rate and list failed parameter combinations instead of assuming 100% success.
  • Regenerated baseline_Tsh_3x3_summary.jsonl and baseline_Pch_3x3_summary.jsonl with the current CLI schema.
  • Added tests covering current reference-summary metric keys and success-summary behavior.
  1. Tests run and results:
  • PYTHONPATH=/tmp/LyoPRONTO_pr11_fix pytest tests/test_benchmarks.py::test_tracked_reference_summaries_use_current_metric_schema tests/test_benchmarks.py::test_success_summary_uses_record_failure_flags -q: failed before the fix, then passed after the fix.
  • python -m ruff check benchmarks tests/test_benchmarks.py: passed.
  • python -m ruff format --check benchmarks/analysis.py tests/test_benchmarks.py benchmarks/grid_analysis.ipynb: passed.
  • python - <<'PY' ... json.loads(...): notebook JSON parse passed.
  • PYTHONPATH=/tmp/LyoPRONTO_pr11_fix pytest tests/test_benchmarks.py -v: 19 passed.
  • PYTHONPATH=/tmp/LyoPRONTO_pr11_fix MPLCONFIGDIR=/tmp/mpl-pr11-fix pytest tests/ -n auto -v -m "not notebook and not pyomo" --ignore=tests/test_pyomo_models: 259 passed.
  • PYTHONPATH=/tmp/LyoPRONTO_pr11_fix MPLCONFIGDIR=/tmp/mpl-pr11-fix pytest tests/ -n auto -v -m "notebook" --ignore=tests/test_pyomo_models: failed in the sandbox with PermissionError: [Errno 1] Operation not permitted during Jupyter socket creation; rerun outside the sandbox passed, 2 passed.
  • PYTHONPATH=/tmp/LyoPRONTO_pr11_fix MPLCONFIGDIR=/tmp/mpl-pr11-fix pytest tests/test_pyomo_models/test_optimizer_framework.py -v: 20 passed, 1 warning.
  • python -m mypy benchmarks/analysis.py: passed, with the installed mypy warning that python_version = 3.8 is no longer supported by that mypy version.
  1. Comments intentionally not addressed:
  • None.
  1. Remaining risks or follow-up items:
  • None identified.

@bernalde bernalde merged commit 5751964 into pr/pyomo-optimizers May 31, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant