Compare paper OCP against LyoPRONTO baselines#35
Conversation
bernalde
left a comment
There was a problem hiding this comment.
Blocking issues: none.
Nonblocking issues: none.
Questions: none.
Tests run and outcomes:
- Verified documented benchmark ranges against
benchmarks/results/baseline_Tsh_3x3_summary.jsonlandbenchmarks/results/baseline_Pch_3x3_summary.jsonlwithjq; the documented ranges match the committed data. python -m ruff check tests/test_benchmarks.pypassed.python -m ruff format --check tests/test_benchmarks.pypassed.python -m pytest tests/test_benchmarks.py::test_paper_ocp_notes_cover_issue30_comparison -qpassed, 1 test.python -m pytest tests/test_benchmarks.py -qpassed, 18 tests.python -m pytest tests/ -m "not notebook and not pyomo and not slow" -qpassed, 247 tests.python -m pytest tests/ -n auto -v -m "not notebook and not pyomo" --ignore=tests/test_pyomo_models -qpassed, 262 tests.
I did not run the full Pyomo or notebook suites locally because this PR changes benchmark notes and a lightweight documentation guard test; the CI-style non-Pyomo suite passed, and the stacked PR's visible GitHub checks (Deploy docs, doctests) are green.
Merge recommendation: merge as-is.
1788e7f to
d34032d
Compare
9e0629f to
8db5985
Compare
…comparison # Conflicts: # tests/test_benchmarks.py
… fix/issue-30-baseline-comparison
…' into fix/issue-31-policy-ocp-adapter
…icy-ocp-adapter # Conflicts: # docs/PAPER_OCP_VALIDATION.md # lyopronto/pyomo_models/README.md
|
Carrying forward the review of #34, which was merged into Tests at the time, on a clean checkout of the #34 head: Blocking: none. Nonblocking:
Questions:
|
bernalde
left a comment
There was a problem hiding this comment.
Review as maintainer. Note: I am the PR author, so GitHub only permits a COMMENT here; the formal approval or change request must come from another maintainer. There are no blocking issues.
Scope verified: diff against merge-base 5f2c3f7 matches the PR totals (2 files, +61 / -0). Base branch is fix/issue-29-problem-2-ocp (stacked PR), head is fix/issue-30-baseline-comparison. The change is documentation-only (a Paper-vs-LyoPRONTO baseline comparison section in docs/PAPER_OCP_VALIDATION.md) plus one docs-regression test in tests/test_benchmarks.py.
Claims verified by running, not just reading. The doc quotes drying-time ranges attributed to the committed baseline summaries; I parsed those files (benchmarks/results/baseline_Tsh_3x3_summary.jsonl and baseline_Pch_3x3_summary.jsonl, both tracked, pre-existing in the base, not part of this diff) and every quoted figure matches exactly:
- Tsh SciPy 12.19-14.47 h, mean 13.33 h - matches.
- Tsh Pyomo FD 12.40-14.76 h, mean 13.58 h - matches.
- Tsh Pyomo collocation 12.00-14.23 h, mean 13.11 h - matches.
- Pch SciPy 18.12-23.67 h, FD 18.15-23.71 h, collocation 17.90-23.37 h - all match.
The documented grid (product.A1=16,18,20; ht.KC=2.75e-4,3.3e-4,4.0e-4; 9 combos) matches the data, and the documented grid_cli.py generate invocation uses only flags that benchmarks/grid_cli.py actually supports (--task, --scenario, --vary, --methods, --n-elements, --n-collocation, --out). The -25 degC product limit, 98.9% dryness target, 243 K (Problem 1) and 240 K (Problem 2) limits, and the 273 K Problem 1 shelf max all match the data and config.
Issue intent (#30): the PR links Closes #30. This is an analysis/documentation issue and the link is appropriate - all three acceptance criteria are met: (1) benchmark notes include a clear comparison table; (2) the recommendation is explicit (keep the paper model validation-only and adapt policy constraints into LyoPRONTO); (3) adapter requirements are converted into an implementation issue (#31, already in progress as #36). The four work items (define comparable scenarios, run scipy/Pyomo where comparable, report drying time/constraints/controls/temperature, and separate formulation vs transcription differences) are all addressed. Closes #30 will correctly auto-close on merge.
Tests I ran on a clean checkout of the PR head:
- PYTHONPATH=. pytest tests/test_benchmarks.py::test_paper_ocp_notes_cover_issue30_comparison: 1 passed.
- PYTHONPATH=. pytest tests/test_benchmarks.py: 20 passed (the PR description's 18 predates tests added since).
- ruff check and ruff format --check on tests/test_benchmarks.py: clean.
The PYTHONPATH=. requirement noted in the PR description reproduces: without it, collection fails with ModuleNotFoundError: No module named 'benchmarks'. That is a pre-existing repo-layout quirk, not introduced here.
-
Blocking issues: none.
-
Nonblocking issues:
- The new regression test asserts only that literal strings (the full table header and exact numeric ranges such as "12.19-14.47 h") are present in the doc (inline comment). It does not validate those numbers against the JSONL summaries, so if the baselines are regenerated and the figures shift, the doc and data can silently drift while the test still passes. Consider parsing the committed summaries and asserting the documented ranges against them, so the test guards correctness rather than mere presence.
-
Questions: none.
-
Tests run and outcomes: targeted docs test (1 passed), tests/test_benchmarks.py (20 passed), ruff clean. All green; no genuine failures.
-
Merge-readiness: Ready on the merits. Documentation-only change with all quantitative claims independently verified against the committed data, acceptance criteria met, and a passing guard test. The single nonblocking item does not gate merge. It targets fix/issue-29-problem-2-ocp, so it should land there as that stack reaches main. A second maintainer must provide the formal approval since I authored this PR.
| in notes | ||
| ) | ||
| assert "benchmarks/results/baseline_Tsh_3x3_summary.jsonl" in notes | ||
| assert "12.19-14.47 h" in notes |
There was a problem hiding this comment.
Nonblocking: this test guards only that the literal strings (table header, exact ranges like "12.19-14.47 h") remain present in the doc; it does not check them against benchmarks/results/baseline_Tsh_3x3_summary.jsonl. I verified all quoted ranges match the data today, but if the baselines are regenerated the doc and data could drift while this test stays green. Consider parsing the committed summaries (objective_time_hr grouped by scipy / pyomo discretization.method) and asserting min/max/mean against the documented figures, so the test validates correctness rather than presence.
There was a problem hiding this comment.
Addressed in 7aabde6. The test now computes the documented ranges from the tracked benchmark summaries and checks the markdown values against those computed stats. Relevant test: tests/test_benchmarks.py::test_paper_ocp_notes_cover_issue30_comparison.
|
Commits pushed:
Main changes made:
Tests run and results:
Comments intentionally not addressed:
Remaining risks or follow-up items:
|
|
Commits pushed:
Main changes made:
Tests run and results:
Comments intentionally not addressed:
Remaining risks or follow-up items:
|
bernalde
left a comment
There was a problem hiding this comment.
Review as maintainer. Note: I am the PR author, so GitHub only permits a COMMENT here; the formal approval must come from another maintainer. I found no blocking, nonblocking, or question-level issues introduced by this PR.
Scope verified: fetched fix/issue-29-problem-2-ocp and fix/issue-30-baseline-comparison; local diff against merge-base 5f2c3f7 matches GitHub PR totals exactly: 7 files changed, +253 / -20. Current PR head reviewed: 36250bb1b81a6d73fcf4dfd28a6aa544a2e5dc1b.
Issue intent (#30): the Closes #30 link is appropriate. The PR defines comparable scenarios, reports drying time / constraints / controls / temperature behavior, separates paper-vs-LyoPRONTO formulation differences from transcription effects, gives an explicit recommendation to keep the paper model validation-only while adapting policy constraints into LyoPRONTO, and points adapter work to #31.
-
Blocking issues: none.
-
Nonblocking issues: none.
-
Questions: none.
-
Tests run and outcomes:
PYTHONPATH=. pytest tests/test_benchmarks.py::test_paper_ocp_notes_cover_issue30_comparison -q: passed, 1 test.MPLCONFIGDIR=/tmp/matplotlib-cache PYTHONPATH=. pytest tests/test_pyomo_models/test_init.py tests/test_pyomo_models/test_paper_ocp.py -m "not slow" -q: passed, 26 tests.MPLCONFIGDIR=/tmp/matplotlib-cache PYTHONPATH=. pytest tests/test_pyomo_models/test_paper_ocp.py::test_problem2_nz20_solve_keeps_velocity_feasible_and_classifies_policy -q: passed, 1 test.PYTHONPATH=. pytest tests/ -n auto -v -m "not notebook and not pyomo" --ignore=tests/test_pyomo_models -q: passed, 266 tests.MPLCONFIGDIR=/tmp/matplotlib-cache PYTHONPATH=. pytest tests/test_pyomo_models -q: passed, 203 passed / 4 skipped / 1 xfailed.python -m ruff check docs/PAPER_OCP_VALIDATION.md lyopronto/pyomo_models/README.md lyopronto/pyomo_models/__init__.py lyopronto/pyomo_models/paper_ocp.py tests/test_benchmarks.py tests/test_pyomo_models/test_init.py tests/test_pyomo_models/test_paper_ocp.py: passed.python -m ruff format --check lyopronto/pyomo_models/__init__.py lyopronto/pyomo_models/paper_ocp.py tests/test_benchmarks.py tests/test_pyomo_models/test_init.py tests/test_pyomo_models/test_paper_ocp.py: passed.python -m mypy lyopronto: passed; mypy emitted the existing warning that configuredpython_version = 3.8is unsupported by this mypy version.
- Merge-readiness: ready on the merits. Because I authored the PR, this COMMENT is not a formal approval; another maintainer should provide the approving review.
bernalde
left a comment
There was a problem hiding this comment.
Re-review of the updated head 36250bb (the PR grew from 2 files / +61 to 7 files / +253 -20 since my prior review). Note: I am the PR author, so GitHub only permits a COMMENT here; the formal approval or change request must come from another maintainer. There are no blocking issues.
Scope verified: diff against merge-base 5f2c3f7 matches the PR totals (7 files, +253 / -20). Base is fix/issue-29-problem-2-ocp, head is fix/issue-30-baseline-comparison.
The two new commits address prior review feedback well:
- "Validate paper OCP notes against benchmark data" rewrites test_paper_ocp_notes_cover_issue30_comparison to parse the committed JSONL summaries and assert the documented table cells/ranges against computed min/max/mean, instead of matching hardcoded literals. This resolves the doc/data drift nonblocking point from my previous review.
- "Address Problem 2 review follow-ups" renames initialize_paper_problem1_from_trajectory to a problem-agnostic initialize_paper_problem_from_trajectory with a backward-compatible alias (both exported, both in test_init's required set), adds a clarifying comment on the velocity-constraint skip, and adds a refined n_z=20 slow Problem 2 test that asserts velocity stays below the active threshold during the Policy 1 max-heat segment. This resolves the initializer-naming nonblocking and the Policy 1 vs Policy 3 disambiguation question carried from the #34 review.
Verified by running, not just reading:
- The data-driven benchmarks test passes, so the documented Tsh ranges (SciPy 12.19-14.47 mean 13.33; FD 12.40-14.76 mean 13.58; colloc 12.00-14.23 mean 13.11) and Pch ranges match the success-filtered objective_time_hr in the committed summaries.
- The new slow test test_problem2_nz20_solve_keeps_velocity_feasible_and_classifies_policy actually solves (IPOPT via SolverFactory fallback; idaes absent), 0.72s call, and passes including the assertion that interface velocity < limit - 2e-9 across the Policy 1 segment. The PR description's "tests not run" note flags that slow/Pyomo tests were not run locally, so I ran them: all 5 paper_ocp slow tests pass.
- The rename has no stale references elsewhere in the repo (grep across py/md/ipynb outside the module and tests is clean).
- Docs updated consistently: the First-Pass Tolerances section now says "slow tests" and documents the n_z=20 confirmation; the skip rationale is clarified ("only at the initial collocation point, not the first finite element"); the n_z=20 item is removed from Known Limitations now that it is implemented.
Tests I ran on a clean checkout of the head:
- PYTHONPATH=. pytest tests/test_benchmarks.py: 20 passed.
- PYTHONPATH=. pytest tests/test_pyomo_models/test_paper_ocp.py tests/test_pyomo_models/test_init.py -m "not slow": 26 passed.
- PYTHONPATH=. pytest tests/test_pyomo_models/test_paper_ocp.py -m slow: 5 passed (incl. the new n_z=20 solve).
- ruff check and ruff format --check on the changed module and tests: clean.
Issue intent (#30): Closes #30 remains appropriate. The three acceptance criteria (comparison table, explicit recommendation, adapter requirements converted to #31) are met, and the comparison numbers are now machine-checked against the data.
-
Blocking issues: none.
-
Nonblocking issues:
- Scope/description gap (inline comment on the rename). The PR description and Closes #30 describe only the #30 baseline-comparison docs, but the head also carries #29 Problem 2 code changes: a public function rename with a backward-compatible alias, a constraint comment, and two new tests including a slow n_z=20 solve. A public-API rename should be called out so the squash-merge message and any changelog are accurate. Fix: extend the PR description to list the Problem 2 follow-ups and the rename/alias, or split them into a #29 follow-up PR.
- Test-helper robustness (inline comment in tests/test_benchmarks.py): _objective_time_stats computes min/max/mean over success-filtered runs; if a future baseline regeneration yields zero successful runs for a method, min([])/max([]) raise a bare ValueError rather than a clear assertion. A small guard would give a clearer failure.
-
Questions: none. The disambiguation question from the prior review is now answered by the n_z=20 test.
-
Tests run and outcomes: benchmarks (20 passed), paper_ocp + init fast (26 passed), paper_ocp slow (5 passed incl. new n_z=20), ruff clean. All green; no genuine failures.
-
Merge-readiness: Ready on the merits. Prior feedback is addressed, new behavior is tested and verified, and the documented numbers are machine-checked against the committed data. The two nonblocking items do not gate merge. It targets fix/issue-29-problem-2-ocp, so it should land there as the stack reaches main. A second maintainer must provide the formal approval since I authored this PR.
|
|
||
|
|
||
| def initialize_paper_problem1_from_trajectory( | ||
| def initialize_paper_problem_from_trajectory( |
There was a problem hiding this comment.
Nonblocking (scope/description): this is a public-API rename (initialize_paper_problem1_from_trajectory -> initialize_paper_problem_from_trajectory, with a backward-compatible alias below, both exported in init). The rename and the accompanying Problem 2 test additions are not mentioned in the PR description, which describes only the #30 baseline-comparison docs. Please list these Problem 2 (#29) follow-ups in the description, or split them into a separate #29 follow-up PR, so the squash-merge commit and any changelog accurately reflect the public-API change.
|
|
||
| return { | ||
| method: { | ||
| "min": min(method_values), |
There was a problem hiding this comment.
Nonblocking (robustness): min(method_values) / max(method_values) raise a bare ValueError if a method ends up with zero successful runs after a future baseline regeneration. Consider asserting each method has at least one successful run first (e.g. assert method_values, f"no successful {method} runs in {summary_path}") so the failure is self-explanatory rather than an empty-sequence error.
There was a problem hiding this comment.
Addressed in 46aacd8. _objective_time_stats now asserts each method has at least one successful run before min/max/mean, and test_objective_time_stats_reports_empty_method covers the failure message.
Add experimental LyoPRONTO policy OCP adapter
|
Commits pushed:
Main changes made:
Tests run and results:
Comments intentionally not addressed:
Remaining risks or follow-up items:
|
… fix/issue-30-baseline-comparison # Conflicts: # docs/PAPER_OCP_VALIDATION.md
Summary
initialize_paper_problem_from_trajectoryis the problem-neutral initializer,initialize_paper_problem1_from_trajectoryremains a backward-compatible alias, exports/docs use the neutral name, and tests cover the Problem 2 interface-velocity constraint skip plus refinedn_z=20classification behavior.Tests run
MPLCONFIGDIR=/tmp/matplotlib-cache PYTHONPATH=. pytest tests/test_benchmarks.py::test_objective_time_stats_reports_empty_method tests/test_benchmarks.py::test_paper_ocp_notes_cover_issue30_comparison -q- passed, 2 tests.MPLCONFIGDIR=/tmp/matplotlib-cache PYTHONPATH=. pytest tests/test_benchmarks.py -q- passed, 21 tests.MPLCONFIGDIR=/tmp/matplotlib-cache PYTHONPATH=. pytest tests/ -m "not notebook and not pyomo and not slow" -q- passed, 252 tests.python -m ruff check tests/test_benchmarks.py- passed.python -m ruff format --check tests/test_benchmarks.py- passed, 1 file already formatted.python -m mypy lyopronto- passed; mypy emitted the existing warning thatpython_version = 3.8is no longer supported by this mypy version.git diff --check- passed.Notes about tests not run
n_z=20slow Problem 2 validation.PYTHONPATH=.failed during collection in this shell withModuleNotFoundError: No module named 'benchmarks'; the successful commands above set the repository root onPYTHONPATH.Closes #30