Add standalone script to analyze savings and lambda dependency by enlorenz · Pull Request #28 · FairRootGroup/powersched

enlorenz · 2026-03-03T09:27:15Z

No description provided.

coderabbitai · 2026-03-03T09:27:39Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Adds a new lambda-sweep analysis tool (analyze_lambda_occupancy.py) that runs train.py in evaluation mode across lambda values and aggregates per-episode occupancy/savings metrics. Also extends per-episode core usage instrumentation (baseline.py, environment.py, metrics_tracker.py), enhances train.py evaluation logging with occupancy in cores/nodes, and improves workload log analysis with quantile-based burst suggestions.

Changes

Cohort / File(s)	Summary
Lambda Sweep Analysis Tool `analyze_lambda_occupancy.py`	New module (≈745 LOC) implementing LambdaRunStats, adaptive lambda scheduling, train.py orchestration, stdout regex parsing for episode/wait metrics, safe stats/derived metrics, CSV/JSON output, and multi-panel plotting with optional polynomial fits.
Workload Statistics Analysis `data/workload_statistics/analyze_workload_logs.py`, `data/workload_statistics/workload_logs.txt`	Added quantile-based burst boundary parsing/suggestion utilities, cores normalization, new CLI flags (`--burst-event-quantile`, `--burst-range-quantiles`), expanded output fields for suggested small/heavy burst ranges and burst detection metadata, and regenerated workload log outputs.
Core Metrics Instrumentation `src/baseline.py`, `src/environment.py`, `src/metrics_tracker.py`, `train.py`	Baseline step now computes and returns num_used_nodes and num_used_cores (return expanded from 4-tuple to 6-tuple); environment and metrics tracker record per-episode `episode_used_cores`, `episode_baseline_used_cores`, and `episode_baseline_used_nodes`; train.py logs agent/baseline occupancy in cores and nodes.
Tests / Visualization `test/test_inspect_workloadgen.py`	Conditional logarithmic y-scale added for duration and cores histograms when heavy-burst probability > 0.0; minor test header update.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Analyzer as analyze_lambda_occupancy.py
    participant Trainer as train.py
    participant Env as environment.py
    participant Baseline as baseline.py
    participant Metrics as metrics_tracker.py

    User->>Analyzer: start lambda sweep CLI
    Analyzer->>Analyzer: build lambda schedule (explicit or adaptive)
    loop for each lambda
        Analyzer->>Trainer: spawn train.py --eval --lambda=X
        Trainer->>Env: run episode loop
        Env->>Baseline: baseline_step(...)
        Baseline-->>Env: return (cost, baseline_next_empty_slot, next_job_id, baseline_num_used_nodes, baseline_num_used_cores)
        Env->>Metrics: append episode_used_cores / baseline variants
        Metrics-->>Env: updated per-episode lists
        Trainer->>Trainer: emit evaluation stdout with occupancy/savings/wait lines
        Trainer-->>Analyzer: stdout captured
        Analyzer->>Analyzer: parse episode metrics + wait summary, compute LambdaRunStats
    end
    Analyzer->>Analyzer: aggregate, fit trends, write CSV/JSON and plot
    Analyzer->>User: produce summary files and visualizations

sequenceDiagram
    participant Trainer as train.py
    participant Env as environment.py
    participant Baseline as baseline.py
    participant Metrics as metrics_tracker.py

    Trainer->>Env: begin episode
    Env->>Metrics: reset_episode_metrics()
    Metrics-->>Env: initialized episode_used_cores and baseline lists

    loop per step
        Env->>Baseline: baseline_step(...)
        Baseline-->>Env: return baseline_num_used_nodes, baseline_num_used_cores (part of 6-tuple)
        Env->>Metrics: append episode_used_cores, episode_baseline_used_nodes, episode_baseline_used_cores
    end

    Trainer->>Metrics: query episode aggregates
    Metrics-->>Trainer: provide per-episode core/node usage
    Trainer->>Trainer: compute/log occupancy in cores and nodes

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Rework of Rewards and Queueing + Small hotfixes #17 — Modifies baseline_step return signature and related environment/metrics plumbing (strong overlap in baseline/metrics changes).
Rebuilt carry-over logic to fit with new infrastructure. Also added two standalone plotting tests. #11 — Also changes baseline_step returns and adds episode-level core usage fields in MetricsTracker.
Hourly sampler: aggregate smaller jobs #19 — Touches the same simulation modules and per-episode/runtime metrics, overlapping on instrumentation and logging changes.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 8.11% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	No pull request description was provided by the author, making it impossible to evaluate relatedness to the changeset.	Add a description explaining the purpose of the new analyze_lambda_occupancy.py script and how the supporting changes relate to the overall objective.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: adding a standalone script (analyze_lambda_occupancy.py) to analyze savings and lambda dependency, which represents the primary focus of the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/baseline.py (2)

54-55: ⚠️ Potential issue | 🟡 Minor

Docstring is outdated.

The docstring states the function returns a 4-tuple, but it now returns 6 values including num_used_nodes and num_used_cores.

📝 Proposed fix

     Returns:
-        Tuple of (baseline_cost, baseline_cost_off, updated baseline_next_empty_slot, updated next_job_id)
+        Tuple of (baseline_cost, baseline_cost_off, updated baseline_next_empty_slot, updated next_job_id, num_used_nodes, num_used_cores)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/baseline.py` around lines 54 - 55, The docstring in the baseline
calculation function is outdated: it still documents a 4-tuple return but the
function now returns six values (baseline_cost, baseline_cost_off,
baseline_next_empty_slot, next_job_id, num_used_nodes, num_used_cores). Update
the Returns section of the docstring in src/baseline.py where the function
(e.g., the baseline calculation function) is defined to list all six returned
values in the correct order and briefly describe each (including num_used_nodes
and num_used_cores) so the docstring matches the actual return tuple.

33-33: ⚠️ Potential issue | 🟡 Minor

Return type annotation is inconsistent with actual return.

The function now returns a 6-tuple (baseline_cost, baseline_cost_off, baseline_next_empty_slot, next_job_id, num_used_nodes, num_used_cores) but the type hint still shows tuple[float, float, int, int].

🔧 Proposed fix

-) -> tuple[float, float, int, int]:
+) -> tuple[float, float, int, int, int, int]:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/baseline.py` at line 33, The return type annotation is out of sync: the
function returns six values (baseline_cost, baseline_cost_off,
baseline_next_empty_slot, next_job_id, num_used_nodes, num_used_cores) but the
signature currently declares tuple[float, float, int, int]; update the
function's return type annotation to reflect six elements (e.g., tuple[float,
float, int, int, int, int]) in the function definition in src/baseline.py so
callers and type checkers see the correct shape.

🧹 Nitpick comments (7)

analyze_lambda_occupancy.py (3)

374-374: Dynamically attached method relies on implementation detail.

evaluate.cache_keys is attached at line 714, but select_lambda_schedule calls it at line 374. This coupling is fragile—if someone calls select_lambda_schedule with a different callable, it will fail with AttributeError.

Consider either:

Passing the cache explicitly as a parameter

Documenting the expected interface in the function signature using a Protocol
♻️ Alternative: pass cache explicitly
 def select_lambda_schedule(
     args: argparse.Namespace,
     evaluate: Callable[[int], LambdaRunStats],
+    cache: dict[int, LambdaRunStats],
 ) -> list[int]:
Then replace evaluate.cache_keys() with cache.keys() throughout the function.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_lambda_occupancy.py` at line 374, select_lambda_schedule currently
depends on a dynamically attached method evaluate.cache_keys which is fragile;
change the function to accept the cache explicitly (e.g., add a parameter named
cache) and replace all uses of evaluate.cache_keys() with cache.keys(), or
alternatively declare an explicit Protocol type describing a keys() method and
annotate the cache parameter (or evaluate) in select_lambda_schedule's signature
so callers must provide a compatible object; update all call sites to pass the
cache object and adjust any imports/typing accordingly.
498-498: Add strict=True to zip() to catch length mismatches.

If x, y, and point_colors have different lengths, silent truncation could cause subtle bugs. Adding strict=True will raise an error if the iterables have different lengths.
♻️ Proposed fix
-        for i, (xi, yi, c) in enumerate(zip(x, y, point_colors)):
+        for i, (xi, yi, c) in enumerate(zip(x, y, point_colors, strict=True)):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_lambda_occupancy.py` at line 498, The loop over x, y, and
point_colors silently truncates if lengths differ; update the enumerate(zip(x,
y, point_colors)) call in analyze_lambda_occupancy.py (the loop that
destructures into xi, yi, c) to use strict=True (i.e., zip(x, y, point_colors,
strict=True)) so a ValueError is raised on length mismatch and the issue
surfaces immediately.
268-283: Subprocess invocation is safe but consider adding timeout.

The command is built from controlled argparse inputs, so injection risk is low. However, subprocess.run without a timeout could hang indefinitely if train.py stalls.
⏱️ Proposed fix: add timeout
     completed = subprocess.run(
         command,
         cwd=str(project_root),
         capture_output=True,
         text=True,
         check=False,
+        timeout=3600,  # 1 hour timeout per lambda evaluation
     )
You'll need to handle subprocess.TimeoutExpired in the caller.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_lambda_occupancy.py` around lines 268 - 283, The subprocess.run call
that executes `command` (producing `completed`) needs a timeout to avoid hangs:
add a sensible timeout argument to the `subprocess.run(...)` invocation and
catch `subprocess.TimeoutExpired` around that call; on timeout, include context
(the `lambda_value`, the command, and a snippet via `os_tail`) and raise or
propagate a `RuntimeError` similar to the existing non-zero exit handling so the
caller (which already expects errors) can handle it; keep existing behavior for
printing (`args.echo_train_output`) and returncode handling intact.

src/metrics_tracker.py (1)

82-88: Inconsistent type annotations.

Type annotations were removed from some episode lists (lines 82-88) while other lists in the same method retain them (lines 90-95). This creates inconsistency. Consider either adding type annotations to all new fields or documenting why they were intentionally omitted.

♻️ Proposed fix for consistency

         # Time series data for plotting (episode)
-        self.episode_on_nodes = []
-        self.episode_used_nodes = []
-        self.episode_used_cores = []
-        self.episode_baseline_used_nodes = []
-        self.episode_baseline_used_cores = []
-        self.episode_job_queue_sizes = []
-        self.episode_price_stats = []
+        self.episode_on_nodes: list[int] = []
+        self.episode_used_nodes: list[int] = []
+        self.episode_used_cores: list[int] = []
+        self.episode_baseline_used_nodes: list[int] = []
+        self.episode_baseline_used_cores: list[int] = []
+        self.episode_job_queue_sizes: list[int] = []
+        self.episode_price_stats: list[float] = []

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/metrics_tracker.py` around lines 82 - 88, The new episode list attributes
(episode_on_nodes, episode_used_nodes, episode_used_cores,
episode_baseline_used_nodes, episode_baseline_used_cores,
episode_job_queue_sizes, episode_price_stats) lack type annotations and should
be annotated consistently with the other episode fields in the class (e.g., the
annotated lists declared on lines 90-95); update each of these attributes in the
MetricsTracker class (or the enclosing init method) to include the same list
type hints used elsewhere (matching types like List[int], List[float], or
List[List[int]] as appropriate) so all episode fields use consistent type
annotations.

train.py (1)

227-229: Consider extracting occupancy calculations for readability.

The inline expressions are correct but dense. Extracting them into local variables before the print statement would improve readability.

✨ Optional refactor for readability

+            agent_occ_cores = env.metrics.episode_used_cores[-1] * 100 / (CORES_PER_NODE * MAX_NODES) if env.metrics.episode_used_cores else 0
+            baseline_occ_cores = env.metrics.episode_baseline_used_cores[-1] * 100 / (CORES_PER_NODE * MAX_NODES) if env.metrics.episode_baseline_used_cores else 0
+            agent_occ_nodes = env.metrics.episode_used_nodes[-1] * 100 / MAX_NODES if env.metrics.episode_used_nodes else 0
+            baseline_occ_nodes = env.metrics.episode_baseline_used_nodes[-1] * 100 / MAX_NODES if env.metrics.episode_baseline_used_nodes else 0
             print(f"  Episode {episode + 1}: "
                 ...
                 f"MaxQueue={env.metrics.max_queue_size_reached}, "
-                f"Agent Occupancy (Cores)={env.metrics.episode_used_cores[-1]*100/(CORES_PER_NODE*MAX_NODES)if env.metrics.episode_used_cores else 0 :.2f}%, Baseline Occupancy (Cores)={env.metrics.episode_baseline_used_cores[-1]*100/(CORES_PER_NODE*MAX_NODES) if env.metrics.episode_baseline_used_cores else 0 :.2f}% "
-                f"Agent Occupancy (Nodes)={env.metrics.episode_used_nodes[-1]*100/MAX_NODES if env.metrics.episode_used_nodes else 0 :.2f}%, Baseline Occupancy (Nodes)={env.metrics.episode_baseline_used_nodes[-1]*100/MAX_NODES if env.metrics.episode_baseline_used_nodes else 0 :.2f}% " )
+                f"Agent Occupancy (Cores)={agent_occ_cores:.2f}%, Baseline Occupancy (Cores)={baseline_occ_cores:.2f}% "
+                f"Agent Occupancy (Nodes)={agent_occ_nodes:.2f}%, Baseline Occupancy (Nodes)={baseline_occ_nodes:.2f}%")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@train.py` around lines 227 - 229, Extract the dense inline occupancy
calculations into clearly named local variables before the print call: read
max_queue = env.metrics.max_queue_size_reached, compute agent_cores_pct and
baseline_cores_pct using env.metrics.episode_used_cores and
env.metrics.episode_baseline_used_cores with the existing empty-list guards and
CORES_PER_NODE and MAX_NODES, and compute agent_nodes_pct and baseline_nodes_pct
from env.metrics.episode_used_nodes and env.metrics.episode_baseline_used_nodes
with the existing guards and MAX_NODES; then use those variables in the f-string
instead of the long inline expressions to improve readability in train.py where
the print/log is emitted.

data/workload_statistics/analyze_workload_logs.py (2)

591-596: Validate --burst-event-quantile at parse time.

Right now invalid values are silently clamped downstream. Rejecting out-of-range input (<0 or >1) in argparse avoids hidden misconfiguration.

🔧 Suggested parser hardening

+def parse_quantile(raw: str) -> float:
+    try:
+        q = float(raw)
+    except ValueError as exc:
+        raise argparse.ArgumentTypeError(f"Invalid quantile '{raw}'") from exc
+    if not (0.0 <= q <= 1.0):
+        raise argparse.ArgumentTypeError("Quantile must be in [0,1]")
+    return q
+
 ...
-    parser.add_argument(
-        "--burst-event-quantile",
-        type=float,
+    parser.add_argument(
+        "--burst-event-quantile",
+        type=parse_quantile,
         default=0.75,
         help="Quantile of positive baseline-adjusted hourly counts used to isolate burst hours (default: 0.75).",
     )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@data/workload_statistics/analyze_workload_logs.py` around lines 591 - 596,
The --burst-event-quantile parser entry currently allows out-of-range floats;
validate it at parse time by adding an argparse-level check: implement and use a
small validator (e.g., validate_quantile) as the type or call a post-parse check
for the argument from parser.add_argument("--burst-event-quantile", ...),
ensuring values are between 0.0 and 1.0 and raising argparse.ArgumentTypeError
(or exiting with parser.error) for invalid input so bad configs are rejected
rather than silently clamped downstream.

306-319: Narrow the exception scope in row parsing.

Catching Exception here can hide coding errors and skew skipped_rows silently. Limit this to expected parse/data errors.

🧭 Suggested exception narrowing

-        except Exception:
+        except (KeyError, ValueError, TypeError):
             skipped += 1
             return

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@data/workload_statistics/analyze_workload_logs.py` around lines 306 - 319,
The broad except Exception in the consume function should be narrowed to only
expected parse/data errors to avoid hiding bugs; replace the generic except with
a specific catch for parsing and missing-key issues (e.g., except (KeyError,
ValueError, TypeError): skipped += 1; return) while letting other exceptions
propagate so real coding errors surface; reference the consume function and the
calls/operations parse_submit_hour, parse_duration_hours, float(...) on
row[submit_col], row[duration_col], row[nodes_col], row[cores_col], and the
cores_is_total/node_count branch when making the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@analyze_lambda_occupancy.py`:
- Line 717: The list comprehension using an ambiguous loop variable `l`
(stats_ordered = [cache[l] for l in sorted(selected_lambdas)]) should use a
clearer name; change `l` to a descriptive identifier such as `lambda_name` or
`lambda_id` and update the comprehension to stats_ordered = [cache[lambda_name]
for lambda_name in sorted(selected_lambdas)]; also scan the surrounding scope
for any other uses of `l` related to selected_lambdas and rename them
consistently to avoid confusion with the digit `1`.

In `@data/workload_statistics/analyze_workload_logs.py`:
- Around line 683-701: The generated suggested_flags block emits "nan" for unset
integer stats; update the code that constructs the f-string (the block using
fmt_suggested_int and stats[...] keys like suggested_wg_burst_small_jobs_min,
suggested_wg_burst_heavy_jobs_min, etc.) to omit any integer flag whose value is
not finite: check each numeric value with math.isfinite (or numpy.isfinite)
before appending its "--wg-..." fragment and only call fmt_suggested_int when
finite; for probabilities you can keep the current formatting but for all
"*_min", "*_max", "*_nodes*", "*_cores*" fields skip adding the flag when the
stat is non-finite (alternatively substitute a safe integer default), ensuring
the produced suggested_flags string is CLI-valid.

---

Outside diff comments:
In `@src/baseline.py`:
- Around line 54-55: The docstring in the baseline calculation function is
outdated: it still documents a 4-tuple return but the function now returns six
values (baseline_cost, baseline_cost_off, baseline_next_empty_slot, next_job_id,
num_used_nodes, num_used_cores). Update the Returns section of the docstring in
src/baseline.py where the function (e.g., the baseline calculation function) is
defined to list all six returned values in the correct order and briefly
describe each (including num_used_nodes and num_used_cores) so the docstring
matches the actual return tuple.
- Line 33: The return type annotation is out of sync: the function returns six
values (baseline_cost, baseline_cost_off, baseline_next_empty_slot, next_job_id,
num_used_nodes, num_used_cores) but the signature currently declares
tuple[float, float, int, int]; update the function's return type annotation to
reflect six elements (e.g., tuple[float, float, int, int, int, int]) in the
function definition in src/baseline.py so callers and type checkers see the
correct shape.

---

Nitpick comments:
In `@analyze_lambda_occupancy.py`:
- Line 374: select_lambda_schedule currently depends on a dynamically attached
method evaluate.cache_keys which is fragile; change the function to accept the
cache explicitly (e.g., add a parameter named cache) and replace all uses of
evaluate.cache_keys() with cache.keys(), or alternatively declare an explicit
Protocol type describing a keys() method and annotate the cache parameter (or
evaluate) in select_lambda_schedule's signature so callers must provide a
compatible object; update all call sites to pass the cache object and adjust any
imports/typing accordingly.
- Line 498: The loop over x, y, and point_colors silently truncates if lengths
differ; update the enumerate(zip(x, y, point_colors)) call in
analyze_lambda_occupancy.py (the loop that destructures into xi, yi, c) to use
strict=True (i.e., zip(x, y, point_colors, strict=True)) so a ValueError is
raised on length mismatch and the issue surfaces immediately.
- Around line 268-283: The subprocess.run call that executes `command`
(producing `completed`) needs a timeout to avoid hangs: add a sensible timeout
argument to the `subprocess.run(...)` invocation and catch
`subprocess.TimeoutExpired` around that call; on timeout, include context (the
`lambda_value`, the command, and a snippet via `os_tail`) and raise or propagate
a `RuntimeError` similar to the existing non-zero exit handling so the caller
(which already expects errors) can handle it; keep existing behavior for
printing (`args.echo_train_output`) and returncode handling intact.

In `@data/workload_statistics/analyze_workload_logs.py`:
- Around line 591-596: The --burst-event-quantile parser entry currently allows
out-of-range floats; validate it at parse time by adding an argparse-level
check: implement and use a small validator (e.g., validate_quantile) as the type
or call a post-parse check for the argument from
parser.add_argument("--burst-event-quantile", ...), ensuring values are between
0.0 and 1.0 and raising argparse.ArgumentTypeError (or exiting with
parser.error) for invalid input so bad configs are rejected rather than silently
clamped downstream.
- Around line 306-319: The broad except Exception in the consume function should
be narrowed to only expected parse/data errors to avoid hiding bugs; replace the
generic except with a specific catch for parsing and missing-key issues (e.g.,
except (KeyError, ValueError, TypeError): skipped += 1; return) while letting
other exceptions propagate so real coding errors surface; reference the consume
function and the calls/operations parse_submit_hour, parse_duration_hours,
float(...) on row[submit_col], row[duration_col], row[nodes_col],
row[cores_col], and the cores_is_total/node_count branch when making the change.

In `@src/metrics_tracker.py`:
- Around line 82-88: The new episode list attributes (episode_on_nodes,
episode_used_nodes, episode_used_cores, episode_baseline_used_nodes,
episode_baseline_used_cores, episode_job_queue_sizes, episode_price_stats) lack
type annotations and should be annotated consistently with the other episode
fields in the class (e.g., the annotated lists declared on lines 90-95); update
each of these attributes in the MetricsTracker class (or the enclosing init
method) to include the same list type hints used elsewhere (matching types like
List[int], List[float], or List[List[int]] as appropriate) so all episode fields
use consistent type annotations.

In `@train.py`:
- Around line 227-229: Extract the dense inline occupancy calculations into
clearly named local variables before the print call: read max_queue =
env.metrics.max_queue_size_reached, compute agent_cores_pct and
baseline_cores_pct using env.metrics.episode_used_cores and
env.metrics.episode_baseline_used_cores with the existing empty-list guards and
CORES_PER_NODE and MAX_NODES, and compute agent_nodes_pct and baseline_nodes_pct
from env.metrics.episode_used_nodes and env.metrics.episode_baseline_used_nodes
with the existing guards and MAX_NODES; then use those variables in the f-string
instead of the long inline expressions to improve readability in train.py where
the print/log is emitted.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22e32ef and 334d04d.

📒 Files selected for processing (8)

analyze_lambda_occupancy.py
data/workload_statistics/analyze_workload_logs.py
data/workload_statistics/workload_logs.txt
src/baseline.py
src/environment.py
src/metrics_tracker.py
test/test_inspect_workloadgen.py
train.py

analyze_lambda_occupancy.py

coderabbitai · 2026-03-03T09:32:29Z

data/workload_statistics/analyze_workload_logs.py

            "suggested_flags: "
            f"--wg-burst-small-prob {stats['suggested_wg_burst_small_prob']:.4f} "
-            f"--wg-burst-heavy-prob {stats['suggested_wg_burst_heavy_prob']:.4f}"
+            f"--wg-burst-heavy-prob {stats['suggested_wg_burst_heavy_prob']:.4f} "
+            f"--wg-burst-small-jobs-min {fmt_suggested_int(stats['suggested_wg_burst_small_jobs_min'])} "
+            f"--wg-burst-small-jobs-max {fmt_suggested_int(stats['suggested_wg_burst_small_jobs_max'])} "
+            f"--wg-burst-small-duration-min {fmt_suggested_int(stats['suggested_wg_burst_small_duration_min'])} "
+            f"--wg-burst-small-duration-max {fmt_suggested_int(stats['suggested_wg_burst_small_duration_max'])} "
+            f"--wg-burst-small-nodes-min {fmt_suggested_int(stats['suggested_wg_burst_small_nodes_min'])} "
+            f"--wg-burst-small-nodes-max {fmt_suggested_int(stats['suggested_wg_burst_small_nodes_max'])} "
+            f"--wg-burst-small-cores-min {fmt_suggested_int(stats['suggested_wg_burst_small_cores_min'])} "
+            f"--wg-burst-small-cores-max {fmt_suggested_int(stats['suggested_wg_burst_small_cores_max'])} "
+            f"--wg-burst-heavy-jobs-min {fmt_suggested_int(stats['suggested_wg_burst_heavy_jobs_min'])} "
+            f"--wg-burst-heavy-jobs-max {fmt_suggested_int(stats['suggested_wg_burst_heavy_jobs_max'])} "
+            f"--wg-burst-heavy-duration-min {fmt_suggested_int(stats['suggested_wg_burst_heavy_duration_min'])} "
+            f"--wg-burst-heavy-duration-max {fmt_suggested_int(stats['suggested_wg_burst_heavy_duration_max'])} "
+            f"--wg-burst-heavy-nodes-min {fmt_suggested_int(stats['suggested_wg_burst_heavy_nodes_min'])} "
+            f"--wg-burst-heavy-nodes-max {fmt_suggested_int(stats['suggested_wg_burst_heavy_nodes_max'])} "
+            f"--wg-burst-heavy-cores-min {fmt_suggested_int(stats['suggested_wg_burst_heavy_cores_min'])} "
+            f"--wg-burst-heavy-cores-max {fmt_suggested_int(stats['suggested_wg_burst_heavy_cores_max'])}"


⚠️ Potential issue | 🟠 Major

Do not emit nan in suggested CLI flags.

Lines 686-701 can print flags like --wg-burst-heavy-jobs-min nan. Copy-pasting this into training commands will break integer flag parsing. Omit unset integer flags (or substitute defaults) when values are non-finite.

💡 Suggested fix (omit non-finite integer flags)

+def _maybe_int_flag(name: str, value: float) -> str: + if not math.isfinite(value): + return "" + return f"{name} {fmt_suggested_int(value)}" + ... - print( - "suggested_flags: " - f"--wg-burst-small-prob {stats['suggested_wg_burst_small_prob']:.4f} " - f"--wg-burst-heavy-prob {stats['suggested_wg_burst_heavy_prob']:.4f} " - f"--wg-burst-small-jobs-min {fmt_suggested_int(stats['suggested_wg_burst_small_jobs_min'])} " - ... - f"--wg-burst-heavy-cores-max {fmt_suggested_int(stats['suggested_wg_burst_heavy_cores_max'])}" - ) + flag_parts = [ + f"--wg-burst-small-prob {stats['suggested_wg_burst_small_prob']:.4f}", + f"--wg-burst-heavy-prob {stats['suggested_wg_burst_heavy_prob']:.4f}", + _maybe_int_flag("--wg-burst-small-jobs-min", stats["suggested_wg_burst_small_jobs_min"]), + _maybe_int_flag("--wg-burst-small-jobs-max", stats["suggested_wg_burst_small_jobs_max"]), + _maybe_int_flag("--wg-burst-small-duration-min", stats["suggested_wg_burst_small_duration_min"]), + _maybe_int_flag("--wg-burst-small-duration-max", stats["suggested_wg_burst_small_duration_max"]), + _maybe_int_flag("--wg-burst-small-nodes-min", stats["suggested_wg_burst_small_nodes_min"]), + _maybe_int_flag("--wg-burst-small-nodes-max", stats["suggested_wg_burst_small_nodes_max"]), + _maybe_int_flag("--wg-burst-small-cores-min", stats["suggested_wg_burst_small_cores_min"]), + _maybe_int_flag("--wg-burst-small-cores-max", stats["suggested_wg_burst_small_cores_max"]), + _maybe_int_flag("--wg-burst-heavy-jobs-min", stats["suggested_wg_burst_heavy_jobs_min"]), + _maybe_int_flag("--wg-burst-heavy-jobs-max", stats["suggested_wg_burst_heavy_jobs_max"]), + _maybe_int_flag("--wg-burst-heavy-duration-min", stats["suggested_wg_burst_heavy_duration_min"]), + _maybe_int_flag("--wg-burst-heavy-duration-max", stats["suggested_wg_burst_heavy_duration_max"]), + _maybe_int_flag("--wg-burst-heavy-nodes-min", stats["suggested_wg_burst_heavy_nodes_min"]), + _maybe_int_flag("--wg-burst-heavy-nodes-max", stats["suggested_wg_burst_heavy_nodes_max"]), + _maybe_int_flag("--wg-burst-heavy-cores-min", stats["suggested_wg_burst_heavy_cores_min"]), + _maybe_int_flag("--wg-burst-heavy-cores-max", stats["suggested_wg_burst_heavy_cores_max"]), + ] + print("suggested_flags: " + " ".join(p for p in flag_parts if p))

Based on learnings: Train.py must support workload generator burst parameters, so emitted suggested flags should remain CLI-valid.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@data/workload_statistics/analyze_workload_logs.py` around lines 683 - 701, The generated suggested_flags block emits "nan" for unset integer stats; update the code that constructs the f-string (the block using fmt_suggested_int and stats[...] keys like suggested_wg_burst_small_jobs_min, suggested_wg_burst_heavy_jobs_min, etc.) to omit any integer flag whose value is not finite: check each numeric value with math.isfinite (or numpy.isfinite) before appending its "--wg-..." fragment and only call fmt_suggested_int when finite; for probabilities you can keep the current formatting but for all "*_min", "*_max", "*_nodes*", "*_cores*" fields skip adding the flag when the stat is non-finite (alternatively substitute a safe integer default), ensuring the produced suggested_flags string is CLI-valid.

train.py

src/baseline.py

src/metrics_tracker.py

analyze_lambda_occupancy.py

…aluation mode as capacity/occupancy of cluster.

…endlines - add `analyze_lambda_occupancy.py` to run `train.py` in evaluation mode across Poisson lambdas - default run matches the provided training/evaluation flags (12-month evaluation, model/session/workload defaults) - parse per-episode `Agent Occupancy (Nodes)` and cumulative savings from `train.py` output - convert cumulative savings to per-episode savings and compute mean/std per lambda - implement adaptive lambda selection to cover occupancy range with minimal points, with `--lambdas` override - fit 3rd-order polynomial curves for: - lambda vs occupancy - occupancy vs savings - occupancy vs savings_off - export artifacts: `summary.csv`, `summary.json`, `trendlines.png`, plus per-lambda logs - add optional `--echo-train-output` and headless plotting support via matplotlib `Agg` Add: extra plots, showing completion rate vs lambda and effective savings

Add: Fit now toggled through parser

Co-authored-by: Alexey Rybalchenko <alexryba@gmail.com>

Add: Fit now toggled through parser

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

enlorenz marked this pull request as draft March 3, 2026 09:44

enlorenz marked this pull request as ready for review March 3, 2026 10:15

rbx reviewed Mar 3, 2026

View reviewed changes

train.py Outdated Show resolved Hide resolved

rbx reviewed Mar 3, 2026

View reviewed changes

src/baseline.py Outdated Show resolved Hide resolved

rbx reviewed Mar 3, 2026

View reviewed changes

src/metrics_tracker.py Outdated Show resolved Hide resolved

rbx reviewed Mar 3, 2026

View reviewed changes

analyze_lambda_occupancy.py Outdated Show resolved Hide resolved

rbx approved these changes Mar 3, 2026

View reviewed changes

Enis Lorenz and others added 8 commits March 3, 2026 12:43

Minor add: analyze_workload, get burst params

b62c4e2

Add extra metric (baseline) for used nodes/cores and print them in ev…

cdd0e7d

…aluation mode as capacity/occupancy of cluster.

Add: QoL Plotting now color-coded

4e10bb9

Add: Fit now toggled through parser

Update train.py

692a100

Co-authored-by: Alexey Rybalchenko <alexryba@gmail.com>

Update src/baseline.py

fa72e4b

Co-authored-by: Alexey Rybalchenko <alexryba@gmail.com>

Update analyze_lambda_occupancy.py

1cf81f2

Co-authored-by: Alexey Rybalchenko <alexryba@gmail.com>

Add: QoL Plotting now color-coded

fd6442f

Add: Fit now toggled through parser

enlorenz force-pushed the analyze branch from 6b77b49 to fd6442f Compare March 3, 2026 11:43

rbx merged commit 3b7adec into FairRootGroup:master Mar 3, 2026
3 of 4 checks passed

enlorenz deleted the analyze branch March 3, 2026 12:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add standalone script to analyze savings and lambda dependency#28

Add standalone script to analyze savings and lambda dependency#28
rbx merged 8 commits intoFairRootGroup:masterfrom
enlorenz:analyze

enlorenz commented Mar 3, 2026

Uh oh!

coderabbitai bot commented Mar 3, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Mar 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

enlorenz commented Mar 3, 2026

Uh oh!

coderabbitai bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Mar 3, 2026 •

edited

Loading