Skip to content

Add standalone script to analyze savings and lambda dependency#28

Merged
rbx merged 8 commits intoFairRootGroup:masterfrom
enlorenz:analyze
Mar 3, 2026
Merged

Add standalone script to analyze savings and lambda dependency#28
rbx merged 8 commits intoFairRootGroup:masterfrom
enlorenz:analyze

Conversation

@enlorenz
Copy link
Contributor

@enlorenz enlorenz commented Mar 3, 2026

No description provided.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 3, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Adds a new lambda-sweep analysis tool (analyze_lambda_occupancy.py) that runs train.py in evaluation mode across lambda values and aggregates per-episode occupancy/savings metrics. Also extends per-episode core usage instrumentation (baseline.py, environment.py, metrics_tracker.py), enhances train.py evaluation logging with occupancy in cores/nodes, and improves workload log analysis with quantile-based burst suggestions.

Changes

Cohort / File(s) Summary
Lambda Sweep Analysis Tool
analyze_lambda_occupancy.py
New module (≈745 LOC) implementing LambdaRunStats, adaptive lambda scheduling, train.py orchestration, stdout regex parsing for episode/wait metrics, safe stats/derived metrics, CSV/JSON output, and multi-panel plotting with optional polynomial fits.
Workload Statistics Analysis
data/workload_statistics/analyze_workload_logs.py, data/workload_statistics/workload_logs.txt
Added quantile-based burst boundary parsing/suggestion utilities, cores normalization, new CLI flags (--burst-event-quantile, --burst-range-quantiles), expanded output fields for suggested small/heavy burst ranges and burst detection metadata, and regenerated workload log outputs.
Core Metrics Instrumentation
src/baseline.py, src/environment.py, src/metrics_tracker.py, train.py
Baseline step now computes and returns num_used_nodes and num_used_cores (return expanded from 4-tuple to 6-tuple); environment and metrics tracker record per-episode episode_used_cores, episode_baseline_used_cores, and episode_baseline_used_nodes; train.py logs agent/baseline occupancy in cores and nodes.
Tests / Visualization
test/test_inspect_workloadgen.py
Conditional logarithmic y-scale added for duration and cores histograms when heavy-burst probability > 0.0; minor test header update.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Analyzer as analyze_lambda_occupancy.py
    participant Trainer as train.py
    participant Env as environment.py
    participant Baseline as baseline.py
    participant Metrics as metrics_tracker.py

    User->>Analyzer: start lambda sweep CLI
    Analyzer->>Analyzer: build lambda schedule (explicit or adaptive)
    loop for each lambda
        Analyzer->>Trainer: spawn train.py --eval --lambda=X
        Trainer->>Env: run episode loop
        Env->>Baseline: baseline_step(...)
        Baseline-->>Env: return (cost, baseline_next_empty_slot, next_job_id, baseline_num_used_nodes, baseline_num_used_cores)
        Env->>Metrics: append episode_used_cores / baseline variants
        Metrics-->>Env: updated per-episode lists
        Trainer->>Trainer: emit evaluation stdout with occupancy/savings/wait lines
        Trainer-->>Analyzer: stdout captured
        Analyzer->>Analyzer: parse episode metrics + wait summary, compute LambdaRunStats
    end
    Analyzer->>Analyzer: aggregate, fit trends, write CSV/JSON and plot
    Analyzer->>User: produce summary files and visualizations
Loading
sequenceDiagram
    participant Trainer as train.py
    participant Env as environment.py
    participant Baseline as baseline.py
    participant Metrics as metrics_tracker.py

    Trainer->>Env: begin episode
    Env->>Metrics: reset_episode_metrics()
    Metrics-->>Env: initialized episode_used_cores and baseline lists

    loop per step
        Env->>Baseline: baseline_step(...)
        Baseline-->>Env: return baseline_num_used_nodes, baseline_num_used_cores (part of 6-tuple)
        Env->>Metrics: append episode_used_cores, episode_baseline_used_nodes, episode_baseline_used_cores
    end

    Trainer->>Metrics: query episode aggregates
    Metrics-->>Trainer: provide per-episode core/node usage
    Trainer->>Trainer: compute/log occupancy in cores and nodes
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive No pull request description was provided by the author, making it impossible to evaluate relatedness to the changeset. Add a description explaining the purpose of the new analyze_lambda_occupancy.py script and how the supporting changes relate to the overall objective.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding a standalone script (analyze_lambda_occupancy.py) to analyze savings and lambda dependency, which represents the primary focus of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/baseline.py (2)

54-55: ⚠️ Potential issue | 🟡 Minor

Docstring is outdated.

The docstring states the function returns a 4-tuple, but it now returns 6 values including num_used_nodes and num_used_cores.

📝 Proposed fix
     Returns:
-        Tuple of (baseline_cost, baseline_cost_off, updated baseline_next_empty_slot, updated next_job_id)
+        Tuple of (baseline_cost, baseline_cost_off, updated baseline_next_empty_slot, updated next_job_id, num_used_nodes, num_used_cores)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/baseline.py` around lines 54 - 55, The docstring in the baseline
calculation function is outdated: it still documents a 4-tuple return but the
function now returns six values (baseline_cost, baseline_cost_off,
baseline_next_empty_slot, next_job_id, num_used_nodes, num_used_cores). Update
the Returns section of the docstring in src/baseline.py where the function
(e.g., the baseline calculation function) is defined to list all six returned
values in the correct order and briefly describe each (including num_used_nodes
and num_used_cores) so the docstring matches the actual return tuple.

33-33: ⚠️ Potential issue | 🟡 Minor

Return type annotation is inconsistent with actual return.

The function now returns a 6-tuple (baseline_cost, baseline_cost_off, baseline_next_empty_slot, next_job_id, num_used_nodes, num_used_cores) but the type hint still shows tuple[float, float, int, int].

🔧 Proposed fix
-) -> tuple[float, float, int, int]:
+) -> tuple[float, float, int, int, int, int]:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/baseline.py` at line 33, The return type annotation is out of sync: the
function returns six values (baseline_cost, baseline_cost_off,
baseline_next_empty_slot, next_job_id, num_used_nodes, num_used_cores) but the
signature currently declares tuple[float, float, int, int]; update the
function's return type annotation to reflect six elements (e.g., tuple[float,
float, int, int, int, int]) in the function definition in src/baseline.py so
callers and type checkers see the correct shape.
🧹 Nitpick comments (7)
analyze_lambda_occupancy.py (3)

374-374: Dynamically attached method relies on implementation detail.

evaluate.cache_keys is attached at line 714, but select_lambda_schedule calls it at line 374. This coupling is fragile—if someone calls select_lambda_schedule with a different callable, it will fail with AttributeError.

Consider either:

  1. Passing the cache explicitly as a parameter
  2. Documenting the expected interface in the function signature using a Protocol
♻️ Alternative: pass cache explicitly
 def select_lambda_schedule(
     args: argparse.Namespace,
     evaluate: Callable[[int], LambdaRunStats],
+    cache: dict[int, LambdaRunStats],
 ) -> list[int]:

Then replace evaluate.cache_keys() with cache.keys() throughout the function.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_lambda_occupancy.py` at line 374, select_lambda_schedule currently
depends on a dynamically attached method evaluate.cache_keys which is fragile;
change the function to accept the cache explicitly (e.g., add a parameter named
cache) and replace all uses of evaluate.cache_keys() with cache.keys(), or
alternatively declare an explicit Protocol type describing a keys() method and
annotate the cache parameter (or evaluate) in select_lambda_schedule's signature
so callers must provide a compatible object; update all call sites to pass the
cache object and adjust any imports/typing accordingly.

498-498: Add strict=True to zip() to catch length mismatches.

If x, y, and point_colors have different lengths, silent truncation could cause subtle bugs. Adding strict=True will raise an error if the iterables have different lengths.

♻️ Proposed fix
-        for i, (xi, yi, c) in enumerate(zip(x, y, point_colors)):
+        for i, (xi, yi, c) in enumerate(zip(x, y, point_colors, strict=True)):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_lambda_occupancy.py` at line 498, The loop over x, y, and
point_colors silently truncates if lengths differ; update the enumerate(zip(x,
y, point_colors)) call in analyze_lambda_occupancy.py (the loop that
destructures into xi, yi, c) to use strict=True (i.e., zip(x, y, point_colors,
strict=True)) so a ValueError is raised on length mismatch and the issue
surfaces immediately.

268-283: Subprocess invocation is safe but consider adding timeout.

The command is built from controlled argparse inputs, so injection risk is low. However, subprocess.run without a timeout could hang indefinitely if train.py stalls.

⏱️ Proposed fix: add timeout
     completed = subprocess.run(
         command,
         cwd=str(project_root),
         capture_output=True,
         text=True,
         check=False,
+        timeout=3600,  # 1 hour timeout per lambda evaluation
     )

You'll need to handle subprocess.TimeoutExpired in the caller.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_lambda_occupancy.py` around lines 268 - 283, The subprocess.run call
that executes `command` (producing `completed`) needs a timeout to avoid hangs:
add a sensible timeout argument to the `subprocess.run(...)` invocation and
catch `subprocess.TimeoutExpired` around that call; on timeout, include context
(the `lambda_value`, the command, and a snippet via `os_tail`) and raise or
propagate a `RuntimeError` similar to the existing non-zero exit handling so the
caller (which already expects errors) can handle it; keep existing behavior for
printing (`args.echo_train_output`) and returncode handling intact.
src/metrics_tracker.py (1)

82-88: Inconsistent type annotations.

Type annotations were removed from some episode lists (lines 82-88) while other lists in the same method retain them (lines 90-95). This creates inconsistency. Consider either adding type annotations to all new fields or documenting why they were intentionally omitted.

♻️ Proposed fix for consistency
         # Time series data for plotting (episode)
-        self.episode_on_nodes = []
-        self.episode_used_nodes = []
-        self.episode_used_cores = []
-        self.episode_baseline_used_nodes = []
-        self.episode_baseline_used_cores = []
-        self.episode_job_queue_sizes = []
-        self.episode_price_stats = []
+        self.episode_on_nodes: list[int] = []
+        self.episode_used_nodes: list[int] = []
+        self.episode_used_cores: list[int] = []
+        self.episode_baseline_used_nodes: list[int] = []
+        self.episode_baseline_used_cores: list[int] = []
+        self.episode_job_queue_sizes: list[int] = []
+        self.episode_price_stats: list[float] = []
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/metrics_tracker.py` around lines 82 - 88, The new episode list attributes
(episode_on_nodes, episode_used_nodes, episode_used_cores,
episode_baseline_used_nodes, episode_baseline_used_cores,
episode_job_queue_sizes, episode_price_stats) lack type annotations and should
be annotated consistently with the other episode fields in the class (e.g., the
annotated lists declared on lines 90-95); update each of these attributes in the
MetricsTracker class (or the enclosing init method) to include the same list
type hints used elsewhere (matching types like List[int], List[float], or
List[List[int]] as appropriate) so all episode fields use consistent type
annotations.
train.py (1)

227-229: Consider extracting occupancy calculations for readability.

The inline expressions are correct but dense. Extracting them into local variables before the print statement would improve readability.

✨ Optional refactor for readability
+            agent_occ_cores = env.metrics.episode_used_cores[-1] * 100 / (CORES_PER_NODE * MAX_NODES) if env.metrics.episode_used_cores else 0
+            baseline_occ_cores = env.metrics.episode_baseline_used_cores[-1] * 100 / (CORES_PER_NODE * MAX_NODES) if env.metrics.episode_baseline_used_cores else 0
+            agent_occ_nodes = env.metrics.episode_used_nodes[-1] * 100 / MAX_NODES if env.metrics.episode_used_nodes else 0
+            baseline_occ_nodes = env.metrics.episode_baseline_used_nodes[-1] * 100 / MAX_NODES if env.metrics.episode_baseline_used_nodes else 0
             print(f"  Episode {episode + 1}: "
                 ...
                 f"MaxQueue={env.metrics.max_queue_size_reached}, "
-                f"Agent Occupancy (Cores)={env.metrics.episode_used_cores[-1]*100/(CORES_PER_NODE*MAX_NODES)if env.metrics.episode_used_cores else 0 :.2f}%, Baseline Occupancy (Cores)={env.metrics.episode_baseline_used_cores[-1]*100/(CORES_PER_NODE*MAX_NODES) if env.metrics.episode_baseline_used_cores else 0 :.2f}% "
-                f"Agent Occupancy (Nodes)={env.metrics.episode_used_nodes[-1]*100/MAX_NODES if env.metrics.episode_used_nodes else 0 :.2f}%, Baseline Occupancy (Nodes)={env.metrics.episode_baseline_used_nodes[-1]*100/MAX_NODES if env.metrics.episode_baseline_used_nodes else 0 :.2f}% " )
+                f"Agent Occupancy (Cores)={agent_occ_cores:.2f}%, Baseline Occupancy (Cores)={baseline_occ_cores:.2f}% "
+                f"Agent Occupancy (Nodes)={agent_occ_nodes:.2f}%, Baseline Occupancy (Nodes)={baseline_occ_nodes:.2f}%")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@train.py` around lines 227 - 229, Extract the dense inline occupancy
calculations into clearly named local variables before the print call: read
max_queue = env.metrics.max_queue_size_reached, compute agent_cores_pct and
baseline_cores_pct using env.metrics.episode_used_cores and
env.metrics.episode_baseline_used_cores with the existing empty-list guards and
CORES_PER_NODE and MAX_NODES, and compute agent_nodes_pct and baseline_nodes_pct
from env.metrics.episode_used_nodes and env.metrics.episode_baseline_used_nodes
with the existing guards and MAX_NODES; then use those variables in the f-string
instead of the long inline expressions to improve readability in train.py where
the print/log is emitted.
data/workload_statistics/analyze_workload_logs.py (2)

591-596: Validate --burst-event-quantile at parse time.

Right now invalid values are silently clamped downstream. Rejecting out-of-range input (<0 or >1) in argparse avoids hidden misconfiguration.

🔧 Suggested parser hardening
+def parse_quantile(raw: str) -> float:
+    try:
+        q = float(raw)
+    except ValueError as exc:
+        raise argparse.ArgumentTypeError(f"Invalid quantile '{raw}'") from exc
+    if not (0.0 <= q <= 1.0):
+        raise argparse.ArgumentTypeError("Quantile must be in [0,1]")
+    return q
+
 ...
-    parser.add_argument(
-        "--burst-event-quantile",
-        type=float,
+    parser.add_argument(
+        "--burst-event-quantile",
+        type=parse_quantile,
         default=0.75,
         help="Quantile of positive baseline-adjusted hourly counts used to isolate burst hours (default: 0.75).",
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/workload_statistics/analyze_workload_logs.py` around lines 591 - 596,
The --burst-event-quantile parser entry currently allows out-of-range floats;
validate it at parse time by adding an argparse-level check: implement and use a
small validator (e.g., validate_quantile) as the type or call a post-parse check
for the argument from parser.add_argument("--burst-event-quantile", ...),
ensuring values are between 0.0 and 1.0 and raising argparse.ArgumentTypeError
(or exiting with parser.error) for invalid input so bad configs are rejected
rather than silently clamped downstream.

306-319: Narrow the exception scope in row parsing.

Catching Exception here can hide coding errors and skew skipped_rows silently. Limit this to expected parse/data errors.

🧭 Suggested exception narrowing
-        except Exception:
+        except (KeyError, ValueError, TypeError):
             skipped += 1
             return
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/workload_statistics/analyze_workload_logs.py` around lines 306 - 319,
The broad except Exception in the consume function should be narrowed to only
expected parse/data errors to avoid hiding bugs; replace the generic except with
a specific catch for parsing and missing-key issues (e.g., except (KeyError,
ValueError, TypeError): skipped += 1; return) while letting other exceptions
propagate so real coding errors surface; reference the consume function and the
calls/operations parse_submit_hour, parse_duration_hours, float(...) on
row[submit_col], row[duration_col], row[nodes_col], row[cores_col], and the
cores_is_total/node_count branch when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@analyze_lambda_occupancy.py`:
- Line 717: The list comprehension using an ambiguous loop variable `l`
(stats_ordered = [cache[l] for l in sorted(selected_lambdas)]) should use a
clearer name; change `l` to a descriptive identifier such as `lambda_name` or
`lambda_id` and update the comprehension to stats_ordered = [cache[lambda_name]
for lambda_name in sorted(selected_lambdas)]; also scan the surrounding scope
for any other uses of `l` related to selected_lambdas and rename them
consistently to avoid confusion with the digit `1`.

In `@data/workload_statistics/analyze_workload_logs.py`:
- Around line 683-701: The generated suggested_flags block emits "nan" for unset
integer stats; update the code that constructs the f-string (the block using
fmt_suggested_int and stats[...] keys like suggested_wg_burst_small_jobs_min,
suggested_wg_burst_heavy_jobs_min, etc.) to omit any integer flag whose value is
not finite: check each numeric value with math.isfinite (or numpy.isfinite)
before appending its "--wg-..." fragment and only call fmt_suggested_int when
finite; for probabilities you can keep the current formatting but for all
"*_min", "*_max", "*_nodes*", "*_cores*" fields skip adding the flag when the
stat is non-finite (alternatively substitute a safe integer default), ensuring
the produced suggested_flags string is CLI-valid.

---

Outside diff comments:
In `@src/baseline.py`:
- Around line 54-55: The docstring in the baseline calculation function is
outdated: it still documents a 4-tuple return but the function now returns six
values (baseline_cost, baseline_cost_off, baseline_next_empty_slot, next_job_id,
num_used_nodes, num_used_cores). Update the Returns section of the docstring in
src/baseline.py where the function (e.g., the baseline calculation function) is
defined to list all six returned values in the correct order and briefly
describe each (including num_used_nodes and num_used_cores) so the docstring
matches the actual return tuple.
- Line 33: The return type annotation is out of sync: the function returns six
values (baseline_cost, baseline_cost_off, baseline_next_empty_slot, next_job_id,
num_used_nodes, num_used_cores) but the signature currently declares
tuple[float, float, int, int]; update the function's return type annotation to
reflect six elements (e.g., tuple[float, float, int, int, int, int]) in the
function definition in src/baseline.py so callers and type checkers see the
correct shape.

---

Nitpick comments:
In `@analyze_lambda_occupancy.py`:
- Line 374: select_lambda_schedule currently depends on a dynamically attached
method evaluate.cache_keys which is fragile; change the function to accept the
cache explicitly (e.g., add a parameter named cache) and replace all uses of
evaluate.cache_keys() with cache.keys(), or alternatively declare an explicit
Protocol type describing a keys() method and annotate the cache parameter (or
evaluate) in select_lambda_schedule's signature so callers must provide a
compatible object; update all call sites to pass the cache object and adjust any
imports/typing accordingly.
- Line 498: The loop over x, y, and point_colors silently truncates if lengths
differ; update the enumerate(zip(x, y, point_colors)) call in
analyze_lambda_occupancy.py (the loop that destructures into xi, yi, c) to use
strict=True (i.e., zip(x, y, point_colors, strict=True)) so a ValueError is
raised on length mismatch and the issue surfaces immediately.
- Around line 268-283: The subprocess.run call that executes `command`
(producing `completed`) needs a timeout to avoid hangs: add a sensible timeout
argument to the `subprocess.run(...)` invocation and catch
`subprocess.TimeoutExpired` around that call; on timeout, include context (the
`lambda_value`, the command, and a snippet via `os_tail`) and raise or propagate
a `RuntimeError` similar to the existing non-zero exit handling so the caller
(which already expects errors) can handle it; keep existing behavior for
printing (`args.echo_train_output`) and returncode handling intact.

In `@data/workload_statistics/analyze_workload_logs.py`:
- Around line 591-596: The --burst-event-quantile parser entry currently allows
out-of-range floats; validate it at parse time by adding an argparse-level
check: implement and use a small validator (e.g., validate_quantile) as the type
or call a post-parse check for the argument from
parser.add_argument("--burst-event-quantile", ...), ensuring values are between
0.0 and 1.0 and raising argparse.ArgumentTypeError (or exiting with
parser.error) for invalid input so bad configs are rejected rather than silently
clamped downstream.
- Around line 306-319: The broad except Exception in the consume function should
be narrowed to only expected parse/data errors to avoid hiding bugs; replace the
generic except with a specific catch for parsing and missing-key issues (e.g.,
except (KeyError, ValueError, TypeError): skipped += 1; return) while letting
other exceptions propagate so real coding errors surface; reference the consume
function and the calls/operations parse_submit_hour, parse_duration_hours,
float(...) on row[submit_col], row[duration_col], row[nodes_col],
row[cores_col], and the cores_is_total/node_count branch when making the change.

In `@src/metrics_tracker.py`:
- Around line 82-88: The new episode list attributes (episode_on_nodes,
episode_used_nodes, episode_used_cores, episode_baseline_used_nodes,
episode_baseline_used_cores, episode_job_queue_sizes, episode_price_stats) lack
type annotations and should be annotated consistently with the other episode
fields in the class (e.g., the annotated lists declared on lines 90-95); update
each of these attributes in the MetricsTracker class (or the enclosing init
method) to include the same list type hints used elsewhere (matching types like
List[int], List[float], or List[List[int]] as appropriate) so all episode fields
use consistent type annotations.

In `@train.py`:
- Around line 227-229: Extract the dense inline occupancy calculations into
clearly named local variables before the print call: read max_queue =
env.metrics.max_queue_size_reached, compute agent_cores_pct and
baseline_cores_pct using env.metrics.episode_used_cores and
env.metrics.episode_baseline_used_cores with the existing empty-list guards and
CORES_PER_NODE and MAX_NODES, and compute agent_nodes_pct and baseline_nodes_pct
from env.metrics.episode_used_nodes and env.metrics.episode_baseline_used_nodes
with the existing guards and MAX_NODES; then use those variables in the f-string
instead of the long inline expressions to improve readability in train.py where
the print/log is emitted.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22e32ef and 334d04d.

📒 Files selected for processing (8)
  • analyze_lambda_occupancy.py
  • data/workload_statistics/analyze_workload_logs.py
  • data/workload_statistics/workload_logs.txt
  • src/baseline.py
  • src/environment.py
  • src/metrics_tracker.py
  • test/test_inspect_workloadgen.py
  • train.py

Comment on lines 683 to +701
"suggested_flags: "
f"--wg-burst-small-prob {stats['suggested_wg_burst_small_prob']:.4f} "
f"--wg-burst-heavy-prob {stats['suggested_wg_burst_heavy_prob']:.4f}"
f"--wg-burst-heavy-prob {stats['suggested_wg_burst_heavy_prob']:.4f} "
f"--wg-burst-small-jobs-min {fmt_suggested_int(stats['suggested_wg_burst_small_jobs_min'])} "
f"--wg-burst-small-jobs-max {fmt_suggested_int(stats['suggested_wg_burst_small_jobs_max'])} "
f"--wg-burst-small-duration-min {fmt_suggested_int(stats['suggested_wg_burst_small_duration_min'])} "
f"--wg-burst-small-duration-max {fmt_suggested_int(stats['suggested_wg_burst_small_duration_max'])} "
f"--wg-burst-small-nodes-min {fmt_suggested_int(stats['suggested_wg_burst_small_nodes_min'])} "
f"--wg-burst-small-nodes-max {fmt_suggested_int(stats['suggested_wg_burst_small_nodes_max'])} "
f"--wg-burst-small-cores-min {fmt_suggested_int(stats['suggested_wg_burst_small_cores_min'])} "
f"--wg-burst-small-cores-max {fmt_suggested_int(stats['suggested_wg_burst_small_cores_max'])} "
f"--wg-burst-heavy-jobs-min {fmt_suggested_int(stats['suggested_wg_burst_heavy_jobs_min'])} "
f"--wg-burst-heavy-jobs-max {fmt_suggested_int(stats['suggested_wg_burst_heavy_jobs_max'])} "
f"--wg-burst-heavy-duration-min {fmt_suggested_int(stats['suggested_wg_burst_heavy_duration_min'])} "
f"--wg-burst-heavy-duration-max {fmt_suggested_int(stats['suggested_wg_burst_heavy_duration_max'])} "
f"--wg-burst-heavy-nodes-min {fmt_suggested_int(stats['suggested_wg_burst_heavy_nodes_min'])} "
f"--wg-burst-heavy-nodes-max {fmt_suggested_int(stats['suggested_wg_burst_heavy_nodes_max'])} "
f"--wg-burst-heavy-cores-min {fmt_suggested_int(stats['suggested_wg_burst_heavy_cores_min'])} "
f"--wg-burst-heavy-cores-max {fmt_suggested_int(stats['suggested_wg_burst_heavy_cores_max'])}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not emit nan in suggested CLI flags.

Lines 686-701 can print flags like --wg-burst-heavy-jobs-min nan. Copy-pasting this into training commands will break integer flag parsing. Omit unset integer flags (or substitute defaults) when values are non-finite.

💡 Suggested fix (omit non-finite integer flags)
+def _maybe_int_flag(name: str, value: float) -> str:
+    if not math.isfinite(value):
+        return ""
+    return f"{name} {fmt_suggested_int(value)}"
+
 ...
-        print(
-            "suggested_flags: "
-            f"--wg-burst-small-prob {stats['suggested_wg_burst_small_prob']:.4f} "
-            f"--wg-burst-heavy-prob {stats['suggested_wg_burst_heavy_prob']:.4f} "
-            f"--wg-burst-small-jobs-min {fmt_suggested_int(stats['suggested_wg_burst_small_jobs_min'])} "
-            ...
-            f"--wg-burst-heavy-cores-max {fmt_suggested_int(stats['suggested_wg_burst_heavy_cores_max'])}"
-        )
+        flag_parts = [
+            f"--wg-burst-small-prob {stats['suggested_wg_burst_small_prob']:.4f}",
+            f"--wg-burst-heavy-prob {stats['suggested_wg_burst_heavy_prob']:.4f}",
+            _maybe_int_flag("--wg-burst-small-jobs-min", stats["suggested_wg_burst_small_jobs_min"]),
+            _maybe_int_flag("--wg-burst-small-jobs-max", stats["suggested_wg_burst_small_jobs_max"]),
+            _maybe_int_flag("--wg-burst-small-duration-min", stats["suggested_wg_burst_small_duration_min"]),
+            _maybe_int_flag("--wg-burst-small-duration-max", stats["suggested_wg_burst_small_duration_max"]),
+            _maybe_int_flag("--wg-burst-small-nodes-min", stats["suggested_wg_burst_small_nodes_min"]),
+            _maybe_int_flag("--wg-burst-small-nodes-max", stats["suggested_wg_burst_small_nodes_max"]),
+            _maybe_int_flag("--wg-burst-small-cores-min", stats["suggested_wg_burst_small_cores_min"]),
+            _maybe_int_flag("--wg-burst-small-cores-max", stats["suggested_wg_burst_small_cores_max"]),
+            _maybe_int_flag("--wg-burst-heavy-jobs-min", stats["suggested_wg_burst_heavy_jobs_min"]),
+            _maybe_int_flag("--wg-burst-heavy-jobs-max", stats["suggested_wg_burst_heavy_jobs_max"]),
+            _maybe_int_flag("--wg-burst-heavy-duration-min", stats["suggested_wg_burst_heavy_duration_min"]),
+            _maybe_int_flag("--wg-burst-heavy-duration-max", stats["suggested_wg_burst_heavy_duration_max"]),
+            _maybe_int_flag("--wg-burst-heavy-nodes-min", stats["suggested_wg_burst_heavy_nodes_min"]),
+            _maybe_int_flag("--wg-burst-heavy-nodes-max", stats["suggested_wg_burst_heavy_nodes_max"]),
+            _maybe_int_flag("--wg-burst-heavy-cores-min", stats["suggested_wg_burst_heavy_cores_min"]),
+            _maybe_int_flag("--wg-burst-heavy-cores-max", stats["suggested_wg_burst_heavy_cores_max"]),
+        ]
+        print("suggested_flags: " + " ".join(p for p in flag_parts if p))
Based on learnings: Train.py must support workload generator burst parameters, so emitted suggested flags should remain CLI-valid.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/workload_statistics/analyze_workload_logs.py` around lines 683 - 701,
The generated suggested_flags block emits "nan" for unset integer stats; update
the code that constructs the f-string (the block using fmt_suggested_int and
stats[...] keys like suggested_wg_burst_small_jobs_min,
suggested_wg_burst_heavy_jobs_min, etc.) to omit any integer flag whose value is
not finite: check each numeric value with math.isfinite (or numpy.isfinite)
before appending its "--wg-..." fragment and only call fmt_suggested_int when
finite; for probabilities you can keep the current formatting but for all
"*_min", "*_max", "*_nodes*", "*_cores*" fields skip adding the flag when the
stat is non-finite (alternatively substitute a safe integer default), ensuring
the produced suggested_flags string is CLI-valid.

@enlorenz enlorenz marked this pull request as draft March 3, 2026 09:44
@enlorenz enlorenz marked this pull request as ready for review March 3, 2026 10:15
Enis Lorenz and others added 8 commits March 3, 2026 12:43
…aluation mode as capacity/occupancy of cluster.
…endlines

- add `analyze_lambda_occupancy.py` to run `train.py` in evaluation mode across Poisson lambdas
- default run matches the provided training/evaluation flags (12-month evaluation, model/session/workload defaults)
- parse per-episode `Agent Occupancy (Nodes)` and cumulative savings from `train.py` output
- convert cumulative savings to per-episode savings and compute mean/std per lambda
- implement adaptive lambda selection to cover occupancy range with minimal points, with `--lambdas` override
- fit 3rd-order polynomial curves for:
  - lambda vs occupancy
  - occupancy vs savings
  - occupancy vs savings_off
- export artifacts: `summary.csv`, `summary.json`, `trendlines.png`, plus per-lambda logs
- add optional `--echo-train-output` and headless plotting support via matplotlib `Agg`

Add: extra plots, showing completion rate vs lambda and effective savings
Add: Fit now toggled through parser
Co-authored-by: Alexey Rybalchenko <alexryba@gmail.com>
Co-authored-by: Alexey Rybalchenko <alexryba@gmail.com>
Co-authored-by: Alexey Rybalchenko <alexryba@gmail.com>
Add: Fit now toggled through parser
@rbx rbx merged commit 3b7adec into FairRootGroup:master Mar 3, 2026
3 of 4 checks passed
@enlorenz enlorenz deleted the analyze branch March 3, 2026 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants