feat: remote stage in#468
Conversation
|
Warning Review limit reached
More reviews will be available in 23 minutes and 11 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
WalkthroughThe PR adds automatic node-local stage-in support to the SLURM executor plugin. Two new utility functions ( ChangesAutomatic Node-Local Stage-In Feature
CI Black Debug Step
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Good, only the dependency on the jobstep executor is not working - needs a release, first. Now:
|
|
@johanneskoester & @fbartusch first, the corresponding jobstep PR needs merging for the tests to be succeeding (or be refined as nessary). |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/test_stagein.py (1)
114-124:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift
test_stagein_sshassertions are too weak to validate SSH behavior.Lines 121-123 only verify output/log presence and a broad substring. That can pass without proving SSH stage-in happened or that the expected “SSH unavailable” error path is handled. Please assert explicit stage-in mechanism evidence (or explicit failure messaging) in
ssh.log.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_stagein.py` around lines 114 - 124, The assertions in test_stagein_ssh only verify that files exist and check for a broad substring in size_file, but do not validate the actual SSH stage-in behavior or proper error handling. Replace or supplement the weak assertions (lines checking size_file.exists(), ssh_log.exists(), and the "data/big" substring check) with stronger assertions that explicitly verify SSH behavior evidence or expected error messaging in ssh.log. Specifically, read the contents of ssh.log and assert for specific SSH mechanism details or expected failure messages that prove either successful SSH stage-in occurred or that SSH unavailability was properly handled.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/ci.yml:
- Around line 27-31: The "Debug black environment" step runs black with the
--check flag, which causes it to exit with a non-zero code when formatting
issues are found, blocking subsequent steps from executing. To make this debug
step non-blocking while preserving diagnostic output, add `continue-on-error:
true` as a property of the step (at the same indentation level as the name and
run keys), or alternatively remove the --check flag from the black command and
keep only the diagnostic outputs (version and diff). This allows the subsequent
"Check formatting" step to always execute and report the official formatting
check result.
In `@snakemake_executor_plugin_slurm/__init__.py`:
- Around line 697-715: The get_file_size function call in the job input file
validation loop is incorrect. The function expects a JobExecutorInterface
parameter but is being passed inp.path instead, which will cause a runtime
error. Replace the get_file_size(inp.path) call with the correct invocation that
passes the job object. Additionally, update the warning message to reflect the
correct unit of measurement returned by get_file_size (GB) instead of bytes,
since the function returns rounded GB values not byte counts.
In `@tests/testcases/stagein_ssh/Snakefile`:
- Around line 17-19: The default value assigned to size_bytes (approximately 2
GiB) is below the 4 GB threshold needed to trigger the scp stage-in path for
SSH. According to the documentation, sbcast handles files ≤ 4 GB while scp
handles files > 4 GB. Increase the default value in the os.environ.get call for
"STAGEIN_TEST_BIG_BYTES" from 2 * 1024 * 1024 * 1024 + 1 to a value larger than
4 GB (such as 4 * 1024 * 1024 * 1024 + 1 or greater) to ensure the test
exercises the scp path as intended.
---
Outside diff comments:
In `@tests/test_stagein.py`:
- Around line 114-124: The assertions in test_stagein_ssh only verify that files
exist and check for a broad substring in size_file, but do not validate the
actual SSH stage-in behavior or proper error handling. Replace or supplement the
weak assertions (lines checking size_file.exists(), ssh_log.exists(), and the
"data/big" substring check) with stronger assertions that explicitly verify SSH
behavior evidence or expected error messaging in ssh.log. Specifically, read the
contents of ssh.log and assert for specific SSH mechanism details or expected
failure messages that prove either successful SSH stage-in occurred or that SSH
unavailability was properly handled.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4481e155-74c3-40cd-b083-4df76bbc9da7
📒 Files selected for processing (8)
.github/workflows/ci.ymldocs/further.mdsnakemake_executor_plugin_slurm/__init__.pysnakemake_executor_plugin_slurm/utils.pytests/test_node_local_prefix.pytests/test_stagein.pytests/testcases/stagein_sbcast/Snakefiletests/testcases/stagein_ssh/Snakefile
| for job in jobs: | ||
| for inp in job.input: | ||
| if isinstance( | ||
| inp.flags.get(STORE_KEY), AccessPattern | ||
| ) and inp.flags[STORE_KEY] in { | ||
| AccessPattern.RANDOM, | ||
| AccessPattern.MIXED, | ||
| }: | ||
| size = get_file_size(inp.path) | ||
| if size is not None and size > 100: | ||
| self.logger.warning( | ||
| f"Job '{job.name}' has input file '{inp.path}' with " | ||
| f"random or mixed access pattern and size {size} " | ||
| "bytes. Snakemake will attempt to stage in this file " | ||
| "to the node local directory specified by " | ||
| f"{self.workflow.executor_settings.node_local_prefix}. " | ||
| "However, for files of this size the process might be " | ||
| "slow." | ||
| ) |
There was a problem hiding this comment.
Fix get_file_size call contract; current code can crash before submission.
Line 705 passes inp.path into get_file_size, but that helper expects a JobExecutorInterface and iterates job.input_files. This will raise at runtime on RANDOM/MIXED inputs and break job submission. Also, the log text says “bytes” although the helper returns rounded GB.
Proposed fix
- for job in jobs:
- for inp in job.input:
- if isinstance(
- inp.flags.get(STORE_KEY), AccessPattern
- ) and inp.flags[STORE_KEY] in {
- AccessPattern.RANDOM,
- AccessPattern.MIXED,
- }:
- size = get_file_size(inp.path)
- if size is not None and size > 100:
- self.logger.warning(
- f"Job '{job.name}' has input file '{inp.path}' with "
- f"random or mixed access pattern and size {size} "
- "bytes. Snakemake will attempt to stage in this file "
- "to the node local directory specified by "
- f"{self.workflow.executor_settings.node_local_prefix}. "
- "However, for files of this size the process might be "
- "slow."
- )
+ for job in jobs:
+ has_random_or_mixed = any(
+ isinstance(inp.flags.get(STORE_KEY), AccessPattern)
+ and inp.flags[STORE_KEY] in {AccessPattern.RANDOM, AccessPattern.MIXED}
+ for inp in job.input
+ )
+ if has_random_or_mixed:
+ size_gb = get_file_size(job)
+ if size_gb > 100:
+ self.logger.warning(
+ f"Job '{job.name}' has RANDOM/MIXED inputs totaling "
+ f"about {size_gb} GB. Snakemake will attempt node-local "
+ f"stage-in via {self.workflow.executor_settings.node_local_prefix}, "
+ "which may be slow for inputs of this size."
+ )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@snakemake_executor_plugin_slurm/__init__.py` around lines 697 - 715, The
get_file_size function call in the job input file validation loop is incorrect.
The function expects a JobExecutorInterface parameter but is being passed
inp.path instead, which will cause a runtime error. Replace the
get_file_size(inp.path) call with the correct invocation that passes the job
object. Additionally, update the warning message to reflect the correct unit of
measurement returned by get_file_size (GB) instead of bytes, since the function
returns rounded GB values not byte counts.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…-executor-plugin-slurm into feat/remote_stage_in
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)
698-716:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winCritical: Variable shadowing, redundant loop, and API mismatch break stage-in detection.
The current code has multiple issues:
Variable shadowing: The outer
for inp in job.input:(line 698) is immediately shadowed by the innerany(...for inp in job.input)(line 703), making the outer loop'sinpvalue unpredictable.Redundant outer loop:
has_random_or_mixedis computed identically on every iteration since theany()always iterates over all inputs.Wrong variable after loop: At line 706,
inpreferences the last input from the outer loop, not necessarily one with random/mixed access pattern.API mismatch:
get_file_size(inp.path)will fail at runtime—the function expects aJobExecutorInterface, not a path (seeutils.pycontract).Unit error: Warning text says "bytes" but
get_file_sizereturns rounded GB.Proposed fix
if ( not self.workflow.executor_settings.suppress_auto_stagein and self.workflow.executor_settings.node_local_prefix ): for job in jobs: - for inp in job.input: - has_random_or_mixed = any( - isinstance(inp.flags.get(STORE_KEY), AccessPattern) - and inp.flags[STORE_KEY] - in {AccessPattern.RANDOM, AccessPattern.MIXED} - for inp in job.input - ) - if has_random_or_mixed: - size = get_file_size(inp.path) - if size is not None and size > 100: - self.logger.warning( - f"Job '{job.name}' has input file '{inp.path}' with " - f"random or mixed access pattern and size {size} " - "bytes. Snakemake will attempt to stage in this file " - "to the node local directory specified by " - f"{self.workflow.executor_settings.node_local_prefix}. " - "However, for files of this size the process might be " - "slow." - ) + has_random_or_mixed = any( + isinstance(inp.flags.get(STORE_KEY), AccessPattern) + and inp.flags[STORE_KEY] in {AccessPattern.RANDOM, AccessPattern.MIXED} + for inp in job.input + ) + if has_random_or_mixed: + size_gb = get_file_size(job) + if size_gb is not None and size_gb > 100: + self.logger.warning( + f"Job '{job.name}' has RANDOM/MIXED inputs totaling " + f"about {size_gb} GB. Snakemake will attempt node-local " + f"stage-in via {self.workflow.executor_settings.node_local_prefix}, " + "which may be slow for inputs of this size." + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@snakemake_executor_plugin_slurm/__init__.py` around lines 698 - 716, Remove the outer redundant for loop that starts at line 698 and move the has_random_or_mixed check outside to compute it once. Then create a separate loop after the check to iterate only through inputs that have random or mixed access patterns, checking the file size for each matching input. Fix the get_file_size() function call to pass the correct parameters according to its signature in utils.py (it expects a JobExecutorInterface or similar, not just a path). Finally, update the warning message to use the correct unit label that get_file_size actually returns (likely GB instead of bytes) to match the actual return value.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@snakemake_executor_plugin_slurm/__init__.py`:
- Around line 698-716: Remove the outer redundant for loop that starts at line
698 and move the has_random_or_mixed check outside to compute it once. Then
create a separate loop after the check to iterate only through inputs that have
random or mixed access patterns, checking the file size for each matching input.
Fix the get_file_size() function call to pass the correct parameters according
to its signature in utils.py (it expects a JobExecutorInterface or similar, not
just a path). Finally, update the warning message to use the correct unit label
that get_file_size actually returns (likely GB instead of bytes) to match the
actual return value.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: def6cc08-111f-431f-81c7-b6a7e2b33dc8
📒 Files selected for processing (1)
snakemake_executor_plugin_slurm/__init__.py
This PR will add the option for multi-node stage in on SLURM clusters - will only work with a corresponding update of jobstep plugin.
The ultimate goal is to enable pooling of shared memory tasks (e.g. all mapping rules of a workflow within one SLURM job, possibly spanning multiple nodes). This is useful, because:
However, together with the corresponding PR for the jobstep plugin (snakemake/snakemake-executor-plugin-slurm-jobstep#48) this has the additional charm that there is no dependency for the fs storage plugin.
Furthermore, pooling of smp tasks is hardly beneficial if there is no automatic stage-in process as this will inevitably lead to random I/O (which already is an issue as some users do not throttle the submission rate, jobs reading shared reference file and then simple cause random access and impede file system performance).
After this PR, the plugin will use
sbcastfor files <= 4G for multi node stage-in andscpotherwise. The latter relies onsshbeing possible across nodes on a cluster (either by auth based logins on own jobs or otherwise). Currently, there is no fallback tosbcastif files are too big andsshis not working.Summary by CodeRabbit
node-local-prefixandsuppress-auto-stagein.