uv run torchrun resolves to user-site torchrun and bypasses the venv

## Summary

`uv run torchrun ...` resolves to `~/.local/bin/torchrun` when a prior `pip install --user torch` exists, and the spawned worker processes inherit the system Python from that binary's shebang rather than the project's `.venv` interpreter. The workers cannot import `kempnerforge`, so every rank crashes with `ModuleNotFoundError` before the training loop starts.

## Repro

Environment:
- Fresh clone with `uv sync` completed; `uv run python -c "import kempnerforge, torch"` succeeds.
- `~/.local/bin/torchrun` present from an earlier `pip install --user` (common on shared HPC accounts).
- `which torchrun` → `~/.local/bin/torchrun`.

Command:

```
uv run torchrun --standalone --nproc_per_node=4 scripts/train.py \
  configs/train/hf_wikitext.toml [overrides...]
```

Result on every rank:

```
File "/n/home10/<user>/.local/bin/torchrun", line 8, in <module>
...
ModuleNotFoundError: No module named 'kempnerforge'
exitcode: 1 (pid: ...) of binary: /n/sw/Miniforge3-25.3.1-0/bin/python3.12
```

The launched binary is the Miniforge base Python, not `.venv/bin/python3`, even though `uv run` was used.

## Root cause

`~/.local/bin/torchrun` is resolved ahead of `.venv/bin/torchrun` under `uv run`, and its shebang targets the system Python. Even when the launcher itself runs, the workers it spawns inherit the launcher's Python interpreter, bypassing the venv. The venv is healthy — only the launcher resolution is wrong.

## Workaround

Replace `uv run torchrun` with `uv run python -m torch.distributed.run`. Equivalent semantics, but forces the venv's Python end-to-end (both launcher and workers):

```
uv run python -m torch.distributed.run --standalone --nproc_per_node=4 \
  scripts/train.py configs/train/hf_wikitext.toml [overrides...]
```

## Affected files

- `scripts/slurm/singlenode.sh:67` — uses `uv run torchrun`; will hit this failure for any user with `~/.local/bin/torchrun` ahead of the venv.
- `docs/getting-started/quickstart.md` — multi-GPU step in the quickstart shows `uv run torchrun`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uv run torchrun resolves to user-site torchrun and bypasses the venv #68

Summary

Repro

Root cause

Workaround

Affected files

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

uv run torchrun resolves to user-site torchrun and bypasses the venv #68

Description

Summary

Repro

Root cause

Workaround

Affected files

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions