English | 한국어
vid_to_sub recursively discovers video files and writes subtitle or transcript files next to the source video or into a dedicated output directory. The runtime defaults prefer a GPU-capable backend when the local environment exposes one, and otherwise fall back to CPU transcription through ffmpeg + whisper.cpp, with optional subtitle translation through an OpenAI-compatible API plus an optional post-editing agent pass for cleanup and correction.
Use the Browse tab to add folders or individual files, search under the current root, and choose an output directory before starting a run.
The Setup tab checks for ffmpeg, whisper-cli, GGML models, and optional Python backends, then exposes install/build actions in one place.
The Transcribe tab controls backend, model, device, execution mode, output formats, translation, and advanced overrides.
- Recursive discovery of common video formats such as
mp4,mkv,mov,avi,webm, andts. - Runtime defaults that prefer
faster-whisperon detected CUDA,whisperXoropenai-whisperon supported Torch devices, and otherwise fall back towhisper.cppon CPU. - Optional backends:
faster-whisper,openai-whisper, andwhisperX. - Output formats:
srt,vtt,txt,tsv,json, orall. - Optional translation with an OpenAI-compatible chat-completions API while preserving the original subtitle timing boundaries.
- Optional post-editing agent pass with separate model/API settings and an
automode that prefers web lookup when available, then falls back to contextual polishing. - A 7-tab Textual TUI: Browse, Setup, Transcribe, History, Settings, Agent, and Logs.
- SQLite-backed saved settings and job history.
- Optional distributed execution through SSH resource profiles.
python vid_to_sub.py ...CLI entrypoint for recursive discovery and batch transcription.python tui.pyTextual TUI entrypoint. On first run it creates a project-local.venvand installs missing requirement groups before relaunching.python init_checker.pyBootstrap helper that prepares the managed virtual environment.
- Python 3.9+
ffmpegonPATH- For the default backend,
whisper-clifromwhisper.cpp - For the default model,
ggml-large-v3.bin
Base packages:
pip install -r requirements.txtOptional backend packages:
pip install -r requirements-faster-whisper.txt
pip install -r requirements-whisper.txt
pip install -r requirements-whisperx.txtThe declared Textual support floor is 0.80.x. CI verifies both textual==0.80.1
and the latest version allowed by requirements.txt.
python vid_to_sub.py /path/to/videosBy default this:
- scans directories recursively,
- uses model
large-v3, - automatically picks the best locally available backend/device, preferring CUDA
faster-whisper, then Torch-backedwhisperXoropenai-whisper, and otherwise CPUwhisper-cpp, - when transcription stays on CPU, automatically uses the available CPU threads divided across
--workers, - writes
movie.srtbeside each source file.
Use --backend, --device, or --backend-threads when you want to override the runtime defaults explicitly.
Set the translation endpoint first:
export VID_TO_SUB_TRANSLATION_BASE_URL=https://your-host/v1
export VID_TO_SUB_TRANSLATION_API_KEY=your_api_key
export VID_TO_SUB_TRANSLATION_MODEL=your_modelThen run:
python vid_to_sub.py /path/to/videos --translate-to koTo split first-pass translation and a second-pass correction agent, enable post-processing:
python vid_to_sub.py /path/to/videos --translate-to ko --postprocess-translationYou can force a strategy with --postprocess-mode auto|web_lookup|context_polish. In auto, the second pass is prompted to use lyric/reference lookup when the serving agent supports web search or MCP tools, and otherwise silently falls back to contextual cleanup.
This writes both the original transcription and the translated file, for example:
movie.srtmovie.ko.srt
Only subtitle text changes. start and end timestamps are copied from the original segments.
Actual output after translating to Korean:
python tui.pyRecommended flow inside the TUI:
Browse: add source folders or files, optionally setOutput dir, and decide whetherNo recurseorSkip existingshould be enabled.Setup: run dependency detection, install Python backend packages, buildwhisper.cpp, or download a GGML model.Transcribe: choose backend, model, device, output formats, translation target, and execution mode.- Start with
Ctrl+R, preview withCtrl+D, stop withCtrl+K. - Review previous jobs in
History, persist defaults inSettings, and useAgentwhen you want reviewable guidance or a proposed action plan.
Useful TUI shortcuts:
Ctrl+RrunCtrl+Ddry runCtrl+KkillCtrl+Ssave settings1to7switch tabsCtrl+Qquit
Write output files into a separate folder:
python vid_to_sub.py /path/to/videos -o /path/to/outputDisable recursive scan:
python vid_to_sub.py /path/to/videos --no-recurseSkip videos that already have a primary output:
python vid_to_sub.py /path/to/videos --skip-existingPreview the queue without running transcription:
python vid_to_sub.py /path/to/videos --dry-runWrite multiple formats:
python vid_to_sub.py /path/to/videos --format srt --format jsonUse an explicit whisper.cpp model path:
python vid_to_sub.py /path/to/videos \
--backend whisper-cpp \
--model large-v3 \
--whisper-cpp-model-path /models/ggml-large-v3.binList built-in model identifiers:
python vid_to_sub.py --list-modelsRun transcription only and save a stage artifact for later translation:
python vid_to_sub.py /path/to/videos --stage1-only --translate-to koThis writes movie.srt and a sidecar movie.stage1.json next to each source file.
If the artifact already records target_lang, you can replay stage 2 later without
passing --translate-to again, or override the saved target with
--translate-to <lang>. Replay verifies the current source file against the
artifact source_fingerprint; use --force-translate only when you
intentionally want to bypass suspicious/mismatch checks.
When a translation API becomes available, replay stage 2 on the artifact without re-transcribing:
python vid_to_sub.py --translate-from-artifact /path/to/movie.stage1.jsonTo force re-translation even when the artifact already records a completed translation pass:
python vid_to_sub.py --translate-from-artifact /path/to/movie.stage1.json --overwrite-translationVID_TO_SUB_WHISPER_CPP_BINOverride thewhisper-cliexecutable path.VID_TO_SUB_WHISPER_CPP_MODELOverride the GGML model path.
If VID_TO_SUB_WHISPER_CPP_MODEL is not set, the project searches common model directories such as ./models, ~/.cache/whisper, ~/models, /models, and /opt/models.
VID_TO_SUB_TRANSLATION_BASE_URLAccepts either an API root such ashttps://host/v1or the full/chat/completionsendpoint.VID_TO_SUB_TRANSLATION_API_KEYBearer token for the translation service.VID_TO_SUB_TRANSLATION_MODELModel name used for translation.
VID_TO_SUB_POSTPROCESS_BASE_URLOptional dedicated endpoint for the subtitle post-editing agent. When blank, translation base URL is reused.VID_TO_SUB_POSTPROCESS_API_KEYOptional dedicated Bearer token for post-editing. When blank, translation API key is reused.VID_TO_SUB_POSTPROCESS_MODELOptional dedicated model for post-editing. When blank, translation model is reused.
VID_TO_SUB_AGENT_BASE_URLVID_TO_SUB_AGENT_API_KEYVID_TO_SUB_AGENT_MODEL
When these are blank in the TUI, the Agent tab falls back to the Translation API settings.
VID_TO_SUB_DB_PATHOverride the SQLite database path for saved settings, SSH profiles, and job history.
Default database location policy:
- New installs use
$XDG_STATE_HOME/vid_to_sub/vid_to_sub.dbwhenXDG_STATE_HOMEis set. - Otherwise they use
~/.local/state/vid_to_sub/vid_to_sub.db. - If an existing legacy database is already present at the project root as
./vid_to_sub.db, that path is kept for backward compatibility.
The parent directory is created automatically when the configured database path points to a nested location.
The TUI supports a distributed mode backed by SSH resource profiles. Use
Settings -> SSH Connections as the primary source of remote executors. The legacy
Settings -> Remote Resources JSON remains available as a fallback/import path, and
duplicate names prefer the saved SSH connection entry.
Switch Execution -> Mode to distributed in the Transcribe tab before starting a
run. When distributed stage-1 finishes on remote hosts, the TUI now tries to fetch
their .stage1.json artifacts back onto the local filesystem and then launches the
existing local stage-2 follow-up automatically. If that fetch/remap step fails,
review the saved SSH connection and path_map, or replay translation manually with
--translate-from-artifact after copying the artifact locally.
Distributed mode scope: remote execution covers stage-1 (transcription) only. Stage-2 (translation/post-processing) always runs locally on the machine that started the TUI, immediately after all stage-1 artifacts have been fetched. If you see no automatic stage-2 launch, check the log for fetch/remap errors.
Example profile JSON:
[
{
"name": "gpu-box",
"ssh_target": "user@gpu-host",
"remote_workdir": "/srv/vid_to_sub",
"slots": 2,
"path_map": {
"/mnt/media": "/srv/media"
},
"env": {
"VID_TO_SUB_WHISPER_CPP_MODEL": "/models/ggml-large-v3.bin"
}
}
]Field behavior:
slotscontrols how much work is assigned to that host.path_maprewrites local path prefixes before the remote command runs.envinjects per-remote environment overrides.
Default transcription output:
movie.srtmovie.vttmovie.txtmovie.tsvmovie.json
Translated output with --translate-to ko:
movie.ko.srtmovie.ko.vttmovie.ko.txtmovie.ko.tsvmovie.ko.json
Use the default whisper.cpp backend when you want the simplest CPU-only flow with minimal moving parts.
Use --translate-to <lang> when you already trust the generated subtitle segmentation and only want the text replaced without changing timing.
Use tui.py when you want setup assistance, persistent settings, queue visibility, history, or distributed execution from one terminal UI.
- The CLI and TUI share the same runtime backend/device detection, so GPU-capable hosts preselect a matching backend when the optional package is installed.
whisperXdiarization needs--hf-token; without it, the run continues without diarization.- Primary outputs are considered existing by filename and format only, so
--skip-existingchecks for files such asmovie.srtin the target output directory. - The Settings tab can export the current non-secret configuration into
.env; API keys stay session/environment-only and are omitted from export.



