yaku

A frameless desktop overlay that transcribes live system audio and translates it in real time — designed for watching foreign-language streams, videos, or calls without subtitles.

Audio is captured from the system's default output (loopback), transcribed by whisper.cpp, translated by a locally-running Ollama model, and displayed as streaming subtitles in an always-on-top overlay window.

Features

Continuous capture — audio is captured without gaps; transcription and translation run while the next chunk is already being buffered
Streaming transcription — partial whisper segments appear word-by-word as the model decodes
Streaming translation — Ollama tokens stream into the overlay in real time
GPU acceleration — Vulkan (default) or CUDA; the active backend is shown at startup
Always on top — stays visible over fullscreen windows
Inline language selector — change source/target language from the status bar; saved automatically

Screenshot

The overlay displays source text (Japanese) in the top panel, live translation (English) below. Drag the header to reposition; use controls to toggle source view, access settings, or stop transcription.

Requirements

Runtime

Dependency	Purpose
Ollama	Local LLM for translation
WebKit2GTK 4.1	Web renderer (part of GNOME/KDE stack, usually pre-installed)
Vulkan loader	GPU acceleration
PipeWire or PulseAudio	System audio capture

The AppImage bundles most dependencies. WebKit2GTK and GStreamer must come from the system.

Whisper models

Models are stored in ~/.cache/yaku/models/ and are not bundled.

Model	Size	Notes
`ggml-tiny.bin`	75 MB	Fastest, lowest accuracy
`ggml-base.bin`	142 MB	Good for quick testing
`ggml-small.bin`	466 MB	Balanced
`ggml-medium.bin`	1.5 GB	Good accuracy
`ggml-large-v3-turbo.bin`	874 MB	Fast and accurate (recommended)
`ggml-large-v3.bin`	3.1 GB	Best accuracy

Installation

AppImage (recommended)

Download the AppImage from the Releases page.

Make it executable and run:

chmod +x yaku-*.AppImage
./yaku-*.AppImage

On first launch the app shows a model selection screen — choose and download a model directly from the UI.

From source

See Developer setup below.

First-time setup

1 — Choose a whisper model

On first launch the app shows a model selection screen. Pick a size and click Download — the model is fetched from HuggingFace and saved to ~/.cache/yaku/models/ automatically. If you already have a .bin file, click Use an existing model file to point directly to it.

2 — Pull an Ollama translation model

TranslateGemma is purpose-built for translation and is recommended:

ollama pull translategemma:4b    # 3.3 GB — fast, good quality
ollama pull translategemma:12b   # 8 GB   — higher quality

General-purpose models also work (llama3.2, qwen2.5:7b, etc.) but produce lower translation quality.

Usage

Overlay controls

Control	Action
`SRC`	Toggle source-language panel (resizes window)
`⚙`	Open settings
`✕`	Close
`▶ Start` / `■ Stop`	Start or stop the pipeline

Drag the header bar to move the window.

Status bar

Shows the pipeline state (Listening / Transcribing / Translating) and the active language pair. Click either language code to change it immediately — changes are saved automatically.

Settings

Setting	Description
Source / target language	Language pair for transcription and translation
Ollama model	Model to use for translation
Ollama URL	Defaults to `http://localhost:11434/api/generate`

Config is saved to the platform config directory (~/.config/yaku/ on Linux, ~/Library/Application Support/yaku/ on macOS, %AppData%\yaku\ on Windows).

Developer setup

Prerequisites

Debian / Ubuntu:

sudo apt-get install cmake glslc libvulkan-dev libgomp-dev g++ pkg-config \
                     libgtk-3-dev libwebkit2gtk-4.1-dev golang-go

Fedora / RHEL:

sudo dnf install cmake glslc vulkan-loader-devel libgomp-devel gcc-c++ pkgconf \
                 gtk3-devel webkit2gtk4.1-devel golang

Install Task:

go install github.com/go-task/task/v3/cmd/task@latest

The Wails CLI is installed automatically the first time you run task or task dev.

Build

git clone --recurse-submodules https://github.com/opdude/yaku
cd yaku
task              # compiles whisper.cpp on first run (~2 min), then builds the app

Output lands in ./bin/.

Task targets

Command	Description
`task`	Build `bin/yaku` (Vulkan GPU)
`task dev`	Wails dev mode with live frontend reload
`GPU=cuda task`	Build with CUDA acceleration
`task appimage`	Build a self-contained AppImage
`task test`	Run unit tests
`task clean`	Remove build output
`task clean-all`	Remove build output and compiled whisper libs

Architecture

audio.Capturer.Stream()              platform audio capture
        │
        ▼  500 ms increments, rolling 10 s window
  silence gate (VAD)
        │
        ▼  when ≥ ChunkSeconds of speech detected
  transcribe.Transcriber.Transcribe()    whisper.cpp via CGO
        ├─ segment callbacks → live source text
        └─ final text
              │
              ▼
  translate.Stream()                 Ollama NDJSON streaming
        └─ token callbacks → overlay events

Capture, transcription, and translation are pipelined: while Ollama translates chunk N, the audio buffer is already filling with chunk N+1.

Project layout

cmd/yaku/       Wails entry point, Go↔JS bridge (App struct)
  frontend/                HTML/CSS/JS overlay UI
internal/
  audio/                   Capturer interface, VAD, PCM conversion; platform implementations
  config/                  Config struct, YAML persistence
  transcribe/              whisper.cpp CGO wrapper, GPU backend detection
  translate/               Ollama streaming, context-aware prompt construction
third_party/whisper.cpp/   git submodule, compiled by `task whisper`
build/                     AppImage scripts and whisper build output

Release procedure

Releases are created automatically when a v* tag is pushed:

git tag v1.2.3
git push origin v1.2.3

The release workflow builds a Vulkan AppImage via GoReleaser and uploads it to the GitHub release.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
build		build
cmd/yaku		cmd/yaku
docs		docs
internal		internal
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
.goreleaser.yaml		.goreleaser.yaml
CLAUDE.md		CLAUDE.md
README.md		README.md
Taskfile.yml		Taskfile.yml
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

yaku

Features

Screenshot

Requirements

Runtime

Whisper models

Installation

AppImage (recommended)

From source

First-time setup

1 — Choose a whisper model

2 — Pull an Ollama translation model

Usage

Overlay controls

Status bar

Settings

Developer setup

Prerequisites

Build

Task targets

Architecture

Project layout

Release procedure

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

yaku

Features

Screenshot

Requirements

Runtime

Whisper models

Installation

AppImage (recommended)

From source

First-time setup

1 — Choose a whisper model

2 — Pull an Ollama translation model

Usage

Overlay controls

Status bar

Settings

Developer setup

Prerequisites

Build

Task targets

Architecture

Project layout

Release procedure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages