GitHub - omachala/diction: iOS keyboard that transcribes speech to text using Whisper

Speech-to-text keyboard for iOS.
On-device, self-hosted, or cloud. Your choice.

What is Diction

An iOS keyboard for speech-to-text. Switch to it in any app, tap the mic, speak, and the text is inserted. No QWERTY — dictation only.

Three ways to transcribe:

On-Device — runs directly on your iPhone. No network, no server, completely offline. Free.
Self-Hosted — run your own transcription server. Audio stays on your network. Free, forever.
Cloud — zero setup, fast transcription via Diction's hosted API. Free trial, then $3.99/mo.

The app is pure Swift with zero third-party SDKs — no analytics, no tracking, no telemetry. The self-hosting infrastructure (a Go gateway and Docker Compose setup) is open source and lives in this repo.

Getting Started

1. Install the keyboard

Download Diction from the App Store
Settings → General → Keyboard → Keyboards → Add New Keyboard → Diction
Enable Allow Full Access (required by iOS for network access — why?)

2. Pick a mode

On-Device — open the app, select On-Device mode, download a model, done. Works offline, audio never leaves your phone.

Cloud — select Cloud mode, pick a model. Works immediately. Free trial included.

Self-Hosted — run a transcription server on your own machine:

git clone https://github.com/omachala/diction.git
cd diction
docker compose up -d gateway whisper-small

Then in the app: Settings → Self-Hosted → set the endpoint to http://<your-server-ip>:9000.

3. Dictate

Open any app, tap 🌐 to switch to Diction, tap the mic, speak. Text appears.

Models

Diction is model-agnostic. The app, the gateway, and the Docker setup all let you choose which model to run. Different models have different trade-offs between speed, accuracy, size, and language support.

On-device models

Downloaded and run directly on your iPhone. No network needed.

Model	Size	Best for
Whisper Tiny	~75 MB	Fastest, any iPhone
Whisper Base	~142 MB	Recommended — good balance of speed and accuracy
Whisper Small	~466 MB	Better accuracy, newer iPhones
Whisper Large Turbo	~1.6 GB	Best on-device accuracy, latest iPhones

Server models (self-hosted & cloud)

Run on your server or via Diction Cloud. Faster and more accurate than on-device.

Service	Model	Port	RAM	Latency (CPU)	Best for
`whisper-tiny`	Whisper Tiny	9001	~350 MB	~1-2s	Low-power devices
`whisper-small`	Whisper Small	9002	~800 MB	~3-4s	Best starting point
`whisper-medium`	Whisper Medium	9003	~1.8 GB	~8-12s	Accents, background noise
`whisper-large`	Whisper Large V3	9004	~3.5 GB	~20-30s	Maximum Whisper accuracy
`whisper-distil-large`	Distil Whisper Large V3	9005	~2 GB	~4-6s	Near-best quality, English only
`parakeet`	NVIDIA Parakeet TDT 0.6B	9006	~2 GB	~1-2s	Best speed + accuracy, 25 European languages

Start any combination:

docker compose up -d gateway whisper-small
docker compose up -d gateway whisper-small parakeet
docker compose up -d gateway whisper-small whisper-medium

Models download on first start and are cached — subsequent starts are instant.

You can also point Diction at anything else: whisper.cpp, OpenAI's API, a fine-tuned model, or any future model. If it has a /v1/audio/transcriptions endpoint, Diction works with it.

Gateway

The gateway is a lightweight Go service (~15 MB Docker image) that sits in front of your model backends:

Model routing — one URL, multiple models. Switch from the app without reconfiguring your server.
WebSocket streaming — audio streams to the server during recording. Transcription starts the moment you stop — no upload wait.
Format conversion — automatically converts audio to the format each backend needs.
Health monitoring — checks each backend every 30s. GET /v1/models shows which are online.

The gateway is optional. You can point the app directly at a model backend. But if you run multiple models, the gateway lets you switch between them from the app.

API

OpenAI-compatible transcription API:

# Health check
curl http://localhost:9000/health

# List available models with health status
curl http://localhost:9000/v1/models

# Transcribe audio
curl -X POST http://localhost:9000/v1/audio/transcriptions \
  -F file=@recording.wav \
  -F model=small

WebSocket streaming:

WS /v1/audio/stream?model=small&language=en

1. Client sends binary frames: raw PCM audio (16-bit LE, mono, 16kHz)
2. Client sends text frame: {"action":"done"}
3. Server replies: {"text":"transcribed text"}

Configuration

Variable	Default	Description
`GATEWAY_PORT`	`8080`	Port the gateway listens on (mapped to 9000 in Docker Compose)
`DEFAULT_MODEL`	`small`	Model used when no `model` field is specified
`MAX_BODY_SIZE`	`10485760`	Max upload size in bytes (10 MB)

Remote Access

Your phone needs to reach the server. On the same Wi-Fi, use the local IP directly. For access from anywhere:

Cloudflare Tunnel (recommended) — free, outbound-only. No port forwarding, no public IP needed.

cloudflared tunnel create diction
cloudflared tunnel route dns diction whisper.yourdomain.com
cloudflared tunnel run --url http://localhost:9000 diction

Tailscale — free WireGuard mesh VPN. Install on server + iPhone, get a stable 100.x.y.z IP.

Reverse proxy — put the gateway behind Caddy for HTTPS:

whisper.yourdomain.com {
    reverse_proxy localhost:9000
}

WebSocket streaming works through Caddy out of the box.

Other options: ngrok (instant public URL), WireGuard (self-managed VPN), port forwarding with DDNS.

GPU Support

For faster inference, use the CUDA variant of the Whisper image:

whisper-small:
  image: fedirz/faster-whisper-server:latest-cuda
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: 1
            capabilities: [gpu]

Requires an NVIDIA GPU and the NVIDIA Container Toolkit.

How is Diction different?

	Diction	Wispr Flow	Apple Dictation
Price	Free (on-device & self-hosted) $3.99/mo (cloud)	$15/month	Free
On-device transcription	✅	❌	✅
Self-hosted option	✅	❌	❌
Choose your model	✅	❌	❌
Open source	✅ Gateway + server	❌	❌
WebSocket streaming	✅	❌	N/A
Third-party SDKs in app	None	Unknown	N/A

Diction is pure transcription — what you say is what you get. No AI rewriting, no "smart" corrections.

Privacy

This is a keyboard extension. We take it seriously:

On-device: Audio never leaves your phone.
Self-hosted: Audio goes only to your server. Full stop.
Cloud: Audio is processed and immediately discarded. Not stored, not used for training.
No analytics, no tracking, no telemetry. Zero third-party SDKs in the app.
Full Access is required by iOS for network access — the keyboard needs to reach the transcription endpoint. There is no QWERTY keyboard to log, no clipboard access.

Read the full Privacy Policy.

Troubleshooting

App

Diction keyboard doesn't appear — Settings → General → Keyboard → Keyboards → Add New Keyboard → Diction. Make sure Allow Full Access is enabled.

No transcription / timeout — check that your endpoint URL is correct and reachable from your phone. In Self-Hosted mode, your phone must be on the same network as your server (or use remote access).

Transcription is slow — try a smaller model or enable Stream Audio in settings. Streaming uploads audio during recording so transcription starts the moment you stop.

Self-hosting

Model takes a long time on first start — normal. Weights download on first launch (~500 MB for Small, ~3 GB for Large V3). Cached in a Docker volume — subsequent starts are instant.

Health check failing — models need 1-2 minutes to load. Check logs: docker compose logs -f whisper-small

Out of memory — run fewer models or pick a smaller one. One model is all you need.

Updating:

docker compose pull
docker compose up -d

Report a bug

Open an issue with:

Which mode you're using (On-Device, Self-Hosted, or Cloud)
Your model and language settings
Steps to reproduce
For self-hosting: Docker version, OS, and logs (docker compose logs)

Requirements

iOS 16.0+ (iPhone)
For self-hosting: any machine that can run Docker (the gateway uses ~15 MB RAM)

Contributing

Contributions to the gateway, Docker setup, and documentation are welcome. See CONTRIBUTING.md.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github		.github
assets		assets
docs		docs
gateway		gateway
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is Diction

Getting Started

1. Install the keyboard

2. Pick a mode

3. Dictate

Models

On-device models

Server models (self-hosted & cloud)

Gateway

API

Configuration

Remote Access

GPU Support

How is Diction different?

Privacy

Troubleshooting

App

Self-hosting

Report a bug

Requirements

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What is Diction

Getting Started

1. Install the keyboard

2. Pick a mode

3. Dictate

Models

On-device models

Server models (self-hosted & cloud)

Gateway

API

Configuration

Remote Access

GPU Support

How is Diction different?

Privacy

Troubleshooting

App

Self-hosting

Report a bug

Requirements

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages