June 19, 2026·7 min read·how-to · whisper · offline · privacy · voice

Run Whisper Locally for Dictation

Run Whisper locally for dictation: no cloud, no API key, no per-minute cost. whisper.cpp DIY setup or bundled app - with model size and WER comparison.

By Typilot Team

Running Whisper locally for dictation means your voice is transcribed entirely on your own hardware - no audio is sent to a cloud server, there is no API key, and there is no per-minute charge. OpenAI's Whisper model (and whisper.cpp, its C/C++ port with over 46,900 GitHub stars) installs in a few commands and reaches around 2.7% word error rate on clean English audio - comparable to mainstream cloud services such as Google Speech-to-Text.

Here is how to get local Whisper dictation working, from the raw terminal path to an integrated desktop app.

Why run Whisper locally for dictation

The reason to switch to local Whisper is architectural. Cloud dictation services - Google Docs Voice Typing, Wispr Flow, Otter - encode your audio on your device and send it to the vendor's servers on every request. Your voice may be retained for 30 to 90 days depending on the service and plan tier.

Local Whisper inverts this entirely: the model runs in RAM on your machine, audio stays on your device during processing, and nothing is transmitted. If you disconnected from the internet mid-session, a local setup would keep working without interruption.

Three practical consequences:

Privacy. Sensitive recordings - medical consultations, legal depositions, NDA calls - stay off third-party servers by design.
Cost. Cloud services charge per minute or per seat. Local inference costs nothing per request after the one-time model download. Bulk transcription of an hour of audio costs the same as a five-second phrase.
Offline. Air-gapped environments, clinic networks, and airplane mode all work after the initial download.

The DIY path: whisper.cpp in the terminal

whisper.cpp is a lightweight C/C++ port of Whisper that runs without a full Python stack. On Apple Silicon it uses Apple's Metal GPU API; on NVIDIA hardware it uses CUDA; on CPU-only machines it falls back to BLAS.

Install and build

On macOS with Apple Silicon (Metal acceleration):

git clone https://github.com/ggml-org/whisper.cpp
cd whisper.cpp
WHISPER_METAL=1 make -j

On Windows with CUDA:

git clone https://github.com/ggml-org/whisper.cpp
cd whisper.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release

On Linux (CPU-only):

git clone https://github.com/ggml-org/whisper.cpp
cd whisper.cpp
make -j

Download a model

bash ./models/download-ggml-model.sh base.en

Replace base.en with small, medium, or large-v3 for higher accuracy (see the model table below). The .en suffix selects the English-only variant, which is roughly 10-15% faster than the multilingual version on English audio.

Quantised variants are also available - INT8 and 4-bit quantisation let you fit large-v3 into roughly 1.5-4 GB, making 8 GB laptops viable for the highest-accuracy model.

Transcribe an audio file

./main -m models/ggml-base.en.bin -f recording.wav

Whisper.cpp accepts WAV at 16 kHz mono. For live microphone dictation (continuous input piped in real time), whisper.cpp ships streaming examples in the repository - that is where a desktop application adds the most value by handling the mic capture and chunking loop for you.

Choosing a Whisper model

Whisper ships in five sizes. Accuracy improves with size; speed drops:

Whisper model word error rates from tiny to large-v3 - small is the practical default for most laptops

Model	Disk	RAM	WER (clean English)	Real-time factor on M2
tiny	~75 MB	1 GB	~7%	10x faster than real-time
base	~150 MB	1 GB	~5%	7x
small	~490 MB	2 GB	~3.4%	5x
medium	~1.5 GB	5 GB	~2.9%	3x
large-v3	~3 GB	10 GB	~2.7%	1.5x

WER figures are from the LibriSpeech test-clean benchmark on clean English speech. Real-world dictation with background noise, accents, or technical vocabulary typically runs 8-12% regardless of model size - that is the gap between lab conditions and a real office or home environment, and it applies equally to cloud services.

For most laptops: small (490 MB, ~3.4% WER) runs in real time on CPU-only hardware made after 2020. Step up to medium if you have 5 GB of free RAM and want better accuracy on accented or technical speech. large-v3-turbo (released October 2024) offers large-class accuracy at close to medium speed and is worth trying if disk space is not a constraint.

The integrated path: Typilot bundles Whisper

The whisper.cpp CLI transcribes files efficiently. For live dictation into any running application - your email client, Slack, VS Code, a terminal - you need a layer on top that handles microphone capture, voice activity detection, and text injection at the cursor in whatever app is active.

Typilot bundles a local Whisper runtime (no separate install required) and downloads models with one click from its Voice settings tab. The entire pipeline runs on-device:

Dictation pipeline: mic capture to VAD to local Whisper to text injection in any running app

Activation works in three modes to suit different dictation styles:

Hold - hold a key (default: Fn) to record, release to commit. Best for short phrases between other tasks.
Toggle with VAD - press once to start; voice activity detection ends the recording automatically on silence. Best for longer passages without touching a key.
Toggle manual - press to start, press again to stop. Best when you need precise control over exactly what is transcribed.

Voice activity detection trims silence before audio reaches Whisper, which prevents the hallucinated words that Whisper can produce on silent segments.

On top of dictation, you can route any transcribed phrase to a local Ollama model for instant editing. Say fix: and Typilot rewrites selected text; sum: summarises it; gen: generates new text from a voice prompt. The full 27-command reference is at /docs/commands. No text or audio leaves the device at any stage.

If you already run whisper.cpp manually, Typilot provides the dictation layer you would otherwise have to build yourself: mic capture, VAD, hotkey activation, model management, and text injection into any app, all in one UI.

Local Whisper vs faster alternatives

Whisper is not the only local speech model in 2026. For English-only workloads, NVIDIA's Parakeet family is worth knowing about:

Model	WER (English)	Speed	Languages	Offline
Whisper large-v3	~2.7%	1.5x real-time (M2)	99+	Yes
Whisper small	~3.4%	5x real-time (M2)	99+	Yes
Parakeet TDT 0.6B	~6.3%	~10x real-time (M2)	English only	Yes

Parakeet is significantly faster on English - roughly 10x versus Whisper large - but is English-only and carries higher WER on clean audio. Whisper remains the right choice if you need multilingual support, non-English dictation, or the broadest hardware compatibility across Mac, Windows, and Linux.

For meeting transcription specifically, transcribing meetings locally covers how to combine local Whisper with speaker diarization for a full offline meeting workflow.

The short version

Running Whisper locally for dictation keeps your audio on your machine, costs nothing per request, and works offline after the model download. The DIY route - whisper.cpp built from source - handles file transcription in minutes. For live dictation into any running application, a desktop tool adds the microphone, VAD, and text injection layer that whisper.cpp does not include out of the box.

Typilot ships a 3-day free trial with local Whisper bundled - no Python environment, no separate whisper.cpp build. The security page documents exactly what stays on your device. If you want to understand how speaker separation works alongside Whisper transcription, how speaker diarization works covers the local pipeline in full.

Common questions.

How do I run Whisper locally for dictation?+

Install whisper.cpp (a C/C++ port of OpenAI Whisper with over 46,900 GitHub stars), download a model such as small or base, and pipe microphone audio through it. For live dictation into any running application, a desktop tool like Typilot bundles the Whisper runtime and handles mic capture, voice activity detection, and text injection at the cursor - no separate CLI setup required.

Which Whisper model should I use for local dictation?+

The small model (490 MB, ~3.4% word error rate) is the right starting point for most laptops: it runs in real time on CPU-only hardware made after 2020 and is accurate enough for everyday dictation. Step up to medium (~2.9% WER, ~1.5 GB) if you need higher accuracy on accented or technical speech. Large-v3 reaches ~2.7% WER but needs around 10 GB of RAM.

Does local Whisper dictation work offline?+

Yes. Once the Whisper model is downloaded, local dictation works with no internet connection. Audio is transcribed entirely in RAM by the local model and never transmitted anywhere. Air-gapped networks, clinic environments, and airplane mode all work after the initial one-time model download.

Is local Whisper as accurate as cloud dictation?+

Whisper large-v3 reaches ~2.7% word error rate on clean English audio (LibriSpeech test-clean), which is comparable to mainstream cloud services. On real-world audio with background noise or accents, both local and cloud tools typically run 8-12% WER. The small model (~3.4% WER, 490 MB) already outperforms most cloud dictation on clean audio and runs in real time on any modern laptop.

Run Whisper Locally for Dictation

Run Whisper locally for dictation: no cloud, no API key, no per-minute cost. whisper.cpp DIY setup or bundled app - with model size and WER comparison.

By Typilot Team

Here is how to get local Whisper dictation working, from the raw terminal path to an integrated desktop app.

Why run Whisper locally for dictation

Three practical consequences:

Privacy. Sensitive recordings - medical consultations, legal depositions, NDA calls - stay off third-party servers by design.
Cost. Cloud services charge per minute or per seat. Local inference costs nothing per request after the one-time model download. Bulk transcription of an hour of audio costs the same as a five-second phrase.
Offline. Air-gapped environments, clinic networks, and airplane mode all work after the initial download.

The DIY path: whisper.cpp in the terminal

Install and build

On macOS with Apple Silicon (Metal acceleration):

git clone https://github.com/ggml-org/whisper.cpp
cd whisper.cpp
WHISPER_METAL=1 make -j

On Windows with CUDA:

git clone https://github.com/ggml-org/whisper.cpp
cd whisper.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release

On Linux (CPU-only):

git clone https://github.com/ggml-org/whisper.cpp
cd whisper.cpp
make -j

Download a model

bash ./models/download-ggml-model.sh base.en

Quantised variants are also available - INT8 and 4-bit quantisation let you fit large-v3 into roughly 1.5-4 GB, making 8 GB laptops viable for the highest-accuracy model.

Transcribe an audio file

./main -m models/ggml-base.en.bin -f recording.wav

Choosing a Whisper model

Whisper ships in five sizes. Accuracy improves with size; speed drops:

Whisper model word error rates from tiny to large-v3 - small is the practical default for most laptops

Model	Disk	RAM	WER (clean English)	Real-time factor on M2
tiny	~75 MB	1 GB	~7%	10x faster than real-time
base	~150 MB	1 GB	~5%	7x
small	~490 MB	2 GB	~3.4%	5x
medium	~1.5 GB	5 GB	~2.9%	3x
large-v3	~3 GB	10 GB	~2.7%	1.5x

The integrated path: Typilot bundles Whisper

Typilot bundles a local Whisper runtime (no separate install required) and downloads models with one click from its Voice settings tab. The entire pipeline runs on-device:

Dictation pipeline: mic capture to VAD to local Whisper to text injection in any running app

Activation works in three modes to suit different dictation styles:

Hold - hold a key (default: Fn) to record, release to commit. Best for short phrases between other tasks.
Toggle with VAD - press once to start; voice activity detection ends the recording automatically on silence. Best for longer passages without touching a key.
Toggle manual - press to start, press again to stop. Best when you need precise control over exactly what is transcribed.

Voice activity detection trims silence before audio reaches Whisper, which prevents the hallucinated words that Whisper can produce on silent segments.

Local Whisper vs faster alternatives

Whisper is not the only local speech model in 2026. For English-only workloads, NVIDIA's Parakeet family is worth knowing about:

Model	WER (English)	Speed	Languages	Offline
Whisper large-v3	~2.7%	1.5x real-time (M2)	99+	Yes
Whisper small	~3.4%	5x real-time (M2)	99+	Yes
Parakeet TDT 0.6B	~6.3%	~10x real-time (M2)	English only	Yes

For meeting transcription specifically, transcribing meetings locally covers how to combine local Whisper with speaker diarization for a full offline meeting workflow.

The short version

Common questions.

How do I run Whisper locally for dictation?+

Which Whisper model should I use for local dictation?+

Does local Whisper dictation work offline?+

Is local Whisper as accurate as cloud dictation?+

Run Whisper Locally for Dictation

Why run Whisper locally for dictation

The DIY path: whisper.cpp in the terminal

Install and build

Download a model

Transcribe an audio file

Choosing a Whisper model

The integrated path: Typilot bundles Whisper

Local Whisper vs faster alternatives

The short version

Common questions.

Whisper Large v3 Turbo vs Parakeet

Dictation Apps That Don't Upload Your Voice

How to Dictate in Any App on Mac

Run Whisper Locally for Dictation

Why run Whisper locally for dictation

The DIY path: whisper.cpp in the terminal

Install and build

Download a model

Transcribe an audio file

Choosing a Whisper model

The integrated path: Typilot bundles Whisper

Local Whisper vs faster alternatives

The short version

Common questions.

Whisper Large v3 Turbo vs Parakeet

Dictation Apps That Don't Upload Your Voice

How to Dictate in Any App on Mac