Whisper Large v3 Turbo vs Parakeet
Parakeet TDT 0.6B v3 beats Whisper large-v3-turbo on English WER (~1.9% vs ~3.0%) and is ~10x faster, but covers 25 European languages. Whisper turbo wins for 99+ languages and 6 GB RAM.
Parakeet TDT 0.6B v3 beats Whisper large-v3-turbo on English accuracy - around 1.9% word error rate versus roughly 3.0% on LibriSpeech clean - and runs about 10 times faster for English audio. The trade-off is narrow language coverage (25 European languages; no Chinese, Japanese, Korean, Arabic, or Hindi) and a higher memory floor (around 16 GB of unified memory). Whisper large-v3-turbo covers 99+ languages, fits in 6 GB of RAM, and is 6-8x faster than full Whisper large-v3 - the practical winner for multilingual work or lower-memory hardware.
Here is how the two models compare across every dimension that matters for local speech recognition.
How they are built
OpenAI's Whisper is an encoder-decoder transformer trained on 680,000 hours of weakly supervised multilingual audio. Whisper large-v3-turbo (released October 2024) keeps the full large-v3 encoder but prunes the decoder from 32 layers to 4, cutting the parameter count from 1.55 billion to 809 million. The result is 6-8x faster transcription with less than 0.5 percentage points of extra word error rate compared to the full model.
NVIDIA's Parakeet TDT 0.6B v3 is a different architecture: a FastConformer encoder paired with a Token-and-Duration Transducer (TDT) decoder, trained on NVIDIA's Granary dataset (670,000+ hours of audio across 25 European languages). The transducer decoder streams output token by token rather than decoding the full sequence at once. This sidesteps the hallucination-on-silence issue that affects Whisper's autoregressive decoder - Whisper can generate filler words on silent audio segments unless a voice activity detection layer trims silence before it reaches the model.
Accuracy on English
On LibriSpeech clean English, Parakeet TDT 0.6B v3 achieves around 1.9% word error rate according to NVIDIA's benchmarks, compared to roughly 2.7% for Whisper large-v3 and approximately 3.0% for large-v3-turbo.
On the Hugging Face Open ASR Leaderboard - which averages across multiple datasets and languages - Parakeet TDT 0.6B v3 posts 6.32% WER against Whisper large-v3's 7.44%. That leaderboard average is broader than English alone: it penalises Whisper on lower-resource languages where its training data is thin, while Parakeet's 25-language training keeps it competitive.
For real-world dictation with background noise, accents, or technical vocabulary, both models typically land in the 8-12% WER range regardless of benchmark score. That gap between lab conditions and a real office exists equally for cloud services.
Speed
Whisper large-v3-turbo is approximately 6-8x faster than full Whisper large-v3. Parakeet TDT 0.6B v3 is roughly 10 times faster than Whisper large-v3-turbo for English audio - on NVIDIA GPU hardware it can process 60 minutes of audio in roughly one second.
| Model | Parameters | RAM needed | Speed vs large-v3 |
|---|---|---|---|
| Whisper large-v3 | 1.55B | ~10 GB | 1x (baseline) |
| Whisper large-v3-turbo | 809M | ~6 GB | ~6-8x faster |
| Parakeet TDT 0.6B v3 | 600M | ~16 GB unified | ~60-80x faster (English, GPU) |
Speed figures are hardware-dependent. On NVIDIA GPUs with CUDA, Parakeet's throughput advantage is most pronounced. On Apple Silicon, Whisper large-v3-turbo runs via whisper.cpp with Metal acceleration; Parakeet v3 is available on Apple Silicon but requires around 16 GB of unified memory to run comfortably.
Language coverage
This is where the two models diverge most sharply.
Whisper large-v3-turbo retains the full large-v3 encoder and its 99+ language representations unchanged - you lose no language coverage going from large-v3 to the turbo variant.
Parakeet TDT 0.6B v3 supports 25 European languages: English, Spanish, French, Russian, German, Italian, Polish, Ukrainian, Romanian, Dutch, Hungarian, Greek, Swedish, Czech, Bulgarian, Portuguese, Slovak, Croatian, Danish, Finnish, Lithuanian, Slovenian, Latvian, Estonian, and Maltese. It does not support Chinese, Japanese, Korean, Arabic, or Hindi.
| Whisper large-v3-turbo | Parakeet TDT 0.6B v3 | |
|---|---|---|
| Total languages | 99+ | 25 European |
| European languages | Yes | Yes |
| Chinese / Japanese / Korean | Yes | No |
| Arabic / Hindi | Yes | No |
| Language auto-detection | Yes | Yes (among supported langs) |
If your audio includes any non-European language, Whisper is the only option between the two.
Hardware requirements
Parakeet TDT 0.6B v3 has fewer parameters than Whisper large-v3-turbo (600M vs 809M) but needs more memory in practice. NVIDIA's reference configurations require around 16 GB of unified memory on Apple Silicon or an NVIDIA GPU with 8+ GB VRAM. Quantised variants reduce that floor, but Parakeet is not yet as widely quantised as Whisper.
Whisper large-v3-turbo fits in approximately 6 GB of RAM. That means it runs on the base-tier MacBook Air (8 GB unified memory) where Parakeet v3 does not.
On Windows and Linux with a recent NVIDIA GPU, both models are viable - Parakeet's speed advantage is most evident there.
Which model to choose
Choose Parakeet TDT 0.6B v3 when:
- Your audio is in English or one of the other 24 European languages it supports
- You have 16+ GB of unified memory or an NVIDIA GPU with 8+ GB VRAM
- Throughput matters - bulk transcription of large archives or real-time streaming pipelines
- You want to avoid Whisper's hallucination-on-silence behaviour without relying on a VAD layer
Choose Whisper large-v3-turbo when:
- You need Chinese, Japanese, Korean, Arabic, Hindi, or any non-European language
- Your hardware has 6-8 GB of RAM (base MacBook Air, Windows laptops)
- You want broad ecosystem support - whisper.cpp, faster-whisper, and Insanely Fast Whisper all support it, with pre-built binaries for Mac, Windows, and Linux
- You need code-switching between languages in the same recording
Running either model locally for dictation
Both models run entirely on your device - no audio is sent to a server at any point, and transcription costs nothing per request after the one-time model download.
The raw models handle file transcription. For live dictation that injects text at the cursor in any running application - email, VS Code, Slack, a terminal - you also need a layer handling microphone capture, voice activity detection, and text injection. The DIY whisper.cpp path covers that in full; an integrated desktop app handles it without the manual setup.
Typilot bundles a local Whisper runtime with VAD, hotkey activation (hold, toggle-VAD, or toggle-manual), and text injection across any running application. The full pipeline stays on your device - audio is never uploaded. For a broader comparison of local speech apps, offline speech-to-text without internet covers the competitive landscape.
If you are on Apple Silicon with 16+ GB of unified memory and your work is primarily in English or European languages, Parakeet TDT 0.6B v3 is the more accurate and faster model. For multilingual audio, 8 GB machines, or the widest tool support across Mac, Windows, and Linux, Whisper large-v3-turbo is the practical default.
The short version
Parakeet TDT 0.6B v3 leads on English accuracy (~1.9% WER versus ~3.0% for Whisper large-v3-turbo on LibriSpeech clean) and speed, but it covers 25 European languages only and needs ~16 GB of memory. Whisper large-v3-turbo delivers 6-8x faster transcription than full Whisper large-v3, covers 99+ languages, and runs on 6 GB of RAM - the right default when language breadth or lower memory matter.
Both run entirely on your device. Typilot ships a 3-day free trial with a local Whisper runtime bundled - no Python environment or whisper.cpp build required. The security page documents exactly what stays on your machine.
Common questions.
Is Parakeet more accurate than Whisper large-v3-turbo?+
On English, yes. Parakeet TDT 0.6B v3 achieves around 1.9% word error rate on LibriSpeech clean versus roughly 3.0% for Whisper large-v3-turbo, and it processes English audio about 10 times faster. However, Parakeet v3 covers 25 European languages only - it does not support Chinese, Japanese, Korean, Arabic, or Hindi. Whisper large-v3-turbo covers 99+ languages and runs on 6 GB of RAM.
What is the difference between Whisper large-v3 and large-v3-turbo?+
Whisper large-v3-turbo (released October 2024) is a pruned version of large-v3 with the decoder trimmed from 32 layers to 4, reducing parameters from 1.55 billion to 809 million. The result is 6-8x faster transcription with less than 0.5 percentage points of extra word error rate - roughly 3.0% versus 2.7% WER on LibriSpeech clean English. Language support is identical to the full model: 99+ languages.
How much memory does Parakeet TDT 0.6B v3 need?+
Parakeet TDT 0.6B v3 runs best with around 16 GB of unified memory on Apple Silicon, or an NVIDIA GPU with 8+ GB VRAM. Quantised variants reduce that floor further. Whisper large-v3-turbo fits in approximately 6 GB of RAM, making it usable on the base 8 GB MacBook Air where Parakeet v3 does not fit comfortably.
Can Parakeet and Whisper both run locally without sending audio to a server?+
Yes. Both models run entirely on-device - no audio is transmitted at any point. Whisper has wide ecosystem support via whisper.cpp, faster-whisper, and desktop dictation apps on Mac, Windows, and Linux. Parakeet v3 is available via NVIDIA NeMo and HuggingFace and runs on NVIDIA GPUs and Apple Silicon with 16+ GB of unified memory.