How to Run a Local AI Assistant with Ollama (Step by Step)
Run a local AI assistant with Ollama: install it, pull a model, connect a desktop app - private, offline, free per request. Step-by-step, with model picks.
To run a local AI assistant, you install Ollama to host language models on your own machine, download a model with one command, and point a desktop app like Typilot at it - so every AI request is processed locally, with nothing sent to the cloud. The whole setup takes about ten minutes and costs nothing per request afterward.
Here is the full walkthrough, plus how to pick the right model for your hardware.
What is Ollama, and why local?
Ollama is a free, open-source runtime that downloads and runs open large language models - Llama, Mistral, Qwen, and others - directly on your computer. It exposes them on a local endpoint (http://localhost:11434) that other apps can call. No account, no API key, no usage meter.
Running models this way changes three things compared to a cloud assistant:
- Privacy. Your prompts never leave the machine. There is no server log of what you asked.
- Cost. Local inference has a marginal cost of zero - no per-token billing, no monthly seat.
- Offline. Once a model is downloaded, it works with no internet at all.
Step by step
1. Install Ollama
Download Ollama for macOS, Windows, or Linux from ollama.com and run the installer. On Linux you can also use the one-line install script. Ollama runs quietly in the background and starts on login.
2. Pull a model
Open a terminal and download a model:
ollama pull llama3.1
The first pull downloads several gigabytes; after that the model is cached locally. You can test it immediately:
ollama run llama3.1
3. Connect your assistant
A runtime alone is not an assistant - you need something that puts the model on your keyboard. Typilot connects to Ollama automatically at http://localhost:11434, so once Ollama is running, Typilot can use any model you have pulled. Point it at a different endpoint in settings if you run Ollama on another machine.
4. Run your first command
With Typilot, type a command prefix in any text field - gen: a polite follow-up email, fix: this stack trace, sum: this article - and the local model writes the result straight into the field. No copy-paste into a chat window. The full setup guide lives in the Ollama setup docs.
Picking a model for your hardware
The right model depends on your RAM and what you do most. A rough guide:
| Model | Size | Good for | Needs | |---|---|---|---| | Phi-3 Mini | ~2.2 GB | Fast autocomplete, low-RAM laptops | 8 GB RAM | | Mistral 7B | ~4.1 GB | General writing, quick responses | 8-16 GB RAM | | Llama 3.1 8B | ~4.7 GB | Balanced everyday assistant | 16 GB RAM | | Qwen 2.5 Coder 7B | ~4.4 GB | Code generation and explanation | 16 GB RAM |
Start with a 7B-8B model on 16 GB of RAM - it is the sweet spot for instant responses and good quality. Drop to Phi-3 Mini on an older laptop; step up to larger models if you have a GPU.
A tool that lets you bind a different model to each task gets the best of both: a small fast model for autocomplete, a larger one for harder rewrites.
The short version
Install Ollama, run ollama pull llama3.1, and connect a desktop assistant that calls localhost:11434 - that is a complete local AI setup, private and offline, with no subscription. Typilot ships the assistant half on macOS, Windows, and Linux with a 3-day free trial; the Ollama setup guide covers the details, and the security page documents exactly what stays on your machine.
Common questions.
How do I run a local LLM?+
Install Ollama (free, open source), download a model with a command like "ollama pull llama3.1", and connect a desktop app that calls Ollama at http://localhost:11434. The model then runs entirely on your own machine.
Is Ollama free?+
Yes. Ollama is free and open source, with no account or API key required. The models it runs are open-weight, and because everything runs locally there are no per-token or per-month fees.
How much RAM do I need to run a local AI model?+
A 7B-8B model (Llama 3.1, Mistral) runs comfortably on 16 GB of RAM and is usable on 8 GB. Smaller models like Phi-3 Mini run on 8 GB laptops; larger models benefit from a GPU.
Can a local AI assistant work offline?+
Yes. Once the model is downloaded through Ollama, the assistant runs fully offline - no internet connection is needed for inference.