Your first local LLM in 10 minutes with Ollama

You don’t need a data-centre GPU to run a useful model at home. You need Ollama and about ten minutes. This is the path I point everyone to first.

1. Install Ollama

One installer, every platform. Grab it from ollama.com. On macOS and Windows it’s a normal app; on Linux it’s a one-line script. Done.

2. Pull and run a model

ollama run llama3.2

That’s it. The first run downloads the weights; after that it’s instant and fully offline. You’re now chatting with a model that lives entirely on your machine.

3. Pick a size your hardware can handle

This is where most people get frustrated. A rough rule of thumb:

Your RAM / VRAM	Comfortable model size
8 GB	1–3B parameters
16 GB	7–8B parameters
24 GB+	13–14B, or quantized larger
48 GB+	30B+ quantized

If a model swaps to disk, it’ll crawl. When in doubt, go one size smaller: a fast 8B beats a 70B that takes a minute per sentence.

Quantization is your friend. A 4-bit (Q4) quant of a bigger model often beats a full-precision smaller one, and fits in far less memory.

4. Talk to it from your own code

Ollama exposes an OpenAI-compatible endpoint on localhost:11434, so most tooling “just works” by pointing at it:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Explain gridfinity in one sentence."
}'

Where to go next

Try a code-focused model for editor autocomplete.
Add a small embedding model and you’ve got the start of local RAG.
Watch your temperatures. Sustained local inference is a real workout for a laptop.

Next guide: giving your local model your own documents to read, without anything touching the cloud.