Local LLM Guides LVL 5 Novice

Your first local LLM in 10 minutes with Ollama

No cloud, no API key, no data leaving your machine. A genuinely beginner-proof path to running a capable model locally — plus how to know which size your hardware can actually handle.

Min hardware
8GB RAM (16GB+ for the good stuff)
Read time
10 min
Stack
Ollama, llama.cpp

You don’t need a data-centre GPU to run a useful model at home. You need Ollama and about ten minutes. This is the path I point everyone to first.

1. Install Ollama

One installer, every platform. Grab it from ollama.com. On macOS and Windows it’s a normal app; on Linux it’s a one-line script. Done.

2. Pull and run a model

ollama run llama3.2

That’s it. The first run downloads the weights; after that it’s instant and fully offline. You’re now chatting with a model that lives entirely on your machine.

3. Pick a size your hardware can handle

This is where most people get frustrated. A rough rule of thumb:

Your RAM / VRAMComfortable model size
8 GB1–3B parameters
16 GB7–8B parameters
24 GB+13–14B, or quantized larger
48 GB+30B+ quantized

If a model swaps to disk, it’ll crawl. When in doubt, go one size smaller — a fast 8B beats a 70B that takes a minute per sentence.

Quantization is your friend. A 4-bit (Q4) quant of a bigger model often beats a full-precision smaller one, and fits in far less memory.

4. Talk to it from your own code

Ollama exposes an OpenAI-compatible endpoint on localhost:11434, so most tooling “just works” by pointing at it:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Explain gridfinity in one sentence."
}'

Where to go next

  • Try a code-focused model for editor autocomplete.
  • Add a small embedding model and you’ve got the start of local RAG.
  • Watch your temperatures — sustained local inference is a real workout for a laptop.

Next guide: giving your local model your own documents to read, without anything touching the cloud.