Most AI assistants rely on cloud APIs — every message you type goes to OpenAI, Anthropic, or Google's servers. For some use cases that's fine. But for privacy-conscious users, offline setups, or anyone who wants zero recurring costs, a fully local AI stack is the answer.
OpenClaw + Ollama delivers exactly that. Ollama runs large language models locally on your machine. OpenClaw connects to it as the AI backend. The result: a personal AI assistant that never touches the internet for inference, costs nothing to run (beyond your hardware), and keeps every conversation on your own disk.
What You'll Need
- A machine with at least 8GB RAM (16GB recommended)
- Linux, macOS, or Windows (WSL2)
- OpenClaw installed
- Ollama installed
- 30 minutes
Step 1: Install Ollama
If you haven't already, install Ollama on your machine. The quickest way on Linux:
curl -fsSL https://ollama.ai/install.sh | sh
For macOS, download the installer from ollama.ai. For Windows, use WSL2 and the Linux install command.
Step 2: Pull a Model
Ollama supports dozens of models. For a balance of speed and capability with OpenClaw, start with Llama 3.2 (3B) or Mistral 7B:
ollama pull llama3.2:3b
For better results (at the cost of more RAM), use:
ollama pull mistral ollama pull llama3.2:latest
Step 3: Configure OpenClaw for Ollama
OpenClaw's config file lets you set Ollama as the AI provider. Edit your openclaw.yaml (usually at ~/.openclaw/config.yaml):
agents:
defaults:
model: ollama/llama3.2:3b
provider: openai # OpenClaw uses OpenAI-compatible API format
apiBase: http://127.0.0.1:11434/v1
apiKey: ollama # Ollama doesn't require a real key
Ollama exposes a fully OpenAI-compatible API at http://127.0.0.1:11434/v1, so OpenClaw treats it like any other OpenAI provider — just pointed at your local machine.
Step 4: Restart and Test
Restart the OpenClaw gateway:
openclaw gateway restart
Then send a message to your assistant. If Ollama is running and the model is loaded, you'll get responses from your local AI — no internet connection required.
- Privacy: No data ever leaves your machine
- Cost: Zero API fees — one-time hardware cost only
- Offline: Works without internet access
- No rate limits: Unlimited conversations
- Custom models: Run Llama, Mistral, Phi, Gemma, or anything Ollama supports
When to Use Local vs Cloud
Local models are great for: private conversations, sensitive data analysis, offline environments, cost-sensitive setups, and prototyping.
Cloud models (Claude, GPT-4) are better for: complex reasoning, long-form content generation, detailed coding assistance, and tasks that benefit from a larger model's capability.
You can configure OpenClaw to use both — switch between your local model and a cloud provider depending on the task. Set Ollama as the default and Claude/GPT-4 as a fallback for complex queries.
Performance Notes
- 3B parameter models run well on 8GB RAM machines and respond in 1-3 seconds
- 7B models need 12-16GB for comfortable use
- 13B+ parameter models are better suited for machines with 24GB+ RAM or a GPU
- For best performance, use a machine with CUDA or Metal GPU acceleration
The beauty of this setup is that it scales with your hardware. Start with a tiny 3B model and upgrade as your infrastructure grows. OpenClaw stays the same — just change the model name in config.
Ready to get started with OpenClaw? Install OpenClaw →