Build Self-Hosted AI Agents: CrewAI with Ollama Tutorial

Creating AI agents often leads developers to default to OpenAI's services due to their simplicity - just generate an API key, add it as an environment variable, and you're ready to go. While GPT models offer excellent performance, this approach comes with drawbacks: ongoing API costs and limited model selection. What if you could build powerful autonomous AI agents that run entirely on your local machine without any usage fees? This tutorial will show you exactly how to implement local AI agents using CrewAI framework and Ollama.

Why Use Local AI Models for Your Agents?

Before diving into implementation, let's understand the advantages of local AI deployment. Using Ollama with CrewAI provides several benefits for your AI agent development workflow:

Zero API costs - run models as much as you want without usage fees
Complete privacy - your data never leaves your machine
Access to diverse open-source models beyond GPT
No internet dependency after initial model download
Full control over model selection and configuration
Ability to experiment with different model sizes based on your hardware

The tradeoff is computational requirements - larger models demand more powerful hardware. For development and testing, smaller models like Llama-3 8B or Phi-3 Mini provide a good balance between performance and resource usage.

Setting Up Ollama for Local AI Model Hosting

Ollama is an open-source tool that simplifies running large language models locally. It handles downloading, setup, and serves models through a consistent API interface. Here's how to get started:

Visit ollama.com and download the application for your operating system
Install and launch Ollama - you'll see a small llama icon in your system tray/menu bar indicating the server is running
Browse available models on the Ollama website or through the application
Choose a model that matches your hardware capabilities (smaller models for average machines, larger ones for powerful systems)

Browsing and selecting appropriate Ollama models based on your hardware capabilities

For this tutorial, we'll use the Phi-3 (53) model with 3.8 billion parameters, which offers a good balance between performance and resource requirements. More powerful computers can experiment with the 14 billion parameter version for potentially better results.

Downloading Your Model with Ollama CLI

Once Ollama is running, you'll need to download your chosen model. Open your terminal and use the Ollama CLI command:

BASH

ollama pull phi3:latest

Or for a specific model version:

BASH

ollama pull phi3:3.8b

The download may take several minutes depending on your internet connection and the model size. Once completed, the model will be available locally for use with your AI agents.

Integrating Ollama with CrewAI Framework

Now let's integrate our local Ollama model with the CrewAI framework. We'll need to create a connection between CrewAI and our local Ollama server using LangChain's OpenAI compatibility layer.

PYTHON

from langchain.chat_models.openai import ChatOpenAI
import os

# Set a dummy OpenAI API key (required but not actually used)
os.environ["OPENAI_API_KEY"] = "sk-dummy-key-111"

# Configure the LLM to use our local Ollama model
llm = ChatOpenAI(
    model="phi3:3.8b",  # The model we downloaded with Ollama
    base_url="http://localhost:11434/v1",  # Ollama's local API endpoint
)

Setting up API connection between CrewAI and our local Ollama LLM server

The key aspects of this configuration are:

Using LangChain's ChatOpenAI class for compatibility
Setting the model parameter to match your downloaded Ollama model
Configuring the base_url to point to your local Ollama server
Including a dummy OpenAI API key (required by the library but not actually used)

Creating a CrewAI Agent with Local LLM

Now we can create our CrewAI agent that uses the local Ollama model instead of OpenAI's APIs. Here's a complete example of setting up an information agent:

PYTHON

from crewai import Agent, Crew, Task
from langchain.chat_models.openai import ChatOpenAI
import os

# Set dummy OpenAI API key
os.environ["OPENAI_API_KEY"] = "sk-dummy-key-111"

# Configure local LLM
llm = ChatOpenAI(
    model="phi3:3.8b",
    base_url="http://localhost:11434/v1",
)

# Create an information agent
information_agent = Agent(
    role="Marine Biology Expert",
    goal="Provide detailed information about marine creatures",
    backstory="You are a renowned marine biologist with extensive knowledge of all sea creatures. You've spent decades studying marine life and can provide comprehensive information about any aquatic species.",
    llm=llm  # Inject our local LLM
)

# Create a task for the agent
research_task = Task(
    description="Research and provide detailed information about the box jellyfish, including its habitat, characteristics, and dangers.",
    agent=information_agent
)

# Create a crew with our agent
crew = Crew(
    agents=[information_agent],
    tasks=[research_task],
    verbose=True
)

# Execute the crew
result = crew.kickoff()
print(result)

The critical difference from a standard OpenAI implementation is the llm=llm parameter passed to the Agent constructor. This tells CrewAI to use our local Ollama model instead of making API calls to OpenAI.

Testing Your Local AI Agent

When you run the code, your agent will process the task using the local Ollama model. Depending on your hardware, this may take longer than cloud-based APIs, but you'll get results without any usage costs.

Results from our local AI agent providing information about box jellyfish using Ollama

In our example, the agent successfully provided detailed information about box jellyfish, including a summary and bullet points about its characteristics, habitat, and dangers - all processed locally without any API calls or costs.

Performance Considerations for Local AI Agents

When working with local AI models, performance is an important consideration. Here are some tips to optimize your experience:

Match model size to your hardware - smaller models (3-7B parameters) for average machines, larger models (13B+) for high-end systems
Expect longer processing times compared to cloud APIs - our test took about 2.5 minutes on a mid-range machine
For development and testing, consider using smaller models or OpenAI's APIs, then switch to local models for production
Close resource-intensive applications while running local models
Consider quantized versions of models (like 4-bit or 8-bit) which use less memory at the cost of slight quality reduction

Expanding Your Local AI Agent Capabilities

Once you've mastered the basics of local AI agents with CrewAI and Ollama, you can expand your implementation in several ways:

Create multiple agents using different models - use smaller models for simpler tasks and larger models for complex reasoning
Implement agent orchestration where multiple specialized agents collaborate on complex tasks
Explore other local AI frameworks like LM Studio or Text Generation WebUI for different model options
Add tools and APIs to your agents for real-world capabilities like web searching, data analysis, or system automation
Containerize your solution with Docker for easier deployment across different environments

Conclusion

By combining CrewAI with Ollama, you can create powerful, autonomous AI agents that run entirely on your local machine. This approach eliminates API costs, enhances privacy, and gives you complete control over your AI models. While local deployment comes with hardware requirements and potential performance tradeoffs, the benefits of self-hosted AI solutions make it an attractive option for many applications.

Whether you're building AI agents for personal projects, business automation, content creation, or local SEO optimization, this local deployment approach provides flexibility and cost-effectiveness that cloud-based solutions can't match. As open-source AI models continue to improve, the gap between proprietary and self-hosted solutions will only narrow, making local AI agent orchestration an increasingly viable option for developers and businesses alike.

How to Build Powerful Self-Hosted AI Agents Using CrewAI with Ollama Integration