Skip to main content

Documentation Index

Fetch the complete documentation index at: https://crewai-lorenze-imp-docs-improvements.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Overview

CrewAI integrates with multiple LLM providers through providers native sdks, giving you the flexibility to choose the right model for your specific use case. This guide will help you understand how to configure and use different LLM providers in your CrewAI projects.

When to Use Advanced LLM Configuration

  • You need strict control of latency, cost, and output format.
  • You need model routing by task type.
  • You need reproducible, policy-sensitive behavior in production.

When Not to Over-Configure

  • You are in early prototyping with one simple task path.
  • You do not yet need structured outputs or model routing.

What are LLMs?

Large Language Models (LLMs) are the core intelligence behind CrewAI agents. They enable agents to understand context, make decisions, and generate human-like responses. Here’s what you need to know:

LLM Basics

Large Language Models are AI systems trained on vast amounts of text data. They power the intelligence of your CrewAI agents, enabling them to understand and generate human-like text.

Context Window

The context window determines how much text an LLM can process at once. Larger windows (e.g., 128K tokens) allow for more context but may be more expensive and slower.

Temperature

Temperature (0.0 to 1.0) controls response randomness. Lower values (e.g., 0.2) produce more focused, deterministic outputs, while higher values (e.g., 0.8) increase creativity and variability.

Provider Selection

Each LLM provider (e.g., OpenAI, Anthropic, Google) offers different models with varying capabilities, pricing, and features. Choose based on your needs for accuracy, speed, and cost.

Setting up your LLM

There are different places in CrewAI code where you can specify the model to use. Once you specify the model you are using, you will need to provide the configuration (like an API key) for each of the model providers you use. See the provider configuration examples section for your provider.
The simplest way to get started. Set the model in your environment directly, through an .env file or in your app code. If you used crewai create to bootstrap your project, it will be set already.
.env
MODEL=model-id  # e.g. gpt-4o, gemini-2.0-flash, claude-3-sonnet-...

# Be sure to set your API keys here too. See the Provider
# section below.
Never commit API keys to version control. Use environment files (.env) or your system’s secret management.

Production LLM Patterns

The basics above show how to configure one model. In real systems, you usually combine several LLM patterns for cost, quality, and reliability.

Pattern 1: Route models by agent role

Use faster/cheaper models for extraction and heavier models for synthesis or critical decisions.
Code
from crewai import Agent, Crew, Process, Task

researcher = Agent(
    role="Researcher",
    goal="Collect factual inputs quickly",
    backstory="Fast information-gathering specialist",
    llm="openai/gpt-4o-mini",
)

reviewer = Agent(
    role="Reviewer",
    goal="Validate claims and produce final answer",
    backstory="Careful editor focused on correctness",
    llm="provider/model-id",
)

crew = Crew(
    agents=[researcher, reviewer],
    tasks=[
        Task(
            description="Find the latest policy changes and list the key points",
            expected_output="Bullet list of validated policy changes",
            agent=researcher,
        ),
        Task(
            description="Review findings and produce a final executive summary",
            expected_output="Concise, decision-ready summary",
            agent=reviewer,
        ),
    ],
    process=Process.sequential,
)

Pattern 2: Set reliability defaults once

Configure retry, timeout, and deterministic sampling in one reusable LLM object.
Code
from crewai import LLM

reliable_llm = LLM(
    model="openai/gpt-4o-mini",
    temperature=0.1,
    timeout=45,
    max_retries=3,
    max_tokens=1200,
    seed=7,
)
Use this for extraction, classification, and policy-sensitive tasks where variance should be low.

Pattern 3: Use structured outputs for machine-readable responses

For downstream automation, force JSON-shaped outputs rather than free-form prose.
Code
from crewai import LLM

json_llm = LLM(
    model="openai/gpt-4o",
    response_format={"type": "json"},
    temperature=0.0,
)
This reduces parser fragility in pipelines that feed APIs, databases, or workflow routers.

Pattern 4: Use OpenAI Responses API for multi-turn reasoning flows

When you need built-in tools, response chaining, or reasoning-model workflows, enable the Responses API explicitly.
Code
from crewai import LLM

reasoning_llm = LLM(
    model="openai/o4-mini",
    api="responses",
    auto_chain=True,
    store=True,
    reasoning_effort="medium",
)
This is especially useful in long-running assistants where you want conversation continuity and controllable reasoning depth.

Provider Configuration

For concept-level usage, keep provider setup minimal and explicit:
  1. Set provider credentials via environment variables.
  2. Pin model IDs explicitly in code or YAML.
  3. Set reliability defaults (timeout, max_retries, low temperature) for production.
Use these pages for deeper provider setup and runtime decisions:

Streaming Responses

CrewAI supports streaming responses from LLMs, allowing your application to receive and process outputs in real-time as they’re generated.
Enable streaming by setting the stream parameter to True when initializing your LLM:
from crewai import LLM

# Create an LLM with streaming enabled
llm = LLM(
    model="openai/gpt-4o",
    stream=True  # Enable streaming
)
When streaming is enabled, responses are delivered in chunks as they’re generated, creating a more responsive user experience.

Async LLM Calls

CrewAI supports asynchronous LLM calls for improved performance and concurrency in your AI workflows. Async calls allow you to run multiple LLM requests concurrently without blocking, making them ideal for high-throughput applications and parallel agent operations.
Use the acall method for asynchronous LLM requests:
import asyncio
from crewai import LLM

async def main():
    llm = LLM(model="openai/gpt-4o")

    # Single async call
    response = await llm.acall("What is the capital of France?")
    print(response)

asyncio.run(main())
The acall method supports all the same parameters as the synchronous call method, including messages, tools, and callbacks.

Structured LLM Calls

CrewAI supports structured responses from LLM calls by allowing you to define a response_format using a Pydantic model. This enables the framework to automatically parse and validate the output, making it easier to integrate the response into your application without manual post-processing. For example, you can define a Pydantic model to represent the expected response structure and pass it as the response_format when instantiating the LLM. The model will then be used to convert the LLM output into a structured Python object.
Code
from crewai import LLM

class Dog(BaseModel):
    name: str
    age: int
    breed: str


llm = LLM(model="gpt-4o", response_format=Dog)

response = llm.call(
    "Analyze the following messages and return the name, age, and breed. "
    "Meet Kona! She is 3 years old and is a black german shepherd."
)
print(response)

# Output:
# Dog(name='Kona', age=3, breed='black german shepherd')

Advanced Features and Optimization

Learn how to get the most out of your LLM configuration:
CrewAI includes smart context management features:
from crewai import LLM

# CrewAI automatically handles:
# 1. Token counting and tracking
# 2. Content summarization when needed
# 3. Task splitting for large contexts

llm = LLM(
    model="gpt-4",
    max_tokens=4000,  # Limit response length
)
Best practices for context management:
  1. Choose models with appropriate context windows
  2. Pre-process long inputs when possible
  3. Use chunking for large documents
  4. Monitor token usage to optimize costs
1

Token Usage Optimization

Choose the right context window for your task:
  • Small tasks (up to 4K tokens): Standard models
  • Medium tasks (between 4K-32K): Enhanced models
  • Large tasks (over 32K): Large context models
# Configure model with appropriate settings
llm = LLM(
    model="openai/gpt-4-turbo-preview",
    temperature=0.7,    # Adjust based on task
    max_tokens=4096,    # Set based on output needs
    timeout=300        # Longer timeout for complex tasks
)
  • Lower temperature (0.1 to 0.3) for factual responses
  • Higher temperature (0.7 to 0.9) for creative tasks
2

Best Practices

  1. Monitor token usage
  2. Implement rate limiting
  3. Use caching when possible
  4. Set appropriate max_tokens limits
Remember to regularly monitor your token usage and adjust your configuration as needed to optimize costs and performance.
CrewAI internally uses native sdks for LLM calls, which allows you to drop additional parameters that are not needed for your specific use case. This can help simplify your code and reduce the complexity of your LLM configuration. For example, if you don’t need to send the stop parameter, you can simply omit it from your LLM call:
from crewai import LLM
import os

os.environ["OPENAI_API_KEY"] = "<api-key>"

o3_llm = LLM(
    model="o3",
    drop_params=True,
    additional_drop_params=["stop"]
)
CrewAI provides message interceptors for several providers, allowing you to hook into request/response cycles at the transport layer.Supported Providers:
  • ✅ OpenAI
  • ✅ Anthropic
Basic Usage:
import httpx
from crewai import LLM
from crewai.llms.hooks import BaseInterceptor

class CustomInterceptor(BaseInterceptor[httpx.Request, httpx.Response]):
"""Custom interceptor to modify requests and responses."""

def on_outbound(self, request: httpx.Request) -> httpx.Request:
    """Print request before sending to the LLM provider."""
    print(request)
    return request

def on_inbound(self, response: httpx.Response) -> httpx.Response:
    """Process response after receiving from the LLM provider."""
    print(f"Status: {response.status_code}")
    print(f"Response time: {response.elapsed}")
    return response

# Use the interceptor with an LLM
llm = LLM(
model="openai/gpt-4o",
interceptor=CustomInterceptor()
)
Important Notes:
  • Both methods must return the received object or type of object.
  • Modifying received objects may result in unexpected behavior or application crashes.
  • Not all providers support interceptors - check the supported providers list above
Interceptors operate at the transport layer. This is particularly useful for:
  • Message transformation and filtering
  • Debugging API interactions

Common Issues and Solutions

Most authentication issues can be resolved by checking API key format and environment variable names.
# OpenAI
OPENAI_API_KEY=sk-...

# Anthropic
ANTHROPIC_API_KEY=sk-ant-...