Documentation Index
Fetch the complete documentation index at: https://crewai-lorenze-imp-docs-improvements.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Overview
CrewAI integrates with multiple LLM providers through providers native sdks, giving you the flexibility to choose the right model for your specific use case. This guide will help you understand how to configure and use different LLM providers in your CrewAI projects.When to Use Advanced LLM Configuration
- You need strict control of latency, cost, and output format.
- You need model routing by task type.
- You need reproducible, policy-sensitive behavior in production.
When Not to Over-Configure
- You are in early prototyping with one simple task path.
- You do not yet need structured outputs or model routing.
What are LLMs?
Large Language Models (LLMs) are the core intelligence behind CrewAI agents. They enable agents to understand context, make decisions, and generate human-like responses. Here’s what you need to know:LLM Basics
Large Language Models are AI systems trained on vast amounts of text data. They power the intelligence of your CrewAI agents, enabling them to understand and generate human-like text.
Context Window
The context window determines how much text an LLM can process at once. Larger windows (e.g., 128K tokens) allow for more context but may be more expensive and slower.
Temperature
Temperature (0.0 to 1.0) controls response randomness. Lower values (e.g., 0.2) produce more focused, deterministic outputs, while higher values (e.g., 0.8) increase creativity and variability.
Provider Selection
Each LLM provider (e.g., OpenAI, Anthropic, Google) offers different models with varying capabilities, pricing, and features. Choose based on your needs for accuracy, speed, and cost.
Setting up your LLM
There are different places in CrewAI code where you can specify the model to use. Once you specify the model you are using, you will need to provide the configuration (like an API key) for each of the model providers you use. See the provider configuration examples section for your provider.- 1. Environment Variables
- 2. YAML Configuration
- 3. Direct Code
The simplest way to get started. Set the model in your environment directly, through an
.env file or in your app code. If you used crewai create to bootstrap your project, it will be set already..env
Production LLM Patterns
The basics above show how to configure one model. In real systems, you usually combine several LLM patterns for cost, quality, and reliability.Pattern 1: Route models by agent role
Use faster/cheaper models for extraction and heavier models for synthesis or critical decisions.Code
Pattern 2: Set reliability defaults once
Configure retry, timeout, and deterministic sampling in one reusableLLM object.
Code
Pattern 3: Use structured outputs for machine-readable responses
For downstream automation, force JSON-shaped outputs rather than free-form prose.Code
Pattern 4: Use OpenAI Responses API for multi-turn reasoning flows
When you need built-in tools, response chaining, or reasoning-model workflows, enable the Responses API explicitly.Code
Provider Configuration
For concept-level usage, keep provider setup minimal and explicit:- Set provider credentials via environment variables.
- Pin model IDs explicitly in code or YAML.
- Set reliability defaults (
timeout,max_retries, lowtemperature) for production.
- Connections and provider setup: /en/learn/llm-connections
- Custom provider integration: /en/learn/custom-llm
- Production routing and reliability patterns: /en/ai/llms/patterns
- Parameter contract reference: /en/ai/llms/reference
Streaming Responses
CrewAI supports streaming responses from LLMs, allowing your application to receive and process outputs in real-time as they’re generated.- Basic Setup
- Event Handling
- Agent & Task Tracking
Enable streaming by setting the When streaming is enabled, responses are delivered in chunks as they’re generated, creating a more responsive user experience.
stream parameter to True when initializing your LLM:Async LLM Calls
CrewAI supports asynchronous LLM calls for improved performance and concurrency in your AI workflows. Async calls allow you to run multiple LLM requests concurrently without blocking, making them ideal for high-throughput applications and parallel agent operations.- Basic Usage
- With Streaming
Use the The
acall method for asynchronous LLM requests:acall method supports all the same parameters as the synchronous call method, including messages, tools, and callbacks.Structured LLM Calls
CrewAI supports structured responses from LLM calls by allowing you to define aresponse_format using a Pydantic model. This enables the framework to automatically parse and validate the output, making it easier to integrate the response into your application without manual post-processing.
For example, you can define a Pydantic model to represent the expected response structure and pass it as the response_format when instantiating the LLM. The model will then be used to convert the LLM output into a structured Python object.
Code
Advanced Features and Optimization
Learn how to get the most out of your LLM configuration:Context Window Management
Context Window Management
CrewAI includes smart context management features:
Best practices for context management:
- Choose models with appropriate context windows
- Pre-process long inputs when possible
- Use chunking for large documents
- Monitor token usage to optimize costs
Performance Optimization
Performance Optimization
Token Usage Optimization
Choose the right context window for your task:
- Small tasks (up to 4K tokens): Standard models
- Medium tasks (between 4K-32K): Enhanced models
- Large tasks (over 32K): Large context models
Remember to regularly monitor your token usage and adjust your configuration as needed to optimize costs and performance.
Drop Additional Parameters
Drop Additional Parameters
CrewAI internally uses native sdks for LLM calls, which allows you to drop additional parameters that are not needed for your specific use case. This can help simplify your code and reduce the complexity of your LLM configuration.
For example, if you don’t need to send the
stop parameter, you can simply omit it from your LLM call:Transport Interceptors
Transport Interceptors
CrewAI provides message interceptors for several providers, allowing you to hook into request/response cycles at the transport layer.Supported Providers:Important Notes:
- ✅ OpenAI
- ✅ Anthropic
- Both methods must return the received object or type of object.
- Modifying received objects may result in unexpected behavior or application crashes.
- Not all providers support interceptors - check the supported providers list above
Interceptors operate at the transport layer. This is particularly useful for:
- Message transformation and filtering
- Debugging API interactions
Common Issues and Solutions
- Authentication
- Model Names
- Context Length
