Documentation Index
Fetch the complete documentation index at: https://crewai-lorenze-imp-docs-improvements.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Testing is a crucial part of the development process, and it is essential to ensure that your crew is performing as expected. With crewAI, you can easily test your crew and evaluate its performance using the built-in testing capabilities.When to Use Testing
- Before promoting a crew to production.
- After changing prompts, tools, or model configurations.
- When benchmarking quality/cost/latency tradeoffs.
When Not to Rely on Testing Alone
- For safety-critical deployments without human review gates.
- When test datasets are too small or unrepresentative.
Using the Testing Feature
Use the CLI commandcrewai test to run repeated crew executions and compare outputs across iterations. The parameters are n_iterations and model, which are optional and default to 2 and gpt-4o-mini.
crewai test command, the crew will be executed for the specified number of iterations, and the performance metrics will be displayed at the end of the run.
A table of scores at the end will show the performance of the crew in terms of the following metrics:
| Tasks/Crew/Agents | Run 1 | Run 2 | Avg. Total | Agents | Additional Info |
|---|---|---|---|---|---|
| Task 1 | 9.0 | 9.5 | 9.2 | Professional Insights | |
| Researcher | |||||
| Task 2 | 9.0 | 10.0 | 9.5 | Company Profile Investigator | |
| Task 3 | 9.0 | 9.0 | 9.0 | Automation Insights | |
| Specialist | |||||
| Task 4 | 9.0 | 9.0 | 9.0 | Final Report Compiler | Automation Insights Specialist |
| Crew | 9.00 | 9.38 | 9.2 | ||
| Execution Time (s) | 126 | 145 | 135 |
Common Failure Modes
Scores fluctuate too much between runs
- Cause: high sampling randomness or unstable prompts.
- Fix: lower temperature and tighten output constraints.
Good test scores but poor production quality
- Cause: test prompts do not match real workload.
- Fix: build a representative test set from real production inputs.
