Documentation Index
Fetch the complete documentation index at: https://crewai-lorenze-imp-docs-improvements.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Overview
This guide shows a practical pattern for managing LLM chat history with Flow state:
- Keep recent turns in a sliding window
- Summarize older turns into a compact running summary
- Persist state automatically with
@persist()
- Keep optional long-term recall using Flow memory
Why this pattern works
Naively appending every message to prompts causes token bloat and unstable behavior over long sessions. A better approach is:
- Keep only the most recent turns in
state.messages
- Move older turns into
state.running_summary
- Build prompts from
running_summary + recent messages
Prerequisites
- CrewAI installed and configured
- API key configured for your model provider
- Basic familiarity with Flow decorators (
@start, @listen)
Step 1: Define typed chat state
from typing import Dict, List
from pydantic import BaseModel, Field
class ChatSessionState(BaseModel):
session_id: str = "demo-session"
running_summary: str = ""
messages: List[Dict[str, str]] = Field(default_factory=list)
max_recent_messages: int = 8
last_user_message: str = ""
assistant_reply: str = ""
turn_count: int = 0
Step 2: Build the Flow
from crewai.flow.flow import Flow, start, listen
from crewai.flow.persistence import persist
from litellm import completion
@persist()
class ChatHistoryFlow(Flow[ChatSessionState]):
model = "gpt-4o-mini"
@start()
def capture_user_message(self):
self.state.last_user_message = self.state.last_user_message.strip()
self.state.messages.append(
{"role": "user", "content": self.state.last_user_message}
)
self.state.turn_count += 1
return self.state.last_user_message
@listen(capture_user_message)
def compact_old_history(self, _):
if len(self.state.messages) <= self.state.max_recent_messages:
return "no_compaction"
overflow = self.state.messages[:-self.state.max_recent_messages]
self.state.messages = self.state.messages[-self.state.max_recent_messages :]
overflow_text = "\n".join(
f"{m['role']}: {m['content']}" for m in overflow
)
summary_prompt = [
{
"role": "system",
"content": "Summarize old chat turns into short bullet points. Preserve facts, constraints, and decisions.",
},
{
"role": "user",
"content": (
f"Existing summary:\n{self.state.running_summary or '(empty)'}\n\n"
f"New old turns:\n{overflow_text}"
),
},
]
summary_response = completion(model=self.model, messages=summary_prompt)
self.state.running_summary = summary_response["choices"][0]["message"]["content"]
return "compacted"
@listen(compact_old_history)
def generate_reply(self, _):
system_context = (
"You are a helpful assistant.\n"
f"Conversation summary so far:\n{self.state.running_summary or '(none)'}"
)
response = completion(
model=self.model,
messages=[{"role": "system", "content": system_context}, *self.state.messages],
)
answer = response["choices"][0]["message"]["content"]
self.state.assistant_reply = answer
self.state.messages.append({"role": "assistant", "content": answer})
# Optional: store key turns in long-term memory for later recall
self.remember(
f"Session {self.state.session_id} turn {self.state.turn_count}: "
f"user={self.state.last_user_message} assistant={answer}",
scope=f"/chat/{self.state.session_id}",
)
return answer
Step 3: Run it
flow = ChatHistoryFlow()
first = flow.kickoff(
inputs={
"session_id": "customer-42",
"last_user_message": "I need help choosing a pricing plan for a 10-person team.",
}
)
print("Assistant:", first)
second = flow.kickoff(
inputs={
"last_user_message": "We also need SSO and audit logs. What do you recommend now?",
}
)
print("Assistant:", second)
print("Turns:", flow.state.turn_count)
print("Recent messages:", len(flow.state.messages))
Expected output (shape)
Assistant: ...initial recommendation...
Assistant: ...updated recommendation with SSO and audit-log requirements...
Turns: 2
Recent messages: 4
Troubleshooting
- If replies ignore earlier context:
increase
max_recent_messages and ensure running_summary is included in the system context.
- If prompts become too large:
lower
max_recent_messages and summarize more aggressively.
- If sessions collide:
provide a stable
session_id and isolate memory scope with /chat/{session_id}.
Next steps
- Add tool calls for account lookup or product catalog retrieval
- Route to human review for high-risk decisions
- Add structured output to capture recommendations in machine-readable JSON