Skip to main content

Documentation Index

Fetch the complete documentation index at: https://crewai-lorenze-imp-docs-improvements.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Overview

This guide shows a practical pattern for managing LLM chat history with Flow state:
  • Keep recent turns in a sliding window
  • Summarize older turns into a compact running summary
  • Persist state automatically with @persist()
  • Keep optional long-term recall using Flow memory

Why this pattern works

Naively appending every message to prompts causes token bloat and unstable behavior over long sessions. A better approach is:
  1. Keep only the most recent turns in state.messages
  2. Move older turns into state.running_summary
  3. Build prompts from running_summary + recent messages

Prerequisites

  1. CrewAI installed and configured
  2. API key configured for your model provider
  3. Basic familiarity with Flow decorators (@start, @listen)

Step 1: Define typed chat state

Code
from typing import Dict, List
from pydantic import BaseModel, Field


class ChatSessionState(BaseModel):
    session_id: str = "demo-session"
    running_summary: str = ""
    messages: List[Dict[str, str]] = Field(default_factory=list)
    max_recent_messages: int = 8
    last_user_message: str = ""
    assistant_reply: str = ""
    turn_count: int = 0

Step 2: Build the Flow

Code
from crewai.flow.flow import Flow, start, listen
from crewai.flow.persistence import persist
from litellm import completion


@persist()
class ChatHistoryFlow(Flow[ChatSessionState]):
    model = "gpt-4o-mini"

    @start()
    def capture_user_message(self):
        self.state.last_user_message = self.state.last_user_message.strip()
        self.state.messages.append(
            {"role": "user", "content": self.state.last_user_message}
        )
        self.state.turn_count += 1
        return self.state.last_user_message

    @listen(capture_user_message)
    def compact_old_history(self, _):
        if len(self.state.messages) <= self.state.max_recent_messages:
            return "no_compaction"

        overflow = self.state.messages[:-self.state.max_recent_messages]
        self.state.messages = self.state.messages[-self.state.max_recent_messages :]
        overflow_text = "\n".join(
            f"{m['role']}: {m['content']}" for m in overflow
        )

        summary_prompt = [
            {
                "role": "system",
                "content": "Summarize old chat turns into short bullet points. Preserve facts, constraints, and decisions.",
            },
            {
                "role": "user",
                "content": (
                    f"Existing summary:\n{self.state.running_summary or '(empty)'}\n\n"
                    f"New old turns:\n{overflow_text}"
                ),
            },
        ]
        summary_response = completion(model=self.model, messages=summary_prompt)
        self.state.running_summary = summary_response["choices"][0]["message"]["content"]
        return "compacted"

    @listen(compact_old_history)
    def generate_reply(self, _):
        system_context = (
            "You are a helpful assistant.\n"
            f"Conversation summary so far:\n{self.state.running_summary or '(none)'}"
        )

        response = completion(
            model=self.model,
            messages=[{"role": "system", "content": system_context}, *self.state.messages],
        )
        answer = response["choices"][0]["message"]["content"]

        self.state.assistant_reply = answer
        self.state.messages.append({"role": "assistant", "content": answer})

        # Optional: store key turns in long-term memory for later recall
        self.remember(
            f"Session {self.state.session_id} turn {self.state.turn_count}: "
            f"user={self.state.last_user_message} assistant={answer}",
            scope=f"/chat/{self.state.session_id}",
        )
        return answer

Step 3: Run it

Code
flow = ChatHistoryFlow()

first = flow.kickoff(
    inputs={
        "session_id": "customer-42",
        "last_user_message": "I need help choosing a pricing plan for a 10-person team.",
    }
)
print("Assistant:", first)

second = flow.kickoff(
    inputs={
        "last_user_message": "We also need SSO and audit logs. What do you recommend now?",
    }
)
print("Assistant:", second)
print("Turns:", flow.state.turn_count)
print("Recent messages:", len(flow.state.messages))

Expected output (shape)

Output
Assistant: ...initial recommendation...
Assistant: ...updated recommendation with SSO and audit-log requirements...
Turns: 2
Recent messages: 4

Troubleshooting

  • If replies ignore earlier context: increase max_recent_messages and ensure running_summary is included in the system context.
  • If prompts become too large: lower max_recent_messages and summarize more aggressively.
  • If sessions collide: provide a stable session_id and isolate memory scope with /chat/{session_id}.

Next steps

  • Add tool calls for account lookup or product catalog retrieval
  • Route to human review for high-risk decisions
  • Add structured output to capture recommendations in machine-readable JSON