Skip to content

faltastic/context-manager-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Post-Booking Customer Service Assistant (Demo)

This is a demonstration of an LLM application architecture that combines a "Head-Tail + Rolling Summary Memory" context strategy with Context-Isolated Subagents.

The system addresses the "Lost in the Middle" problem and mitigates context window bloat by keeping a small, active conversation window (The Tail) and periodically condensing older conversation turns into a flat memory text file (The Middle).


App Structure

  context_manager/
  ├── app/
  │   ├── main.py                  # FastAPI application setup
  │   ├── api/
  │   │   ├── chat.py              # Chat endpoint, handles handoffs & memory
  │   │   └── business.py          # Upload endpoints (/upload-docs)
  │   ├── agents/
  │   │   ├── orchestrator.py      # Main router agent
  │   │   ├── faq_agent.py         # Subagent for business rules
  │   │   └── accommodation.py     # Subagent for special requests
  │   ├── memory/
  │   │   ├── context_manager.py   # Assembles Head, Middle (retrieved), and Tail
  │   │   ├── memory_updater.py    # Rolling memory compression as background task
  │   │   ├── vector_store.py      # Vector DB interface (Chroma/Qdrant)
  │   │   └── sql_db.py            # Relational DB for booking state/Head context
  │   └── schemas/
  │       └── models.py            # Pydantic models for API validation
  ├── requirements.txt             # Python libraries
  └── .env                         # API keys 

Core Architectural Pillars

1. The Head-Tail + Rolling Summary Strategy

Instead of running expensive vector database queries or generating text embeddings on every single message turn, the context window is constructed dynamically:

  • The Head (Fixed/System Prompt): Contains instructions, the user's booking ID, and current relational state.
  • The Middle (Rolling Summary File): A static text file (local_memory/{session_id}_memory.txt) updated every $N$ turns containing key compressed facts.
  • The Tail (Unsummarized Messages): The last $N$ raw, uncompressed turns.

Why This Design is Highly Efficient:

  • Extremely Low Latency: For 9 out of 10 messages, the system reads a plain text file. No database latency or embeddings generation blocks the chat loop.
  • "Breathing" Context Window: Token consumption resembles a sawtooth wave. It grows slightly with each turn, then drops back to near-zero as soon as the threshold is hit and messages are compressed.
  • Cheaper Models for Maintenance: The memory compaction background task uses a fast, low-cost model (like gpt-4o-mini), leaving the smarter model (like gpt-4o) free to handle complex routing.

2. Context-Isolated Subagents

Instead of passing the entire conversation history down to subagents:

  1. The Orchestrator evaluates the query and returns a concise, single-sentence task_summary.
  2. The Context Manager intercepts the handoff and constructs a pristine, isolated prompt tailored specifically to the target subagent.
  3. The Subagents receive only the specific data or system parameters required for their specific actions, keeping confusion and latency to an absolute minimum.

Getting Started

This codebase acts as a structural prototype with mock integrations and stubs for standard database and vector store calls.

Installation

Install the required packages:

pip install -r requirements.txt

Run the FastAPI Server

Launch the development server from the post_booking_agent folder:

uvicorn app.main:app --reload

The server will start up on http://127.0.0.1:8000. You can explore the automated interactive documentation via Swagger at http://127.0.0.1:8000/docs.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages