Post-Booking Customer Service Assistant (Demo)

This is a demonstration of an LLM application architecture that combines a "Head-Tail + Rolling Summary Memory" context strategy with Context-Isolated Subagents.

The system addresses the "Lost in the Middle" problem and mitigates context window bloat by keeping a small, active conversation window (The Tail) and periodically condensing older conversation turns into a flat memory text file (The Middle).

App Structure

  context_manager/
  ├── app/
  │   ├── main.py                  # FastAPI application setup
  │   ├── api/
  │   │   ├── chat.py              # Chat endpoint, handles handoffs & memory
  │   │   └── business.py          # Upload endpoints (/upload-docs)
  │   ├── agents/
  │   │   ├── orchestrator.py      # Main router agent
  │   │   ├── faq_agent.py         # Subagent for business rules
  │   │   └── accommodation.py     # Subagent for special requests
  │   ├── memory/
  │   │   ├── context_manager.py   # Assembles Head, Middle (retrieved), and Tail
  │   │   ├── memory_updater.py    # Rolling memory compression as background task
  │   │   ├── vector_store.py      # Vector DB interface (Chroma/Qdrant)
  │   │   └── sql_db.py            # Relational DB for booking state/Head context
  │   └── schemas/
  │       └── models.py            # Pydantic models for API validation
  ├── requirements.txt             # Python libraries
  └── .env                         # API keys

Core Architectural Pillars

1. The Head-Tail + Rolling Summary Strategy

Instead of running expensive vector database queries or generating text embeddings on every single message turn, the context window is constructed dynamically:

The Head (Fixed/System Prompt): Contains instructions, the user's booking ID, and current relational state.
The Middle (Rolling Summary File): A static text file (local_memory/{session_id}_memory.txt) updated every $N$ turns containing key compressed facts.
The Tail (Unsummarized Messages): The last $N$ raw, uncompressed turns.

Why This Design is Highly Efficient:

Extremely Low Latency: For 9 out of 10 messages, the system reads a plain text file. No database latency or embeddings generation blocks the chat loop.
"Breathing" Context Window: Token consumption resembles a sawtooth wave. It grows slightly with each turn, then drops back to near-zero as soon as the threshold is hit and messages are compressed.
Cheaper Models for Maintenance: The memory compaction background task uses a fast, low-cost model (like gpt-4o-mini), leaving the smarter model (like gpt-4o) free to handle complex routing.

2. Context-Isolated Subagents

Instead of passing the entire conversation history down to subagents:

The Orchestrator evaluates the query and returns a concise, single-sentence task_summary.
The Context Manager intercepts the handoff and constructs a pristine, isolated prompt tailored specifically to the target subagent.
The Subagents receive only the specific data or system parameters required for their specific actions, keeping confusion and latency to an absolute minimum.

Getting Started

This codebase acts as a structural prototype with mock integrations and stubs for standard database and vector store calls.

Installation

Install the required packages:

pip install -r requirements.txt

Run the FastAPI Server

Launch the development server from the post_booking_agent folder:

uvicorn app.main:app --reload

The server will start up on http://127.0.0.1:8000. You can explore the automated interactive documentation via Swagger at http://127.0.0.1:8000/docs.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
.env		.env
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Post-Booking Customer Service Assistant (Demo)

App Structure

Core Architectural Pillars

1. The Head-Tail + Rolling Summary Strategy

Why This Design is Highly Efficient:

2. Context-Isolated Subagents

Getting Started

Installation

Run the FastAPI Server

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Post-Booking Customer Service Assistant (Demo)

App Structure

Core Architectural Pillars

1. The Head-Tail + Rolling Summary Strategy

Why This Design is Highly Efficient:

2. Context-Isolated Subagents

Getting Started

Installation

Run the FastAPI Server

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages