Skip to content

[GH-1102] Fix dangling episode_ids#1151

Merged
o-love merged 2 commits into
mainfrom
fix/episode-delete-semantic-refs
Mar 3, 2026
Merged

[GH-1102] Fix dangling episode_ids#1151
o-love merged 2 commits into
mainfrom
fix/episode-delete-semantic-refs

Conversation

@o-love

@o-love o-love commented Feb 27, 2026

Copy link
Copy Markdown
Contributor

Purpose of the change

Fix dangling episode_id references in semantic storage that cause background ingestion crashes. When sessions are deleted, the episode history and citation references in semantic storage were not being cleaned up, leaving orphaned episode_ids that caused ValueError failures during subsequent ingestion runs.

Description

The root cause was that delete_session removed episodes from the episode store but left behind their references in semantic storage (history entries and feature citations). When the ingestion process later tried to retrieve messages for these now-missing episode IDs, it crashed.

This PR makes the following changes:

Core fix (memmachine.py):

  • Session deletion now fetches episode IDs before deleting episodes, then calls a new _cleanup_semantic_history method to remove history and citations for those IDs.
  • Replaces the bulk delete_episode_messages call with targeted get_episode_messages + delete_episodes so we capture the IDs first.
  • Removes the delete_all_project_messages call from _delete_semantic_memory since cleanup is now episode-ID-driven.

Storage layer (neo4j_semantic_storage.py, sqlalchemy_pgvector_semantic.py):

  • delete_history now also removes citation references from features, not just history entries.
  • Neo4j uses a single Cypher query with DETACH DELETE for history and citation pruning.
  • PGVector deletes from the citation_association_table before removing history rows.

Removed dead code:

  • delete_history_set removed from the storage base, both storage backends, in-memory test storage, and mock objects.
  • delete_messages and delete_all_project_messages/delete_all_org_messages removed from SemanticMemory and SemanticSessionManager since they are superseded by the episode-ID-based cleanup.

Tests:

  • New test_semantic_history_cleanup.py unit test verifying delete_history removes both history and citations.
  • New test_memmachine_delete_session.py integration test verifying end-to-end session deletion cleans up semantic references.
  • Removed tests for the deleted delete_history_set method.
  • Updated conftest.py to remove calls to deleted methods.

Fixes/Closes

Fixes #1102

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

  • Unit Test
  • Integration Test

New tests added:

  • tests/memmachine/semantic_memory/test_semantic_history_cleanup.py - verifies delete_history removes history entries and prunes citations from features.
  • tests/memmachine/main/test_memmachine_delete_session.py - end-to-end integration test that creates a session with episodes, adds citations, deletes the session, and verifies all semantic references are cleaned up.

Checklist

  • I have signed the commit(s) within this pull request
  • My code follows the style guidelines of this project (See STYLE_GUIDE.md)
  • I have performed a self-review of my own code
  • I have commented my code
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Screenshots/Gifs

N/A

Further comments

  • The delete_history_set method and delete_all_project_messages/delete_all_org_messages methods were removed as they are no longer needed. Cleanup is now driven by specific episode IDs rather than set-level bulk operations, which is more precise and avoids the original bug.
  • The _cleanup_semantic_history method in memmachine.py gracefully handles ResourceNotReadyError by logging and skipping, so session deletion does not fail if the semantic service is temporarily unavailable.

@o-love o-love force-pushed the fix/episode-delete-semantic-refs branch from fe1829f to dc4891c Compare February 28, 2026 00:01
@o-love o-love marked this pull request as ready for review February 28, 2026 00:08

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes background ingestion crashes by ensuring session deletion also cleans up semantic-history references (including feature citations) tied to deleted episodes.

Changes:

  • Add episode-ID-driven semantic cleanup during delete_session to prevent dangling episode_id references.
  • Update Neo4j + PGVector delete_history to prune both history entries and feature citations.
  • Remove deprecated set-based deletion APIs and update/add tests for the new cleanup behavior.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/memmachine/semantic_memory/test_semantic_history_cleanup.py Adds a unit test ensuring delete_history removes history + citations
tests/memmachine/semantic_memory/storage/test_semantic_storage.py Removes tests for deleted delete_history_set API
tests/memmachine/semantic_memory/storage/in_memory_semantic_storage.py Updates in-memory test storage delete_history to also prune citations
tests/memmachine/semantic_memory/mock_semantic_memory_objects.py Removes mock method for deleted delete_history_set
tests/memmachine/main/test_memmachine_delete_session.py Adds integration test validating session deletion cleans semantic references
tests/memmachine/main/conftest.py Updates test setup/teardown to stop calling removed project-message deletion
src/memmachine/semantic_memory/storage/storage_base.py Removes delete_history_set from storage interface; clarifies delete_history intent
src/memmachine/semantic_memory/storage/sqlalchemy_pgvector_semantic.py Ensures delete_history also deletes citation associations
src/memmachine/semantic_memory/storage/neo4j_semantic_storage.py Implements combined Cypher cleanup for history + citations
src/memmachine/semantic_memory/semantic_session_manager.py Removes set-based bulk message deletion APIs
src/memmachine/semantic_memory/semantic_memory.py Removes delete_messages and stops calling deleted set-based history deletion
src/memmachine/main/memmachine.py Adds _cleanup_semantic_history and runs it during session deletion

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/memmachine/semantic_memory/test_semantic_history_cleanup.py
Comment thread tests/memmachine/main/test_memmachine_delete_session.py
Comment thread tests/memmachine/main/test_memmachine_delete_session.py
Comment thread src/memmachine/main/memmachine.py
Comment thread src/memmachine/main/memmachine.py Outdated
Comment on lines +356 to +362
episodes = await episode_store.get_episode_messages(
filter_expr=session_filter,
)
episode_ids = [episode.uid for episode in episodes]
if episode_ids:
await episode_store.delete_episodes(episode_ids)
await self._cleanup_semantic_history(episode_ids)

Copilot AI Feb 28, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fetches full episode messages just to extract IDs, which could be expensive for large sessions. Consider adding/using an episode-store API that returns only episode IDs (or supports server-side deletion that returns deleted IDs), to avoid loading message payloads during session deletion.

Copilot uses AI. Check for mistakes.
Comment thread src/memmachine/semantic_memory/semantic_memory.py
@o-love o-love self-assigned this Feb 28, 2026

@sscargal sscargal left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR appears correct and safe for small-to-moderate deletes. However, for very large deletes (1,000+), I can see we're going to run into trouble.

  • get_episode_messages loads everything into memory at once (no pagination). The call to get_episode_messages uses no page_size or page_num, so for 10,000 episodes, the entire result set is materialized as a Python list of Episode objects in application memory before any deletion occurs.
  • delete_episodes(episode_ids) sends all 10,000 IDs in a single query
    The old delete_episode_messages used a filter expression (a WHERE clause). The new approach collects all IDs and passes them to delete_episodes(episode_ids). For 10,000 items, this translates to a single SQL DELETE ... WHERE id IN (?, ?, ..., ?) with 10,000 bind parameters — or a single Cypher/graph equivalent.
  • delete_history in Neo4j sends all 10,000 IDs in a single Cypher query. For 10,000 IDs, the second MATCH (f:Feature) WHERE any(...) is a full table scan of all Feature nodes, filtering each one in memory within Neo4j. This becomes a very expensive operation as the Feature graph grows. This is also a single non-batched transaction — if it times out, nothing is deleted.
  • delete_history in PGVector opens two sequential DELETE statements in one session. Both DELETE ... WHERE history_id IN (...) statements carry all 10,000 IDs. Wrapping them in a single transaction ensures atomicity, but the transaction will hold locks on both the citation_association_table and set_ingested_history tables until commit. At 10,000 rows, this is a long-held write lock that could block other concurrent DB operations.

As noted in the concerns above, there is a silent risk of connection pool exhaustion. See Issue #1111 for an example of DELETE. We've seen similar in the SEARCH operations.

@o-love o-love force-pushed the fix/episode-delete-semantic-refs branch from fcdd099 to 8c7f43f Compare March 3, 2026 02:12
Reversing the order ensures that if the process crashes mid-batch,
the episode record still exists in the store and the cleanup can be
retried, rather than leaving orphaned semantic history with no
corresponding episode to recover from.
@o-love o-love force-pushed the fix/episode-delete-semantic-refs branch from 01c18a6 to a6c9d09 Compare March 3, 2026 02:38
@o-love o-love merged commit 55958b6 into main Mar 3, 2026
48 checks passed
@o-love o-love deleted the fix/episode-delete-semantic-refs branch March 3, 2026 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Background semantic ingestion crashes/restarts when semantic storage references missing episode_ids

4 participants