[GH-1102] Fix dangling episode_ids#1151
Conversation
fe1829f to
dc4891c
Compare
There was a problem hiding this comment.
Pull request overview
Fixes background ingestion crashes by ensuring session deletion also cleans up semantic-history references (including feature citations) tied to deleted episodes.
Changes:
- Add episode-ID-driven semantic cleanup during
delete_sessionto prevent danglingepisode_idreferences. - Update Neo4j + PGVector
delete_historyto prune both history entries and feature citations. - Remove deprecated set-based deletion APIs and update/add tests for the new cleanup behavior.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/memmachine/semantic_memory/test_semantic_history_cleanup.py | Adds a unit test ensuring delete_history removes history + citations |
| tests/memmachine/semantic_memory/storage/test_semantic_storage.py | Removes tests for deleted delete_history_set API |
| tests/memmachine/semantic_memory/storage/in_memory_semantic_storage.py | Updates in-memory test storage delete_history to also prune citations |
| tests/memmachine/semantic_memory/mock_semantic_memory_objects.py | Removes mock method for deleted delete_history_set |
| tests/memmachine/main/test_memmachine_delete_session.py | Adds integration test validating session deletion cleans semantic references |
| tests/memmachine/main/conftest.py | Updates test setup/teardown to stop calling removed project-message deletion |
| src/memmachine/semantic_memory/storage/storage_base.py | Removes delete_history_set from storage interface; clarifies delete_history intent |
| src/memmachine/semantic_memory/storage/sqlalchemy_pgvector_semantic.py | Ensures delete_history also deletes citation associations |
| src/memmachine/semantic_memory/storage/neo4j_semantic_storage.py | Implements combined Cypher cleanup for history + citations |
| src/memmachine/semantic_memory/semantic_session_manager.py | Removes set-based bulk message deletion APIs |
| src/memmachine/semantic_memory/semantic_memory.py | Removes delete_messages and stops calling deleted set-based history deletion |
| src/memmachine/main/memmachine.py | Adds _cleanup_semantic_history and runs it during session deletion |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| episodes = await episode_store.get_episode_messages( | ||
| filter_expr=session_filter, | ||
| ) | ||
| episode_ids = [episode.uid for episode in episodes] | ||
| if episode_ids: | ||
| await episode_store.delete_episodes(episode_ids) | ||
| await self._cleanup_semantic_history(episode_ids) |
There was a problem hiding this comment.
This fetches full episode messages just to extract IDs, which could be expensive for large sessions. Consider adding/using an episode-store API that returns only episode IDs (or supports server-side deletion that returns deleted IDs), to avoid loading message payloads during session deletion.
sscargal
left a comment
There was a problem hiding this comment.
The PR appears correct and safe for small-to-moderate deletes. However, for very large deletes (1,000+), I can see we're going to run into trouble.
get_episode_messagesloads everything into memory at once (no pagination). The call to get_episode_messages uses no page_size or page_num, so for 10,000 episodes, the entire result set is materialized as a Python list of Episode objects in application memory before any deletion occurs.delete_episodes(episode_ids)sends all 10,000 IDs in a single query
The old delete_episode_messages used a filter expression (a WHERE clause). The new approach collects all IDs and passes them to delete_episodes(episode_ids). For 10,000 items, this translates to a single SQL DELETE ... WHERE id IN (?, ?, ..., ?) with 10,000 bind parameters — or a single Cypher/graph equivalent.delete_historyin Neo4j sends all 10,000 IDs in a single Cypher query. For 10,000 IDs, the second MATCH (f:Feature) WHERE any(...) is a full table scan of all Feature nodes, filtering each one in memory within Neo4j. This becomes a very expensive operation as the Feature graph grows. This is also a single non-batched transaction — if it times out, nothing is deleted.delete_historyin PGVector opens two sequential DELETE statements in one session. BothDELETE ... WHEREhistory_id IN (...) statements carry all 10,000 IDs. Wrapping them in a single transaction ensures atomicity, but the transaction will hold locks on both the citation_association_table and set_ingested_history tables until commit. At 10,000 rows, this is a long-held write lock that could block other concurrent DB operations.
As noted in the concerns above, there is a silent risk of connection pool exhaustion. See Issue #1111 for an example of DELETE. We've seen similar in the SEARCH operations.
fcdd099 to
8c7f43f
Compare
Reversing the order ensures that if the process crashes mid-batch, the episode record still exists in the store and the cleanup can be retried, rather than leaving orphaned semantic history with no corresponding episode to recover from.
01c18a6 to
a6c9d09
Compare
Purpose of the change
Fix dangling
episode_idreferences in semantic storage that cause background ingestion crashes. When sessions are deleted, the episode history and citation references in semantic storage were not being cleaned up, leaving orphanedepisode_ids that causedValueErrorfailures during subsequent ingestion runs.Description
The root cause was that
delete_sessionremoved episodes from the episode store but left behind their references in semantic storage (history entries and feature citations). When the ingestion process later tried to retrieve messages for these now-missing episode IDs, it crashed.This PR makes the following changes:
Core fix (
memmachine.py):_cleanup_semantic_historymethod to remove history and citations for those IDs.delete_episode_messagescall with targetedget_episode_messages+delete_episodesso we capture the IDs first.delete_all_project_messagescall from_delete_semantic_memorysince cleanup is now episode-ID-driven.Storage layer (
neo4j_semantic_storage.py,sqlalchemy_pgvector_semantic.py):delete_historynow also removes citation references from features, not just history entries.DETACH DELETEfor history and citation pruning.citation_association_tablebefore removing history rows.Removed dead code:
delete_history_setremoved from the storage base, both storage backends, in-memory test storage, and mock objects.delete_messagesanddelete_all_project_messages/delete_all_org_messagesremoved fromSemanticMemoryandSemanticSessionManagersince they are superseded by the episode-ID-based cleanup.Tests:
test_semantic_history_cleanup.pyunit test verifyingdelete_historyremoves both history and citations.test_memmachine_delete_session.pyintegration test verifying end-to-end session deletion cleans up semantic references.delete_history_setmethod.conftest.pyto remove calls to deleted methods.Fixes/Closes
Fixes #1102
Type of change
How Has This Been Tested?
New tests added:
tests/memmachine/semantic_memory/test_semantic_history_cleanup.py- verifiesdelete_historyremoves history entries and prunes citations from features.tests/memmachine/main/test_memmachine_delete_session.py- end-to-end integration test that creates a session with episodes, adds citations, deletes the session, and verifies all semantic references are cleaned up.Checklist
Screenshots/Gifs
N/A
Further comments
delete_history_setmethod anddelete_all_project_messages/delete_all_org_messagesmethods were removed as they are no longer needed. Cleanup is now driven by specific episode IDs rather than set-level bulk operations, which is more precise and avoids the original bug._cleanup_semantic_historymethod inmemmachine.pygracefully handlesResourceNotReadyErrorby logging and skipping, so session deletion does not fail if the semantic service is temporarily unavailable.