add retry logic for LLM judge and semantic memory consolidation#1361
Conversation
Both evaluate_llm_judge and _deduplicate_features now retry once on JSON parse failures (missing/malformed 'label' key or None response) instead of crashing or silently dropping results. Consolidation also gains context-length error handling matching the feature-update path. Refs: MemMachine#1332 Signed-off-by: Steve Scargall <37674041+sscargal@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds defensive retry handling around structured-output failures for two LLM-driven paths: the evaluation-time LLM judge and server-side semantic feature consolidation, aiming to reduce transient JSON/validation failures from corrupting results or skipping consolidation.
Changes:
- Add a 2-attempt retry loop to
evaluate_llm_judge()when the parsed JSON is missing/invalid for thelabelfield, defaulting toWRONGafter retries. - Refactor semantic consolidation to route through a new
_try_consolidate()helper that retries whenllm_consolidate_features()returnsNone, and explicitly skips context-length errors. - Add server tests covering consolidation retry success, retry exhaustion, and context-length error handling.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
evaluation/retrieval_agent/llm_judge.py |
Adds retry + logging when judge output is missing/invalid, defaults to WRONG after retries. |
packages/server/src/memmachine_server/semantic_memory/semantic_ingestion.py |
Introduces _try_consolidate() and retry handling for consolidation, plus context-length error skip behavior. |
packages/server/server_tests/memmachine_server/semantic_memory/test_semantic_ingestion.py |
Adds tests for consolidation retry behavior and context-length error handling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Steve Scargall <37674041+sscargal@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- retry consolidation on parse/validation exceptions, not only None
- raise context-length as a sentinel so debug_fail_loudly does not trip
on the graceful-skip path, matching the feature-update path
- tighten llm judge label validation: require CORRECT/WRONG string
- correct consolidation log messages to mention consolidation + set_id
|
The latest commit addresses the CoPilot comments - 4 of 5 valid, all addressed:
Tests added (all passing):
Verification:
|
|
Thank you for adding the retry logic! |
Purpose of the change
Both
evaluate_llm_judgeand_deduplicate_featuresnow retry once on JSON parse failures (missing/malformed 'label' key or None response) instead of crashing or silently dropping results. Consolidation also gains context-length error handling, matching the feature-update path.Description
Changes made:
evaluation/retrieval_agent/llm_judge.pypackages/server/src/memmachine_server/semantic_memory/semantic_ingestion.pyevaluation/retrieval_agent/test_llm_judge.py(new file)packages/server/server_tests/memmachine_server/semantic_memory/test_semantic_ingestion.pyFixes/Closes
Fixes #1332
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.
Test Results: [Attach logs, screenshots, or relevant output]
New LLM judge tests (fast, no dependencies):
uv run pytest evaluation/retrieval_agent/test_llm_judge.py -v
Semantic ingestion tests (includes the 3 new consolidation tests):
uv run --frozen --all-extras pytest packages/server/server_tests/memmachine_server/semantic_memory/test_semantic_ingestion.py -v
Filter to just the new tests if you want a quicker check:
uv run --frozen --all-extras pytest packages/server/server_tests/memmachine_server/semantic_memory/test_semantic_ingestion.py -v -k "retry or exhaust or context_length_error"
Full server suite (validates nothing was broken):
uv run --frozen --all-extras pytest packages/server/server_tests -q
Lint/format (mirrors CI):
uv run ruff check evaluation/retrieval_agent/ packages/server/src/memmachine_server/semantic_memory/
uv run ruff format --check evaluation/retrieval_agent/ packages/server/src/memmachine_server/semantic_memory/
Checklist
Maintainer Checklist
Screenshots/Gifs
N/A
Further comments
None