Skip to content

Retrieval agent implementation and benchmark updates#1103

Merged
Tianyang-Zhang merged 4 commits into
mainfrom
retrieval_agent
Mar 3, 2026
Merged

Retrieval agent implementation and benchmark updates#1103
Tianyang-Zhang merged 4 commits into
mainfrom
retrieval_agent

Conversation

@Tianyang-Zhang

@Tianyang-Zhang Tianyang-Zhang commented Feb 11, 2026

Copy link
Copy Markdown
Contributor

Purpose of the change

Adding an option to involve the retrieval agent during memory search to improve the memory retrieval accuracy.

Description

  1. Implement three agents to help resolve complex multi-hop queries.
  2. Add retrieval_agent configuration.
  3. Refactor evaluation logic and add more result details. Add WikiMultihop and HotpotQA benchmarks.
  4. Simplify the benchmark cmdlines.
  5. Add corresponding unit tests and update existing unit tests.
  6. Update Python SDK and TypeScript SDK.
  7. Update stale documentation.

Type of change

[Please delete options that are not relevant.]

  • New feature (non-breaking change which adds functionality)
  • Documentation update

How Has This Been Tested?

Test via benchmark scripts and unit tests

[Please delete options that are not relevant.]

  • Unit Test
  • End-to-end Test
  • Test Script (please provide)

Test Results: [Attach logs, screenshots, or relevant output]
As shown in the Sample output part under evaluation/README.md.

Checklist

[Please delete options that are not relevant.]

  • I have signed the commit(s) within this pull request
  • My code follows the style guidelines of this project (See STYLE_GUIDE.md)
  • I have performed a self-review of my own code
  • I have commented my code
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have checked my code and corrected any misspellings

Maintainer Checklist

  • Confirmed all checks passed
  • Contributor has signed the commit(s)
  • Reviewed the code
  • Run, Tested, and Verified the change(s) work as expected

Further comments

Future improvements:

  1. more agent types for different question types.
  2. configurable llm model reasoning effort.
  3. configurable time cost and token cost threshold on the retrieval agent.
  4. self-adapt search limit/top-k

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an “agent_mode” option to route long-term memory search through a new retrieval-agent orchestration (multi-hop, split-query, tool-selection), updates benchmarks/scripts, and propagates config + SDK changes to support an LLM model for the retrieval agent.

Changes:

  • Add retrieval-agent framework (ToolSelect / ChainOfQuery / SplitQuery / MemMachine retriever) and wire it into LongTermMemory.search_scored(..., agent_mode=...).
  • Extend configs + SDKs (Python + TS) to support agent_mode in search and add llm_model to long-term memory config/resources.
  • Add/refresh evaluation scripts + docs for new benchmark workflows (LoCoMo, WikiMultiHop, HotpotQA) and simplify CLI usage.

Reviewed changes

Copilot reviewed 69 out of 89 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
tests/memmachine/server/api_v2/test_spec.py Asserts default agent_mode behavior in request spec parsing.
tests/memmachine/server/api_v2/test_router.py Verifies router passes agent_mode through to service layer.
tests/memmachine/retrieval_agent/test_retrieval_agent.py New unit tests covering retrieval-agent behaviors and reranking.
tests/memmachine/rest_client/test_memory.py Adds Python REST client tests for agent_mode payload field.
tests/memmachine/main/test_memmachine_mock.py Updates mocked config defaults to include llm_model for LTM.
tests/memmachine/common/session_manager/test_session_manager.py Adds llm_model to episodic memory test configuration.
tests/memmachine/common/configuration/test_configuration.py Validates merge/update behavior for new llm_model field.
src/memmachine/server/api_v2/service.py Passes agent_mode through to MemMachine.query_search(); adds logging.
src/memmachine/retrieval_agent/common/agent_api.py Introduces retrieval-agent base API, policies, and rerank helper.
src/memmachine/retrieval_agent/agents/tool_select_agent.py Adds LLM-based tool routing agent.
src/memmachine/retrieval_agent/agents/split_query_agent.py Adds query splitting agent for multi-entity single-hop queries.
src/memmachine/retrieval_agent/agents/memmachine_retriever.py Adds direct MemMachine memory retriever agent wrapper.
src/memmachine/retrieval_agent/agents/coq_agent.py Adds chain-of-query rewriting + sufficiency-check agent.
src/memmachine/retrieval_agent/agents/init.py Exposes retrieval-agent classes in package exports.
src/memmachine/rest_client/memory.py Adds agent_mode parameter to Python SDK Memory.search().
src/memmachine/rest_client/README.md Documents agent_mode usage in Python REST client README.
src/memmachine/main/memmachine.py Propagates agent_mode through episodic search; validates llm_model.
src/memmachine/episodic_memory/long_term_memory/service_locator.py Loads llm_model resource for long-term memory params.
src/memmachine/episodic_memory/long_term_memory/long_term_memory.py Implements agent-mode branching and initializes retrieval agent.
src/memmachine/episodic_memory/episodic_memory.py Plumbs agent_mode through episodic query path; adds logging.
src/memmachine/common/language_model/openai_responses_language_model.py Adds reasoning_effort and token-usage-returning API method.
src/memmachine/common/filter/filter_parser.py Marks FilterExpr as @runtime_checkable.
src/memmachine/common/errors.py Adds language-model config errors for missing/default LLM model.
src/memmachine/common/configuration/language_model_conf.py Adds contains_language_model() helper.
src/memmachine/common/configuration/episodic_config.py Adds llm_model to long-term memory config/partial config.
src/memmachine/common/configuration/init.py Adds default/check helpers for long-term-memory llm_model.
src/memmachine/common/api/spec.py Adds agent_mode field to SearchMemoriesSpec.
src/memmachine/common/api/doc.py Documents agent_mode and adds examples.
src/memmachine-ts/rest_client/tests/memmachine-memory.spec.ts Adds TS client test asserting agent_mode in search payload.
src/memmachine-ts/rest_client/src/memory/memmachine-memory.types.ts Adds agent_mode option to TS SearchMemoriesOptions.
src/memmachine-ts/rest_client/src/memory/memmachine-memory.ts Includes agent_mode field in TS search request payload.
src/memmachine-ts/rest_client/README.md Documents TS usage example for agent_mode.
sample_configs/episodic_memory_config.gpu.sample Adds llm_model to long-term memory sample config.
sample_configs/episodic_memory_config.cpu.sample Adds llm_model to long-term memory sample config.
memmachine-compose.sh Updates config generation to rewrite LTM llm_model value.
evaluation/utils/memmachine_helper_restapiv2.py Adds agent_mode to evaluation REST helper payload.
evaluation/utils/agent_utils.py New helper utilities for retrieval-agent benchmark workflows.
evaluation/retrieval_agent/wikimultihop_search.py Adds WikiMultiHop search benchmark script (agent/memmachine/llm modes).
evaluation/retrieval_agent/wikimultihop_ingest.py Adds WikiMultiHop ingestion script.
evaluation/retrieval_agent/run_test.sh New unified runner for locomo/wikimultihop/hotpotqa ingest/search.
evaluation/retrieval_agent/locomo_search.py Adds LoCoMo search benchmark script using new agent utils.
evaluation/retrieval_agent/locomo_ingest.py Adds LoCoMo ingestion script using new agent utils.
evaluation/retrieval_agent/locomo_delete.py Simplifies LoCoMo delete script signature/typing.
evaluation/retrieval_agent/llm_judge.py Updates judging model + parsing; simplifies IO paths.
evaluation/retrieval_agent/hotpotQA_test.py Adds HotpotQA ingest/search benchmark script.
evaluation/retrieval_agent/generate_scores.py Adds consolidated scoring report for retrieval-agent benchmark outputs.
evaluation/retrieval_agent/evaluate.py Updates evaluation pipeline to new output schema fields.
evaluation/locomo/episodic_memory/locomo_ingest.py Removes older legacy ingestion script.
evaluation/locomo/episodic_memory/locomo_config.yaml Removes legacy YAML config stub.
evaluation/locomo/episodic_memory/README.md Removes legacy locomo episodic_memory README.
evaluation/locomo/episodic_agent/run_experiments.py Removes legacy episodic agent runner.
evaluation/locomo/episodic_agent/restapiv2_locomo_search_agent.py Removes legacy restapiv2 search agent implementation.
evaluation/locomo/episodic_agent/memmachine_locomo.py Removes legacy MemMachine locomo runner.
evaluation/locomo/episodic_agent/locomo_agent.py Removes legacy OpenAI Agents-sdk locomo agent code.
evaluation/locomo/episodic_agent/generate_scores.py Removes legacy scoring script.
evaluation/locomo/episodic_agent/init.py Removes legacy package marker.
evaluation/locomo/episodic_agent/README.md Removes legacy episodic agent README.
evaluation/episodic_memory/restapiv2_locomo_search.py Adds CLI flag to enable agent_mode in legacy episodic workflow.
evaluation/episodic_memory/README.md New README for legacy episodic workflow (with agent-mode option).
evaluation/README.md Replaces top-level evaluation guide with retrieval-agent workflow docs.
docs/open_source/configuration.mdx Documents llm_model under long-term memory config.
docs/install_guide/integrate/GPTStore.mdx Documents agent_mode in schema snippet for GPT Store integration.
docs/getting_started/benchmarks.mdx Points users to updated evaluation guide; expands benchmark mentions.
docs/examples/rest.mdx Documents agent_mode in REST search example.
docs/api_reference/ts-rest/interfaces/SearchMemoryResult.mdx Updates TS REST docs for renamed interfaces/options and adds agent_mode.
docs/api_reference/ts-rest/classes/MemMachineMemory.mdx Updates TS class docs for new option type names and response structure.
docs/api_reference/python/memory_api.mdx Updates Python SDK docs for expanded search() signature + agent_mode.
USAGE.md Documents REST + Python SDK defaults including agent_mode.
README.md Adds bibtex reference section relevant to agent prompts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/memmachine/retrieval_agent/common/agent_api.py Outdated
Comment thread src/memmachine/retrieval_agent/common/agent_api.py Outdated
Comment thread src/memmachine/retrieval_agent/agents/tool_select_agent.py Outdated
Comment thread src/memmachine/retrieval_agent/agents/tool_select_agent.py Outdated
Comment thread src/memmachine/retrieval_agent/agents/split_query_agent.py Outdated
Comment thread src/memmachine/episodic_memory/long_term_memory/long_term_memory.py Outdated
Comment thread src/memmachine/retrieval_agent/agents/memmachine_retriever.py Outdated
Comment thread src/memmachine/server/api_v2/service.py Outdated
Comment thread evaluation/utils/memmachine_helper_restapiv2.py Outdated
Comment thread src/memmachine/retrieval_agent/agents/tool_select_agent.py Outdated
@Tianyang-Zhang Tianyang-Zhang force-pushed the retrieval_agent branch 3 times, most recently from af8e56a to 53a2d98 Compare February 12, 2026 21:40
@sscargal sscargal requested a review from Copilot February 20, 2026 18:49

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 87 out of 105 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (6)

evaluation/utils/agent_utils.py:1

  • Corrected spelling of 'matrics' to 'metrics'.
    evaluation/retrieval_agent/wikimultihop_ingest.py:1
  • Corrected spelling of 'ketword' to 'keyword'.
    evaluation/retrieval_agent/hotpotQA_test.py:1
  • Corrected spelling of 'needel' to 'needle'.
    src/memmachine/retrieval_agent/agents/tool_select_agent.py:1
  • Corrected spelling of 'intput' to 'input'.
    src/memmachine/main/memmachine.py:1
  • Corrected spelling of 'Faield' to 'Failed'.
    src/memmachine/common/language_model/language_model.py:1
  • Corrected spelling of 'intput' to 'input'.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sscargal sscargal left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested some changes.

Comment thread docs/examples/rest.mdx Outdated
Comment thread docs/images/retrieval-agent-workflow.svg Outdated
Comment thread docs/open_source/configuration.mdx
Comment thread docs/tools/generate_openapi.py
Comment thread docs/tools/generate_unified_openapi.py
Comment thread evaluation/retrieval_agent/llm_judge.py Outdated
Comment thread evaluation/retrieval_agent/llm_judge.py Outdated
Comment thread evaluation/retrieval_agent/llm_judge.py Outdated
Comment thread evaluation/retrieval_agent/run_test.sh Outdated
Comment thread evaluation/README.md Outdated
@Tianyang-Zhang Tianyang-Zhang force-pushed the retrieval_agent branch 2 times, most recently from ed1fd66 to 6a21bbc Compare February 25, 2026 00:08
@Tianyang-Zhang Tianyang-Zhang force-pushed the retrieval_agent branch 5 times, most recently from 2ea643c to 322e8a6 Compare February 27, 2026 01:22

@malatewang malatewang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the code and also reviewed the PR with codex. There were not major problems. As we discussed offline, the agent should work on top of semantic and episodic memory. It can be addressed by furture PR

Comment thread src/memmachine/retrieval_agent/agents/split_query_agent.py Outdated
Comment thread docs/core_concepts/retrieval_agent_architecture.mdx Outdated
Comment thread src/memmachine/server/api_v2/service.py Outdated
self, query: QueryParam, episodes: list[Episode]
) -> list[Episode]:
if query.limit <= 0:
return sorted(episodes, key=lambda x: x.timestamp)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should apply the top-k here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case query.limit <= 0 indicates the 'k' is 0/none, therefore no top-k is applied.

@Tianyang-Zhang

Copy link
Copy Markdown
Contributor Author

Refactored to only use top-level EpisodicMemory API in retrieval agent, and updated corresponding tests, docs. Please take a look, thanks! @malatewang @sscargal

@SarahScargall SarahScargall left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! This LGTM. I may keep the content the same, but change the wording in retrieval_agent_architecture.mdx to match the rest of the site's cohesive language style at some future point (not part of this PR).

Also - due to changes that were in flight at the same time as this PR, there are 4 conflicts in our /docs that must be resolved before we can merge. Please resolve those. Once you do, you should be able to squash and merge.

@sscargal sscargal added the priority: blocker Prevents progress on development or release. Must be resolved before continuing. label Mar 3, 2026
@Tianyang-Zhang Tianyang-Zhang force-pushed the retrieval_agent branch 3 times, most recently from ec840da to f9a5a64 Compare March 3, 2026 01:55
@sscargal sscargal requested a review from Copilot March 3, 2026 17:47

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 95 out of 117 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +25 to +32
COMBINED_SUFFICIENCY_AND_REWRITE_PROMPT = """You are a meticulous expert in retrieval-augmented question answering evaluation and query rewriting.

Task: Given (1) an original user query, (2) rewritten queries already tried, and (3) retrieved documents, you must decide whether the documents are sufficient to answer the orig
inal query directly, completely, and explicitly. If insufficient, generate the NEXT BEST rewritten subquery to retrieve the missing evidence. If sufficient, set the rewritten qu
ery to the original query.

Hard constraints:
- Use ONLY the provided retrieved documents for sufficiency judgment. Do NOT use external knowledge, browsing, assumptions, or plausibility.

Copilot AI Mar 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COMBINED_SUFFICIENCY_AND_REWRITE_PROMPT also contains mid-word hard wraps (e.g., orig\ninal, qu\nery). These look accidental and can materially affect the chain-of-query rewrite/sufficiency behavior. Suggest reflowing the prompt to remove mid-word newlines so instructions are unambiguous.

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +48
if isinstance(result.index, int):
result["type"] = result.index.map(lambda x: categories[x - 1])

Copilot AI Mar 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if isinstance(result.index, int): will never be true because result.index is a pandas Index object, so the type column mapping won’t run. If you want to map numeric category IDs to names, consider using pd.api.types.is_integer_dtype(result.index) (or attempt result.index.astype(int) in a try/except) before applying the mapping.

Copilot uses AI. Check for mistakes.
Comment on lines +95 to +105
async def _do_rerank(
self, query: QueryParam, episodes: list[Episode]
) -> list[Episode]:
if query.limit <= 0:
return sorted(episodes, key=lambda x: x.created_at)

if len(episodes) <= query.limit or self._reranker is None:
if len(episodes) == 0:
return episodes
return sorted(episodes, key=lambda x: x.created_at)

Copilot AI Mar 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_do_rerank ignores query.limit when self._reranker is None (or when len(episodes) <= query.limit), returning all episodes sorted by created_at. This can violate the caller’s top_k/limit contract and potentially return a much larger payload than expected. Consider always truncating to query.limit when it’s > 0 (even without a reranker), and only then applying the final chronological sort if that ordering is required.

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +46
TOOL_SELECT_PROMPT = """You are a tool router. Your task is to select exactly ONE tool name from the provided list that best fits the user query. Do not call any tools. Use only the text in {query}; d
o not assume external context or missing metadata. You may interpret dependency only from explicit linguistic structure (e.g., "use X to find Y", possessive/relationship chains)
.

GOAL
- Choose exactly one of: {coq}, {split_query}, {memory_retrieval}
- Output NONE only when the query type cannot be determined from {query} (e.g., empty/invalid/malformed).

MECHANISM (what to do, then how to do it)

1) Validate input
- If {query} is empty, whitespace-only, null-like, non-linguistic garbage, or otherwise not classifiable as a query/request -> output NONE.

2) Classify the query type from {query} using ONLY the operational criteria below (pick exactly one):

A) MULTI-HOP (dependency chain; later step depends on earlier result)
Choose MULTI-HOP if the query requires two or more dependent steps where you must first determine X, then use X to determine Y (or more).
- Explicit dependency signals: "then", "after", "using that", "based on that result", "from there", "which of those", "once you find", "given the answer to", "trace/derive".
- Relationship-chain dependency: queries that require resolving an intermediate entity/attribute to proceed (e.g., "X's spouse's company", "author of the book that inspired the
film").
- Comparison/timeline becomes MULTI-HOP ONLY if it requires derived attributes that must be found first (e.g., "Which is older, the CEO of A or the founder of B?" requires findi
ng CEO/founder first).
- Tie-breaker: If any explicit dependency chain exists, classify as MULTI-HOP even if multiple entities/keywords appear.

B) SINGLE-HOP WITH MULTIPLE ENTITIES/KEYWORDS (independent sub-queries; no dependency)
Choose this when the query contains multiple entities/subjects that can be answered via separate independent lookups and then combined, without needing any earlier result to for
m the later lookup.
- Signals: "A and B", "A, B, and C", "for each of", "separately", "list ... for these items".

Copilot AI Mar 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOOL_SELECT_PROMPT appears to have accidental hard line breaks mid-word (e.g., ...{query}; d\no not..., findi\nng, for\nm). This will degrade the routing prompt quality and can lead to incorrect tool selection. Suggest reformatting the prompt to remove mid-word newlines/stray characters and keep sentences intact.

Copilot uses AI. Check for mistakes.
Comment on lines +165 to +175
selected_tool = None
for tool in self._children_tools:
if tool.agent_name in rsp:
selected_tool = tool
break
if selected_tool is None:
logger.warning("Tool %s not found", rsp)
if selected_tool is None and self._default_tool is not None:
logger.warning("No tool selected, using default tool")
return self._default_tool, input_token, output_token
return selected_tool, input_token, output_token

Copilot AI Mar 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tool selection currently checks if tool.agent_name in rsp, which is a substring match across the full model output. If the model ever returns extra text (e.g., repeats multiple tool names, includes examples, etc.), this can select the wrong tool nondeterministically. Consider strict parsing (e.g., take the first non-empty line, strip it, and require an exact match against known tool names; treat anything else as NONE/fallback).

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +61
SPLIT_QUERY_PROMPT = """You are a search expert. Transform the input query into either:
- multiple single-hop sub-queries (2-6 lines), or
- the original query unchanged (1 line),
following the rules below.

Query
{query}

Mechanism (what to do, then how to do it)

1) Decide whether to split (default: do NOT split)
- Do NOT split if the answer can be retrieved from a single page/infobox/database record/field set for the same entity and timeframe (i.e., one co-located source would likely co
ntain it).
- Split ONLY if you must retrieve >=2 distinct facts that are not co-located OR are for different entities and/or different timepoints/locations/contexts.
- Tie-breaker: when unsure, prefer NOT splitting.

2) Special cases that limit splitting
- Multi-constraint single-entity queries (e.g., "A's birth date and birthplace"):
- Keep as ONE query if those attributes are typically co-located in the same reference entry for A.
- Split only if the attributes are typically separate lookups (or clearly not co-located).
- List-style questions ("all/each/every ... and their ..."):
- Keep as-is unless the query explicitly names 2-6 specific entities/subjects that can each become one sub-query.
- Do not split if it would likely require more than 6 lines.

3) If splitting: produce single-hop fact-lookups only
- Each sub-query must be directly answerable by one fact lookup (one field/value) with no derived operations.
- Explicit ban: do NOT use derived/operation wording in sub-queries, including (non-exhaustive) "compare," "difference," "between," "rate," "top," "average," "change," "increase
," "decrease," "percent," "rank," "versus," "more/less than."
- Rewrite derived intents into pure fact retrievals (e.g., "between X and Y" -> ask for X and ask for Y; "compare A and B in 2023" -> ask A in 2023 and ask B in 2023).

4) Preserve intent and attach constraints correctly
- Keep the same entities/aliases and the same constraints (timeframe, location, context, units) from the original query.
- For mixed structures, maintain left-to-right entity/subject order AND attach any paired constraints to every relevant sub-query (e.g., "in 2023," "in Paris," "during WWII" app
ears on each line it applies to).
- Do not add assumptions or extra constraints.

5) Handling common structures
- Conjunctions ("A and B"): one sub-query per entity/subject for the same attribute and constraints.
- Multi-entity multi-attribute: split by entity first in left-to-right order; within each entity, include only the minimal single-hop attribute per line needed to cover the orig
inal query (while respecting the 2-6 line cap).
- Relational questions ("A's relationship to B"):

Copilot AI Mar 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPLIT_QUERY_PROMPT contains mid-word line breaks (e.g., co\nntain, increase\n ,, app\nears) which look unintentional. These artifacts can confuse the model and reduce split quality. Suggest removing the hard wraps so the prompt text is contiguous and readable.

Copilot uses AI. Check for mistakes.
Comment on lines +208 to +212
# Rerank base on all queries concatenated
param = query.model_copy()
if len(sub_queries) > 1:
param.query += "\n".join(sub_queries)
final_episodes = await self._do_rerank(param, result)

Copilot AI Mar 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When building the final rerank query, param.query += "\n".join(sub_queries) appends without a leading newline, so the original query and the first sub-query can run together (e.g., "orig?sub1?\nsub2?"). Consider replacing the query with a clearly delimited combined string (or add an explicit leading newline) so reranking uses the intended text.

Copilot uses AI. Check for mistakes.

@sscargal sscargal left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Tianyang-Zhang Tianyang-Zhang force-pushed the retrieval_agent branch 7 times, most recently from cd47711 to 3d65e07 Compare March 3, 2026 21:39
@edwinyyyu

Copy link
Copy Markdown
Contributor

To be included in the memmachine_server package, file organization needs to be updated.

@Tianyang-Zhang Tianyang-Zhang merged commit a326e4b into main Mar 3, 2026
60 of 61 checks passed
@Tianyang-Zhang Tianyang-Zhang deleted the retrieval_agent branch March 3, 2026 23:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority: blocker Prevents progress on development or release. Must be resolved before continuing.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants