Index uid and timestamp properties#1000
Conversation
Signed-off-by: Edwin Yu <edwinyyyu@gmail.com>
5725b17 to
290c468
Compare
There was a problem hiding this comment.
Pull request overview
This PR updates Neo4j graph-store initialization to ensure frequently queried episodic properties can be range-indexed (notably for cursor-ordered episode lookups that order by ("timestamp", "uid")).
Changes:
- Pass
range_index_hierarchies=[["uid"], ["timestamp", "uid"]]intoNeo4jVectorGraphStoreParamsduring Neo4j driver/store construction.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "range_index_hierarchies": [["uid"], ["timestamp", "uid"]], | ||
| } |
There was a problem hiding this comment.
range_index_hierarchies is documented (in Neo4jVectorGraphStoreParams) as being applied to all node labels and relationship types. If the intent is specifically to speed up episodic queries, consider making this configurable (e.g., add a range_index_hierarchies field to Neo4jConf) or otherwise scoping it, to avoid creating potentially many additional indexes in deployments with many collections/relations.
| "range_index_hierarchies": [["uid"], ["timestamp", "uid"]], | |
| } | |
| } | |
| range_index_hierarchies = getattr(conf, "range_index_hierarchies", None) | |
| if range_index_hierarchies is not None: | |
| params_kwargs["range_index_hierarchies"] = range_index_hierarchies |
| params_kwargs: dict[str, Any] = { | ||
| "driver": driver, | ||
| "force_exact_similarity_search": conf.force_exact_similarity_search, | ||
| "range_index_hierarchies": [["uid"], ["timestamp", "uid"]], | ||
| } |
There was a problem hiding this comment.
This change updates Neo4j graph store parameters but there is no unit test asserting that Neo4jVectorGraphStoreParams is constructed with the expected range_index_hierarchies. Since this behavior lives in DatabaseManager.async_get_neo4j_driver, please add/extend a test (similar to the existing kwargs-forwarding tests) to catch regressions.
| params_kwargs: dict[str, Any] = { | ||
| "driver": driver, | ||
| "force_exact_similarity_search": conf.force_exact_similarity_search, | ||
| "range_index_hierarchies": [["uid"], ["timestamp", "uid"]], | ||
| } |
There was a problem hiding this comment.
NebulaGraphVectorGraphStoreParams also supports range_index_hierarchies, and search_directional_nodes in episodic memory orders by ("timestamp", "uid"). If the performance issue exists for NebulaGraph too, consider passing the same range_index_hierarchies when constructing the Nebula graph store params for consistency; otherwise, it would help to clarify in the PR description that this indexing improvement is Neo4j-only.
sscargal
left a comment
There was a problem hiding this comment.
LGTM. We can consider CoPilot's comments in the future if they are required.
Purpose of the change
Properties that are used often are not being indexed.
Description
Index Episode uid and timestamp. Node/Edge uid was already indexed.
Type of change
How Has This Been Tested?
Checklist
Maintainer Checklist