Skip to content

Add LongMemEval evaluation scripts#1216

Merged
sscargal merged 1 commit into
MemMachine:mainfrom
edwinyyyu:lme
Mar 13, 2026
Merged

Add LongMemEval evaluation scripts#1216
sscargal merged 1 commit into
MemMachine:mainfrom
edwinyyyu:lme

Conversation

@edwinyyyu

@edwinyyyu edwinyyyu commented Mar 12, 2026

Copy link
Copy Markdown
Contributor

Purpose of the change

Add scripts for reproducible results.

Description

Use longmemeval_s_cleaned.json for evaluation.
Run ingest, then search, then evaluate.

Scores are typically at least 93% for whole pipeline.

0.958 result (0.9638 unweighted average of category scores): https://memverge.atlassian.net/wiki/external/ZTRhN2YyOWE2ZTFmNDBhOWI4NWVkMmU5MjQwYjIyOGI

@edwinyyyu edwinyyyu added the poc Proof-of-concept implementation for a solution, feature, idea, etc. label Mar 12, 2026

@sscargal sscargal left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a README.md that explains what this is, how to use it, and provides example output? Thanks. The scripts are fine. Just needs some user instructions.

Signed-off-by: Edwin Yu <edwinyyyu@gmail.com>
@edwinyyyu

Copy link
Copy Markdown
Contributor Author

Updated README.md and added a script to view scores more easily.

@edwinyyyu edwinyyyu marked this pull request as ready for review March 13, 2026 18:40
@edwinyyyu edwinyyyu requested a review from tomw-mv March 13, 2026 18:41

@sscargal sscargal left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@sscargal sscargal added this to the v0.3.2 milestone Mar 13, 2026
@sscargal sscargal merged commit 3cd65ed into MemMachine:main Mar 13, 2026
43 of 44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

poc Proof-of-concept implementation for a solution, feature, idea, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants