Skip to content

The correct number of test set problems #40

Description

@SparkJiao

Hi, appreciate your valuable contribution!

I'm running my own model by adding a new model with newly registered prompt template. When I'm running the evaluation, I found that there are some mismatch about the amount of test questions.

Here are my evaluation script:

python -m lcb_runner.runner.main --model deepseek-coder-v1.5-instruct-7b-r2c  \
  --scenario codegeneration \
  --local_model_path ../experiments/deepseek-coder-v1.5-ins.7b.r2c.sft_ps_test_case.iter2.dpo.H100.dp8.v1.0.s42/checkpoint-2400/ \
  --release_version "release_v2" --not_fast --n 1 --evaluate --stop "<|EOT|>" --max_tokens 4096 --temperature 0.0

When I'm running the model, I noticed that the tqdm bar shows 400 questions, but I find there should be 450 questions from 2023-09-01 to 2024-09-01. Besides, after I run

python -m lcb_runner.evaluation.compute_scores --eval_all_file output/DeepSeekR2C/Scenario.codegeneration_1_0.0_eval_all.json --start_date 2023-09-01 --end_date 2024-09-01

I get the following outputs:

238
Pass@1 =  0.24369747899159663
Easy Pass@1 =  0.5529411764705883
Medium Pass@1 =  0.11224489795918367
Hard Pass@1 =  0.0
Pass@5 =  1.0
Easy Pass@5 =  1.0
Medium Pass@5 =  1.0
Hard Pass@5 =  1.0
Pass@10 =  1.0
Easy Pass@10 =  1.0
Medium Pass@10 =  1.0
Hard Pass@10 =  1.0
Pass@25 =  1.0
Easy Pass@25 =  1.0
Medium Pass@25 =  1.0
Hard Pass@25 =  1.0
Pass@50 =  1.0
Easy Pass@50 =  1.0
Medium Pass@50 =  1.0
Hard Pass@50 =  1.0
Pass@100 =  1.0
Easy Pass@100 =  1.0
Medium Pass@100 =  1.0
Hard Pass@100 =  1.0
Pass@150 =  1.0
Easy Pass@150 =  1.0
Medium Pass@150 =  1.0
Hard Pass@150 =  1.0
Pass@200 =  1.0
Easy Pass@200 =  1.0
Medium Pass@200 =  1.0
Hard Pass@200 =  1.0
Pass@1: 0.24369747899159663
Easy Pass@1: 0.5529411764705883
Medium Pass@1: 0.11224489795918367
Hard Pass@1: 0.0

Seems that there are only 238 rows of results. Could you explain a little bit about this? Is there any mistake from my side?

BTW,
here is my registered model following readme:

    LanguageModel(
        "deepseek-coder-v1.5-instruct-7b-r2c",
        "DeepSeekR2C",
        LMStyle.DeepSeekR2C,
        datetime(2023, 1, 1),
        link="https://huggingface.co/chitanda/deepseek-coder-v1.5-instruct-7b-r2c",
    )

Thank you for your help very much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions