The correct number of test set problems

Hi, appreciate your valuable contribution!

I'm running my own model by adding a new model with newly registered prompt template. When I'm running the evaluation, I found that there are some mismatch about the amount of test questions.

Here are my evaluation script:
```
python -m lcb_runner.runner.main --model deepseek-coder-v1.5-instruct-7b-r2c  \
  --scenario codegeneration \
  --local_model_path ../experiments/deepseek-coder-v1.5-ins.7b.r2c.sft_ps_test_case.iter2.dpo.H100.dp8.v1.0.s42/checkpoint-2400/ \
  --release_version "release_v2" --not_fast --n 1 --evaluate --stop "<|EOT|>" --max_tokens 4096 --temperature 0.0
```

When I'm running the model, I noticed that the tqdm bar shows `400` questions, but I find there should be 450 questions from `2023-09-01` to `2024-09-01`. Besides, after I run
```
python -m lcb_runner.evaluation.compute_scores --eval_all_file output/DeepSeekR2C/Scenario.codegeneration_1_0.0_eval_all.json --start_date 2023-09-01 --end_date 2024-09-01
```

I get the following outputs:
```
238
Pass@1 =  0.24369747899159663
Easy Pass@1 =  0.5529411764705883
Medium Pass@1 =  0.11224489795918367
Hard Pass@1 =  0.0
Pass@5 =  1.0
Easy Pass@5 =  1.0
Medium Pass@5 =  1.0
Hard Pass@5 =  1.0
Pass@10 =  1.0
Easy Pass@10 =  1.0
Medium Pass@10 =  1.0
Hard Pass@10 =  1.0
Pass@25 =  1.0
Easy Pass@25 =  1.0
Medium Pass@25 =  1.0
Hard Pass@25 =  1.0
Pass@50 =  1.0
Easy Pass@50 =  1.0
Medium Pass@50 =  1.0
Hard Pass@50 =  1.0
Pass@100 =  1.0
Easy Pass@100 =  1.0
Medium Pass@100 =  1.0
Hard Pass@100 =  1.0
Pass@150 =  1.0
Easy Pass@150 =  1.0
Medium Pass@150 =  1.0
Hard Pass@150 =  1.0
Pass@200 =  1.0
Easy Pass@200 =  1.0
Medium Pass@200 =  1.0
Hard Pass@200 =  1.0
Pass@1: 0.24369747899159663
Easy Pass@1: 0.5529411764705883
Medium Pass@1: 0.11224489795918367
Hard Pass@1: 0.0
```

Seems that there are only 238 rows of results. Could you explain a little bit about this? Is there any mistake from my side?

BTW,
here is my registered model following readme:
```
    LanguageModel(
        "deepseek-coder-v1.5-instruct-7b-r2c",
        "DeepSeekR2C",
        LMStyle.DeepSeekR2C,
        datetime(2023, 1, 1),
        link="https://huggingface.co/chitanda/deepseek-coder-v1.5-instruct-7b-r2c",
    )
```

Thank you for your help very much!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The correct number of test set problems #40

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

The correct number of test set problems #40

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions