Skip to content

ByteAsk/C-CppBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

C-CppBench

C-CppBench catalog interface

A catalogue of the public benchmarks used to evaluate large language models on C and C++, compiled from a manual review of papers, leaderboards, datasets, and benchmark repositories (96 benchmark entries, current as of April 2026). Coverage spans the C/C++ tasks that appear in the literature: kernel crash repair, real-world bug repair, vulnerability detection and localization, performance optimization, test generation, decompilation, codebase comprehension, and C-to-Rust migration.

Each entry records the benchmark's evaluation target, primary source, and the specific C/C++ behavior it measures. Where a benchmark reports model results, the catalogue notes the C/C++ portion rather than the multilingual aggregate, since headline scores are often dominated by other languages. Entries are grouped by task family and by the role each benchmark plays: direct measurement, weak proxy, or absent, rather than by visibility or citation count.

The aim is to separate evidence that directly measures C/C++ behavior from broader multilingual results, and to mark where the field has strong benchmarks, weak proxies, or no measurement at all. The gaps are recorded as deliberately as the benchmarks: several task areas that matter for production C/C++ have no public benchmark of any kind.

Benchmark Evaluation Focus Source C/C++ Signal
kGym / kBenchSyz Kernel crash repair Paper / GitHub Real Linux kernel crashes with compile-and-reproduce validation.
SEC-bench Pro Security analysis and exploit generation Paper / GitHub / Leaderboard Autonomous vulnerability discovery on large C++ engines.
SEC-bench Vulnerability repair and PoC generation Paper / GitHub / Leaderboard Real C/C++ CVE tasks with containerized validation.
PrimeVul Vulnerability detection Paper / GitHub Paired vulnerable and patched C/C++ functions reduce shortcut scoring.
SecVulEval Vulnerability localization Paper Statement-level C/C++ vulnerability localization.
VulDetectBench Vulnerability detection and localization Paper / GitHub Multi-level C/C++ security reasoning from detection to trigger lines.
VulBench Vulnerability detection Paper / GitHub Cleaned C/C++ vulnerability benchmark across CTF and real-world data.
Defects4C Bug repair Paper / Leaderboard Executable real C/C++ bugs with test-based repair validation.
Multi-SWE-bench C/C++ Issue resolution Paper / GitHub / Leaderboard Real C and C++ GitHub issues with fail-to-pass tests.
VulnLoc / ExtractFix CVE repair VulnLoc / ExtractFix / GitHub Small C memory-safety repair datasets with localization support.
LinuxFLBench Fault localization Paper / GitHub Linux-kernel file localization from bug reports.
ARVO Memory vulnerability repair Paper / GitHub Reproducible OSS-Fuzz C/C++ memory-bug repair substrate.
BugsCpp Bug repair substrate Paper / GitHub Real C/C++ bugs packaged with build, test, and coverage tooling.
ManyBugs APR substrate Paper / GitHub Legacy real C defect benchmark.
DBGBench Debugging and diagnosis Paper / GitHub C regression bugs with human fault-location and diagnosis labels.
Vul4C Vulnerability repair Paper / GitHub Real C vulnerability-repair instances with exploits and patches.
AutoCBI Compiler-bug isolation Paper C/C++ compiler-bug isolation setting.
CMind Fault localization and reasoning Paper / GitHub C/C++ program-behavior reasoning benchmark.
xLoc Cross-file fault localization Paper Interprocedural C/C++ fault localization.
SecLLMHolmes Security reasoning Paper / GitHub C vulnerability tasks for detection robustness and reasoning checks.
Big-Vul Vulnerability corpus Data Large C/C++ CWE-labeled function corpus.
Devign Vulnerability corpus Paper / Data C vulnerability corpus from FFmpeg and QEMU.
ReVeal Vulnerability corpus Paper / GitHub Real-world C/C++ vulnerability data from Chromium and Debian.
DiverseVul Vulnerability corpus Paper / GitHub Deduplicated C/C++ vulnerability corpus across many CWEs.
MegaVul Vulnerability corpus Paper / GitHub Large C/C++ vulnerability corpus with richer context.
ReposVul Repository-level vulnerability corpus Paper / GitHub Repository-context C/C++ vulnerability data.
VulEval Interprocedural vulnerability evaluation Paper C/C++ vulnerability evaluation with interprocedural context.
ICVul Context-aware vulnerability corpus Paper / GitHub C/C++ vulnerability data with contextual features.
CVEfixes CVE fix corpus Paper / GitHub CVE-to-fix-commit corpus with C/C++ coverage.
CrossVul Cross-language vulnerability corpus Paper Includes a C/C++ vulnerability subset.
D2A Static-analysis vulnerability corpus Paper / GitHub Differential-analysis C/C++ vulnerability data.
Draper VDISC Static-analysis vulnerability corpus Paper Large weakly labeled C/C++ security corpus.
PatchDB Security patch corpus Paper / GitHub Security-patch data with C/C++ coverage.
SecretPatch Silent security patch corpus Paper / GitHub Silent security-patch mining with C/C++ coverage.
SVulD Vulnerability corpus Paper / GitHub Semantic C/C++ vulnerability-detection dataset.
SVEN Secure code generation Paper / GitHub Controlled secure-vs-insecure C/C++ generation tasks.
VulnPatchPairs Vulnerable-patched pairs Paper / GitHub C/C++ function pairs for robustness testing.
PairVul Vulnerable-patched pairs Paper / GitHub Paired C/C++ vulnerable and fixed samples.
VulnLLMEval Vulnerability detection and localization Paper C/Linux-kernel LLM vulnerability evaluation framework.
VulTrigger / InterPVD Trigger-path detection Paper / GitHub Interprocedural vulnerability-trigger path detection for C/C++.
VulDeePecker Vulnerability detector lineage Paper / GitHub Classic slice-based C/C++ vulnerability benchmark lineage.
SySeVR Vulnerability detector lineage Paper / GitHub Classic semantic-vector C/C++ vulnerability benchmark lineage.
VulnBench Vulnerability evaluation harness Paper C/C++ vulnerability evaluation harness.
CleanVul Cleaned vulnerability corpus Paper / GitHub LLM-cleaned vulnerability data for label-noise reduction.
CRUST-Bench C-to-Rust migration Paper / GitHub / Leaderboard Whole C repositories translated to safe Rust and checked by tests.
RustRepoTrans C-to-Rust migration Paper / GitHub Repository-level C-to-Rust translation with cross-file dependencies.
SWE-bench-Live MultiLang C/C++ Issue resolution Paper / GitHub / Leaderboard Fresh C/C++ issue-resolution split in a contamination-resistant suite.
SemOpt Semantic optimization Paper Static-rule-guided optimization of real C/C++ code.
ParEval Parallel code generation Paper / GitHub C++, CUDA, HIP, OpenMP, MPI, and Kokkos generation tasks.
PerfCodeBench Performance optimization Paper Hardware-aware C/C++ optimization against expert references.
SecRepoBench Secure repository completion Paper / GitHub / Leaderboard Secure code completion across C/C++ repositories and CWEs.
C2SaferRust C-to-Rust safety migration Paper LLM rewrite stage for reducing unsafe Rust after C2Rust.
SACTOR C-to-Rust migration Paper Static-analysis-assisted C-to-Rust translation system.
SafeTrans C-to-safe-Rust migration Paper / GitHub C-to-Rust translation with explicit safety constraints.
EvoC2Rust C-to-Rust migration Paper / GitHub Iterative C-to-Rust translation method.
RustAssure C-to-Rust assurance Paper / GitHub C-to-Rust translation with verification-oriented checks.
ENCRUST C-to-Rust migration Paper C-to-Rust translation system.
RepoTransBench Repository translation Paper / GitHub Repository-level translation benchmark with C/C++ coverage.
ORNL HPC eval HPC code generation Paper C++ kernels across OpenMP, CUDA, HIP, and Kokkos.
MPCO Code optimization Paper / GitHub C++ optimization method and evaluation setup.
CITYWALK Unit-test generation Paper C++ unit-test generation with pointers, templates, virtuals, and mocks.
LLM4Decompile / Decompile-Bench Decompilation Paper / GitHub / Data Binary-to-C recovery measured by compile and execution behavior.
RepoQA Codebase comprehension Paper / GitHub / Leaderboard Function retrieval from repository context with a C++ slice.
SAFIM Fill-in-the-middle Paper / GitHub / Leaderboard Syntax-aware C++ masked-code completion with execution checks.
CPP-UT-Bench Unit-test generation Paper / Data C++ unit-test generation dataset from real projects.
CodeInverter Decompilation Paper / GitHub C decompilation using control-flow and memory mappings.
Idioms / Realtype Decompilation metadata recovery Paper / GitHub Variable-name and real-type recovery for decompiled code.
ExeBench Executable C corpus Paper / GitHub / Data Large executable C function corpus for binary/source tasks.
DecLLM Decompilation Paper LLM-based C decompilation with recompilation checks.
SK2Decompile Decompilation Paper / GitHub Skeleton-aware C/C++ decompilation.
EffiBench-X Efficiency evaluation Paper / GitHub / Leaderboard C++ correctness and runtime/memory efficiency scoring.
LiveCodeBench Pro Competitive programming Paper / Leaderboard Contamination-controlled programming tasks with C++ submissions.
LiveCodeBench-Pro-CPP Competitive programming Paper / Leaderboard C++ compile-and-run path for LiveCodeBench Pro.
LiveCodeBench-Pro-Testcase Testcase generation Paper / Leaderboard Test-payload component for LiveCodeBench Pro.
SWE-bench Multilingual Multilingual issue resolution Paper / Leaderboard Includes C/C++ but headline scores are not C/C++ isolated.
SWE-bench++ / Auto-SWE-Bench Multilingual issue resolution Paper / GitHub / Data Aggregate or combined reporting only in captured notes.
SWE-rebench V2 Multilingual issue resolution Paper / GitHub / Data C/C++ subset not isolated in captured reporting.
LiveCVEBench Multilingual CVE repair Paper / GitHub / Leaderboard Headline repair rates are multilingual.
CVE-Bench Multilingual CVE repair Paper Four-language CVE benchmark without clean C/C++-only result captured.
RepoDebug Multilingual debugging Paper / GitHub Synthetic repository bugs across languages.
GSO Software optimization Paper / GitHub / Leaderboard Python-ecosystem optimization benchmark; no C/C++ result.
CodeTransOcean Multilingual code translation Paper / GitHub / Data Includes C/C++ but not isolated in captured reporting.
REEF Multilingual vulnerability corpus Paper / GitHub Real-world vulnerabilities and fixes across several languages.
McEval Multilingual code generation Paper / GitHub / Leaderboard Per-language coverage exists, but C/C++ cells are not captured here.
MdEval Multilingual debugging Paper / GitHub / Data Headline debugging scores are aggregate.
CRUXEval-X Multilingual execution reasoning Paper / GitHub / Leaderboard C/C++ coverage exists, but no isolated SOTA captured here.
LiveCodeBench Multilingual coding benchmark Paper / GitHub / Leaderboard Cross-language benchmark; headline scores are aggregate.
CodeElo Competitive programming Elo Paper / GitHub / Leaderboard C++-centric submissions, but no distinct C/C++ sub-score.
Long Code Arena Repository comprehension Paper / GitHub / Leaderboard Nearby benchmark context; current tasks are not C/C++.
CodeReviewer Code review Paper / GitHub Multilingual code-review corpus.
CodeReviewQA Code-review comprehension Paper / GitHub / Data Pull-request QA across languages.
AACR-Bench Automated code review Paper / GitHub / Data Multilingual automated-code-review benchmark.
Sphinx Code review and static analysis Paper Multilingual review and static-analysis evaluation.
CommitChronicle Commit-message generation Paper / GitHub / Data Multilingual commit-message corpus.
MCMD Commit-message generation Paper / GitHub Multilingual commit-message generation dataset.
CommitPack / CommitPackFT Commit corpus Paper / GitHub / Data Large multilingual commit corpus for instruction tuning.

License And Access

Copyright (c) 2026 ByteAsk. Released under the MIT License.

For data access, reproducibility materials, or replication details, contact research@byteask.ai.

About

C/C++ LLM benchmark catalog from a manual review of 96 public benchmark entries, current as of April 2026.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors