A catalogue of the public benchmarks used to evaluate large language models on C and C++, compiled from a manual review of papers, leaderboards, datasets, and benchmark repositories (96 benchmark entries, current as of April 2026). Coverage spans the C/C++ tasks that appear in the literature: kernel crash repair, real-world bug repair, vulnerability detection and localization, performance optimization, test generation, decompilation, codebase comprehension, and C-to-Rust migration.
Each entry records the benchmark's evaluation target, primary source, and the specific C/C++ behavior it measures. Where a benchmark reports model results, the catalogue notes the C/C++ portion rather than the multilingual aggregate, since headline scores are often dominated by other languages. Entries are grouped by task family and by the role each benchmark plays: direct measurement, weak proxy, or absent, rather than by visibility or citation count.
The aim is to separate evidence that directly measures C/C++ behavior from broader multilingual results, and to mark where the field has strong benchmarks, weak proxies, or no measurement at all. The gaps are recorded as deliberately as the benchmarks: several task areas that matter for production C/C++ have no public benchmark of any kind.
| Benchmark | Evaluation Focus | Source | C/C++ Signal |
|---|---|---|---|
| kGym / kBenchSyz | Kernel crash repair | Paper / GitHub | Real Linux kernel crashes with compile-and-reproduce validation. |
| SEC-bench Pro | Security analysis and exploit generation | Paper / GitHub / Leaderboard | Autonomous vulnerability discovery on large C++ engines. |
| SEC-bench | Vulnerability repair and PoC generation | Paper / GitHub / Leaderboard | Real C/C++ CVE tasks with containerized validation. |
| PrimeVul | Vulnerability detection | Paper / GitHub | Paired vulnerable and patched C/C++ functions reduce shortcut scoring. |
| SecVulEval | Vulnerability localization | Paper | Statement-level C/C++ vulnerability localization. |
| VulDetectBench | Vulnerability detection and localization | Paper / GitHub | Multi-level C/C++ security reasoning from detection to trigger lines. |
| VulBench | Vulnerability detection | Paper / GitHub | Cleaned C/C++ vulnerability benchmark across CTF and real-world data. |
| Defects4C | Bug repair | Paper / Leaderboard | Executable real C/C++ bugs with test-based repair validation. |
| Multi-SWE-bench C/C++ | Issue resolution | Paper / GitHub / Leaderboard | Real C and C++ GitHub issues with fail-to-pass tests. |
| VulnLoc / ExtractFix | CVE repair | VulnLoc / ExtractFix / GitHub | Small C memory-safety repair datasets with localization support. |
| LinuxFLBench | Fault localization | Paper / GitHub | Linux-kernel file localization from bug reports. |
| ARVO | Memory vulnerability repair | Paper / GitHub | Reproducible OSS-Fuzz C/C++ memory-bug repair substrate. |
| BugsCpp | Bug repair substrate | Paper / GitHub | Real C/C++ bugs packaged with build, test, and coverage tooling. |
| ManyBugs | APR substrate | Paper / GitHub | Legacy real C defect benchmark. |
| DBGBench | Debugging and diagnosis | Paper / GitHub | C regression bugs with human fault-location and diagnosis labels. |
| Vul4C | Vulnerability repair | Paper / GitHub | Real C vulnerability-repair instances with exploits and patches. |
| AutoCBI | Compiler-bug isolation | Paper | C/C++ compiler-bug isolation setting. |
| CMind | Fault localization and reasoning | Paper / GitHub | C/C++ program-behavior reasoning benchmark. |
| xLoc | Cross-file fault localization | Paper | Interprocedural C/C++ fault localization. |
| SecLLMHolmes | Security reasoning | Paper / GitHub | C vulnerability tasks for detection robustness and reasoning checks. |
| Big-Vul | Vulnerability corpus | Data | Large C/C++ CWE-labeled function corpus. |
| Devign | Vulnerability corpus | Paper / Data | C vulnerability corpus from FFmpeg and QEMU. |
| ReVeal | Vulnerability corpus | Paper / GitHub | Real-world C/C++ vulnerability data from Chromium and Debian. |
| DiverseVul | Vulnerability corpus | Paper / GitHub | Deduplicated C/C++ vulnerability corpus across many CWEs. |
| MegaVul | Vulnerability corpus | Paper / GitHub | Large C/C++ vulnerability corpus with richer context. |
| ReposVul | Repository-level vulnerability corpus | Paper / GitHub | Repository-context C/C++ vulnerability data. |
| VulEval | Interprocedural vulnerability evaluation | Paper | C/C++ vulnerability evaluation with interprocedural context. |
| ICVul | Context-aware vulnerability corpus | Paper / GitHub | C/C++ vulnerability data with contextual features. |
| CVEfixes | CVE fix corpus | Paper / GitHub | CVE-to-fix-commit corpus with C/C++ coverage. |
| CrossVul | Cross-language vulnerability corpus | Paper | Includes a C/C++ vulnerability subset. |
| D2A | Static-analysis vulnerability corpus | Paper / GitHub | Differential-analysis C/C++ vulnerability data. |
| Draper VDISC | Static-analysis vulnerability corpus | Paper | Large weakly labeled C/C++ security corpus. |
| PatchDB | Security patch corpus | Paper / GitHub | Security-patch data with C/C++ coverage. |
| SecretPatch | Silent security patch corpus | Paper / GitHub | Silent security-patch mining with C/C++ coverage. |
| SVulD | Vulnerability corpus | Paper / GitHub | Semantic C/C++ vulnerability-detection dataset. |
| SVEN | Secure code generation | Paper / GitHub | Controlled secure-vs-insecure C/C++ generation tasks. |
| VulnPatchPairs | Vulnerable-patched pairs | Paper / GitHub | C/C++ function pairs for robustness testing. |
| PairVul | Vulnerable-patched pairs | Paper / GitHub | Paired C/C++ vulnerable and fixed samples. |
| VulnLLMEval | Vulnerability detection and localization | Paper | C/Linux-kernel LLM vulnerability evaluation framework. |
| VulTrigger / InterPVD | Trigger-path detection | Paper / GitHub | Interprocedural vulnerability-trigger path detection for C/C++. |
| VulDeePecker | Vulnerability detector lineage | Paper / GitHub | Classic slice-based C/C++ vulnerability benchmark lineage. |
| SySeVR | Vulnerability detector lineage | Paper / GitHub | Classic semantic-vector C/C++ vulnerability benchmark lineage. |
| VulnBench | Vulnerability evaluation harness | Paper | C/C++ vulnerability evaluation harness. |
| CleanVul | Cleaned vulnerability corpus | Paper / GitHub | LLM-cleaned vulnerability data for label-noise reduction. |
| CRUST-Bench | C-to-Rust migration | Paper / GitHub / Leaderboard | Whole C repositories translated to safe Rust and checked by tests. |
| RustRepoTrans | C-to-Rust migration | Paper / GitHub | Repository-level C-to-Rust translation with cross-file dependencies. |
| SWE-bench-Live MultiLang C/C++ | Issue resolution | Paper / GitHub / Leaderboard | Fresh C/C++ issue-resolution split in a contamination-resistant suite. |
| SemOpt | Semantic optimization | Paper | Static-rule-guided optimization of real C/C++ code. |
| ParEval | Parallel code generation | Paper / GitHub | C++, CUDA, HIP, OpenMP, MPI, and Kokkos generation tasks. |
| PerfCodeBench | Performance optimization | Paper | Hardware-aware C/C++ optimization against expert references. |
| SecRepoBench | Secure repository completion | Paper / GitHub / Leaderboard | Secure code completion across C/C++ repositories and CWEs. |
| C2SaferRust | C-to-Rust safety migration | Paper | LLM rewrite stage for reducing unsafe Rust after C2Rust. |
| SACTOR | C-to-Rust migration | Paper | Static-analysis-assisted C-to-Rust translation system. |
| SafeTrans | C-to-safe-Rust migration | Paper / GitHub | C-to-Rust translation with explicit safety constraints. |
| EvoC2Rust | C-to-Rust migration | Paper / GitHub | Iterative C-to-Rust translation method. |
| RustAssure | C-to-Rust assurance | Paper / GitHub | C-to-Rust translation with verification-oriented checks. |
| ENCRUST | C-to-Rust migration | Paper | C-to-Rust translation system. |
| RepoTransBench | Repository translation | Paper / GitHub | Repository-level translation benchmark with C/C++ coverage. |
| ORNL HPC eval | HPC code generation | Paper | C++ kernels across OpenMP, CUDA, HIP, and Kokkos. |
| MPCO | Code optimization | Paper / GitHub | C++ optimization method and evaluation setup. |
| CITYWALK | Unit-test generation | Paper | C++ unit-test generation with pointers, templates, virtuals, and mocks. |
| LLM4Decompile / Decompile-Bench | Decompilation | Paper / GitHub / Data | Binary-to-C recovery measured by compile and execution behavior. |
| RepoQA | Codebase comprehension | Paper / GitHub / Leaderboard | Function retrieval from repository context with a C++ slice. |
| SAFIM | Fill-in-the-middle | Paper / GitHub / Leaderboard | Syntax-aware C++ masked-code completion with execution checks. |
| CPP-UT-Bench | Unit-test generation | Paper / Data | C++ unit-test generation dataset from real projects. |
| CodeInverter | Decompilation | Paper / GitHub | C decompilation using control-flow and memory mappings. |
| Idioms / Realtype | Decompilation metadata recovery | Paper / GitHub | Variable-name and real-type recovery for decompiled code. |
| ExeBench | Executable C corpus | Paper / GitHub / Data | Large executable C function corpus for binary/source tasks. |
| DecLLM | Decompilation | Paper | LLM-based C decompilation with recompilation checks. |
| SK2Decompile | Decompilation | Paper / GitHub | Skeleton-aware C/C++ decompilation. |
| EffiBench-X | Efficiency evaluation | Paper / GitHub / Leaderboard | C++ correctness and runtime/memory efficiency scoring. |
| LiveCodeBench Pro | Competitive programming | Paper / Leaderboard | Contamination-controlled programming tasks with C++ submissions. |
| LiveCodeBench-Pro-CPP | Competitive programming | Paper / Leaderboard | C++ compile-and-run path for LiveCodeBench Pro. |
| LiveCodeBench-Pro-Testcase | Testcase generation | Paper / Leaderboard | Test-payload component for LiveCodeBench Pro. |
| SWE-bench Multilingual | Multilingual issue resolution | Paper / Leaderboard | Includes C/C++ but headline scores are not C/C++ isolated. |
| SWE-bench++ / Auto-SWE-Bench | Multilingual issue resolution | Paper / GitHub / Data | Aggregate or combined reporting only in captured notes. |
| SWE-rebench V2 | Multilingual issue resolution | Paper / GitHub / Data | C/C++ subset not isolated in captured reporting. |
| LiveCVEBench | Multilingual CVE repair | Paper / GitHub / Leaderboard | Headline repair rates are multilingual. |
| CVE-Bench | Multilingual CVE repair | Paper | Four-language CVE benchmark without clean C/C++-only result captured. |
| RepoDebug | Multilingual debugging | Paper / GitHub | Synthetic repository bugs across languages. |
| GSO | Software optimization | Paper / GitHub / Leaderboard | Python-ecosystem optimization benchmark; no C/C++ result. |
| CodeTransOcean | Multilingual code translation | Paper / GitHub / Data | Includes C/C++ but not isolated in captured reporting. |
| REEF | Multilingual vulnerability corpus | Paper / GitHub | Real-world vulnerabilities and fixes across several languages. |
| McEval | Multilingual code generation | Paper / GitHub / Leaderboard | Per-language coverage exists, but C/C++ cells are not captured here. |
| MdEval | Multilingual debugging | Paper / GitHub / Data | Headline debugging scores are aggregate. |
| CRUXEval-X | Multilingual execution reasoning | Paper / GitHub / Leaderboard | C/C++ coverage exists, but no isolated SOTA captured here. |
| LiveCodeBench | Multilingual coding benchmark | Paper / GitHub / Leaderboard | Cross-language benchmark; headline scores are aggregate. |
| CodeElo | Competitive programming Elo | Paper / GitHub / Leaderboard | C++-centric submissions, but no distinct C/C++ sub-score. |
| Long Code Arena | Repository comprehension | Paper / GitHub / Leaderboard | Nearby benchmark context; current tasks are not C/C++. |
| CodeReviewer | Code review | Paper / GitHub | Multilingual code-review corpus. |
| CodeReviewQA | Code-review comprehension | Paper / GitHub / Data | Pull-request QA across languages. |
| AACR-Bench | Automated code review | Paper / GitHub / Data | Multilingual automated-code-review benchmark. |
| Sphinx | Code review and static analysis | Paper | Multilingual review and static-analysis evaluation. |
| CommitChronicle | Commit-message generation | Paper / GitHub / Data | Multilingual commit-message corpus. |
| MCMD | Commit-message generation | Paper / GitHub | Multilingual commit-message generation dataset. |
| CommitPack / CommitPackFT | Commit corpus | Paper / GitHub / Data | Large multilingual commit corpus for instruction tuning. |
Copyright (c) 2026 ByteAsk. Released under the MIT License.
For data access, reproducibility materials, or replication details, contact research@byteask.ai.
