C-CppBench

A catalogue of the public benchmarks used to evaluate large language models on C and C++, compiled from a manual review of papers, leaderboards, datasets, and benchmark repositories (96 benchmark entries, current as of April 2026). Coverage spans the C/C++ tasks that appear in the literature: kernel crash repair, real-world bug repair, vulnerability detection and localization, performance optimization, test generation, decompilation, codebase comprehension, and C-to-Rust migration.

Each entry records the benchmark's evaluation target, primary source, and the specific C/C++ behavior it measures. Where a benchmark reports model results, the catalogue notes the C/C++ portion rather than the multilingual aggregate, since headline scores are often dominated by other languages. Entries are grouped by task family and by the role each benchmark plays: direct measurement, weak proxy, or absent, rather than by visibility or citation count.

The aim is to separate evidence that directly measures C/C++ behavior from broader multilingual results, and to mark where the field has strong benchmarks, weak proxies, or no measurement at all. The gaps are recorded as deliberately as the benchmarks: several task areas that matter for production C/C++ have no public benchmark of any kind.

Benchmark	Evaluation Focus	Source	C/C++ Signal
kGym / kBenchSyz	Kernel crash repair	Paper / GitHub	Real Linux kernel crashes with compile-and-reproduce validation.
SEC-bench Pro	Security analysis and exploit generation	Paper / GitHub / Leaderboard	Autonomous vulnerability discovery on large C++ engines.
SEC-bench	Vulnerability repair and PoC generation	Paper / GitHub / Leaderboard	Real C/C++ CVE tasks with containerized validation.
PrimeVul	Vulnerability detection	Paper / GitHub	Paired vulnerable and patched C/C++ functions reduce shortcut scoring.
SecVulEval	Vulnerability localization	Paper	Statement-level C/C++ vulnerability localization.
VulDetectBench	Vulnerability detection and localization	Paper / GitHub	Multi-level C/C++ security reasoning from detection to trigger lines.
VulBench	Vulnerability detection	Paper / GitHub	Cleaned C/C++ vulnerability benchmark across CTF and real-world data.
Defects4C	Bug repair	Paper / Leaderboard	Executable real C/C++ bugs with test-based repair validation.
Multi-SWE-bench C/C++	Issue resolution	Paper / GitHub / Leaderboard	Real C and C++ GitHub issues with fail-to-pass tests.
VulnLoc / ExtractFix	CVE repair	VulnLoc / ExtractFix / GitHub	Small C memory-safety repair datasets with localization support.
LinuxFLBench	Fault localization	Paper / GitHub	Linux-kernel file localization from bug reports.
ARVO	Memory vulnerability repair	Paper / GitHub	Reproducible OSS-Fuzz C/C++ memory-bug repair substrate.
BugsCpp	Bug repair substrate	Paper / GitHub	Real C/C++ bugs packaged with build, test, and coverage tooling.
ManyBugs	APR substrate	Paper / GitHub	Legacy real C defect benchmark.
DBGBench	Debugging and diagnosis	Paper / GitHub	C regression bugs with human fault-location and diagnosis labels.
Vul4C	Vulnerability repair	Paper / GitHub	Real C vulnerability-repair instances with exploits and patches.
AutoCBI	Compiler-bug isolation	Paper	C/C++ compiler-bug isolation setting.
CMind	Fault localization and reasoning	Paper / GitHub	C/C++ program-behavior reasoning benchmark.
xLoc	Cross-file fault localization	Paper	Interprocedural C/C++ fault localization.
SecLLMHolmes	Security reasoning	Paper / GitHub	C vulnerability tasks for detection robustness and reasoning checks.
Big-Vul	Vulnerability corpus	Data	Large C/C++ CWE-labeled function corpus.
Devign	Vulnerability corpus	Paper / Data	C vulnerability corpus from FFmpeg and QEMU.
ReVeal	Vulnerability corpus	Paper / GitHub	Real-world C/C++ vulnerability data from Chromium and Debian.
DiverseVul	Vulnerability corpus	Paper / GitHub	Deduplicated C/C++ vulnerability corpus across many CWEs.
MegaVul	Vulnerability corpus	Paper / GitHub	Large C/C++ vulnerability corpus with richer context.
ReposVul	Repository-level vulnerability corpus	Paper / GitHub	Repository-context C/C++ vulnerability data.
VulEval	Interprocedural vulnerability evaluation	Paper	C/C++ vulnerability evaluation with interprocedural context.
ICVul	Context-aware vulnerability corpus	Paper / GitHub	C/C++ vulnerability data with contextual features.
CVEfixes	CVE fix corpus	Paper / GitHub	CVE-to-fix-commit corpus with C/C++ coverage.
CrossVul	Cross-language vulnerability corpus	Paper	Includes a C/C++ vulnerability subset.
D2A	Static-analysis vulnerability corpus	Paper / GitHub	Differential-analysis C/C++ vulnerability data.
Draper VDISC	Static-analysis vulnerability corpus	Paper	Large weakly labeled C/C++ security corpus.
PatchDB	Security patch corpus	Paper / GitHub	Security-patch data with C/C++ coverage.
SecretPatch	Silent security patch corpus	Paper / GitHub	Silent security-patch mining with C/C++ coverage.
SVulD	Vulnerability corpus	Paper / GitHub	Semantic C/C++ vulnerability-detection dataset.
SVEN	Secure code generation	Paper / GitHub	Controlled secure-vs-insecure C/C++ generation tasks.
VulnPatchPairs	Vulnerable-patched pairs	Paper / GitHub	C/C++ function pairs for robustness testing.
PairVul	Vulnerable-patched pairs	Paper / GitHub	Paired C/C++ vulnerable and fixed samples.
VulnLLMEval	Vulnerability detection and localization	Paper	C/Linux-kernel LLM vulnerability evaluation framework.
VulTrigger / InterPVD	Trigger-path detection	Paper / GitHub	Interprocedural vulnerability-trigger path detection for C/C++.
VulDeePecker	Vulnerability detector lineage	Paper / GitHub	Classic slice-based C/C++ vulnerability benchmark lineage.
SySeVR	Vulnerability detector lineage	Paper / GitHub	Classic semantic-vector C/C++ vulnerability benchmark lineage.
VulnBench	Vulnerability evaluation harness	Paper	C/C++ vulnerability evaluation harness.
CleanVul	Cleaned vulnerability corpus	Paper / GitHub	LLM-cleaned vulnerability data for label-noise reduction.
CRUST-Bench	C-to-Rust migration	Paper / GitHub / Leaderboard	Whole C repositories translated to safe Rust and checked by tests.
RustRepoTrans	C-to-Rust migration	Paper / GitHub	Repository-level C-to-Rust translation with cross-file dependencies.
SWE-bench-Live MultiLang C/C++	Issue resolution	Paper / GitHub / Leaderboard	Fresh C/C++ issue-resolution split in a contamination-resistant suite.
SemOpt	Semantic optimization	Paper	Static-rule-guided optimization of real C/C++ code.
ParEval	Parallel code generation	Paper / GitHub	C++, CUDA, HIP, OpenMP, MPI, and Kokkos generation tasks.
PerfCodeBench	Performance optimization	Paper	Hardware-aware C/C++ optimization against expert references.
SecRepoBench	Secure repository completion	Paper / GitHub / Leaderboard	Secure code completion across C/C++ repositories and CWEs.
C2SaferRust	C-to-Rust safety migration	Paper	LLM rewrite stage for reducing unsafe Rust after C2Rust.
SACTOR	C-to-Rust migration	Paper	Static-analysis-assisted C-to-Rust translation system.
SafeTrans	C-to-safe-Rust migration	Paper / GitHub	C-to-Rust translation with explicit safety constraints.
EvoC2Rust	C-to-Rust migration	Paper / GitHub	Iterative C-to-Rust translation method.
RustAssure	C-to-Rust assurance	Paper / GitHub	C-to-Rust translation with verification-oriented checks.
ENCRUST	C-to-Rust migration	Paper	C-to-Rust translation system.
RepoTransBench	Repository translation	Paper / GitHub	Repository-level translation benchmark with C/C++ coverage.
ORNL HPC eval	HPC code generation	Paper	C++ kernels across OpenMP, CUDA, HIP, and Kokkos.
MPCO	Code optimization	Paper / GitHub	C++ optimization method and evaluation setup.
CITYWALK	Unit-test generation	Paper	C++ unit-test generation with pointers, templates, virtuals, and mocks.
LLM4Decompile / Decompile-Bench	Decompilation	Paper / GitHub / Data	Binary-to-C recovery measured by compile and execution behavior.
RepoQA	Codebase comprehension	Paper / GitHub / Leaderboard	Function retrieval from repository context with a C++ slice.
SAFIM	Fill-in-the-middle	Paper / GitHub / Leaderboard	Syntax-aware C++ masked-code completion with execution checks.
CPP-UT-Bench	Unit-test generation	Paper / Data	C++ unit-test generation dataset from real projects.
CodeInverter	Decompilation	Paper / GitHub	C decompilation using control-flow and memory mappings.
Idioms / Realtype	Decompilation metadata recovery	Paper / GitHub	Variable-name and real-type recovery for decompiled code.
ExeBench	Executable C corpus	Paper / GitHub / Data	Large executable C function corpus for binary/source tasks.
DecLLM	Decompilation	Paper	LLM-based C decompilation with recompilation checks.
SK2Decompile	Decompilation	Paper / GitHub	Skeleton-aware C/C++ decompilation.
EffiBench-X	Efficiency evaluation	Paper / GitHub / Leaderboard	C++ correctness and runtime/memory efficiency scoring.
LiveCodeBench Pro	Competitive programming	Paper / Leaderboard	Contamination-controlled programming tasks with C++ submissions.
LiveCodeBench-Pro-CPP	Competitive programming	Paper / Leaderboard	C++ compile-and-run path for LiveCodeBench Pro.
LiveCodeBench-Pro-Testcase	Testcase generation	Paper / Leaderboard	Test-payload component for LiveCodeBench Pro.
SWE-bench Multilingual	Multilingual issue resolution	Paper / Leaderboard	Includes C/C++ but headline scores are not C/C++ isolated.
SWE-bench++ / Auto-SWE-Bench	Multilingual issue resolution	Paper / GitHub / Data	Aggregate or combined reporting only in captured notes.
SWE-rebench V2	Multilingual issue resolution	Paper / GitHub / Data	C/C++ subset not isolated in captured reporting.
LiveCVEBench	Multilingual CVE repair	Paper / GitHub / Leaderboard	Headline repair rates are multilingual.
CVE-Bench	Multilingual CVE repair	Paper	Four-language CVE benchmark without clean C/C++-only result captured.
RepoDebug	Multilingual debugging	Paper / GitHub	Synthetic repository bugs across languages.
GSO	Software optimization	Paper / GitHub / Leaderboard	Python-ecosystem optimization benchmark; no C/C++ result.
CodeTransOcean	Multilingual code translation	Paper / GitHub / Data	Includes C/C++ but not isolated in captured reporting.
REEF	Multilingual vulnerability corpus	Paper / GitHub	Real-world vulnerabilities and fixes across several languages.
McEval	Multilingual code generation	Paper / GitHub / Leaderboard	Per-language coverage exists, but C/C++ cells are not captured here.
MdEval	Multilingual debugging	Paper / GitHub / Data	Headline debugging scores are aggregate.
CRUXEval-X	Multilingual execution reasoning	Paper / GitHub / Leaderboard	C/C++ coverage exists, but no isolated SOTA captured here.
LiveCodeBench	Multilingual coding benchmark	Paper / GitHub / Leaderboard	Cross-language benchmark; headline scores are aggregate.
CodeElo	Competitive programming Elo	Paper / GitHub / Leaderboard	C++-centric submissions, but no distinct C/C++ sub-score.
Long Code Arena	Repository comprehension	Paper / GitHub / Leaderboard	Nearby benchmark context; current tasks are not C/C++.
CodeReviewer	Code review	Paper / GitHub	Multilingual code-review corpus.
CodeReviewQA	Code-review comprehension	Paper / GitHub / Data	Pull-request QA across languages.
AACR-Bench	Automated code review	Paper / GitHub / Data	Multilingual automated-code-review benchmark.
Sphinx	Code review and static analysis	Paper	Multilingual review and static-analysis evaluation.
CommitChronicle	Commit-message generation	Paper / GitHub / Data	Multilingual commit-message corpus.
MCMD	Commit-message generation	Paper / GitHub	Multilingual commit-message generation dataset.
CommitPack / CommitPackFT	Commit corpus	Paper / GitHub / Data	Large multilingual commit corpus for instruction tuning.

License And Access

For data access, reproducibility materials, or replication details, contact research@byteask.ai.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

C-CppBench

License And Access

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

C-CppBench

License And Access

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages