The MLIR-first decompiler engine behind HexCore — LLVM IR to readable pseudo-C, honest about what it cannot recover.
Overview · What It Does · Architecture · Honesty Layer · Build · CLI · Layout · License
decompiler · MLIR · LLVM IR · pseudo-C · Remill · reverse engineering · Win64 · type recovery · control-flow structuring · SSA
Helix is the decompiler engine inside HexCore. It takes lifted LLVM IR (from a patched Remill fork), lowers it through three custom MLIR dialects, and emits readable pseudo-C with calling-convention recovery, stack reconstruction, variable naming, type propagation, and control-flow structuring.
Helix is MLIR-first — it does not fork Remill, it wraps it:
LLVM IR -> RemillToHelixLow -> HelixLow -> HelixMid -> HelixHigh -> C-AST -> pseudo-C
The whole pipeline runs natively in C++23. It ships into the HexCore IDE as a pre-built N-API .node (via a Rust/napi-rs bridge), and also runs standalone through the helix_tool CLI for engine-direct work.
The guiding principle is fidelity over polish: correct C, or an honestly flagged approximation when the lift cannot be trusted — never a clean-looking lie. See The Honesty Layer.
Status:
v0.9.2— the stable cut. The leave-nightly honesty layer is complete (D1, D2, D4, #30; see below). The architecturalv1.0track (MemEffects-based DCE, universal op semantics, full structured-CFG recovery) runs alongside and does not gate this cut. See CHANGELOG.
- Lowers Remill-generated LLVM IR through a 3-tier MLIR dialect pipeline (HelixLow → HelixMid → HelixHigh)
- Recovers stack layout, parameters, locals, and variable intent (Win64 / SysV / cdecl auto-detection)
- Reconstructs direct, vtable, and recursive calls; gates fabricated call names against the function table
- Propagates types across function boundaries to a fixed point
- Simplifies flags and comparisons into readable conditions
- Reverses compiler optimizations (magic division, strength reduction)
- Structures control flow — if / else / while / switch — before emission, with irreducible-loop handling
- Resolves recovered code addresses to symbols instead of leaking them as bare data
- Scores each function with an honest confidence that tracks correctness, not just surface plausibility
- Emits pseudo-C through a C-AST layer and the standalone
helix_tool
Helix lowers through three MLIR dialects, then builds a C abstract syntax tree and prints it.
| Tier | Dialect | Responsibility |
|---|---|---|
| 1 | HelixLow | Machine-level semantics — reg.read/reg.write, mem.read/mem.write, flags, raw control flow. Direct lowering of Remill IR. |
| 2 | HelixMid | ISA-agnostic typed SSA — registers become typed variable slots, flags become comparisons, REP MOVS/STOS become memcpy/memset. |
| 3 | HelixHigh | C-level — var.decl with storage class, structured control flow, typed expressions. |
| — | C-AST | HelixHigh → C abstract syntax tree → printed pseudo-C. Owns the honesty gates and the optimizer passes. |
RemillToHelixLow— Remill IR lowering and per-instruction address trackingRecoverStackLayout— Win64 stack parameter and local reconstructionRecoverCallingConvention— ABI argument materializationPropagateTypes/InterProceduralTypePropagation— intra- and cross-function type inference to fixed pointStructureControlFlow— CFG structuring with irreducible-loop handlingRecoverVariables/EliminateDeadCode— register-noise reduction and dead-code eliminationHelixLowToMid— machine-level → ISA-agnostic typed SSARecoverMagicDivision— reverses(x * magic) >> shiftback tox / divisorDevirtualizeIndirectCalls— vtable dataflow analysisHelixMidToHigh— typed SSA → C source-level representation- C-AST optimizer — dead-store elimination, copy propagation, compound-assignment folding, struct-field recovery, semantic naming, confidence scoring
The Rust workspace provides shared types, the FlatBuffers transport, and the napi-rs bridge that exposes the engine to the HexCore IDE.
See ARCHITECTURE.md for the full design.
Most decompilers optimize for clean-looking output. Helix optimizes for trustworthy output: when a lift is ambiguous, Helix says so in-band instead of fabricating something plausible. The blocker to a stable cut is a small set of honesty guarantees:
| Class | Guarantee |
|---|---|
| D1 | A recovered code address never leaks as a bare data constant (var = 0x401050;). It resolves to a label or an honest code-pointer cast. |
| D2 | A call target not in the function table is emitted as an honest indirect call (*(code *)0xADDR)(...), never a fabricated sub_XXX symbol. |
| D3 | Unreachable code is removed and located-marked, never silently shown as live. |
| D4 | Confidence tracks honesty. A function is hard-capped only when a genuine defect survives into the emitted output — a leaked code address, an out-of-table call, an uninitialized return, an irreducible no-return — and the cap names the located reason. It cannot self-report as plausible while hiding one. |
| #30 | No silent high-confidence stub for an address that did not actually lift. |
These live in the C-AST layer as one authoritative function/block-address registry plus located honest markers — a focused honesty pass, not a rewrite. v0.9.2 ships D1, D2, D4, and #30; D3 (general unreachable-code removal) is maturing on the structured-CFG track and does not gate the cut.
A concrete example — a function that looked clean but returns an uninitialized value, before and after the D4 gate:
- // Confidence: 92.0% (High)
+ // Confidence: 50.0% (Low)
+ // Issues: ... damning honesty defect (uninitialized return value 'result') - confidence capped at 50%The body has zero gotos and reads cleanly, so a surface-plausibility score rated it High — but it returns a result that is never assigned. Helix caps it to Low and names the exact located reason, re-derived from the final emitted code so the flag always points at a real defect in the output.
- A C++23 toolchain (MSVC 2022 / clang-cl on Windows)
- LLVM + MLIR 18 (an MLIR-enabled LLVM build)
- CMake 3.20+ and Ninja
- Rust stable (for the napi-rs bridge)
- Node.js 18+ (only for the N-API
.nodebuild)
cmake -S engine -B engine/build -G Ninja \
-DLLVM_DIR=/path/to/llvm/lib/cmake/llvm \
-DMLIR_DIR=/path/to/llvm/lib/cmake/mlir
cmake --build engine/build --config ReleaseThis produces engine/build/helix_tool.exe (the engine-direct CLI) and the helix_engine static library.
cmake -S engine -B engine/build-tests -G Ninja -DCMAKE_BUILD_TYPE=Release
cmake --build engine/build-tests --config Release
./engine/build-tests/test/helix_tests.exeLLVM_DIR=/path/to/llvm/lib/cmake/llvm cargo check -p helix-core -p hexcore-helixhelix_tool runs the full pipeline on a single .ll and prints pseudo-C — the fastest way to validate a change without the IDE.
./engine/build/helix_tool.exe --use-cast-layer path/to/function.ll--use-cast-layerroutes through the C-AST layer (the default emission path; honesty gates and the optimizer run here).- Always feed a fresh
.lllifted by the current Remill — a stale dump invalidates the result.
HexCore-Helix/
|- engine/ C++23 decompiler engine, MLIR dialects, C-AST layer, helix_tool CLI, tests
|- crates/ Rust workspace (helix-core) and the hexcore-helix napi-rs bridge
|- schemas/ FlatBuffers schemas (engine <-> IDE transport)
|- signatures/ Optional address / name databases
|- tests/ Real decompilation fixtures and reports
|- ARCHITECTURE.md Architectural overview
|- CHANGELOG.md Release notes
`- roadmap.md Near-term roadmap
Helix is one of HexCore's native engines. Inside the IDE the decompilation pipeline runs:
machine code -> Pathfinder CFG hints -> Remill lift -> LLVM IR -> Helix (this repo) -> pseudo-C
End users receive Helix as a pre-built .node — no compilation needed. This repository is where the engine itself is developed, gated, and benchmarked against a real-world corpus (Linux kernel modules, large Win64 game binaries, obfuscated malware, and CTF VMs).
Apache License 2.0. See LICENSE for details.
HexCore Helix — a decompiler that tells you the truth.
