Skip to content

AkashaCorporation/HexCore-Helix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HexCore Helix

HexCore Helix Decompiler

The MLIR-first decompiler engine behind HexCore — LLVM IR to readable pseudo-C, honest about what it cannot recover.

Overview · What It Does · Architecture · Honesty Layer · Build · CLI · Layout · License

version engine lifter bridge license

decompiler · MLIR · LLVM IR · pseudo-C · Remill · reverse engineering · Win64 · type recovery · control-flow structuring · SSA


Overview

Helix is the decompiler engine inside HexCore. It takes lifted LLVM IR (from a patched Remill fork), lowers it through three custom MLIR dialects, and emits readable pseudo-C with calling-convention recovery, stack reconstruction, variable naming, type propagation, and control-flow structuring.

Helix is MLIR-first — it does not fork Remill, it wraps it:

LLVM IR  ->  RemillToHelixLow  ->  HelixLow  ->  HelixMid  ->  HelixHigh  ->  C-AST  ->  pseudo-C

The whole pipeline runs natively in C++23. It ships into the HexCore IDE as a pre-built N-API .node (via a Rust/napi-rs bridge), and also runs standalone through the helix_tool CLI for engine-direct work.

The guiding principle is fidelity over polish: correct C, or an honestly flagged approximation when the lift cannot be trusted — never a clean-looking lie. See The Honesty Layer.

Status: v0.9.2 — the stable cut. The leave-nightly honesty layer is complete (D1, D2, D4, #30; see below). The architectural v1.0 track (MemEffects-based DCE, universal op semantics, full structured-CFG recovery) runs alongside and does not gate this cut. See CHANGELOG.


What Helix Does

  • Lowers Remill-generated LLVM IR through a 3-tier MLIR dialect pipeline (HelixLow → HelixMid → HelixHigh)
  • Recovers stack layout, parameters, locals, and variable intent (Win64 / SysV / cdecl auto-detection)
  • Reconstructs direct, vtable, and recursive calls; gates fabricated call names against the function table
  • Propagates types across function boundaries to a fixed point
  • Simplifies flags and comparisons into readable conditions
  • Reverses compiler optimizations (magic division, strength reduction)
  • Structures control flow — if / else / while / switch — before emission, with irreducible-loop handling
  • Resolves recovered code addresses to symbols instead of leaking them as bare data
  • Scores each function with an honest confidence that tracks correctness, not just surface plausibility
  • Emits pseudo-C through a C-AST layer and the standalone helix_tool

Architecture

Helix lowers through three MLIR dialects, then builds a C abstract syntax tree and prints it.

Tier Dialect Responsibility
1 HelixLow Machine-level semantics — reg.read/reg.write, mem.read/mem.write, flags, raw control flow. Direct lowering of Remill IR.
2 HelixMid ISA-agnostic typed SSA — registers become typed variable slots, flags become comparisons, REP MOVS/STOS become memcpy/memset.
3 HelixHigh C-level — var.decl with storage class, structured control flow, typed expressions.
C-AST HelixHigh → C abstract syntax tree → printed pseudo-C. Owns the honesty gates and the optimizer passes.

Pass pipeline (selected)

  1. RemillToHelixLow — Remill IR lowering and per-instruction address tracking
  2. RecoverStackLayout — Win64 stack parameter and local reconstruction
  3. RecoverCallingConvention — ABI argument materialization
  4. PropagateTypes / InterProceduralTypePropagation — intra- and cross-function type inference to fixed point
  5. StructureControlFlow — CFG structuring with irreducible-loop handling
  6. RecoverVariables / EliminateDeadCode — register-noise reduction and dead-code elimination
  7. HelixLowToMid — machine-level → ISA-agnostic typed SSA
  8. RecoverMagicDivision — reverses (x * magic) >> shift back to x / divisor
  9. DevirtualizeIndirectCalls — vtable dataflow analysis
  10. HelixMidToHigh — typed SSA → C source-level representation
  11. C-AST optimizer — dead-store elimination, copy propagation, compound-assignment folding, struct-field recovery, semantic naming, confidence scoring

The Rust workspace provides shared types, the FlatBuffers transport, and the napi-rs bridge that exposes the engine to the HexCore IDE.

See ARCHITECTURE.md for the full design.


The Honesty Layer

Most decompilers optimize for clean-looking output. Helix optimizes for trustworthy output: when a lift is ambiguous, Helix says so in-band instead of fabricating something plausible. The blocker to a stable cut is a small set of honesty guarantees:

Class Guarantee
D1 A recovered code address never leaks as a bare data constant (var = 0x401050;). It resolves to a label or an honest code-pointer cast.
D2 A call target not in the function table is emitted as an honest indirect call (*(code *)0xADDR)(...), never a fabricated sub_XXX symbol.
D3 Unreachable code is removed and located-marked, never silently shown as live.
D4 Confidence tracks honesty. A function is hard-capped only when a genuine defect survives into the emitted output — a leaked code address, an out-of-table call, an uninitialized return, an irreducible no-return — and the cap names the located reason. It cannot self-report as plausible while hiding one.
#30 No silent high-confidence stub for an address that did not actually lift.

These live in the C-AST layer as one authoritative function/block-address registry plus located honest markers — a focused honesty pass, not a rewrite. v0.9.2 ships D1, D2, D4, and #30; D3 (general unreachable-code removal) is maturing on the structured-CFG track and does not gate the cut.

A concrete example — a function that looked clean but returns an uninitialized value, before and after the D4 gate:

- // Confidence: 92.0% (High)
+ // Confidence: 50.0% (Low)
+ // Issues: ... damning honesty defect (uninitialized return value 'result') - confidence capped at 50%

The body has zero gotos and reads cleanly, so a surface-plausibility score rated it High — but it returns a result that is never assigned. Helix caps it to Low and names the exact located reason, re-derived from the final emitted code so the flag always points at a real defect in the output.


Build

Prerequisites

  • A C++23 toolchain (MSVC 2022 / clang-cl on Windows)
  • LLVM + MLIR 18 (an MLIR-enabled LLVM build)
  • CMake 3.20+ and Ninja
  • Rust stable (for the napi-rs bridge)
  • Node.js 18+ (only for the N-API .node build)

Build the engine + CLI

cmake -S engine -B engine/build -G Ninja \
  -DLLVM_DIR=/path/to/llvm/lib/cmake/llvm \
  -DMLIR_DIR=/path/to/llvm/lib/cmake/mlir
cmake --build engine/build --config Release

This produces engine/build/helix_tool.exe (the engine-direct CLI) and the helix_engine static library.

Build and run the tests

cmake -S engine -B engine/build-tests -G Ninja -DCMAKE_BUILD_TYPE=Release
cmake --build engine/build-tests --config Release
./engine/build-tests/test/helix_tests.exe

Check the Rust bridge

LLVM_DIR=/path/to/llvm/lib/cmake/llvm cargo check -p helix-core -p hexcore-helix

Engine-direct CLI

helix_tool runs the full pipeline on a single .ll and prints pseudo-C — the fastest way to validate a change without the IDE.

./engine/build/helix_tool.exe --use-cast-layer path/to/function.ll
  • --use-cast-layer routes through the C-AST layer (the default emission path; honesty gates and the optimizer run here).
  • Always feed a fresh .ll lifted by the current Remill — a stale dump invalidates the result.

Repository Layout

HexCore-Helix/
|- engine/                 C++23 decompiler engine, MLIR dialects, C-AST layer, helix_tool CLI, tests
|- crates/                 Rust workspace (helix-core) and the hexcore-helix napi-rs bridge
|- schemas/                FlatBuffers schemas (engine <-> IDE transport)
|- signatures/             Optional address / name databases
|- tests/                  Real decompilation fixtures and reports
|- ARCHITECTURE.md         Architectural overview
|- CHANGELOG.md            Release notes
`- roadmap.md              Near-term roadmap

Relationship to HexCore

Helix is one of HexCore's native engines. Inside the IDE the decompilation pipeline runs:

machine code  ->  Pathfinder CFG hints  ->  Remill lift  ->  LLVM IR  ->  Helix (this repo)  ->  pseudo-C

End users receive Helix as a pre-built .node — no compilation needed. This repository is where the engine itself is developed, gated, and benchmarked against a real-world corpus (Linux kernel modules, large Win64 game binaries, obfuscated malware, and CTF VMs).


Documentation


License

Apache License 2.0. See LICENSE for details.


HexCore Helix — a decompiler that tells you the truth.

About

The decompiler engine behind HexCore. It uses MLIR and LLVM IR (via Remill) for binary analysis, calling convention recovery, stack reconstruction, and clean, structured pseudo-C generation (with a focus on Win64 and Linux Kernel Pwning).

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors