Helios
A compute runtime that measures each workload at run time, then routes it to the right backend: scalar CPU, threaded SIMD, or CUDA. I ran it on real SuiteSparse matrices and SNAP graphs, and checked correctness against reference results before making any speed claim.