6 builds  ·  2023 / 2026

Projects
through
interests

Six builds across AI inference, compute optimization, 6G networking, distributed systems, and space. Each one starts from a real bottleneck and gets measured against a real baseline.

systems-first · production-style · benchmarked
scroll for full breakdowns
01 / 06 Compute Optimization · CUDA

Helios

A compute runtime that measures each workload at run time, then routes it to the right backend: scalar CPU, threaded SIMD, or CUDA. I ran it on real SuiteSparse matrices and SNAP graphs, and checked correctness against reference results before making any speed claim.

SpMV speedup1.52× scalar → SIMD
Effective bandwidth13.5 GB/s
Datasetsreal SuiteSparse + SNAP
C++CUDASIMDNVBenchSpMV
View on GitHub ↗
helios · bench
CategoryCompute Runtime
StackC++ · CUDA · NVBench
Year2026
02 / 06 Distributed Systems · LLM Infra

Nimbus
Mesh-X

Serve an LLM across many GPUs and the workers keep redoing the same prefill work, wasting memory. NimbusMesh-X is a shared KV-cache layer that lets nodes reuse each other's cache, so time-to-first-token holds up when memory gets tight.

Designshared KV across nodes
Routingcache-aware
Goallower TTFT under pressure
PythongRPCvLLMKV CacheDistributed
View on GitHub ↗
nimbusmesh · routing
CategoryDistributed Systems
StackPython · gRPC · vLLM
Year2025
03 / 06 AI Inference Optimization

Eigen
Kache

When long context has to fit a hard memory budget, most KV caches just drop old tokens and lose the evidence. EigenKache compresses the cold middle into attention-conditioned landmarks instead, so the model keeps long-range recall. I benchmarked it against window and H2O-style eviction under the same budget.

Recall fidelity @ 8×0.91 cos vs full
vs window / H2O0.04 / 0.60
Coverage14 tests passing
PythonPyTorchKV CacheCompressionLong Context
View on GitHub ↗
eigenkache · explainer
CategoryAI Inference
StackPython · PyTorch
Year2026
04 / 06 6G / AI-Native Networking

Q-AIRAN
SLICE

5G and 6G networks split limited spectrum across users with very different needs. I framed that allocation as a QUBO and solved it with quantum-inspired local search, against greedy and linear-programming baselines, under a shared latency model that includes radio, transport, queueing, and inference.

SolverQUBO local search
Latency modelradio + transport + inference
Baselinesgreedy, linear program
PythonQUBO6GO-RANQuantum-Inspired
View on GitHub ↗
q-airan · slice solver
Category6G Networking
StackPython · QUBO
Year2025
05 / 06 Real-Time ML · Edge AI

Neuro
StreamRT

EEG screening models get scored offline, but real deployment means streaming two-second windows at 256Hz under a sub-100ms budget. This benchmarks accuracy and per-window latency together across architectures on the OpenNeuro ds004504 dataset, 88 subjects, leave-one-subject-out.

Stream latency P500.045 ms
Stream → batch3.0× speedup
Deadline budget< 100 ms / window
PyTorchONNXReal-TimeEdge AIBenchmark
View on GitHub ↗
neurostream · latency harness
CategoryReal-Time ML
StackPyTorch · ONNX
Year2026
06 / 06 Networking · Space Systems

Space
LinkBench

TCP assumes a link that behaves consistently. Space links don't. This models a non-ergodic channel with Gilbert-Elliott bursts, runs TCP over it, and detects when the link's behavior shifts so the transport can react instead of stalling.

Run-to-run spread3.1× (p5 0.15 → p95 0.46 Mbps)
Verdictstrongly non-ergodic
Coverage33 tests passing
PythonTCPNetworkingSimulationBenchmark
View on GitHub ↗
spacelink · tcp sim
CategorySpace Systems
StackPython · TCP
Year2026
// what drives the work
01
AI Inference Optimization
KV cache constraints, attention approximation, speculative decoding. The gap between research throughput and production latency is the problem.
02
6G and AI-Native Networks
Intelligent RAN slicing, URLLC, O-RAN disaggregation. Networks that adapt to inference workloads, not the other way.
03
Distributed Systems
Disaggregated architectures, fault-tolerant pipelines, latency under contention. Systems that degrade gracefully, not catastrophically.
04
Compute Optimization
Memory bandwidth vs compute tradeoffs, CUDA kernel design, operator fusion. Every cycle is a constraint worth reasoning about.
05
Robotics and Edge AI
Real-time perception, sensor fusion, ROS2 pipelines. AI that works when the network doesn't exist and latency is physical.
06
Quantum Computing
QUBO, QAOA, quantum-inspired optimization. Already in Q-AIRAN-SLICE; next step is a real circuit implementation.
GitHub Activity · aryanputta view profile ↗
GitHub contributions