6 builds · 2023 / 2026

Projects
through
interests

Six builds across AI inference, compute optimization, 6G networking, distributed systems, and space. Each one starts from a real bottleneck and gets measured against a real baseline.

systems-first · production-style · benchmarked

scroll for full breakdowns

01 / 06 Compute Optimization · CUDA

Helios

A compute runtime that measures each workload at run time, then routes it to the right backend: scalar CPU, threaded SIMD, or CUDA. I ran it on real SuiteSparse matrices and SNAP graphs, and checked correctness against reference results before making any speed claim.

SpMV speedup1.52× scalar → SIMD

Effective bandwidth13.5 GB/s

Datasetsreal SuiteSparse + SNAP

C++CUDASIMDNVBenchSpMV

View on GitHub ↗

helios · bench

CategoryCompute Runtime

StackC++ · CUDA · NVBench

Year2026

02 / 06 Distributed Systems · LLM Infra

Nimbus
Mesh-X

Serve an LLM across many GPUs and the workers keep redoing the same prefill work, wasting memory. NimbusMesh-X is a shared KV-cache layer that lets nodes reuse each other's cache, so time-to-first-token holds up when memory gets tight.

Designshared KV across nodes

Routingcache-aware

Goallower TTFT under pressure

PythongRPCvLLMKV CacheDistributed

View on GitHub ↗

nimbusmesh · routing

CategoryDistributed Systems

StackPython · gRPC · vLLM

Year2025

03 / 06 AI Inference Optimization

Eigen
Kache

When long context has to fit a hard memory budget, most KV caches just drop old tokens and lose the evidence. EigenKache compresses the cold middle into attention-conditioned landmarks instead, so the model keeps long-range recall. I benchmarked it against window and H2O-style eviction under the same budget.

Recall fidelity @ 8×0.91 cos vs full

vs window / H2O0.04 / 0.60

Coverage14 tests passing

PythonPyTorchKV CacheCompressionLong Context

View on GitHub ↗

eigenkache · explainer

CategoryAI Inference

StackPython · PyTorch

Year2026

04 / 06 6G / AI-Native Networking

Q-AIRAN
SLICE

5G and 6G networks split limited spectrum across users with very different needs. I framed that allocation as a QUBO and solved it with quantum-inspired local search, against greedy and linear-programming baselines, under a shared latency model that includes radio, transport, queueing, and inference.

SolverQUBO local search

Latency modelradio + transport + inference

Baselinesgreedy, linear program

PythonQUBO6GO-RANQuantum-Inspired

View on GitHub ↗

q-airan · slice solver

Category6G Networking

StackPython · QUBO

Year2025

05 / 06 Real-Time ML · Edge AI

Neuro
StreamRT

EEG screening models get scored offline, but real deployment means streaming two-second windows at 256Hz under a sub-100ms budget. This benchmarks accuracy and per-window latency together across architectures on the OpenNeuro ds004504 dataset, 88 subjects, leave-one-subject-out.

Stream latency P500.045 ms

Stream → batch3.0× speedup

Deadline budget< 100 ms / window

PyTorchONNXReal-TimeEdge AIBenchmark

View on GitHub ↗

neurostream · latency harness

CategoryReal-Time ML

StackPyTorch · ONNX

Year2026

06 / 06 Networking · Space Systems

Space
LinkBench

TCP assumes a link that behaves consistently. Space links don't. This models a non-ergodic channel with Gilbert-Elliott bursts, runs TCP over it, and detects when the link's behavior shifts so the transport can react instead of stalling.

Run-to-run spread3.1× (p5 0.15 → p95 0.46 Mbps)

Verdictstrongly non-ergodic

Coverage33 tests passing

PythonTCPNetworkingSimulationBenchmark

View on GitHub ↗

spacelink · tcp sim

CategorySpace Systems

StackPython · TCP

Year2026

// what drives the work

AI Inference Optimization

KV cache constraints, attention approximation, speculative decoding. The gap between research throughput and production latency is the problem.

6G and AI-Native Networks

Intelligent RAN slicing, URLLC, O-RAN disaggregation. Networks that adapt to inference workloads, not the other way.

Distributed Systems

Disaggregated architectures, fault-tolerant pipelines, latency under contention. Systems that degrade gracefully, not catastrophically.

Compute Optimization

Memory bandwidth vs compute tradeoffs, CUDA kernel design, operator fusion. Every cycle is a constraint worth reasoning about.

Robotics and Edge AI

Real-time perception, sensor fusion, ROS2 pipelines. AI that works when the network doesn't exist and latency is physical.

Quantum Computing

QUBO, QAOA, quantum-inspired optimization. Already in Q-AIRAN-SLICE; next step is a real circuit implementation.

GitHub Activity · aryanputta view profile ↗

Projects through interests

Helios

NimbusMesh-X

EigenKache

Q-AIRANSLICE

NeuroStreamRT

SpaceLinkBench

Projects
through
interests

Nimbus
Mesh-X

Eigen
Kache

Q-AIRAN
SLICE

Neuro
StreamRT

Space
LinkBench