Skip to content
View aryanputta's full-sized avatar

Highlights

  • Pro

Organizations

@agentrust-io

Block or report aryanputta

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
aryanputta/README.md


// work

transformer-servex
Production KV cache optimization · MoE routing · IO-aware attention for long-context LLMs

cuda-netopt
ML-driven TCP/UDP packet scheduling · DQN network routing · CUDA queue scoring

AeroMimic
Behavior cloning from expert pilots · real-time MAV autonomy · onboard inference stack

aerosurrogate-control-stack
CFD surrogate modeling · constrained optimization · robustness replacing FEM solvers



// research

satellite telemetry anomaly detection
100K telemetry readings · 5 NASA/ESA fault modes · recurrence-plot CV · 0.91 F1 on Kepler-class wheel oscillation
PDF · repo
bell labs ml impact analysis
71-paper corpus · semantic clustering · co-authorship networks · Gradient Boosting AUC 0.674 · SHAP attribution
PDF · repo



// open source

17 merged pull requests across NVIDIA · HuggingFace · IBM · LinkedIn · Microsoft · others — inference, CUDA, and ML-systems internals


NVIDIA/cuda-python#2087
FIPS-safe hashes for program cache keys
NVIDIA/cuda-quantum#4688
nvqpp: discriminate measured-register bool iteration
huggingface/accelerate#4054
Aggregate profiler memory example
Dao-AILab/flash-attention#2622
weights_only=True across all torch.load sites
ai-dynamo/dynamo#10281
HTTP 415 for unsupported image formats
linkedin/Liger-Kernel#1157
Guard save_for_backward on grad_bias in fused linear CE



// stack



// metrics


Pinned Loading

  1. KVCacheX KVCacheX Public

    Memory-aware LLM inference optimizer for KV cache compression, eviction, and scheduling.

    Python

  2. adaptive-compute-runtime adaptive-compute-runtime Public

    Adaptive C++/CUDA runtime that profiles workloads at submission time and dynamically routes to CPU, GPU, or batched execution based on arithmetic intensity and transfer cost

    C++

  3. Helios Helios Public

    Hardware-aware compute runtime in C++ and CUDA for real sparse, dense, and graph workloads.

    C++

  4. IBM/aiu-trace-analyzer IBM/aiu-trace-analyzer Public

    A tool to post-process json trace files for IBM-AIU performance analysis. It enhances the traces with additional statistics extracted from the trace data itself and (optionally) by combining it wit…

    Python 13 7