Skip to content
View ServerCrash358's full-sized avatar

Highlights

  • Pro

Organizations

@devsper-com

Block or report ServerCrash358

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ServerCrash358/README.md

 


About Me

name: Shubhang S
role: Backend & DevOps Engineer
focus:
  - Production ML / LLM infrastructure (RAG, multi-agent systems)
  - Cloud-native infrastructure, GitOps & DevOps automation
  - Distributed systems & backend scalability
  - Observability & self-healing infra
developing:
  - advanced ML
  - DevOps
  - backend
  - cloud

I build production-grade ML & backend systems — from RAG and multi-agent LLM infrastructure down to the cloud-native platforms and GitOps pipelines that keep them running reliably.

  • Designing production ML systems — retrieval pipelines, rerankers, and reliability/safety layers for agentic LLM workloads
  • Shipping Kubernetes, GitOps, and IaC workflows with full observability (Prometheus + Grafana)
  • Strong focus on performance, reliability, and self-healing infrastructure
  • Hands-on with scalable backend architectures and automated, verifiable infra pipelines

GitHub Stats


Tech Stack

Languages

Backend & Data


Frontend

DevOps & Cloud



AI / ML


MLOps & Observability


Systems · Hardware · Blockchain


Featured Projects

A transactional safety layer for multi-agent LLM systems.

  • Deterministic replay & prefix rollback
  • Cryptographic provenance for agent actions
  • Verifier-gated consensus before commit

Python Multi-Agent LLM Reliability

Schedules container workloads across compute providers to minimize cost.

  • Blockchain escrow for trustless provider settlement
  • Priority-based workload scheduling
  • Real-time pricing oracles for cost-aware placement

Go Blockchain Scheduling Cloud Cost

Answers questions grounded in your own documents, built to run in prod.

  • Async FastAPI + PostgreSQL/pgvector (HNSW ANN) + Redis cache
  • Two-stage retrieval: vector search then cross-encoder rerank
  • Containerised, deployed to Kubernetes via GitOps, fully observable
  • Automated eval pipeline (MLflow + Prefect) & Terraform on AWS

FastAPI pgvector RAG Kubernetes Terraform

Simulates, verifies, and certifies infra fixes before they hit prod.

  • Detects CrashLoopBackOff and system anomalies
  • Verifies remediation actions before execution
  • Improves debugging workflows in containerized systems

Kubernetes Automation DevOps

Real-time network threat hunting combining vision + retrieval.

  • Vision Transformers (ViT) for anomaly detection
  • RAG pipeline powered by a Llama-based LLM
  • Scalable, security-focused ML inference

Python ViT RAG Security

Fake Reddit Sentiment Analyzer

6-class transformer-based NLP classifier.

  • Transformer architecture for multi-class sentiment
  • End-to-end training & evaluation workflow

PyTorch Transformers NLP


Current Focus

Area What I'm Exploring
Production ML / LLM Infra RAG systems, rerankers, multi-agent LLM reliability & safety
Advanced ML Transformers, ViT, retrieval & evaluation pipelines
DevOps & GitOps Autonomous remediation, Kubernetes automation at scale
Cloud-Native Backend Async APIs, distributed systems, AWS + Terraform IaC
Observability Metrics, tracing & self-healing infrastructure

Connect With Me

"Build systems that heal themselves."

Pinned Loading

  1. Mnemosyne Mnemosyne Public

    A transactional safety layer for multi-agent LLM systems - deterministic replay, cryptographic provenance, verifier - gated consensus, and prefix rollback.

    Python 4

  2. Lumina-RAG Lumina-RAG Public

    A production RAG API that answers questions grounded in your own documents, built the way it would run in production: async FastAPI, PostgreSQL + pgvector (HNSW ANN), Redis cache, two-stage retriev…

    Python

  3. CertOps CertOps Public

    CertOps is an experimental autonomous DevOps system that simulates, verifies, and certifies infrastructure remediation actions before executing them in production.

    Python 2

  4. NimbusX NimbusX Public

    A decentralized cloud cost optimization protocol that schedules container workloads across compute providers using blockchain escrow, priority-based scheduling, and real-time pricing oracles.

    Go 1

  5. ThreatFind ThreatFind Public

    ThreatFind is a real-time autonomous network threat hunting system that combines Vision Transformers (ViT) for anomaly detection with a Retrieval-Augmented Generation (RAG) pipeline powered by Llam…

    Python 2

  6. optimal-racing-line optimal-racing-line Public

    Python 1