## 🚀 About Me
I build production-grade AI systems for complex regulated environments — where correctness, reliability, security, and explainability matter more than benchmarks.
My work sits at the intersection of:
- 📄 Document Intelligence (OCR + layout + page understanding + field extraction)
- 🧠 Multimodal AI (LLMs / SLMs / VLMs) for real healthcare workflows
- ⚙️ AI Platform Engineering (on-prem, cloud, gov cloud; microservices; CI/CD)
- ⚡ GPU + inference systems (CUDA, NVIDIA drivers, custom builds, profiling)
I’m passionate about Responsible AI in healthcare — making care faster and easier while staying audit-friendly, verifiable, and compliant.
A production platform that processes large medical claim PDFs end-to-end:
- OCR + document splitting
- Page classification (DistilBERT + rules) to categorize page types
- Layout detection (YOLO) to locate page regions (titles, summary boxes, checkboxes, etc.)
- Region cropping to improve OCR/VLM accuracy and reduce compute
- Field extraction (beneficiary details, NPI/MBI, HCPCS/CPT, ICD codes, claim lines, etc.)
- Decision support system using RAG, answering nurse review questions with evidence + traceability
- Human-in-the-loop UI: nurses can verify exactly where an answer came from
- Built a full end-to-end service operations module (frontend + backend) rapidly using AI-assisted development (Cursor/Claude)
- Implemented an OCR service using Azure Document Intelligence with custom-trained models for prior auth cover sheets
- Deployed services using Azure Web Apps
✅ Production Document Intelligence
- OCR pipelines, layout detection, page classification, field extraction
- Handling messy documents: handwriting, checkboxes, signatures, diagrams
✅ LLM/SLM/VLM Systems
- Model selection & benchmarking (LLaMA, Mistral, Phi-3, BioBERT, BioMedBERT, Bio-Mistral)
- RAG with citations + evidence grounding
- Guardrails using LangChain (safe inputs/outputs for regulated environments)
- Fine-tuning + CPU-first optimization via 4-bit quantization (C++/BitNet-style)
✅ GPU + Inference Engineering
- Bare-metal GPU bring-up (drivers, CUDA toolkit, NVCC, kernels)
- H100 / L40s / A40 / RTX 4090 / RTX 5090 / DGX environments
- Inference runtimes: PyTorch, ONNX Runtime, llama.cpp
- Custom builds aligned to CUDA versions + GPU compute capability
✅ Platform Engineering & Deployment
- Microservices architecture
- On-prem deployment on RHEL-based systems
- RPM packaging + systemd services for “install → services up”
- Containers + Kubernetes + ACR
- CI/CD (Azure DevOps) for RPMs + artifacts + deployments
- Experience across on-prem + cloud + gov cloud
✅ Reliability, Observability, Security
- Performance monitoring stack: Telegraf + InfluxDB + Chronograf
- System/service/GPU metrics and bottleneck analysis
- Production tuning: Uvicorn workers, CPU/I/O concurrency, DB pooling
- Security: HTTPS everywhere, cert-based auth, mTLS between services + to DB
✅ Leadership
- Led 15 interns (Label Studio + YOLO layout detection pipeline)
- Mentored 3 associates (install/test/validation, architecture onboarding)
- Acted as product-owner proxy / client-facing technical lead when needed
- Medium: https://medium.com/@abhishekgoud1212
- Portfolio: https://abhi0323.github.io/Abhishek-Chandragiri-Portfolio/
If you're building:
- AI platforms in regulated environments
- Document intelligence at scale
- Evidence-grounded RAG systems
- GPU-backed inference services
Let’s connect: https://www.linkedin.com/in/abhishek-chandragiri/

