Infrastructure & AI Systems

SAHIL KHUNDIYA

Engineering distributed systems, intelligent retrieval pipelines, and production-grade AI architectures.Optimizing high-throughput backends and scalable AI infrastructure.

Scroll to explore

Observability

System Impact Metrics

Live Monitoring Active

Region: Asia-South1

Throughput Optimization

P99 Latency: ValidatedProduction Mode

Query Latency Reduction

P99 Latency: ValidatedProduction Mode

Algorithmic Mastery

P99 Latency: ValidatedProduction Mode

Production Systems

P99 Latency: ValidatedProduction Mode

Kafka Event Velocity

0k/s

P99 Latency: ValidatedProduction Mode

RAG Retrieval Accuracy

P99 Latency: ValidatedProduction Mode

Trajectory

Engineering Evolution

The progression from building interfaces to architecting high-scale intelligent infrastructure.

Monolithic Origins

Foundational Backend Engineering & REST APIs.

JavaSpringPostgreSQL

Architecture Complexity

Validated Phase 01

Distributed Transition

Scalable Microservices & System Integration.

MicroservicesDockerHibernate

Architecture Complexity

Validated Phase 02

Event-Driven Velocity

High-Throughput Asynchronous Pipelines.

KafkaJMSEvent Streams

Architecture Complexity

Validated Phase 03

Intelligent Retrieval

Production RAG & Vector Infrastructure.

Vector DBsSemantic Search

Architecture Complexity

Validated Phase 04

Knowledge Orchestration

Graph-Augmented Intelligent Systems.

Knowledge GraphsNeo4jLLM Routing

Architecture Complexity

Validated Phase 05

Elite AI Infrastructure

Architecting Scalable Hybrid AI Platforms.

Distributed AISystem Reliability

Architecture Complexity

Validated Phase 06

Deep Dive

Hybrid AI Infrastructure

Engineering a high-fidelity retrieval system for production AI agents.

Case Study: Retrieval 2.0System: Distributed RAG

Optimizing Context Fusion

To move beyond simple RAG, we built a **Tri-Search Engine** that executes parallel queries across a **Knowledge Graph**, **Vector Index**, and **Full Text Search**. The key engineering challenge was "Context Fusion" — merging these disparate data sources into a ranked set of documents that minimize LLM hallucination while maximizing recall.

Routing Engine

Custom intent classification layer using lightweight BERT models to route queries between exact search and semantic retrieval.

Context Fusion

RRF (Reciprocal Rank Fusion) algorithm implementation to normalize scores from vector and keyword search pipelines.

Status: Deployed to Production

Engineering Logic

Why PostgreSQL FTS?

Lower infrastructure overhead. By leveraging PG's native inverted indices alongside our vector store, we reduced network hop latency by 15ms.

Dual Retrieval Pipeline

Vector search captures "concepts", while FTS captures "specific identifiers". Combining them solved the 20% recall gap we faced with pure vector search.

Outcomes

40%

Precision Increase

-20%

Hallucination Rate

~85ms

P95 Retrieval Latency

Stack Overview

PostgreSQL

Vector

ChromaDB

Stream

Kafka

Lang

Python/Java

Evidence

Production Artifacts

Real-world system designs and performance optimizations.

Distributed Event Topology

Production-grade Kafka cluster architecture designed for 100k+ events/sec.

Query Plan Optimization

PostgreSQL P99 latency reduction through advanced indexing and query plan analysis.

Infrastructure

Live System Simulation

The lifecycle of an intelligent query through a distributed retrieval architecture.

User Query

LLM Router

Vector Index

Knowledge Graph

Context Fusion

Production LLM

System Status

Kafka StreamsACTIVE

RAG PipelineACTIVE

LLM RouterIDLE

Philosophy

Beyond the CRUD API.

My approach to engineering is centered on **reliability at scale**. I believe that any system that doesn't account for network failures, database lock contention, and traffic spikes isn't production-ready.

I prioritize **Clean Architecture** and **SOLID principles** not as dogmas, but as practical tools to reduce the cost of change in complex distributed systems.

"Simple is hard, but scalable is impossible without simplicity."

Fault Isolation

Implementing bulkhead patterns and circuit breakers to ensure service failures stay contained.

Event-Driven Systems

Leveraging asynchronous message brokers for decoupled architectures and reliable state propagation.

Observability-First

Designing systems with distributed tracing and structured logging for P99 latency monitoring.

Distributed State

Managing data consistency across microservices using Sagas and Outbox patterns.

System Reliability Whiteboard

Architecting for Resilience

Backoff

Retry Strategy

Timeout

Fault Isolation

Bulkhead

Resource Segregation

Circuit

Stateful Resilience

Arsenal

Technology Stack

Java

Language

Spring Boot

Framework

Kafka

Streaming

PostgreSQL

Database

Microservices

Architecture

RAG Systems

Knowledge Graphs

Data Structure

Vector DB

AI Infrastructure

Docker

DevOps

Hibernate

ORM

Trajectory

Engineering Experience

Amantya Technologies

Trainee Engineer – Software Development

2023 - Present

Developing scalable backend microservices and intelligent systems.

Engineered scalable backend microservices using Spring Boot and Hibernate/JPA.
Implemented high-throughput async processing with Kafka and JMS.
Optimized PostgreSQL queries for production-grade performance.
Contributed to HLD/LLD for distributed system modules.

Samsung SDS

Developer Intern

2022

Contributed to enterprise backend modules and system integration.

Developed core backend modules using Java/Spring framework.
Optimized complex SQL queries for data-intensive operations.
Assisted in integrating various system modules into the main pipeline.

Coding Ninjas

Teaching Assistant

2021

Mentored students in Data Structures and Algorithms.

Solved over 1100+ DSA problems across various platforms.
Assisted students in debugging complex algorithmic problems.
Conducted doubt-clearing sessions for 500+ students.

Roadmap

Currently Exploring

The future of my engineering focus, moving towards deep infrastructure and autonomous agentic systems.

In Progress

Distributed AI Systems

Scaling LLM inference and training across distributed clusters.

Researching

Graph Retrieval (GRAG)

Combining graph traversal with vector search for deep semantic context.

Core Focus

Reliability Engineering

Advanced observability and chaos engineering in backend systems.

Planning

Event-Driven AI

Real-time agentic workflows triggered by streaming data events.

Consistency

Engineering Pulse

2000+ Commits in 2023

Less

Backend Engineering Focus

Recruiter Command Console

Engineering OS v2.4

Secure

Active

SahilOS v2.4.1 (Stable). Type "help" to list available protocols.

Last Login: 5/7/2026Infrastructure status: Optimal