Generative AI Engineer with 2+ years of hands-on experience designing, developing, and deploying production-grade GenAI solutions using Large Language Models (LLMs) and Retrieval Augmented Generation (RAG) pipelines. Expert in building scalable AI-powered backend services with strong expertise in Python, FastAPI, Google Cloud Platform (Vertex AI), vector databases, and agentic workflows. Proven track record of delivering enterprise-grade AI applications with focus on performance optimization, observability, and security.
- LLM Integration: Claude, Gemini, OpenAI API
- RAG Pipelines: Embedding models, document chunking, indexing, retrieval optimization
- AI Frameworks: LangChain, LlamaIndex, Hugging Face Transformers , Crewai , Agno
- Agentic Systems: Multi-agent workflows, tool integration, autonomous reasoning
- Prompt Engineering: Chain-of-thought, few-shot learning, output optimization
- Backend Development: Python, FastAPI, Django REST Framework, Flask
- APIs & Integration: REST APIs, Webhooks, OAuth2, JWT, Microservices
- Cloud Platforms: Google Cloud (Vertex AI, Cloud Run, Cloud Storage), AWS
- Databases: PostgreSQL, MySQL, MongoDB , CasandraDB , Redis
- Vector Databases: FAISS, Chroma, Pinecone, OpenSearch
- Observability: Logging, monitoring, evaluation metrics (RAGAS)
- Performance: Latency optimization, caching, batch processing, cost reduction
- Security: Enterprise security, data governance, HIPAA compliance, IAM
- Deployment: Docker, Kubernetes, CI/CD, Cloud Run, serverless
Nuvae.ai | Aug 2025 β Feb 2026 | Remote
- Designed and deployed production-grade GenAI solutions using GPT-4, Claude, and Gemini for healthcare automation workflows
- Built end-to-end RAG pipelines with advanced embedding models, document chunking strategies, and optimized retrieval techniques
- Deployed scalable applications on GCP using Vertex AI, Cloud Run, and Cloud Storage with 50,000+ daily requests at sub-200ms latency
- Integrated vector databases (FAISS, Chroma) with hybrid retrieval, improving search accuracy by 35% and query performance by 50%
- Implemented LangChain/LlamaIndex frameworks for agent-based workflows, tool integration, and multi-step reasoning systems
- Established ML evaluation metrics (RAGAS) for retrieval accuracy, response quality, and hallucination detection
- Applied prompt engineering techniques including chain-of-thought reasoning and systematic output optimization
- Optimized cloud costs by 40% through efficient model selection, caching strategies, and batch processing
- Implemented enterprise security and data governance with IAM policies, encryption, and privacy-compliant data handling
AOT Technologies | Feb 2024 β Aug 2025 | Thiruvananthapuram, India
- Developed RAG pipelines for document intelligence platforms with embedding models, chunking algorithms, and semantic retrieval
- Built LLM-based applications using OpenAI and Gemini APIs serving 10,000+ daily users in production
- Integrated vector databases (FAISS, Chroma) with efficient indexing and retrieval for large-scale document collections
- Created scalable GenAI APIs using FastAPI with authentication, rate limiting, and comprehensive error handling
- Implemented prompt engineering strategies reducing hallucinations by 40% and improving response consistency
- Applied performance optimization including model quantization, caching, and async processing
- Established ML observability with logging, monitoring, and alerting ensuring high availability
- Worked with GCP services including Cloud Storage, Cloud Functions, and serverless processing
Tech: Python, LangChain, Vertex AI, Cloud Run, FAISS, FastAPI
- Architected comprehensive RAG pipeline on GCP with Vertex AI for LLM serving and Cloud Storage for document management
- Implemented advanced chunking strategies (recursive, semantic) and embedding models (text-embedding-ada-002, GCP embeddings)
- Built vector database integration with FAISS and Chroma using hybrid retrieval (semantic + keyword matching)
- Developed scalable API deployment on Cloud Run with auto-scaling, load balancing, and IAM-based security
- Applied prompt engineering with context optimization and evaluation using RAGAS metrics
- Integrated LangChain for agent orchestration, tool calling, and multi-step reasoning with memory management
- Achieved 45% cost reduction through batch processing, caching strategies, and optimized model selection
Tech: LangChain, Vertex AI, Cloud Functions, Pinecone, FastAPI
- Developed GenAI application using Vertex AI for model deployment and Cloud Functions for event-driven processing
- Built RAG system with Pinecone vector database and advanced retrieval techniques for medical data
- Created LangChain-based agents with tool integration enabling autonomous decision-making workflows
- Implemented performance optimization achieving sub-500ms response times
- Established HIPAA-compliant data governance ensuring privacy and secure handling of healthcare information
Tech: Python, FastAPI, LLMs, REST APIs
- Developed backend service orchestrating multi-step eligibility and pre-auth workflows using AI-assisted decision logic
- Integrated multiple LLM providers with proper authentication, rate limiting, and fallback strategies
- Focused on correctness, validation, and observability in production healthcare environments
Amity University | 2024 β 2027
B.Sc. β Data Analysis
Central Polytechnic College | 2021 β 2024
Diploma β Computer Engineering
- LangGraph for Agentic Workflows β Advanced AI agent development with state management
- Practical Multi-Agent Systems (CrewAI) β Building collaborative AI workflows
- Develop GenAI Apps with Gemini β Google Cloud Vertex AI and Gemini integration
- Google Cloud Platform β Vertex AI, Cloud Run, Cloud Storage, IAM
- Data Science with Python β ML evaluation and performance optimization
Building GenAI solutions? Let's collaborate on RAG pipelines, agentic workflows, or LLM applications.


