About the role
## About the Role We are looking for a Senior Machine Learning Engineer specializing in Generative AI to join our AI/ML platform team. You will architect and ship production-grade ML systems — from model training and fine-tuning to real-time inference at scale. This is a high-impact, hands-on engineering role where you will own the full lifecycle of GenAI systems powering our products. ## What You Will Do - **ML Pipeline Engineering**: Design and implement end-to-end ML pipelines using PyTorch, TensorFlow, and JAX. Build robust data ingestion, feature engineering, model training, evaluation, and deployment workflows on Kubernetes (EKS/GKE) with GPU/TPU orchestration. - **LLM Systems & Fine-Tuning**: Fine-tune foundation models (LLaMA, Mistral, Gemma, GPT-class) using LoRA, QLoRA, PEFT, and full-parameter tuning. Implement RLHF/DPO alignment pipelines. Deploy models with vLLM, TGI, or TensorRT-LLM for low-latency serving. - **Agentic AI & RAG Architectures**: Build multi-agent orchestration frameworks using LangChain, LangGraph, CrewAI, or custom agent loops. Design retrieval-augmented generation (RAG) pipelines with vector databases (Pinecone, Weaviate, pgvector, FAISS) and hybrid search. - **Inference Infrastructure**: Optimize model serving for throughput and latency — quantization (GPTQ, AWQ, GGUF), speculative decoding, KV-cache optimization, continuous batching. Deploy on AWS SageMaker, Vertex AI, or self-managed GPU clusters. - **Evaluation & Observability**: Build evaluation frameworks for LLM outputs — automated metrics (BLEU, ROUGE, BERTScore), LLM-as-judge pipelines, human evaluation workflows. Implement production monitoring with guardrails, toxicity detection, and hallucination tracking. - **Data Engineering for ML**: Design data pipelines for training data curation, deduplication, and quality filtering at TB scale using Apache Spark, Ray, or Dask. Implement data versioning with DVC or LakeFS. - **Technical Leadership**: Lead cross-functional projects spanning ML, backend, and infrastructure teams. Define technical roadmaps, conduct design reviews, mentor engineers, and drive architectural decisions. ## Minimum Qualifications - B.Tech/M.Tech/MS in Computer Science, Machine Learning, or equivalent practical experience - 5+ years of hands-on ML engineering in production environments - 5+ years of software engineering with experience shipping production systems at scale - 3+ years leading ML infrastructure — model deployment (Triton, TorchServe, vLLM), evaluation pipelines, distributed training (DeepSpeed, FSDP, Megatron-LM) - Strong proficiency in Python, PyTorch, and at least one cloud platform (AWS, GCP, Azure) - Experience with containerization (Docker, Kubernetes) and CI/CD for ML (MLflow, Kubeflow, Airflow) ## Preferred Qualifications - M.Tech/PhD in ML, NLP, or Computer Vision - 3+ years in a technical lead role setting direction for ML teams - Production experience with GenAI — LLMs, multimodal models (vision-language), diffusion models, speech models - Experience with model optimization: quantization, distillation, pruning, mixture-of-experts (MoE) - Contributions to open-source ML projects or published research (NeurIPS, ICML, ACL, EMNLP) - Experience building developer-facing AI APIs and SDKs ## Tech Stack Python, PyTorch, TensorFlow, JAX, CUDA, Triton, vLLM, TensorRT-LLM, LangChain, LangGraph, FAISS, Pinecone, pgvector, PostgreSQL, Redis, Kafka, Apache Spark, Ray, Docker, Kubernetes, Terraform, AWS (SageMaker, Bedrock, EC2 P4/P5), GCP (Vertex AI, TPU), MLflow, Weights & Biases, Prometheus, Grafana