Saransh Surana

Saransh Surana

Data Science - ML - AI

Saransh • Data Science • AI

I ship AI systems that actually get used.

From messy data to deployed models – I care less about leaderboard scores and more about shipped systems that move metrics for real teams.

Python • LLMs & RAG • MLOps • Experimentation • Evaluation

Download resume

What I like working on

  • Turning vague ideas into concrete experiments.
  • Building RAG + LLM systems that don't hallucinate.
  • Making dashboards and pipelines boringly reliable.

Where I've been

Ford (Data Science), Stony Brook (MS Data Science), and a handful of startups and research teams where I shipped models instead of just slide decks.

Experience

  • Built RAG-based structured extraction with JSON Schema guardrails, lifting parse success from 78% to 96% and cutting hallucinations by 25%.
  • Ran 50+ MLflow experiments across models, prompts, chunking, and retrieval to systematically tune extraction quality.
  • Shipped a FastAPI + Celery + Redis + PostgreSQL pipeline that processes 1.2–2k records/hour with under 2% timeouts and 35% fewer repeat LLM calls, plus a React + TypeScript directory UI with search and filters.
PythonRAGGemini FlashMLflowFastAPICeleryRedisPostgreSQLReact.jsTypeScript
  • Built Python pipelines to clean and preprocess unstructured data from web pages, PDFs, and other raw formats, version-controlled with Git for reproducibility and collaboration
PySparkNumPyPandas
  • Built scalable ETL pipelines in BigQuery and GCP to support end-to-end ML workflows for anomaly detection on manufacturing sensor streams.
  • Trained Isolation Forest and One-Class SVM models that achieved 78% recall and 73% precision for early fault detection.
  • Turned anomaly insights into fixes by explaining model behavior and findings to both engineering and leadership teams.
Pythonscikit-learnPandasmatplotlibGCPAnomaly Detection
  • Deployed XGBoost demand-forecasting models on GCP with FastAPI and Docker, reducing forecast error by 18% across 30+ SKUs.
  • Designed Spark / Hive / Kafka pipelines and SQL-based ETL that cut deployment time by 40% and supported 1K+ events per day.
  • Lifted marketing ROI by 30% through clustering 10K+ customers into high-value segments and validating promotions with A/B tests (+7% conversion).
XGBoostFastAPIDockerBigQueryGCPA/B TestingKafkaSparkHivescikit-learnClusteringMarketing

Projects

Skills

Python
SQL
Java
R
Bash
C/C++
NoSQL
Ocaml

Writing

Leave a Message