Saransh Surana

Data Science - ML - AI

Saransh • Data Science • AI

I ship AI systems that actually get used.

From messy data to deployed models – I care less about leaderboard scores and more about shipped systems that move metrics for real teams.

Python • LLMs & RAG • MLOps • Experimentation • Evaluation

Ford (Data Science), Stony Brook (MS Data Science), and a handful of startups and research teams where I shipped models instead of just slide decks.

Built RAG-based structured extraction with JSON Schema guardrails, lifting parse success from 78% to 96% and cutting hallucinations by 25%.
Ran 50+ MLflow experiments across models, prompts, chunking, and retrieval to systematically tune extraction quality.
Shipped a FastAPI + Celery + Redis + PostgreSQL pipeline that processes 1.2-2k records/hour with under 2% timeouts and 35% fewer repeat LLM calls.
Stored model outputs in PostgreSQL and performed slice analysis by county and service tag to identify error patterns, prioritize fixes, and support downstream product and operational decisions.
Built production dbt models with incremental loads, snapshots, Jinja macros, tests, and documentation to monitor schema compliance and data quality across large datasets.
Partnered with business stakeholders to ship a React and TypeScript review interface for sampling and triage, and communicated quality trends and model behavior clearly to support feature planning and decision-making.

PythonRAGGemini FlashMLflowFastAPICeleryRedisPostgreSQLReact.jsTypeScript

Built Python pipelines to clean and preprocess unstructured data from web pages, PDFs, and other raw formats, version-controlled with Git for reproducibility and collaboration

PySparkNumPyPandas

Built scalable ETL pipelines in BigQuery and GCP to support end-to-end ML workflows for anomaly detection on manufacturing sensor streams.
Trained Isolation Forest and One-Class SVM models that achieved 78% recall and 73% precision for early fault detection.
Turned anomaly insights into fixes by explaining model behavior and findings to both engineering and leadership teams.

Pythonscikit-learnPandasmatplotlibGCPAnomaly Detection

Deployed XGBoost demand-forecasting models on GCP with FastAPI and Docker, reducing forecast error by 18% across 30+ SKUs.
Designed Spark / Hive / Kafka pipelines and SQL-based ETL that cut deployment time by 40% and supported 1K+ events per day.
Lifted marketing ROI by 30% through clustering 10K+ customers into high-value segments and validating promotions with A/B tests (+7% conversion).

XGBoostFastAPIDockerBigQueryGCPA/B TestingKafkaSparkHivescikit-learnClusteringMarketing