Saransh Surana

Saransh Surana

Data Science · ML · AI

About me

I'm a Data Scientist & AI Engineer with expertise in machine learning, deep learning, and large-scale data systems. My work focuses on building scalable, end-to-end ML solutions that solve real-world problems, from data preprocessing to deployment.

I enjoy working at the intersection of AI research and practical applications, turning complex data into insights and intelligent systems. I aim to contribute to cutting-edge AI innovation—LLMs, generative AI, optimization-driven ML—while driving measurable business impact.

Download Resume

Interests

AI · ML · Deep Learning · Data Engineering · Statistics

Education

M.S. Data Science — Stony Brook · B.E. ECE — Andhra University

Experience

  • Scraped and structured 50,000+ housing and social service records across multiple counties into machine-readable datasets.
  • Automated web data extraction using Playwright with asynchronous concurrent scraping, reducing collection time by ~70%
  • Designed a deduplication framework to merge duplicate organizations while preserving unique attributes, cutting redundancy by ~35%.
  • Implemented a relevance-filtering prompt system that improved classification accuracy of housing-related records to >60% precision.
  • Delivered a final cleaned dataset for NGO partners, enabling more accurate housing service mapping and supporting advocacy for individuals with serious illness.
PythonPlaywrightGemini ProPandas
  • Built Python pipelines to clean and preprocess unstructured data from web pages, PDFs, and other raw formats, version-controlled with Git for reproducibility and collaboration
PySparkNumPyPandas
  • Built scalable ETL pipelines in BigQuery and SQL on GCP to support end-to-end ML workflows for anomaly detection.
  • Enhanced early fault detection by identifying spikes and irregular patterns in manufacturing time-series data through Z-score thresholds, facilitating effective data visualization for analysis.
  • Trained unsupervised models (Isolation Forest, One-Class SVM) to detect anomalies in manufacturing sensor data achieving 78% recall and 73% precision, supporting early fault detection.
  • Explained model results to technical and non-technical teams and engaged with data science experts to learn more about the field, supporting fault resolution and alignment.
Pythonscikit-learnPandasmatplotlibGCPAnomaly Detection
  • Developed real-time demand forecasting and inventory optimization by deploying XGBoost models on GCP using FAST API and Docker, reducing forecast error by 18% across 30+ SKUs.
  • Designed end-to-end ML data pipelines on unstructured data and SQL-based ETL workflows using Spark, Hive, and Kafka, accelerating deployment time by 40% and supporting analysis of 1K+ events daily.
  • Drove 30% marketing ROI uplift by applying clustering on 10K+ customer profiles, enabling business teams to target high-value segments effectively.
  • Conducted A/B testing on promotional strategies and new product placements across multiple regions, identifying winning variants that increased sales conversion by 7%.
XGBoostFastAPIDockerBigQueryGCPA/B TestingKafkaSparkHivescikit-learnClusteringMarketing

Projects

Skills

Python
SQL
Java
R
Bash
C/C++
NoSQL
Ocaml

Open Source & Writing

Open Source Contributions

Writing & Publications

Leave a Message