Harsh Gupta Photo

Harsh Gupta

Data & AI Engineer @ Indiana University Bloomington

Building scalable data systems and applied AI products.

3+ years of experience across Data & AI engineering, data analysis and ML.
Ex-Deloitte · Ex-Samsung Research Intern

Actively seeking full-time roles in Data Science & Engineering, Analytics, Machine Learning, or AI.

Download Resume

About Me

I'm Harsh Gupta, with over 3 years of experience in AI and Data roles across diverse industries including education, research, retail, and insurance. I’ve contributed to projects at Deloitte and Samsung Research, working on everything from data engineering and analysis to computer vision and intelligent automation.

Currently, I'm pursuing a Master’s in Data Science at Indiana University Bloomington while working part-time as an Data & AI Engineer. I completed my undergraduate studies at Manipal Institute of Technology with a major in Electronics & Communication Engineering and a Minor in Data Science.

What drives me is building systems that don’t just work, but learn, adapt, and scale. I love automating repetitive tasks, designing LLM-powered assistants, and creating tools that support better decision-making. Whether it’s optimizing retail pricing, enhancing educational platforms, or accelerating developer workflows, I enjoy applying AI to create meaningful impact.
Let’s connect — I’d love to hear what you’re working on!

What I Do: The Full Stack Flow

From ingestion to intelligence : building systems end to end.

Layer 0

Ingestion & ETL

Designing reliable pipelines to ingest, clean, and unify data from multiple structured and unstructured sources.

Layer 1

Processing & Analytics

Transforming raw data into meaningful features, metrics, and insights that drive downstream modeling.

Layer 2

Modeling & Intelligence

Training machine learning models to predict, classify, and forecast real-world outcomes.

Layer 3

GenAI & Agentic Systems

Building LLM-powered applications, RAG pipelines, and agents that reason, act, and automate workflows.

Skills

Programming Foundations

Python SQL R TypeScript

LLM & AI Systems

RAG Hybrid Search (BM25 + Vector) Embeddings Cross-Encoder Reranking Context-Aware Chunking LangChain LangGraph Semantic Kernel Azure AI Search MCP Ollama Hugging Face ElevenLabs

Data Engineering & Platforms

Apache Spark PySpark ETL / ELT Pipelines Apache Airflow Azure Data Factory AWS Glue Data Modeling (Fact / Dim) Data Validation & Logging Feature Engineering MongoDB Snowflake

Machine Learning & MLOps

Model Deployment Docker CI / CD Regression Clustering Time Series Forecasting Recommender Systems Computer Vision Deep Learning TensorFlow PyTorch A/B Testing Hypothesis Testing Statistical Inference

Cloud, Storage & Analytics

AWS (S3, EC2, Redshift) Azure (Azure ML, ADF, AI Search) GCP Postgres Amazon Aurora OracleDB Hadoop Tableau Power BI

Backend & APIs

FastAPI Flask REST APIs Git / GitHub

Education

Indiana University Bloomington, USA

MS in Data Science (Aug 2024 – May 2026)

GPA: 4/4

Manipal Institute of Technology, India

B.Tech in Electronics & Communication Engineering

Minor in Data Science (Jul 2017 – May 2021)

Experience

Indiana University Bloomington
Oct 2024 – Present

Part-Time AI & Data Engineer Azure · Python · SQL · Tableau · LLM

  • Developed time-series forecasting models for dining footfall prediction, reducing average forecast error from ~2,000 to ±500 swipes/day.
  • Built a Data Ingestion and Feature Engineering pipeline using Azure Data Factory, achieving a 21% increase in prediction accuracy.
  • Boosted pipeline throughput using distributed Azure compute, achieving a residual error of 5%.
  • Enabled ~$2K–$3K daily operational cost optimization through improved food preparation and labor scheduling decisions.
Indiana University Campus Auxiliaries
Jun 2025 – Aug 2025

AI & Data Engineer Intern Azure AI Search · Azure ML · Python · FastAPI · LangChain

  • Designed and built a scalable document ingestion and indexing pipeline to onboard 12,000+ Confluence pages into Azure AI Search, automating extraction, chunking, embedding, and index updates.
  • Enhanced RAG response accuracy from 76% to 93% by implementing metadata filtering & small-to-large context expansion, improving document retrieval and user confidence.
  • Deployed a knowledge retrieval system enabling 250+ staff members to reduce document lookup time from 20–25 minutes to under 10 minutes.
Deloitte – Strategy & Analytics
Sep 2021 – Jul 2024

Data Engineer – Consultant – FSI Domain AWS Glue · PySpark · OracleDB · Postgres (Jun 2023 – Jul 2024)

  • Built production batch ETL pipelines in AWS Glue & PySpark to ingest a new XML-based insurance source system, processing 12–15M records/day into an enterprise data warehouse.
  • Designed a multi-stage ingestion architecture (Landing → Staging → Master → ODS), integrating with Informatica workflows and downstream analytics.
  • Orchestrated pipelines using AWS Glue Workflows with EventBridge scheduling and SNS alerting, reducing issue resolution time by 25–30%.
  • Developed CodETL, a rule-based code generation accelerator that cut ETL pipeline development time by 40%.

Data Scientist – Consultant – Customer Strategy & Pricing Python · R · ML · AWS S3 · Tableau (Sep 2021 – May 2023)

  • Modeled demand elasticity across 5,000+ stores using linear mixed-effects regression, generating a $1M–$3M profit increase while limiting customer churn to 1%.
  • Delivered executive-ready Tableau dashboards and scenario analysis to inform pricing roadmaps and influence leadership decisions.
  • Streamlined the Pricing-as-a-Service data pipeline, boosting efficiency by 30% and supporting client acquisition through success story presentations.
Samsung Research Institute (PRISM Team)
Jan 2021 – Jun 2021

Computer Vision Research Intern Python · NumPy · OpenCV

  • Developed a CV algorithm to enhance low-light astrophotography captured on mobile phones, significantly improving the signal-to-noise ratio using advanced image processing techniques.
  • Co-authored an IEEE research paper; received Samsung Excellence Award ($300 reward) for outstanding contributions to the PRISM internship program.

Projects & Publications

Publications

Star identification in night sky images using mobile phone camera

Certificate 3 View Paper

Cross-Geography Generalization of Machine Learning Methods for Classification of Flooded Regions in Aerial Images

Certificate 3 View Paper
AI / ML

TA-Lite

Certificate 3

A modular, instructor-aligned teaching assistant powered by LLMs.

View on GitHub View Demo

MockChain

Certificate 3

Multi-agent AI-powered mock interview platform with personalized feedback.

View on GitHub View Demo

DataLens

LLM-powered RAG search and Q&A for structured and unstructured data. Internal Deloitte Project

View Certificate

Online Sign Recognition

Certificate 3

Time-series handwriting recognition and fraud detection.

View Report

Music Emotion Recognition

Certificate 3

Classify music into emotions using ML.

View Report

Flight Price Prediction

Collected data and trained model to predict flight prices.

View on GitHub

SprintlessAI

Generates Agile user stories from a requirements document + codebase context using RAG. Outputs structured stories and supports optional upload to Jira or GitHub.

View on GitHub

Semantic Intent Router

Multi-agent routing pipeline that classifies user intent, retrieves the relevant domain via FAISS vector search, and dispatches to the correct agent — all using open-source embeddings.

Data Science, Engineering & Visualization

Retail Sales Price Optimization

Certificate 3

Used price elasticity modeling to recommend optimal pricing in retail.

View on GitHub

Retail Store Pricing Dashboard

Certificate 3

Dashboard for visualizing product and city trends in retail pricing.

View Dashboard View on GitHub

CodETL

ETL engine with rule-based code generation. Cut dev time by 40%. (Deloitte Internal Project)

Analyzing & Visualizing Google Analytics Data for HRA UI

Certificate 3

Built an interactive dashboard using Google Analytics data to visualize the user journey. It mapped how users interacted with the interface, highlighted the most used features, and tracked adoption of new functionalities. This helped identify which features were driving engagement and which needed improvement.

View Dashboard

Covid Dashboard

Certificate 3

Dashboard that scraped and visualized India's COVID-19 stats.

View on GitHub
Computer Vision

Drowsy Driver Assistant

Certificate 3

Created a computer vision–based system to detect driver drowsiness using in-car cameras. The system issued real-time alerts, and in severe cases, initiated autonomous vehicle control to safely guide the car to the shoulder and halt, while sending an emergency alert.

View Poster View Demo

Astrophotography

Certificate 3

Collaborated with Samsung R&D to develop a system that enhances night sky images captured on mobile devices. Improved signal-to-noise ratio to deliver clearer visuals and more accurate star detection.

View Paper

Estimation of Flooded Regions

Certificate 3

Cross-region flood segmentation using UAV aerial imagery.

View Paper

Virtual Mouse using Python-OpenCV

Certificate 3

Gesture-based virtual mouse using OpenCV hand tracking.

View on GitHub

Image to Text using CNN

Certificate 3

Handwritten text to digital using OpenCV and CNN prediction.

View on GitHub
Miscellaneous

Streamlit Youtube Playlist

Certificate 3

Create a series of video explaining how to use streamlit. Gathered over 210k views and 13.5k hrs of watch time

View on Youtube

Sneaker Update Bot

Created a real-time Discord bot that alerted users about sneaker drops and restocks. It supported a client’s resale business, generating $300K in sales across 1,000+ pairs.

View on GitHub

Covid Vaccine Appointment Bot

Alerts for vaccine slots by scraping using zip codes.

University Assistant Chatbot

NLP-powered FAQ chatbot for university queries. Built a custom dataset and trained multiple NLP models for intent classification and response generation. Served clients across 5+ countries.

Honors & Certifications

Certifications

Databricks Fundamentals

Certificate 1

IBM Professional Certification

Certificate 1

Deloitte AI Academy

Certificate 2

APEX Certificate

Certificate 4

Tableau

Certificate 5
Honors

Samsung Excellence Award

Certificate 3

Deloitte Outstanding Award

Certificate 3

Deloitte Applause Award - QSR Client

Certificate 2

Deloitte Applause Award - Insurance Client

Certificate 1

Deloitte Applause Award - Hackathon

Certificate 3

APEX Training - Best Presentation

Certificate 3
Feel free to reach out via email or connect on LinkedIn.