Harsh Gupta
Data & AI Engineer @ Indiana University Bloomington
Building scalable data systems and applied AI products.
3+ years of experience across Data & AI engineering, data analysis and ML.
Ex-Deloitte · Ex-Samsung Research Intern
Actively seeking full-time roles in Data Science & Engineering, Analytics, Machine Learning, or AI.
Download ResumeAbout Me
I'm Harsh Gupta, with over 3 years of experience in AI and Data roles across
diverse industries including education, research, retail, and
insurance. I’ve contributed to projects at Deloitte and Samsung
Research, working on everything from data engineering and analysis
to computer vision and intelligent automation.
Currently, I'm pursuing a Master’s in Data Science at Indiana University
Bloomington while working part-time as an Data & AI Engineer. I completed my undergraduate
studies at Manipal Institute of Technology with a major in Electronics & Communication
Engineering and a Minor in Data Science.
What drives me is building systems that don’t just work, but learn, adapt, and scale. I love
automating repetitive tasks, designing LLM-powered assistants, and
creating tools that support better decision-making. Whether it’s optimizing retail
pricing, enhancing educational platforms, or accelerating developer
workflows, I enjoy applying AI to create meaningful impact.
Let’s connect — I’d love to hear what you’re working on!
What I Do: The Full Stack Flow
From ingestion to intelligence : building systems end to end.
Ingestion & ETL
Designing reliable pipelines to ingest, clean, and unify data from multiple structured and unstructured sources.
Processing & Analytics
Transforming raw data into meaningful features, metrics, and insights that drive downstream modeling.
Modeling & Intelligence
Training machine learning models to predict, classify, and forecast real-world outcomes.
GenAI & Agentic Systems
Building LLM-powered applications, RAG pipelines, and agents that reason, act, and automate workflows.
Skills
Programming Foundations
LLM & AI Systems
Data Engineering & Platforms
Machine Learning & MLOps
Cloud, Storage & Analytics
Backend & APIs
Education
Indiana University Bloomington, USA
MS in Data Science (Aug 2024 – May 2026)
GPA: 4/4
Manipal Institute of Technology, India
B.Tech in Electronics & Communication Engineering
Minor in Data Science (Jul 2017 – May 2021)
Experience
Indiana University Bloomington
Part-Time AI & Data Engineer Azure · Python · SQL · Tableau · LLM
- Developed time-series forecasting models for dining footfall prediction, reducing average forecast error from ~2,000 to ±500 swipes/day.
- Built a Data Ingestion and Feature Engineering pipeline using Azure Data Factory, achieving a 21% increase in prediction accuracy.
- Boosted pipeline throughput 6× using distributed Azure compute, achieving a residual error of 5%.
- Enabled ~$2K–$3K daily operational cost optimization through improved food preparation and labor scheduling decisions.
Indiana University Campus Auxiliaries
AI & Data Engineer Intern Azure AI Search · Azure ML · Python · FastAPI · LangChain
- Designed and built a scalable document ingestion and indexing pipeline to onboard 12,000+ Confluence pages into Azure AI Search, automating extraction, chunking, embedding, and index updates.
- Enhanced RAG response accuracy from 76% to 93% by implementing metadata filtering & small-to-large context expansion, improving document retrieval and user confidence.
- Deployed a knowledge retrieval system enabling 250+ staff members to reduce document lookup time from 20–25 minutes to under 10 minutes.
Deloitte – Strategy & Analytics
Data Engineer – Consultant – FSI Domain AWS Glue · PySpark · OracleDB · Postgres (Jun 2023 – Jul 2024)
- Built production batch ETL pipelines in AWS Glue & PySpark to ingest a new XML-based insurance source system, processing 12–15M records/day into an enterprise data warehouse.
- Designed a multi-stage ingestion architecture (Landing → Staging → Master → ODS), integrating with Informatica workflows and downstream analytics.
- Orchestrated pipelines using AWS Glue Workflows with EventBridge scheduling and SNS alerting, reducing issue resolution time by 25–30%.
- Developed CodETL, a rule-based code generation accelerator that cut ETL pipeline development time by 40%.
Data Scientist – Consultant – Customer Strategy & Pricing Python · R · ML · AWS S3 · Tableau (Sep 2021 – May 2023)
- Modeled demand elasticity across 5,000+ stores using linear mixed-effects regression, generating a $1M–$3M profit increase while limiting customer churn to 1%.
- Delivered executive-ready Tableau dashboards and scenario analysis to inform pricing roadmaps and influence leadership decisions.
- Streamlined the Pricing-as-a-Service data pipeline, boosting efficiency by 30% and supporting client acquisition through success story presentations.
Samsung Research Institute (PRISM Team)
Computer Vision Research Intern Python · NumPy · OpenCV
- Developed a CV algorithm to enhance low-light astrophotography captured on mobile phones, significantly improving the signal-to-noise ratio using advanced image processing techniques.
- Co-authored an IEEE research paper; received Samsung Excellence Award ($300 reward) for outstanding contributions to the PRISM internship program.
Projects & Publications
Publications
Cross-Geography Generalization of Machine Learning Methods for Classification of Flooded Regions in Aerial Images
View Paper
AI / ML
MockChain
Multi-agent AI-powered mock interview platform with personalized feedback.
View on GitHub View DemoDataLens
LLM-powered RAG search and Q&A for structured and unstructured data. Internal Deloitte Project
View CertificateSprintlessAI
Generates Agile user stories from a requirements document + codebase context using RAG. Outputs structured stories and supports optional upload to Jira or GitHub.
View on GitHubSemantic Intent Router
Multi-agent routing pipeline that classifies user intent, retrieves the relevant domain via FAISS vector search, and dispatches to the correct agent — all using open-source embeddings.
Data Science, Engineering & Visualization
Retail Sales Price Optimization
Used price elasticity modeling to recommend optimal pricing in retail.
View on GitHubRetail Store Pricing Dashboard
Dashboard for visualizing product and city trends in retail pricing.
View Dashboard View on GitHubCodETL
ETL engine with rule-based code generation. Cut dev time by 40%. (Deloitte Internal Project)
Analyzing & Visualizing Google Analytics Data for HRA UI
Built an interactive dashboard using Google Analytics data to visualize the user journey. It mapped how users interacted with the interface, highlighted the most used features, and tracked adoption of new functionalities. This helped identify which features were driving engagement and which needed improvement.
View DashboardComputer Vision
Drowsy Driver Assistant
Created a computer vision–based system to detect driver drowsiness using in-car cameras. The system issued real-time alerts, and in severe cases, initiated autonomous vehicle control to safely guide the car to the shoulder and halt, while sending an emergency alert.
View Poster View DemoAstrophotography
Collaborated with Samsung R&D to develop a system that enhances night sky images captured on mobile devices. Improved signal-to-noise ratio to deliver clearer visuals and more accurate star detection.
View PaperVirtual Mouse using Python-OpenCV
Gesture-based virtual mouse using OpenCV hand tracking.
View on GitHubMiscellaneous
Streamlit Youtube Playlist
Create a series of video explaining how to use streamlit. Gathered over 210k views and 13.5k hrs of watch time
View on YoutubeSneaker Update Bot
Created a real-time Discord bot that alerted users about sneaker drops and restocks. It supported a client’s resale business, generating $300K in sales across 1,000+ pairs.
View on GitHubCovid Vaccine Appointment Bot
Alerts for vaccine slots by scraping using zip codes.
University Assistant Chatbot
NLP-powered FAQ chatbot for university queries. Built a custom dataset and trained multiple NLP models for intent classification and response generation. Served clients across 5+ countries.