Projects
During my Master’s in Data Analytics, I tackled projects that are hands-on and applicable to real world scenarios.
The topics and data I’ve chosen to work on are a reflection of my interests. I’ve used data from Kaggle to real business data from my past occupations.
Contents
01
Remote Work & Employee Well-Being
The occupational world changed after the 2019 COVID pandemic. These
changes continue to affect how we operate and interact in business today. The trend towards remote
work was already underway, thanks
to the rise of telecommunications and freelance opportunities. However, the pandemic accelerated
this trend, resulting in the widespread adoption of working from home and its transformative effects
on
traditional work structures. The research objective is to give businesses seeking to implement a
remote/hybrid work policy indications on whether or not this type of policy would be appropriate and
successful by
providing businesses with actionable insights to inform their decision-making regarding remote work
policies,
ultimately fostering a more effective and sustainable work environment for both employers and
employees in the new norm of remote work.
The results show that when crafting a work-from-home policy, it is
important to focus on the needs of these demographics, as they make up the majority of the current
and upcoming workforce. Based on these findings, the recommendation is to focus on the needs of
the employees. “What is good for people is good for the organization” [7, p. 5]. It is suggested to
research policies other similar companies have established and then tailor them to your company’s
specific needs. The most important thing to do is to understand your employees by studying their
demographics and their needs. I utilized UCLA’s and Vanderbloemen’s, an executive search firm,
guidelines for a remote working policy to recommend implementation.
Read More
02
Predicting Customer Repurchase
Environmental Products & Accessories, LLC (EPA Sales) is a manufacturer
and reseller of vacuum truck parts. Despite its industry presence, EPA Sales currently lacks
outbound sales efforts, limiting its ability to anticipate customer needs and encourage repeat
purchases. This project aims to predict customer reorders and reorder timing while identifying
high-value customer segments to inform sales and marketing strategies.
Using historical 2024 transaction data, we applied K-Means Clustering to refine customer
segmentation based on purchasing behavior, Random Forest to determine feature importance and predict
if a customer will repurchase within 30 days, association rule mining to look for interesting and
useful relationships between items, and the Cox Proportional Hazards model to estimate the
likelihood and timing of future purchases.
We analyzed join-level customer, transaction, and product data to uncover purchase behaviors,
product performance patterns, and time-based trends. Using K-means clustering, we identified three
customer segments with distinct spend and frequency profiles; a random forest classifier (AUC > 95%)
effectively predicted 30-day reorders; association rule mining revealed complementary product
bundles; and a Cox proportional hazards model highlighted a critical repurchase window within 50
days. Our findings inform segmentation-driven marketing, inventory optimization, and early lifecycle
outreach strategies to boost retention and revenue.
Read More
03
Analyzing the Impact of Technology & Digital Media Consumption on The Human Attention Span
This project examines whether and how modern technology and digital
media consumption have impacted human attention spans, framing attention as three components:
direction (task switching), depth (level of immersion), and duration (sustained focus). The analysis
combines behavioral smartphone data with large-scale media datasets to evaluate how digital habits
and content formats align with changing attention patterns.
Using datasets from Lancaster University and Kaggle, the project analyzes smartphone usage (device
checks, screen time, age cohorts), music trends (Spotify song durations and popularity from
1956–2019), and video platforms (YouTube trending videos and TikTok engagement data). Data
preparation included cleaning inconsistent fields, standardizing durations, removing invalid
records, creating age cohorts, and normalizing engagement metrics to enable fair cross-platform
comparisons.
Key findings show that younger users (18–25) engage with their phones more frequently, with distinct
patterns: ages 18–20 check devices often for short bursts, while ages 21–25 engage in longer
sessions. Screen time follows a trimodal distribution, suggesting light, moderate, and heavy user
types. Media analysis indicates a long-term shift toward shorter, more concise content: popular
songs have shortened since the early 2000s, and high-popularity tracks cluster between 2–5 minutes.
On video platforms, TikTok drives rapid, high-like engagement through short-form content, while
YouTube supports deeper engagement via longer videos and higher comment activity, particularly for
educational or searchable content.
Overall, the project concludes that digital platforms increasingly favor short-form, high-energy
content optimized for quick engagement, reinforcing frequent attention switching and reduced
sustained focus. While content length alone does not determine engagement, platform design strongly
shapes how attention is captured and expressed. The findings highlight the importance of mindful
media consumption and provide evidence-based insight into how technology both responds to and
reinforces shrinking attention spans.
Read More
04
Creating a Database
This project involved designing and building a centralized product
inventory management database for Environmental Products & Accessories (EPA), a small manufacturer
and reseller of vacuum truck and hydro-excavation equipment. The core problem was fragmented, poorly
integrated data across Shopify, Sage, and HubSpot, which caused inconsistent inventory records, weak
visibility into stock levels, and operational inefficiencies. Rather than attempting to clean legacy
systems, the project intentionally started from scratch to design a clean, scalable database aligned
to business needs.
The solution was a relational SQL database structured around five core entities: Product, Supplier,
Category, Inventory, and Sales (with Transactions separated for normalization). An ER diagram and
relational schema were created to enforce data integrity and clearly define one-to-many and
one-to-one relationships across departments. This design supports key operational use cases
including real-time inventory tracking, supplier management, sales analysis, and reorder monitoring
for purchasing and warehouse teams.
The database was implemented using SQL, including table creation with primary and foreign keys, data
type enforcement, and sample data inserts. Analytical queries were written to answer realistic
business questions such as identifying products sold after a specific date, detecting items below
reorder levels, ranking top revenue-generating products, and calculating category- and
supplier-based revenue. These queries demonstrate how stakeholders would use the system for
reporting and decision-making.
Overall, the project highlights strong fundamentals in database design, normalization, and applied
SQL analytics. The key takeaway is that thoughtful upfront data modeling prevents downstream
inefficiencies. By consolidating inventory, sales, and supplier data into a single source of truth,
the proposed system improves accuracy, reduces operational friction, and provides a scalable
foundation for future integrations and growth at EPA.
Read More
05
Academic Risk Detection
This project focused on developing a data-driven framework to
proactively identify high school students who are academically at risk before failure occurs. The
core problem addressed was the lack of scalable, early-warning systems in schools, where
interventions often happen only after performance has already declined. The goal was to surface risk
factors early so that students, families, educators, and administrators can take targeted action to
improve engagement, retention, and academic outcomes.
The solution combined exploratory data analysis with predictive and unsupervised machine learning
models using a synthetic high-school dataset of 2,392 students. The data captured demographics,
academic behavior, parental involvement, tutoring, absences, study habits, and extracurricular
participation. Age was mapped to grade level as a proxy, categorical variables were encoded for
modeling, and GPA thresholds were used to define academic risk. Exploratory analysis revealed clear
patterns linking absences, parental support, and extracurricular engagement to academic performance,
while showing minimal performance differences by gender.
Multiple modeling approaches were implemented to both predict outcomes and understand drivers of
risk. A linear regression model was used to predict GPA and identify statistically significant
predictors, achieving strong explanatory power and demonstrating that absences, parental support,
tutoring, and study time were among the most influential factors. A decision tree classifier was
then used to predict grade classification, reinforcing attendance as the most critical early-warning
signal while highlighting limitations related to class imbalance. K-means clustering was applied to
segment students into distinct behavioral profiles, revealing groups such as high-effort but
low-performance students and low-engagement, high-risk students—insights that go beyond simple
pass/fail labels.
Overall, the project demonstrates how combining predictive modeling with clustering can support
early intervention strategies in education. The key takeaway is that academic risk is multifaceted
and cannot be addressed through grades alone. By identifying attendance patterns, support
structures, and engagement behaviors, this framework provides a practical foundation for
early-warning dashboards, targeted tutoring programs, and policy decisions that shift schools from
reactive remediation to proactive student support.
Read More