Projects

During my Master’s in Data Analytics, I tackled projects that are hands-on and applicable to real world scenarios.

The topics and data I’ve chosen to work on are a reflection of my interests. I’ve used data from Kaggle to real business data from my past occupations.

01

Remote Work & Employee Well-Being

The occupational world changed after the 2019 COVID pandemic. These changes continue to affect how we operate and interact in business today. The trend towards remote work was already underway, thanks to the rise of telecommunications and freelance opportunities. However, the pandemic accelerated this trend, resulting in the widespread adoption of working from home and its transformative effects on traditional work structures. The research objective is to give businesses seeking to implement a remote/hybrid work policy indications on whether or not this type of policy would be appropriate and successful by providing businesses with actionable insights to inform their decision-making regarding remote work policies, ultimately fostering a more effective and sustainable work environment for both employers and employees in the new norm of remote work. Person working from home at a desk with a laptop, on a video call, with a dog nearby and clothes hanging in the background. Messages and icons float around, showing online communication. The results show that when crafting a work-from-home policy, it is important to focus on the needs of these demographics, as they make up the majority of the current and upcoming workforce. Based on these findings, the recommendation is to focus on the needs of the employees. “What is good for people is good for the organization” [7, p. 5]. It is suggested to research policies other similar companies have established and then tailor them to your company’s specific needs. The most important thing to do is to understand your employees by studying their demographics and their needs. I utilized UCLA’s and Vanderbloemen’s, an executive search firm, guidelines for a remote working policy to recommend implementation.
Read More

02

Predicting Customer Repurchase

Environmental Products & Accessories, LLC (EPA Sales) is a manufacturer and reseller of vacuum truck parts. Despite its industry presence, EPA Sales currently lacks outbound sales efforts, limiting its ability to anticipate customer needs and encourage repeat purchases. This project aims to predict customer reorders and reorder timing while identifying high-value customer segments to inform sales and marketing strategies. Using historical 2024 transaction data, we applied K-Means Clustering to refine customer segmentation based on purchasing behavior, Random Forest to determine feature importance and predict if a customer will repurchase within 30 days, association rule mining to look for interesting and useful relationships between items, and the Cox Proportional Hazards model to estimate the likelihood and timing of future purchases. Illustration of a person sitting cross-legged using a laptop with a large “BUY” button and arrow pointing to the screen, representing online purchasing or digital consumer behavior. We analyzed join-level customer, transaction, and product data to uncover purchase behaviors, product performance patterns, and time-based trends. Using K-means clustering, we identified three customer segments with distinct spend and frequency profiles; a random forest classifier (AUC > 95%) effectively predicted 30-day reorders; association rule mining revealed complementary product bundles; and a Cox proportional hazards model highlighted a critical repurchase window within 50 days. Our findings inform segmentation-driven marketing, inventory optimization, and early lifecycle outreach strategies to boost retention and revenue.
Read More

03

Analyzing the Impact of Technology & Digital Media Consumption on The Human Attention Span

This project examines whether and how modern technology and digital media consumption have impacted human attention spans, framing attention as three components: direction (task switching), depth (level of immersion), and duration (sustained focus). The analysis combines behavioral smartphone data with large-scale media datasets to evaluate how digital habits and content formats align with changing attention patterns. Using datasets from Lancaster University and Kaggle, the project analyzes smartphone usage (device checks, screen time, age cohorts), music trends (Spotify song durations and popularity from 1956–2019), and video platforms (YouTube trending videos and TikTok engagement data). Data preparation included cleaning inconsistent fields, standardizing durations, removing invalid records, creating age cohorts, and normalizing engagement metrics to enable fair cross-platform comparisons. Person working from home at a desk with a laptop, on a video call, with a dog nearby and clothes hanging in the background. Messages and icons float around, showing online communication. Key findings show that younger users (18–25) engage with their phones more frequently, with distinct patterns: ages 18–20 check devices often for short bursts, while ages 21–25 engage in longer sessions. Screen time follows a trimodal distribution, suggesting light, moderate, and heavy user types. Media analysis indicates a long-term shift toward shorter, more concise content: popular songs have shortened since the early 2000s, and high-popularity tracks cluster between 2–5 minutes. On video platforms, TikTok drives rapid, high-like engagement through short-form content, while YouTube supports deeper engagement via longer videos and higher comment activity, particularly for educational or searchable content. Overall, the project concludes that digital platforms increasingly favor short-form, high-energy content optimized for quick engagement, reinforcing frequent attention switching and reduced sustained focus. While content length alone does not determine engagement, platform design strongly shapes how attention is captured and expressed. The findings highlight the importance of mindful media consumption and provide evidence-based insight into how technology both responds to and reinforces shrinking attention spans.
Read More

04

Creating a Database

This project involved designing and building a centralized product inventory management database for Environmental Products & Accessories (EPA), a small manufacturer and reseller of vacuum truck and hydro-excavation equipment. The core problem was fragmented, poorly integrated data across Shopify, Sage, and HubSpot, which caused inconsistent inventory records, weak visibility into stock levels, and operational inefficiencies. Rather than attempting to clean legacy systems, the project intentionally started from scratch to design a clean, scalable database aligned to business needs. The solution was a relational SQL database structured around five core entities: Product, Supplier, Category, Inventory, and Sales (with Transactions separated for normalization). An ER diagram and relational schema were created to enforce data integrity and clearly define one-to-many and one-to-one relationships across departments. This design supports key operational use cases including real-time inventory tracking, supplier management, sales analysis, and reorder monitoring for purchasing and warehouse teams. Person working from home at a desk with a laptop, on a video call, with a dog nearby and clothes hanging in the background. Messages and icons float around, showing online communication. The database was implemented using SQL, including table creation with primary and foreign keys, data type enforcement, and sample data inserts. Analytical queries were written to answer realistic business questions such as identifying products sold after a specific date, detecting items below reorder levels, ranking top revenue-generating products, and calculating category- and supplier-based revenue. These queries demonstrate how stakeholders would use the system for reporting and decision-making. Overall, the project highlights strong fundamentals in database design, normalization, and applied SQL analytics. The key takeaway is that thoughtful upfront data modeling prevents downstream inefficiencies. By consolidating inventory, sales, and supplier data into a single source of truth, the proposed system improves accuracy, reduces operational friction, and provides a scalable foundation for future integrations and growth at EPA.
Read More

05

Academic Risk Detection

This project focused on developing a data-driven framework to proactively identify high school students who are academically at risk before failure occurs. The core problem addressed was the lack of scalable, early-warning systems in schools, where interventions often happen only after performance has already declined. The goal was to surface risk factors early so that students, families, educators, and administrators can take targeted action to improve engagement, retention, and academic outcomes. The solution combined exploratory data analysis with predictive and unsupervised machine learning models using a synthetic high-school dataset of 2,392 students. The data captured demographics, academic behavior, parental involvement, tutoring, absences, study habits, and extracurricular participation. Age was mapped to grade level as a proxy, categorical variables were encoded for modeling, and GPA thresholds were used to define academic risk. Exploratory analysis revealed clear patterns linking absences, parental support, and extracurricular engagement to academic performance, while showing minimal performance differences by gender. Person working from home at a desk with a laptop, on a video call, with a dog nearby and clothes hanging in the background. Messages and icons float around, showing online communication. Multiple modeling approaches were implemented to both predict outcomes and understand drivers of risk. A linear regression model was used to predict GPA and identify statistically significant predictors, achieving strong explanatory power and demonstrating that absences, parental support, tutoring, and study time were among the most influential factors. A decision tree classifier was then used to predict grade classification, reinforcing attendance as the most critical early-warning signal while highlighting limitations related to class imbalance. K-means clustering was applied to segment students into distinct behavioral profiles, revealing groups such as high-effort but low-performance students and low-engagement, high-risk students—insights that go beyond simple pass/fail labels. Overall, the project demonstrates how combining predictive modeling with clustering can support early intervention strategies in education. The key takeaway is that academic risk is multifaceted and cannot be addressed through grades alone. By identifying attendance patterns, support structures, and engagement behaviors, this framework provides a practical foundation for early-warning dashboards, targeted tutoring programs, and policy decisions that shift schools from reactive remediation to proactive student support.
Read More