Hi, everyone! My name is Basty, and I’m a data scientist and educator from Metro Manila. If there’s something I value so much, that is education. Education has allowed me to not only dive into the amazing world of code and data, but also to encourage and inspire others to do the same. Read more about me here.
Welcome back to our Demo Day 2-part series, wherein we first talked about what Demo Day is and why it is the culminating event of the Data Science Fellowship in Eskwelabs. In this second part of the series, as promised we’ll be going through each capstone project prepared by the Fellows (students).
This is the final project produced in the 15-week Data Science Fellowship by the Fellows. It is the culmination of all that they've learned and worked hard for during the Fellowship. The counterpart of this in the Data Analytics Bootcamp is the Company Business Review.
Now let's try to understand how they used data science to solve their chosen problems. Without further ado, let’s gooo!
Group 1 is composed of Fellows Jacob, Jota, Jopet, Ron, and Gelo, and their project is titled Road to Zero Poverty: A machine learning approach to alleviating poverty in Cabanatuan City.
In this capstone project, they’ve decided to take on the problem of poverty in our country, specifically in Cabanatuan City. Poverty in Cabanatuan has resulted in the neglect and marginalization of its people. In fact, Nueva Ecija, the province that Cabanatuan City is in, was not recognized by Spain as a separate country only because of poverty.
In Cabanatuan City, it is known that 26.14 million people live below the poverty line with a Php 12,000 salary a month in a family of 5. Aside from the problem of income, the team has also highlighted that another reason why poverty is such a big problem in the city is because of its multi-dimensionality, such as access to water and sanitation, education, etc. And so with all of these in mind, the group came up with a question, "What data-driven solutions can we provide to Cabanatuan City in alleviating poverty?"
In order for them to answer their question, they’ve identified 3 objectives, namely:
To achieve these objectives, they used the Community-Based Monitoring System (CBMS) 2018 Cabanatuan dataset, which featured poverty indicators on health, nutrition, housing, water, education, income, employment, and peace & order.
For the first objective, they built a classification model with income as the main factor to classify whether a household is poor or non-poor. Through this model, they were able to uncover the different factors affecting poverty in a household.
In terms of the barangay level, which is the second objective, they used linear regression to quantify the factors with most impact in poverty.
Lastly, for the third objective they used clustering to strategically allocate the solutions they came up with.
Group 2 is composed of Fellows Aleta, Geniston, Lacar, Laurel, and Perillo, and their project is Serving Philippines delicacies on the Global Menu: Making local products competitive on the global platform Amazon in collaboration with eCFulfill.
eCFulfill helps make Philippines MSME products available all over the world. However, the problem is that there is a lack of knowledge in terms of setting up and managing their products to be globally competitive. And so, the group has identified their data science problem to be, “How can we make our Philippine products Amazon Best Sellers?”
In order for them to answer this question, they came up with two objectives:
To accomplish these objectives let’s take a look at their project overview:
They first gathered data through the Amazon website with the use of web scraping, then they did some preprocessing to turn the data into a usable format, and they created a binary classification model. Based on their initial findings, they then further explored the data through the use of Natural Language Processing (NLP).
Here are some interesting insights that they’ve got from their analysis:
Group 3 is composed of Fellows Anjelo, Tin, and Tan, with a project entitled BooCA, A Machine Learning Solution to deal with Booking Cancellations for Hotel Lita.
In this project, they decided to focus on hotel booking cancellations since it deals with lost revenue opportunity, and problems in staffing, supplies purchases, and profitability. To counter these, hotels implement policies to avoid cancellation and force cancellations.
In the group’s case, their client, Hotel Lita, needs an approach to address these cancellations by answering these questions:
With the questions already put in place, the group came up with a solution, which is to predict the probability of each booking to be cancelled through machine learning.
With this approach, the group will able to:
To do these, the group collected reservations data from 30 different hotels worldwide. And with that data, they were able to make a Catboost model, a model used for classifying categorical variables, and also identify features that are attributing to hotel cancellations.
Those features are namely: the number of changes in booking, lead time, and average daily rate in USD.
Group 4 is composed of the Fellow Moreno brothers, Juancho and Niño, together with Fellow Kyle. This group decided to take on a different route compared to the other groups as they went with sports analytics. Their capstone project is titled, PLAYER ARCHETYPES, What types of PBA and NBA players are out there?
The basketball industry is a huge business, especially in the Philippines. The NBA alone is estimated to be worth $49.5 billion alone. In the Philippines, it’s estimated that nearly 40 million Filipinos play or have played the sport. On top of that, the Philippines will also be hosting the FIBA Basketball World Cup in 2023, which makes their project relevant and timely.
With all these in mind, the group came up with two problem statements:
By answering these problems, the group will be able to identify different types of players in the NBA and PBA, compare their similarities and differences, and provide meaningful insights for PBA stakeholders.
In order to achieve these, the group collected data through basketball-reference.com and dribblemedia.com. The data consists of different quantitative stats such as points per game, rebounds per game, etc.
After scraping the data from its sources, they performed some data cleaning to make sure the data is in proper format. Next is to model this data by using K-Means Clustering and Soft K-Means. And with the results from the models used, they came up with player archetypes that will later on become recommendations for PBA stakeholders.
Here are all the different archetypes that the group came up with for the NBA:
As for the PBA, here are the different archetypes they’ve established:
Wasn’t it amazing to witness all the different capstone projects? The final product, the conclusion, and the culmination of their entire bootcamp experience into one project!
It was inspiring to see how all four groups were able to apply data science to real-world cases and different industries, and of course let’s not forget, to create impact with the use of data!
This blog post is only a glimpse of what YOU can also experience and achieve if you join the Data Science Fellowship.
Harness the power of data and use it to create impact in fields and industries you are passionate about! Who knows, your capstone project might be the next one featured.
From the scrapbook of Basty Vergara | Connect with Basty via LinkedIn and Notion
Updated for Data Science Fellowship Cohort 10 | Classes for Cohort 10 start on September 12, 2022.