How did our Cohort 11 Data Science Fellows create their Capstone Projects?
Date
Reading Time
 minutes
Read if
Tags
Data Science Fellowship

How did our Cohort 11 Data Science Fellows create their Capstone Projects?

Basty’s Notebook

Hi, everyone! My name is Basty, and I’m a data scientist and educator from Metro Manila. If there’s something I value so much, that is education. Education has allowed me to not only dive into the amazing world of code and data, but also to encourage and inspire others to do the same. Read more about me here.

Outside of work and school, I love playing video games like Valorant and League of Legends. I also love listening to Broadway musicals (HAMILTON, DEH, TICK TICK BOOM ALL THE WAY!). Lastly, I LOVE watching Friends, New Girl, HIMYM, and The Big Bang Theory.

Now, let’s take a look at my notebook!

July 2023 Notebook entry

With the Cohort 11 Fellows graduating from Eskwelabs’ Data Science Bootcamp and Demo Fest being right around the corner, it is time for us to take a bit of time off from our usual theoretical blogs and see how data science can be a tool to empower our passions. 

Are you a food-lover who's always on the lookout for new restaurants to try? Or perhaps you have a strong passion for culinary experiences and exploring diverse cuisines? While our adventurous spirit in the food world is exciting, it’s essential to also nurture our mental health and well-being. And so, in this blog post, we’re gonna explore two of the capstone projects done by our very own fellows. Let’s get started!

MM Foodies Restaurant Recommender Engine

Fellows’ Ben, Martel, Rex, Queenie, and Zee are all food lovers, and one common question that they noticed most people ask is “Saan tayo kakain?” The dilemma of choosing where to eat is one of the most notorious challenges we face. With Zomato, an application that contains information about restaurants in Metro Manila, bidding farewell, the team decided to take on the task of providing Filipinos a tool that will help them decide where to eat. 

Exploratory Data Analysis (EDA)

The dataset comprises 3,491 total number of restaurants.

Another problem that they’ve discovered is that when people talk about Metro Manila, we often think about the popular places within the Metro, such as BGC (Bonifacio Global City), CBD (Central Business Districts), and Quezon City—which has the highest number of restaurants. While most of the reviewed restaurants are located within central business districts in Manila, there’s a lot to experience in other places.

To better show average rating per city, the team used the Bayesian Average, which is used to show a better balance of ratings and quantity. 

With all these information in mind, the team came up with two questions:

Is customer satisfaction amplified by a network of restaurant branches?

Quality over quantity! The team used a correlation plot to see whether there is any relationship among the number of branches a restaurant has, average review counts, and the average rating, and saw that the quality of a restaurant cannot be guaranteed by having multiple branches.

Is cost a guarantee of culinary brilliance?

Quality meets affordability! While there are a lot of luxurious establishments within the metro that are known for their quality, the team found out that budget-friendly restaurants rival these establishments in terms of quality as well. 

Building the Recommender Engine

After doing EDA, the team then created a recommender engine that will help Filipinos decide where to eat.

For the data, the team scraped 3,491 listings with 176 restaurant categories and 7,655 reviews from Yelp. With the nature of the data being mostly text, necessary text cleaning had to be done and some imputation using KNN Imputation and Iterative Imputation to fill-up the missing values within the data. They then proceeded with topic modelling to identify the different cuisine groups and restaurant types, and to perform TFIDF vectorization. Lastly, through the recommender engine, they provided filters such as Cuisine, Type of Restaurant, City, Price Bucket, and Number of Recommendations that a user may input, and this will output the Restaurant Names, Ratings, and Geospatial Mapping.

In addition, by employing text analysis on reviews, the team was able to consolidate the restaurant categories from 176 to 19 cuisines to simplify user experience and mitigate issues of confusion and inaccurate listings. Here’s how MM Foodies works:

Why use MM Foodies?

With this tool, users are provided with a personalized and user-friendly interface for convenient browsing. It also offers a unique experience through the modified filters and interactive map provided. Lastly, it supports small businesses by prioritizing those with high ratings but low review counts. 

Now while there are advantages to this tool, the team highlighted that there are limitations to it such as not being able to output a Menu, Photo Gallery, Amenities, and Information Updates for the recommendation.

Try out MM Foodies first hand: https://mm-foodies.streamlit.app/ 

Beyond the Numbers of Mental Health

For Team JARDiS, composed of Justin, Austin, Ron, Denise, and Shen, they decided to analyze Reddit posts on mental health in the Philippines and Developed their own BESHY (Bot for Emotional Support and a Happy You). 

Globally, 1 out 8 people live with a mental disorder, according to the World Health Organization. Specifically in the Philippines, 1 out 7 people live with a mental disorder with depression and anxiety being the most prevalent disorders. In 2019 alone, the country lost P68.9 Billion from expenditure and decreased productivity due to mental illness. Despite this, only 3% of the national health budget goes to mental health programs. 

The team asked: What are the problems existing with mental health data? 

  • Lack of national data for mental health

  • Burden of these mental health disorders are difficult to understand through numbers alone

The objectives of this project is to:

  • Describe the experience of mental health among Reddit users in the Philippines

  • Create a chatbot that recommends next steps for user’s mental health concerns

Methodology

The data was scraped from three different subreddits with around 2000 post submissions per subreddit. It consisted of the score of the post, title, flair, text, and # of comments. It is worth noting as well that demographic data for users is not available and that it is assumed that it is composed of millennials and middle to upper middle class. 

Exploratory Data Analysis (EDA)

Based on the data, most Filipinos engage in Reddit during night time, when depressions and anxiety are the dominant discussions.

They also found out that most Filipinos are in subreddits because they want to feel listened to. They want to be able to discuss, share, and vent out any compressed emotions they are experiencing. On top of this, it showed that Filipinos need consolidated and timely information about mental healthcare as the analysis showed that the average number of comments per flair revolves around topics such as NSFW (not safe for work), treatment, help, trigger warning, etc. The average scores per flair also reflected this insight. 

The team also went ahead and closely analyzed what the posts were talking about through text analysis and word clouds and found out common patterns which are general life experiences, struggles with depression, holding on to hope, withstanding adversity, and getting help.

Recommendations based on EDA

With all the insights they were able to gather through EDA, the team provided a set of recommendations that should be acted upon:

  • Provide services and customer support at night when people have time and space to work on their mental health

  • Create more safe spaces to discuss mental health to help people feel listened to and to allow perceptions about mental health to change

  • Explore mental health interventions that allow people to connect with each other since Filipinos are very social, amidst the rise of self-care apps and 1-1 therapy.

  • Highlight good stories to inspire people and let them know they can get better

  • Make it easy for Filipinos to get help through actionable information about mental health services

BESHY (Bot for Emotional Support and a Happy You)

Aside from the recommendations the team came up with, they also created a chatbot that can provide information to users to get started with reaching out to a mental health professional or another qualified health provider. It is essential to note that the chatbot is not intended to be a substitute for professional advice, diagnosis, or treatment. 

The core functionalities of BESHY is the ability to recommend a mental health facility, flag high risk suicidal thoughts, and flag depressive and anxious thoughts based on user input. Here’s how BESHY works:

Improvements

For every analysis, there will always be improvements. The team identified 4 points for improvement for the chatbot to perform better in getting user input and providing recommendations.

  • Include more posts from Reddit through other Application Programming Interface (APIs) and expand to other social media platforms as well to improve the model’s generalization capability

  • Find better ways to pre-process and apply Natural Language Processing (NLP) on posts written in the Filipino language

  • Find training data that is specifically from the Philippines

  • Consult with mental health professionals and collaborate with existing initiatives, make the chatbot be able to respond in Filipino, and find ways to improve the chatbot’s conversational ability

Try out BESHY first hand: https://jardis-beyond-the-numbers-mental-health.streamlit.app/ 

With the two projects done by both teams, they were all able to demonstrate their remarkable potentials in this field by creating projects in two vastly different but equally important domains. Through the MM Foodies Restaurant Recommender Engine, we witnessed the power of data-driven algorithms in enhancing the dining experiences of food enthusiasts, offering personalized and delightful suggestions to tantalize taste buds.

The mental health project showcased the impact data science can have in promoting well-being and support systems. By harnessing the insights from Reddit alone, the team contributed to the advancement of mental health interventions, paving the way for more informed and compassionate care.

As data scientists, we are inspired by these projects, fueling our passion to explore new frontiers and harness the power of data to create a better, healthier, and more connected world. Together, we embark on a journey where data science meets human needs, striving to make a meaningful impact in the lives of individuals and communities alike. Congratulations Fellows!

Never stop learning!