What were the Data Science Capstone Projects presented during Demo Fest?
Date
Reading Time
 minutes
Read if
Tags
Data Science Fellowship

What were the Data Science Capstone Projects presented during Demo Fest?

Basty’s Notebook

Hi, everyone! My name is Basty, and I’m a data scientist and educator from Metro Manila. If there’s something I value so much, that is education. Education has allowed me to not only dive into the amazing world of code and data, but also to encourage and inspire others to do the same. Read more about me here.

Outside of work and school, I love playing video games like Valorant and League of Legends. I also love listening to Broadway musicals (HAMILTON, DEH, TICK TICK BOOM ALL THE WAY!). Lastly, I LOVE watching Friends, New Girl, HIMYM, and The Big Bang Theory.

Now, let’s take a look at my notebook!

August 2023 Notebook entry

Were you able to catch Demo Fest last August 10? Well, don’t worry if you weren’t able to because I’m here to walk you through the two different capstone projects that were presented on behalf of the Data Science Fellowship. 

In this blog post, we’ll explore simulations of how power outages affect the lives of our fellow kababayans in Visayas through Network Analytics. Aside from that, we’ll also talk about how data science and machine learning can be a tool to empower the gaming industry by predicting the success of a game using player retention. Let’s get started!

VTS (Visayas Transmission Simulator)

Visayas Transmission Simulator, also known as VTS, is one of the two capstone projects that were showcased during Demo Fest, and it was made by fellows’ Kurt, Gelo, Pau, and Jed, together with their mentor JC. The project focuses mainly on simulating the effect of line outages in the power grid of Visayas through Network Analytics. 

Problem

Power Crisis in Visayas has been a trend this year. Due to the power crisis, this has also caused a delay in public information, with approximately 1.5 hrs of delay in social media and 7 hrs of delay in news outlets.

Objective

This study aims to improve outage understanding and enhance warning systems by utilizing live alerts data and examining the network structure of the Visayas grid.

Scope and Limitations

The team have identified 4 specific scopes and limitations for the project:

  • Current flow directions assume a normal grid condition which is 98.47% of the time.
  • Only major transmission line outages are considered, substation level not included.
  • Line capacities are assumed to maintain normal thresholds. Single line connections are assumed between nodes.
  • SLD, Coops, DUs and customer connections are from the Market Network Model (MNM)

Methodology

Exploratory Data Analysis (EDA)

Here are what we know about the Visayas Grid:

  • It is more prone to natural disasters like typhoons and earthquakes, so it is more prone to transmission issues.
  • There are relatively fewer interconnections making it less flexible in managing grid fluctuations.
  • It experiences higher transmission losses due to long distances of lines resulting in grid inefficiency.
  • It is more challenging transmission infrastructure expansion due to the scattered nature of islands.

75% of all advisories in Visayas refer to line outages, while line trippings and restoration in Visayas exhibit a random occurrence pattern, suggesting a lack of predictable factors or consistent triggers. 

Here is an image of the Visayas Grid Single Line Diagram (SLD), which was extracted from the Market Network Model (MNM) as of April 2023. This diagram is composed of 5 minor grids: Cebu, Leyte-Samar, Bohol, Negros & Panay, and will be used as the basis of the network and for identifying affected nodes.

Network Analysis

After conducting EDA, here are the results of the Network Analysis through different centrality measures.

Conclusion

  • The National Grid Corporation of the Philippines (NGCP) published text alerts are great sources of electricity grid information and can be made available and understandable by the public.
  • Network Analysis using Visayas grid SLD can be used to determine importances of electricity grid nodes.
  • NetworkX is a powerful solution to pinpoint nodes affected by line outages.
  • An additional layer of information on the node connection of cooperatives, distribution utilities and contestable customers makes the project relatable to the regular consumer.

Recommendations

In terms of further improving the project, the team recommends:

  • Integrating line capacities, and supply and demand information in the network to accurately simulate abnormal grid situations.
  • Provide real-time simulations to view affected nodes and customers on-the-dot.
  • Apply analysis to the Luzon and Mindanao grids.

For the development of the power grid:

  • Additional transmission lines from and towards Cebu (load and supply center) since any disruption in Cebu’s power distribution can significantly impact numerous surrounding areas.
  • Strengthen the grid resilience of highly important nodes like Colon, Cebu, Ormoc, Bacolod and Amlan in order to better maintain a stable power supply across the region.
  • Grid reinforcements in Panay Island to support the growing demand requirements of Boracay Island to ensure a reliable and uninterrupted power source for sustaining Boracay’s vibrant tourism industry.

Player Retention in the Gaming Realm

Leveling up with data science: Player retention in the gaming realm was the second project showcased during Demo Fest, and the team consists of Jamie, Harold, Ace, Gian, and Patrick. The project focuses on utilizing a machine learning model for predicting the success of a game through player retention.

Problem

The Philippine gaming industry earned 78 Trillion PHP back in 2020 and is expected to grow to 121 Trillion PHP by 2027. However, while the PH Gaming Industry is highly valued, local game developers struggle to succeed as stakeholders within the ecosystem.

According to the flowchart, a player can either finish a game or drop it even before finishing it. And so, the team decided to focus on player retention to assess the likeliness that a game would succeed. 

Methodology

Exploratory Data Analysis (EDA)

Since the nature of the project is classification, the team first looked at the correlation of the features to ensure that there is no multicollinearity and that they were using features that had a relationship with the target variable.

The dataset was scraped from both RAWG API and Steam API, which had a total of 30,251 total number of games, then was later on cut-down to 13,280 after pre-processing.

Modelling

The team ultimately decided to use a Gradient Boosting Algorithm to model the data. They also utilized SHAP, a game theoretic package that is used to explain the output of any machine learning package, and found that the top features that contributed the most to the results are “to_play”, “no_of_reviews'', “dropped”, and “playtime”. Higher values of these features, except for “dropped”, all contributed to the positive prediction of the model.

Conclusion

In summary, organizations such as The Game Developers Association of the Phillipines (GDAP) should focus on four main features based on the model: to_play, review_count, dropped, and playtime.

Another helpful tactic is to establish an online presence by creating social media pages or press releases for their games. Lastly, developers should also consistently engage in the community to figure out what gamers actually want.

For further improving the analysis:

1. Conduct a time series analysis of player data such as:

  • Playtime analysis
  • In-game purchase habits
  • Users segmentation

2. Use NLP for social media impact and coverage

  • Analysis of live streaming of games

With the two projects showcased during Demo Fest, they were all able to demonstrate their remarkable potentials in this field by creating projects in two vastly different domains. Through the VTS project, the team increased social awareness to power outages in Visayas through network analytics. 

The Player Retention in the Gaming Realm project showcased that while the gaming industry still has a long journey towards data maturity, there are already numerous ways we can utilize data in order to improve the gaming experience. 

As data scientists, these projects have truly sparked our curiosity, pushing us to explore new possibilities and leverage the power of data for a better world. The fellows’ journey in Eskwelabs was about merging the brilliance of data science with real human needs, aiming to create a tangible impact in people's lives and communities. To all our remarkable Fellows, congratulations and see you in the industry!

Never stop learning!