This series is called “Data for Good” and you can expect the blog posts to tackle a variety of topics from the environment, politics, business, and more, because using data for good is not bound to just one industry. We will feature noteworthy data scientists and data analysts both locally and internationally in this series.

Meet Our Expert

Albert “Bash” Yumol is an AI consultant and is currently the senior data scientist at ING, a global bank that has pioneered digital banking—making them one of the most innovative banks in the world. Bash used to be one of our Data Science Fellowship data sprint instructors here at Eskwelabs.

The Hot Topic

The Philippines’ May 9, 2022 elections will go down in history as one we’ll never forget. One of the many reasons why is the fact that the season had an incredible amount of documentation, especially with the rise of social media and the intensity of various candidates’ supporters. These people—volunteers, fans, advocates—were incredibly vocal online in the night of and days after the Monday elections.

Charts and spreadsheet screenshots started popping up on Facebook and Twitter. Some people were left confused and distressed, trying to make sense of the initial results.

The question on everyone’s minds: “What happened?”

This is what Bash Yumol will be answering in today’s blog post. With permission, the following content was taken from Bash’s lecture and demo session with the Eskwelabs alumni last May 24 in an event we call the “Alumni Learning Circle.”

What is an Automated Election System?

As of January 15, 2022, it’s been reported that there are 65,745,529 registered voters. 60% of these voters had little to no knowledge of what the Automated Election System (AES) was.

AES uses computers to capture voter input, to aggregate it, and to send it electronically to servers for canvassing.

What’s the process of an Automated Election System?

The process is as simple as: Vote (manual) → Count (automated) → Canvass (automated).

There are many points of aggregation. There is a hierarchy of how votes are aggregated and then canvassed per geographical levels. Aside from precinct levels, we can check the municipal, provincial, regional, and up to the national level. These numbers are passed through two servers: the central server and the transparency server.

  • Central server - The official server which tells us the final results of the elections.
  • Transparency server - The server that captures directly from the Precinct Count Optical Scanner (PCOS) machines. This server is open to media and poll watching groups. For example, the graphics on the news that we see of real-time partial results during the night of and days after the elections are from the transparency server.

Our Hot Takes

Before doing any analysis, we must first get the data. Thankfully, data from polling precincts are available to us. In Bash’s lecture and demo, he shared two ways of web scraping the data that we’ll be using.
  • Option 1: You can use Python and Beautifulsoup to parse through a list of polling precincts and capture the numbers there.
  • Option 2: You can also go directly to the transparency server and capture the JSON files.

Angle 1: Looking at the distribution of votes at the precinct level

This is a sample box plot of Marcos Jr.’s votes for simple anomaly detection calculated from the proportion of votes per precinct that Marcos Jr. was able to collect. According to Bash, the box plot shows no outlying precincts, meaning there were no dots outside the box plot. Statistically, nothing seems out of the ordinary. Nothing is out of the statistical norm.

Angle 2: Analyzing the proportions of votes from Marcos Jr. and Robredo from accumulated counts while the votes are being transmitted

As the live feeds from the media would reveal updates on the count of votes every few hours or so, we can do an analysis of the regularity of numbers as they come across time.

People on social media have raised their concerns after doing this kind of analysis. But the problem with their analysis is that they’ve been using cumulative data, so they’re building on top of a pattern that has been established previously.

We need to understand that we are plotting cumulative values. So how do we correct this? What would be a better way? We can remove the cumulativeness of the data by subtracting the subsequent values to each other to get the actual count per transmission. When we use those numbers and plot the shares, we observe more variance in the data.

Angle 3: Investigating where election receipts are coming from

We can also investigate the diversity of where these election receipts are coming from. We can break down results per region. Based on this graph, we can say each region was well represented at all times of transmission.

Our Firm Conviction

Eskwelabs believes in using data for good. What are some principles to keep in mind?

  • It doesn't matter if you are a data scientist or data analyst, all of us have a responsibility to use data for good. Collectively, we produce 2.5 quintillion bytes of data every day. There’s no excuse, since we are all data producers and users.
  • We say no to laziness. We check, check, and check again. The cost of misinformation and misuse of data is too high. The more rigorous we are in our processes, the better.
  • Using data for good isn’t just for the common good of our society today. The way we use and act on what data tells us right now can have either positive or negative consequences on future generations.

Join the discussions

We’ve covered the most basic parts of Bash’s discussion with our alums inside this blog post. But there is a more advanced section which we’ll leave as a bonus when you watch the entire lecture here:
Access the codes used here.

If you are interested in joining the ‘Data for Good’ movement, join the community by upskilling yourself.

  • If this is your first time reading about the Data Science Fellowship, we recommend checking out the program in more detail here. Ready to apply? Get started by signing up here.

  • If this is your first time reading about the Data Analytics Bootcamp, we recommend checking out the program in more detail here. Ready to apply? Get started by signing up here.