This series is called “Data for Good” and you can expect the blog posts to tackle a variety of topics from the environment, politics, business, and more, because using data for good is not bound to just one industry. We will feature noteworthy data scientists and data analysts both locally and internationally in this series.
Albert “Bash” Yumol is an AI consultant and is currently the senior data scientist at ING, a global bank that has pioneered digital banking—making them one of the most innovative banks in the world. Bash used to be one of our Data Science Fellowship data sprint instructors here at Eskwelabs.
The Philippines’ May 9, 2022 elections will go down in history as one we’ll never forget. One of the many reasons why is the fact that the season had an incredible amount of documentation, especially with the rise of social media and the intensity of various candidates’ supporters. These people—volunteers, fans, advocates—were incredibly vocal online in the night of and days after the Monday elections.
Charts and spreadsheet screenshots started popping up on Facebook and Twitter. Some people were left confused and distressed, trying to make sense of the initial results.
This is what Bash Yumol will be answering in today’s blog post. With permission, the following content was taken from Bash’s lecture and demo session with the Eskwelabs alumni last May 24 in an event we call the “Alumni Learning Circle.”
As of January 15, 2022, it’s been reported that there are 65,745,529 registered voters. 60% of these voters had little to no knowledge of what the Automated Election System (AES) was.
AES uses computers to capture voter input, to aggregate it, and to send it electronically to servers for canvassing.
The process is as simple as: Vote (manual) → Count (automated) → Canvass (automated).
There are many points of aggregation. There is a hierarchy of how votes are aggregated and then canvassed per geographical levels. Aside from precinct levels, we can check the municipal, provincial, regional, and up to the national level. These numbers are passed through two servers: the central server and the transparency server.
Before doing any analysis, we must first get the data. Thankfully, data from polling precincts are available to us. In Bash’s lecture and demo, he shared two ways of web scraping the data that we’ll be using.
This is a sample box plot of Marcos Jr.’s votes for simple anomaly detection calculated from the proportion of votes per precinct that Marcos Jr. was able to collect. According to Bash, the box plot shows no outlying precincts, meaning there were no dots outside the box plot. Statistically, nothing seems out of the ordinary. Nothing is out of the statistical norm.
As the live feeds from the media would reveal updates on the count of votes every few hours or so, we can do an analysis of the regularity of numbers as they come across time.
People on social media have raised their concerns after doing this kind of analysis. But the problem with their analysis is that they’ve been using cumulative data, so they’re building on top of a pattern that has been established previously.
We need to understand that we are plotting cumulative values. So how do we correct this? What would be a better way? We can remove the cumulativeness of the data by subtracting the subsequent values to each other to get the actual count per transmission. When we use those numbers and plot the shares, we observe more variance in the data.
We can also investigate the diversity of where these election receipts are coming from. We can break down results per region. Based on this graph, we can say each region was well represented at all times of transmission.
Eskwelabs believes in using data for good. What are some principles to keep in mind?
We’ve covered the most basic parts of Bash’s discussion with our alums inside this blog post. But there is a more advanced section which we’ll leave as a bonus when you watch the entire lecture here:
Access the codes used here.
If you are interested in joining the ‘Data for Good’ movement, join the community by upskilling yourself.