Would it sound odd if you heard the words “Eskwelabs is on the case”? Though this statement seems to come straight from a crime novel, the statement is entirely factual. Our Data Science Fellowship, a 12-week intensive part-time cohort-based program, is not just about tech skills but applying them to solve problems that are important to society.

Bianca Bencio, Matthew Chan, Fidel Racines, Jay Silverio, and Romeo Ben Manangu joined forces on one such data for good project, where they harnessed the power of data science to democratize access to legal research and lawyers, working with Digest.ph as a client.

Capstone Project Summary

Have you ever heard of Digest.ph? Though you may be tempted to think it’s a cooking or foodie website, it is not. Digest is a legal technology platform that aims to significantly improve access to quality legal services and research. Through the Data Science Fellowship capstone, our learners contributed to Digest’s platform by building something that would have taken years to complete without data science.

Bianca, Matthew, Fidel, Jay, and Ben worked with their Eskwelabs mentor to build a legal case classifier which is powered by artificial intelligence (AI). Their project, called “DIGEST-ION,” was able to classify, according to crime type, the thousands of cases that the Supreme Court of the Philippines have handled since its inception.

How were they able to accomplish this using data science? Here’s a simplified rundown of how they did it:

  • The Fellows sourced the volumes of Supreme Court texts from Digest’s own online legal library.
  • Fourteen criminal case categories were created during exploratory data analysis, in consultation with practicing lawyers.
  • Rule-based algorithms or models were used to develop labels for the case dataset.
  • Using the labeled data, the Fellows tested out different artificial intelligence (AI) models to classify each case according to the type of crime involved.
  • The most accurate AI model was selected based on the Fellows’ metrics and Digest’s evaluation.

The Challenge

Much like it was harder to find assignment answers in the library before the Internet was invented, lawyers used to manually look for cases in archives. This task was incredibly time-consuming. Although the last few decades have seen computers enabling mass digitization of physical documents, digital databases have remained largely uncategorized and sorted only chronologically.

But why is it important for lawyers to look at past cases? The Philippines has a mixed legal system. Practically speaking, this means that both statutes (written law) and cases (Supreme Court decisions) form part of the law of the land. Lawyers must look through voluminous amounts of past cases in search of what might be relevant to their client’s particular situation. As Digest founder Raymond Rodis puts it, “Digest aims to help lawyers find relevant laws to cite in court. To do that, we needed a way of classifying thousands of Supreme Court cases into relevant categories.”

The Solution

Screen Shot 2021-09-23 at 8.44.48 AM (1).jpg

Outline of Project Procedures from DIGEST-ION Team’s Demo Day Presentation

From 1901 to 2017, the Supreme Court of the Philippines resolved around 60,000 total cases. From this, the project group focused on 18,000 criminal cases. To classify these cases by hand would require lawyers to read over each case and decide what category it can be assigned to. At a speed of 10 cases a day, it would still take 5 lawyers collectively more than 1 year to complete the criminal law section. However, using data science, lawyers were only involved in advising the development of labelling. Once the Data Science Fellows finished the challenging task of labelling, software powered by machine learning (a field related to artificial intelligence) did most of the heavy lifting and classified the selected Supreme Court cases into specific criminal offense categories. This machine learning-driven case classification software is the classifier engine.
To build the classifier engine, the Fellows did the following:
  1. Labels creation - 14 labels were created, such as “Crimes Against Property”, based on the classification scheme within the Revised Penal Code of the Philippines (the country’s primary criminal law statute) and various law school syllabi.
  2. Data Wrangling - Processed large volumes of Supreme Court cases from Digest’s online library, including cutting up bodies of case text into words or categories that the computer can process. Unnecessary parts like punctuation marks and HTML tags (code used in creating online content) were also eliminated in the data wrangling process.
  3. Exploratory Data Analysis (EDA) with the help of various models or algorithms, legal text references and consultation with the legal professionals of Digest. EDA involves the following substeps:
    • Identifying criminal offense categories (for example, “Crimes against Persons” and “Crimes against Public Order”) for the engine to work on, using the Revised Penal Code of the Philippines as reference material.
    • Using algorithms like Latent Dirichlet Allocation (LDA), Latent Semantic Indexing (LSI), and Hierarchical Dirichlet Process (HDP) in order to determine the main keywords from the large body of criminal cases.
    • Matching the main keywords from the cases to their corresponding criminal offense category (for example, the keywords “adultery” and “rape” are matched with the category “Crimes Against Chastity”).

Screen Shot 2021-09-23 at 8.49.23 AM (1).jpg

Criminal Offense Categories and Related Keywords from DIGEST-ION Team’s Demo Day Presentation


  • Testing of predictive models - Five different multi-label classification models were then trained on the label dataset. These models are XLNet, RoBERTA, BERT, DistilBERT, and logistic regression. The models were used to find the main keywords in each Supreme Court criminal case and label each case using a multilabel classifier. Without getting too much into the details, these models represent natural language processing (NLP) techniques, which is the ability of a computer to understand written and spoken words like a human being. You can learn more about the models used via the DIGEST-ION team’s Demo Day presentation video.
  • Model Section - Choosing the most accurate and efficient NLP model and incorporating it into the creation of the classifier engine.
  • Tools

    • Pandas - An open source data analysis and manipulation tool which is based on the Python programming language
    • Scikit Learn - Another Python-based software that gives users access to machine learning algorithms
    • Gensim - Software that provides models like LDA and LSI, which were models used by Cohort 7 to extract main keywords from volumes of cases
    • NLTK - A tool used for data wrangling and other processes that require natural language processing
    • Simple Transformers - A natural language processing library used for deep learning models

    Impact

    Empowering lawyers, their clients, and other stakeholders of the legal system.

    The classifier engine built by the DIGEST-ION team has the potential to save lawyers time in legal research. Since lawyers typically charge by the hour, saving time also means saving money. Automation of the search for cases can help democratize reasonably-priced legal services for more Filipinos.

    In Rodis’ own words, “The project was able to show us a proof of concept on how an algorithm can accurately classify cases instead of lawyers spending thousands of hours categorizing [the cases] manually.”

    In addition to establishing proof of concept, the Fellows gleaned important insight as to what particular techniques, models, and tools work most effectively for this challenge. Another key insight was the indispensability of working in close collaboration with domain experts, in this case, legal professionals from the Digest team. Their expertise during the labelling phase was a necessary complement to the Fellows’ data science practice. Furthermore, engaging in direct and consistent dialogue with the individuals who are most likely to use the tool allowed the Fellows to develop their project with a user-centered end goal in mind.

    Work on Fascinating Cases through Eskwelabs

    Cases in the Supreme Court are not the only exciting cases you’ll work on if you enroll in one of our data upskilling programs. For learners in our Data Science Fellowship or Data Analytics Bootcamp, they can now join a new feature called the Industry Apprenticeship.

    Currently, all of our learners take part in cohorts where they complete data projects with their peers and mentors from the industry. But we are taking it further by introducing paid industry apprenticeships which will give our learners the option to extend the projects they work on in the bootcamp while getting paid to do them. We are partnering with Accenture and the Asian Institute of Management - Dado Banatao Incubator (AIM-DBI) to offer this to our next cohort of Data Science Fellowship learners. Sign-up to learn more below.

    RECOMMENDED READING

    • Interested in our Industry Apprenticeship? You may click any of the links below depending on your program of choice to find out more.
    • Like Cohort 7, you too can invent your own data science-driven social impact tool by enrolling in our Data Science Fellowship. Click here to get started with your data science journey.
    • If you think analyzing data is your way of contributing to the common good, you may try to enroll in our Data Analytics Bootcamp. Proceed to this page for more information and to start your application.