Metrics and Insights: Mastering Exploratory Data Analysis
Reading Time
Read if
Data Science

Metrics and Insights: Mastering Exploratory Data Analysis

Basty’s Notebook

Hi, everyone! My name is Basty, and I’m a data scientist and educator from Metro Manila. If there’s something I value so much, that is education. Education has allowed me to not only dive into the amazing world of code and data, but also to encourage and inspire others to do the same. Read more about me here.

Outside of work and school, I love playing video games like Valorant and League of Legends. I also love listening to Broadway musicals (HAMILTON, DEH, TICK TICK BOOM ALL THE WAY!). Lastly, I LOVE watching Friends, New Girl, HIMYM, and The Big Bang Theory.

Now, let’s take a look at my notebook!

June 2023 Notebook entry

Hello there, data enthusiasts! In the age of AI, it’s easy to become excited with advanced models, algorithms, and complex methodologies. However, sometimes we need to remind ourselves to take a step back and appreciate the beauty of raw data. It’s an art that empowers us to discover the underlying story of all the information around us. So for this month’s technical blog, gear up as we embark on this exciting adventure through Exploratory Data Analysis. Prepare to sharpen your analytical skills, ignite your curiosity, and discover the untold stories that reside within your data. Let’s get started!

Unraveling stories with Exploratory Data Analysis

To start, what even is EDA? It’s a term that you’ll often hear in the data industry and while it stands for one thing, it could mean different things to every other data enthusiast. EDA, or Exploratory Data Analysis is a preliminary step in the data analysis process that involves examining and summarizing the main characteristics of a dataset. It’s basically the step where we explore our dataset, hence the word “Exploratory”. 

Arguably, EDA is one of the most crucial steps in data analysis and machine learning because it helps data scientists gain a comprehensive understanding of the data, identify potential issues such as missing values or outliers, and explore relationships between variables—which is important when creating models. 

Another reason it's beneficial for data scientists is that since EDA helps us explore relationships between variables, it leads us to the formulation of hypotheses. It also provides a foundation for us to make informed decisions regarding the appropriate statistical techniques and models to apply. 

EDA also has a unique power—it allows us to unravel the captivating stories hidden within our datasets. Like skilled detectives, we embark on a quest to unveil the tales that lie beneath the surface. We become storytellers, using data as our medium, and EDA as our compass. Through the art of EDA, we can breathe life into numbers and charts, transforming them into narratives that captivate our audience. We piece together the intricate plotlines, unravel the mysteries, and unveil the insights that shape our understanding of the data. Each variable and relationship becomes a character, and every observation holds a clue waiting to be discovered.

Choosing Metrics

One of the key challenges in EDA (and also, in general) is selecting the appropriate metrics to quantify and measure the variables of interest. The metrics that we choose should align with the goals and objectives of the analysis. Different types of data also require different types of metrics. There’s usually not a one-size-fits all solution when it comes to choosing metrics. For numerical data, common metrics include mean, median, standard deviation, and correlation coefficients. Categorical data, on the other hand, might require metrics such as counts, proportions, or mode. Time-series data often employs metrics such as moving averages or growth rates.

The importance of Metrics

  • Data Understanding - Metrics act as quantitative measures that facilitate comparisons, highlight variations, and draw attention to important features in the dataset.

  • Decision-making - Metrics assist in making informed decisions by quantifying and evaluating the performance of different strategies, models, or interventions.

  • Communication - Metrics serve as a common language between data scientists and stakeholders. By presenting findings in the form of metrics, data analysts can effectively communicate complex insights to non-technical audiences.

  • Performance Evaluation - Metrics play a crucial role in evaluating the performance of machine learning models, algorithms, and the business as a whole.

Steps to EDA

While I’ve mentioned that EDA differs from each analyst, there are some common steps that we follow to understand the underlying patterns and characteristics of a dataset. Think of it as a customizable blueprint. These steps provide a structured approach to analyze and summarize data effectively. Here are the common steps we take:

  • Data Understand / Exploration - Assuming that you’ve already gathered or received the dataset you’ll be working on, this step is where we immerse ourselves into the data to understand the variables and their meanings, data types, and if there’s any missing values. Commonly, this is also the step where we try to formulate some questions we’d like to answer later on. Having a set of metrics to quantify will also be helpful.

  • Data Cleaning / Preprocessing - There’s this saying that goes “there is no such thing as clean data,” and so this step is perhaps the most crucial one of all because it will determine the quality of your results. This step is mostly about handling missing values by either imputing or removing them, dealing with outliers by deciding whether to remove, transform, or keep them, and addressing any inconsistencies with the dataset.

  • Data Visualization - This is now where you breathe life into your exploration by creating visualizations, such as histograms, box plots, and scatter plots, to explore the distribution, variability, and relationships between the variables you have. Looking at plain numbers itself is mostly not enough for us to uncover insights in our data, and so graphs are great tools for us to see them.

  • Reporting - After doing the initial steps, this is where you try to answer the questions you formulated at the start using your newly found knowledge of the data. You’ll then summarize and document these findings and turn them into actionable recommendations, if necessary. 

Now while these are the common steps we follow, it’s important to keep in mind that most of the time, this blueprint is not a linear process. It’s expected to be an iterative process as new insights might emerge down the line. 

Exploratory Data Analysis, coupled with the appropriate selection of metrics, empowers data analysts to gain valuable insights from raw data. In the world of data analysis, EDA acts as a compass, guiding us through the intricacies of data and unraveling stories. It empowers us to make informed decisions, communicate insights effectively, and extract value from raw information. So let's embrace the art of EDA and embark on a data-driven journey to discover the hidden treasures within our datasets.

Never stop learning!