Hi, everyone! My name is Basty, and I’m a data scientist and educator from Metro Manila. If there’s something I value so much, that is education. Education has allowed me to not only dive into the amazing world of code and data, but also to encourage and inspire others to do the same. Read more about me here.
Outside of work and school, I love playing video games like Valorant and League of Legends. I also love listening to Broadway musicals (HAMILTON, DEH, TICK TICK BOOM ALL THE WAY!). Lastly, I LOVE watching Friends, New Girl, HIMYM, and The Big Bang Theory.
Now, let’s take a look at my notebook!
Hello there, data enthusiasts! In the age of AI, it’s easy to become excited with advanced models, algorithms, and complex methodologies. However, sometimes we need to remind ourselves to take a step back and appreciate the beauty of raw data. It’s an art that empowers us to discover the underlying story of all the information around us. So for this month’s technical blog, gear up as we embark on this exciting adventure through Exploratory Data Analysis. Prepare to sharpen your analytical skills, ignite your curiosity, and discover the untold stories that reside within your data. Let’s get started!
To start, what even is EDA? It’s a term that you’ll often hear in the data industry and while it stands for one thing, it could mean different things to every other data enthusiast. EDA, or Exploratory Data Analysis is a preliminary step in the data analysis process that involves examining and summarizing the main characteristics of a dataset. It’s basically the step where we explore our dataset, hence the word “Exploratory”.
Arguably, EDA is one of the most crucial steps in data analysis and machine learning because it helps data scientists gain a comprehensive understanding of the data, identify potential issues such as missing values or outliers, and explore relationships between variables—which is important when creating models.
Another reason it's beneficial for data scientists is that since EDA helps us explore relationships between variables, it leads us to the formulation of hypotheses. It also provides a foundation for us to make informed decisions regarding the appropriate statistical techniques and models to apply.
EDA also has a unique power—it allows us to unravel the captivating stories hidden within our datasets. Like skilled detectives, we embark on a quest to unveil the tales that lie beneath the surface. We become storytellers, using data as our medium, and EDA as our compass. Through the art of EDA, we can breathe life into numbers and charts, transforming them into narratives that captivate our audience. We piece together the intricate plotlines, unravel the mysteries, and unveil the insights that shape our understanding of the data. Each variable and relationship becomes a character, and every observation holds a clue waiting to be discovered.
One of the key challenges in EDA (and also, in general) is selecting the appropriate metrics to quantify and measure the variables of interest. The metrics that we choose should align with the goals and objectives of the analysis. Different types of data also require different types of metrics. There’s usually not a one-size-fits all solution when it comes to choosing metrics. For numerical data, common metrics include mean, median, standard deviation, and correlation coefficients. Categorical data, on the other hand, might require metrics such as counts, proportions, or mode. Time-series data often employs metrics such as moving averages or growth rates.
While I’ve mentioned that EDA differs from each analyst, there are some common steps that we follow to understand the underlying patterns and characteristics of a dataset. Think of it as a customizable blueprint. These steps provide a structured approach to analyze and summarize data effectively. Here are the common steps we take:
Now while these are the common steps we follow, it’s important to keep in mind that most of the time, this blueprint is not a linear process. It’s expected to be an iterative process as new insights might emerge down the line.
Exploratory Data Analysis, coupled with the appropriate selection of metrics, empowers data analysts to gain valuable insights from raw data. In the world of data analysis, EDA acts as a compass, guiding us through the intricacies of data and unraveling stories. It empowers us to make informed decisions, communicate insights effectively, and extract value from raw information. So let's embrace the art of EDA and embark on a data-driven journey to discover the hidden treasures within our datasets.