Learn with Eskwelabs: Time Series Analysis
Date
Reading Time
 minutes
Read if
Tags

Learn with Eskwelabs: Time Series Analysis

This is an accessible and friendly guide to the what and why of Time Series Analysis

Welcome to “Learn with Eskwelabs!” This series is called “From the Notebook of Our Fellows” because you will be guided by our very own alumni through a mix of basic and advanced data science concepts. Every time you read from one of our Fellows’ notebooks, just imagine that you have a data BFF or lifelong learning friend who’ll hold your hand at every step. 

Basty’s Notebook

Hi, everyone! My name is Basty, and I’m a data scientist and educator from Metro Manila. If there’s something I value so much, that is education. Education has allowed me to not only dive into the amazing world of code and data, but also to encourage and inspire others to do the same. Read more about me here

Outside of work and school, I love playing video games like Valorant and League of Legends. I also love listening to Broadway musicals (HAMILTON, DEH, TICK TICK BOOM ALL THE WAY!). Lastly, I LOVE watching Friends, New Girl, HIMYM, and The Big Bang Theory.

Now, let’s take a look at my notebook!


Hey, there! So we’ve been learning about different machine learning algorithms and data scraping already, and it’s now time again to add a new skill to your arsenal of data science skills. In this blog, I’ll be helping you learn a widely and practically used concept/technique in the real-world, Time Series!

What is Time Series Analysis?

Before I go any further, I’ll be using the words forecast and predict interchangeably so hopefully you don’t get confused, and that I’ll be focusing mainly on the analysis, rather than the forecasting aspect of time series.

Time Series Analysis and Forecasting is a very prominent field in data science. Technically, it is the process of extracting information from time-series data to forecast and gain insights from it. In other words, it helps us analyze and predict the probability that something is gonna happen based on data with respect to change in time.

“Wait, so time-series data is different from the data we’ve been talking about from the previous blogs?” Well, yes! From the name itself, time-series data is basically a sequence or series of data points that involves a time component. And since we’re dealing with time, this kind of data is known to be non-static (change or motion) and continuous.

Difference of Analysis and Forecasting

Earlier, I mentioned the words analysis and , but what is the difference between them when it comes to time series? Actually, the two are commonly used interchangeably when it comes to time series, but there is a very thin line between these two depending on your time series problem.

Basically, time series analysis is the study of patterns and trends to gain useful insights from time series data, while time series forecasting involves predicting future trends based on historical data. Hopefully, you now have a distinct understanding of the two.

Applications of Time Series Analysis

As I’ve mentioned earlier, time series analysis is a prominent field in data science, and it is widely used in the real-world. It is used in healthcare analytics, geospatial analysis, and weather forecasting! Here are some industries that time series analysis is used on:

  • Finance industry - Time series is used for analyzing the behavior of financial markets. It can also be used to forecast interest rates, foreign currency risk, and track price fluctuations over time.
  • Healthcare industry - Doctors can suggest crucial measures to reduce the risk of stroke based on a patient’s heart rate over time.
  • Weather industry - Weather prediction is based upon historical time series data. 

Regression vs Time Series

You might also be wondering about the difference of regression and time series because they both similarly work the same way. 

Regression

  • The target variable is continuous
  • Involves finding patterns in the data and predicting the target variable using this pattern

Time Series

  • The target variable is continuous
  • Involves finding trends in the data and predicting the target variable using this trend

They both have continuous target variables, and both also do the process of predicting future outcomes, so what’s the difference? 

A regression analysis is commonly good for simple relationships such as predicting the age of a person based on their height or the GPA of a student based on the amount of time they study. However, if we’re talking about the relationship over time so that we can identify patterns and trends, then that is where we use time series analysis. 

Time Series Components (Integrants)

Any time series problem can be broken down into several components, which can be very useful for analysis and forecasting. These various components can help us highlight the trend and behavior of the data over time. But hold on! Before we look into the different components, it is worth knowing there are 2 integrants that you should be aware of:

  • Systematic - These are components that can be used for predictive modeling (recall supervised learning) and that occur repeatedly. Level, trend, and seasonality come under this category.
  • Non-systematic - These are components that cannot be used for predictive modeling directly. Noise comes under this category.

For the components that we use to break down time series data, they are:

  • Structural breaks - This is a component that helps us identify sudden change in the time series data. A sudden change in the data, may it be upward or downward, could also affect the reliability of the results of your analysis.
    As you can see in this graph, there was a sudden upward shift starting April to November, and a sudden drop from November to December.
    As you can see in this graph, there was a sudden upward shift starting April to November, and a sudden drop from November to December.
Source: Analytics Vidhya
  • Trend - Trend is when the time series data moves higher or lower over a time period. Basically, these trends are either positive (increasing) or negative (decreasing) slopes over the entire range of time. For example, we want to analyze the trend of a company’s monthly revenue.

    We can see that despite the graph having both positive and negative slopes, we can see that there is an increasing trend over time in the company’s monthly revenue.

  • Seasonality - This component refers to periodic fluctuations. In other words, seasonality is something that repeats over a lapse of time, for example a year. An easy way for you to understand this component is to think about the different seasons like summer, winter, spring, and monsoon, in which each of them come and go throughout a specified period of time. For example, online sales are high during the Christmas season!
  • Noise - In simple terms, noise is the random fluctuation in your time series data. It’s an irregularity that randomly occurs if the features are not correlated with each other, and if the variance (how spread out the data is from the mean) of the data is similar across the series. Noises can lead to dirty and messy data, and this could prevent you from forecasting, and so removing or reducing noise is very important in preprocessing time series data.
  • Cyclicity - This component refers to when the time series data repeats after some interval of time, like months, years, or sometimes even decades.

    In this plot, we can see the demand of electricity per week. Notice that there is a recurring pattern after every 2 weeks. That pattern is what we call cyclicity.

Source: Analytics Vidhya

Time Series Analysis Implementation

We’ve been focusing a lot on the theoretical side of time series analysis, and so it’s time to actually do a simple analysis ourselves! 

In this tutorial, we’ll be analyzing Kaggle’s dataset.

Before proceeding, make sure to download the dataset from here: time series dataset

Read the dataset

Cleaning the dataset

Our dataset has 5 columns and 96 rows. The columns are:

  • Period - Contains the period for the model
  • Revenue - Company’s revenue for each month from 2015 to 2020
  • Sales_quantity - Company’s sales quantity
  • Average_cost - Average cost of production
  • The_average_annual_payroll_of_the_region - Average number of employees in the region per year

Plotting the line chart for all columns

This plot contains all the data from all 5 columns so we can’t really get an exact view, so let’s try to focus on the time series of revenue from 2015 to 2020 by dropping all the other columns.

Now we only have the Period and Revenue columns. Let us now plot the graph!

In this time series graph, we can see that there is an increasing trend for the company’s revenue from 2015 to 2020.

Conclusion

Congratulations on reaching the end of this blog! We learned some pretty interesting concepts about time series, my friend (you are now a time lord—just kidding!).

To summarize everything we’ve talked about from this blog:

  1. We first learned the basics of time series.
  2. We got a brief idea of the difference between time series analysis and forecasting, and even the difference between regression and time series.
  3. We also learned about the different components of time series to help us in better analyzing our data.

I hope you had fun exploring Time Series, and that you continue to learn more about it! If you’re interested in learning more about time series analysis and forecasting and you want to apply your newfound knowledge, join me in my next blog where we’ll be conducting time series analysis on Manila’s rising sea levels data!

If you’re more curious to learn more about this topic, did you know that in the 12-week Data Science Fellowship, you’ll be applying Time Series analysis on the music industry? If you’re as into music as I am, I encourage you to join the bootcamp where you’ll have the chance to analyze your favorite artist’s Spotify streams! Whoever your favorite artist is, or whatever genre you listen to, there is an application of time series that you can utilize in the bootcamp!

I hope you found this blog useful, and that you stick around for the next one where we’ll be applying it to a real project!

Never stop learning!

From the notebook of Basty Vergara | Connect with Basty via LinkedIn and Notion

This series is called “From the Notebook of Our Fellows” because you will be guided by our very own alumni through a mix of basic and advanced data science concepts. Every time you read from one of our Fellows’ notebooks, just imagine that you have a data BFF or lifelong learning friend who’ll hold your hand at every step.

RECOMMENDED NEXT STEPS

Updated for Data Science Fellowship Cohort 10 | Classes for Cohort 10 start on September 12, 2022.

If you’re ready to dive in

  • Enroll in the Data Science Fellowship via the sign up link here and take the assessment exam.
    Note:
    The assessment exam is a key part of your application. The deadline for the assessment is on August 21, 2022.

If you want to know more

YOUR NEXT READ

Bootcamp preparation

Bootcamp payment options

Other Bootcamp features