Hi, everyone! My name is Basty, and I’m a data scientist and educator from Metro Manila. If there’s something I value so much, that is education. Education has allowed me to not only dive into the amazing world of code and data, but also to encourage and inspire others to do the same. Read more about me here.
Welcome back to the last chapter of our series in the Journey into the World Data Science. We have now tackled the path to becoming a data scientist and data analyst, and now it is high time to discuss one of the most popular data roles in 2022, a data engineer! Let’s get started!
Let’s first start by defining what a data engineer is. If you can recall, we defined a data scientist and data analyst as detectives, but it’s not the same when it comes to data engineers.
Data engineers are like architects, they are the ones who craft the “blueprint." By blueprint, I mean the entire foundation of an organization’s data landscape. They lay out the processes for acquiring, storing, transforming, and managing data. Basically, out of all the three career paths we have talked about, data engineers are the most technical people in the field of data, because they act as a bridge between software development and data science.
Now let’s take a look at what a data engineer does. Data engineers play a vital role in developing and maintaining the data architecture of a company. They prepare large datasets so that data analysts and data scientists can use them for analysis. So whenever an analyst or scientist needs to analyze data or create models, think of the data engineer as the mediator who provides them the necessary data needed.
You might’ve also come across the term “ETL," which stands for Extract, Transform, Load. ETL is the combined process of data extraction, transformation, and loading data between different environments. Aside from ETL, data engineers also do data cleaning processes to ensure that the data being used by data analysts and data scientists are in a structured format ready for analysis.
You might’ve also heard about the term “data pipeline”. A data pipeline basically moves data into different stages, an example of this is when you want to transfer data from a database to another environment. Creating data pipelines is a huge part of a data engineer’s task to ensure that data management is being automated.
It’s now time to take a look at the set of technical skills needed for data engineers to take on complex data tasks:
There are still a lot of different skills that you’d need to learn to be a data engineer, such as computing and stream processing frameworks, but we won’t be diving into them. Instead, I trust you to explore them yourself. This is the joy of being a lifelong learner!
Before we discuss how you can become a data engineer, let’s first distinguish the difference of being a data engineer compared to data analysts and data scientists. The main difference is that data engineers are responsible for designing, building, and maintaining data architectures. As for both data scientists and data analysts, they’re responsible for using data to perform different types of analyses to solve business problems.
Just like with becoming a data scientist and data analyst, there is no one defined solution to becoming a data engineer. Most of the time, data engineers are those who transitioned to the role that came from other data roles. Usually, they started out as data scientists then transitioned to being a data engineer.
If you are someone who’s transitioning into this role, or someone who is completely new to this field, here are some other ways for you to become a data engineer:
Now despite having these different pathways of becoming a data engineer, the simplest answer I can give you is to just start or keep learning. The more you learn and train your skills, the more chances you will stand out as a data engineer!
Lastly, speaking of bootcamps, if you want a more structured—and at the same time—social approach on learning data science, make sure to check out the 15-week Data Science Fellowship of Eskwelabs. You can also check out the 9-week Data Analytics Bootcamp. Aside from the portfolio of projects you will make, you will also benefit from the bootcamp’s 1:5 mentor to student ratio.
I hope you found this blog post helpful. See you in the next one!