“How do I get started in data?”

This is a question we hear all the time, and there is no perfect one-size-fits-all path.

It helps when practicing your skills to start with a domain you are interested in. This is a subject area where the intricacies and rabbit holes make sense. Topics which you find diving through the minutiae incredibly interesting and holding up a metaphorical magnifying glass leaves you with a satisfied ‘ohhhh’.

This is helpful in thinking about the exploratory process of investigating a data set (which we commonly call EDA or “exploratory data analysis”), not as some kind of monotonous checklist of procedures—but rather as an act of play.

Play is important for adults, as described in a number of journal articles as well as news stories. EDA can be a great opportunity to flex creative muscles as much as technical ones.

Simply put—exploring data should be fun, and I sincerely hope you learn to love it as much as I do.

Finding Data to play with

Depending on your interests, there are many great starting points, and often seeing how data storytellers and experts in data visualization present data narratives is great inspiration.

Teams at the New York Times work on interactive visualizations like charting Biden’s Economic Plan, teams at Mapbox have explored New York Taxi ride datasets, and FiveThirtyEight discuss their design thinking for their 2020 US Election Forecast.

Rappler has also won a data journalism award for their #SaferRoadsPH work.

Visualization professionals like Nathan Yao showcase work and tutorials on his site FlowingData.com.

Inspiration for other community-created interactive visualizations is also available on observablehq.com with code examples, particularly if you wish to learn tools like D3.js and Javascript.

If you are interested in exploring Data Science and Machine Learning then Kaggle Datasets are a great way to learn with small examples. Github hosts many datasets, as does Papers With Code and official sources, like Open Data Philippines.

Many of these examples and datasets are biased towards North American or European sources. Recently a new initiative to increase the availability of Filipino NLP data launched www.nlpinas.org.ph. Hopefully, this will be one of many initiatives in the Open Data space to encourage free and available local data.

Journalistic investigation

An essential skill in exploring data is the art of being curious. Developing critical enquiry is much like any kind of exercise—we get better with practice. Thinking about how we ask the right questions of a dataset, determining what it means and who wrote the definitions and why are all important traits for a data professional.

You should imagine holding your dataset up for journalistic interrogation—does it support your hypotheses? Is it hiding things? What stories does it tell?

Data Literacy, Data Fluency, and Data Poetry

Asking data questions in practice helps us be more critical of data stories we are told by other sources. Did a particular study which a health article quotes use a representative sample? Are government budget figures inflated over many years to sound more impressive than they are? Is a data narrative being told in a responsible way?

Being curious is an essential part of training our own level of data literacy and helping us to interpret data stories to tell data narratives ethically. By sharpening our skills with data manipulation and exploration tools and adding in the art of visualisation we can then think deeply about what we are communicating and how different audiences will receive and interpret the data. In this sense we may get a little closer to perhaps being almost poetic in our data storytelling.

Be part of a community

Learning and experimenting is so much easier with a supportive community sharing those same goals. At Eskwelabs we believe in lifelong learning, which is part of why we are building this kind of community with our Data Club. Check it out!


Stay Curious,
Caleb
Eskwelabs CTO and Co-Founder