5 Stories Data Tell us About Data Scientists | by Éverton Bin | Nov, 2020
Data scientists tell stories through data. But what stories can data tell about data scientists?
It may sound like the revenge of structured data, but it’s actually just a survey conducted by Kaggle Platform. The result of the 2019 Kaggle Machine Learning and Data Science Survey was made available here and that is the data that we have used to see what they could tell us about data scientists.
Here are the questions that guided us through this analysis:
- What is their educational background?
- Which activities do they perfom in their daily work?
- Which tools do they use?
- And what about their salary?
And these are the stories they told us:
I know this first figure may look like bad news for you if you’re just getting started in the journey to become a data scientist. I don’t want to give you any spoiler, but if you feel disapointed now, go check the last section called Education x Salary and it may calm you down again.
Most of the data scientists who answerd the survey has indeed a Master’s degree on their back. Still, it is very common that they attend online courses through those popular platforms like Coursera, Udemy, DataCamp, Udacity and many others.
It could indicate that, no matter what is your formal educational background, you will have to continously seek for knowledge, especially in this technological field that brings us news almost every single day.
When hearing about Data Science, it is inevitable to think about the hype of arificial intelligence and machine learning algorithms. But when it comes to a regular day in a data scientist’s life, would that be the main task?
Over 60% of them actually has to deal with data analysis instead of just worrying about what algorithm to use or finding a way to improve their models. Actually, the survey shows us that it doesn’t matter if you are a data scientist or a data engineer, data analysis will play a big role in your activities.
Also, we can see that almost 35% of the data scientists who participated in the survey performs activities realted to data infrastructure. That shows how different roles in the Data Science field relates to one another.
Probably it would be no surprise if I would tell you that Python is the most popular programming language among data scientists. But what if I asked you what is the second language they use the most?
If your answer was SQL, you got it right! SQL, or Structured Query Language, is a simple but powerful language, and its commands can be grouped in four main functions: data definition language (DDL), data manipulation language (DML), data control language (DQL) and data query language (DQL).
It shows us that a great part of the work for a data scientist is to understand data, by executing queries in databases, transforming, manipulating and analyzing them.
But language is just one of the tools that we can find in a data scientist’s toolbox. What about the most used databases and machine learning frameworks?