Advice to aspiring Data Scientists – your most common questions answered
By Roman Orac, Data Scientist.
I get many messages asking for advice from aspiring Data Scientists. I am no expert in career advising, so take everything that I write with a grain of salt.
I give advice based on my observations of the field and the experience that I’ve developed over the years. This is me, advising younger me as I had similar questions at the start of my career.
What is the best way to learn and practice Data Science?
My advice would be to start with practical projects and then slowly progress with theory. Kaggle notebooks are a great way to learn the practical part.
When you become satisfied with your knowledge of tools and practices, I would suggest you construct the dataset for some problem by yourself (e.g., you can scrape the data) and apply ML algorithms to it. The hardest thing in ML is dataset construction. You might even build a company out of it.
Kaggle is a great way to learn the practical part.
What are good resources to learn Machine Learning?
I suggest you start with free resources as there are many of them available for Programming, Machine Learning, and Data Science:
I personally like the Machine Learning Coursera course by Andrew Ng. The course starts easy and then gradually gets harder as it goes. The good thing about it is that it focuses on the fundamentals of Machine Learning.
I suggest that you listen to at least the first few lectures. Don’t worry if you don’t understand everything because you can always revisit it later. I would also advise that you don’t focus just on a single course. We all learn differently, and that’s ok.
We all learn differently, and that’s OK.
I hardly have any technical background. What do you think would be the best approach to learn?
Don’t study alone! Find and join online communities that can help you learn and grow. I wrote about Data Science communities in:
You can start practicing Machine Learning in Excel. Try to implement a Linear Regression in Excel. It is a great first challenge, and it will get you motivated.
Start practicing Machine Learning in Excel.
Should I learn Python or R?
Let’s address the elephant in the room. If you’re just starting I would suggest learning Python. The main reasons are:
- rich ecosystem for Data Science, Backend… you name it, Python has it.
- the language is still gaining momentum in popularity.
With Python, you can do the analysis, develop the model from scratch, and then run it in production. While I am sure that models in R also run in production, I haven’t heard about one (let me know in the comments if your experience is different).
Don’t get me wrong, if you know R, that’s totally fine. Data Science teams are usually using both languages as some prefer R and others Python.
In the end, it doesn’t really matter as some models have to be reimplemented in a compiled language (Java, Go) to make faster predictions in production.
Python enables you to do the analysis, develop the model from scratch and run it in production.
Should I learn SQL?
This is a great question. The answer is YES — with capital letters.
Whether you’ll be using the SQL databases or not, you should know the main concepts from relational databases like joins, group by, window functions, lag, lead, etc. These concepts are essential even when working with pandas, R, or some other tool.
In case you’re interested, I also wrote a few articles about SQL:
The answer is YES — with capital letters.
Should I take more math classes?
The more math you know, the better for you in the long term. Knowing math will enable you to understand what’s happening behind the scenes of a black box Machine Learning model. It’s also easier to transfer knowledge from theory to practice.
With math you’ll understand what’s happening behind the scenes of a black box model.
Are math and statistics important in Data Science?
Math becomes crucial when you need to improve the model. You need math to understand the difference between different types of models, distributions, etc.
Senior Machine Learning Engineers can tell the main properties of a model just by looking at the optimization function.
Math becomes crucial when you try to improve the model.
Which classes should I take to be better prepared for the Data Science role?
My advice would be to think ahead. Each field needs a Data Scientist or will need it in the future. Ask yourself, in which company would you like to get an internship after you finish the study? It will be easier to get a Bioinformatics internship if you already listened to a few related classes.
Do I need a Ph.D. to work in Data Science?
You don’t need a Ph.D. to work in Data Science — meaning doing analysis of real-world data, and applying the Machine Learning models.
If your goal is doing research and developing new Machine Learning algorithms (e.g., working in Deep Mind), then you should pursue a Ph.D.
You don’t need a Ph.D. to work in Data Science, but…
How to get your first job in Data Science?
Attend local meetups. Companies are looking for new hires there. Maybe start in the Data Quality Assessment department — bigger companies have those. Online communities can also help.
Attend local meetups.
How do I know which job offer has the best mentor?
Recently, I wrote, “When you have multiple job offers, accept the one with a better mentor.”
How do you know which one has the best mentor? Try to get as much information as you can about team members during the interview, managers, their background, etc. Check their LinkedIn. Do they write on Quora, StackOverflow, Medium? Do your research.
Do your research.
Original. Reposted with permission.
Read More …