Learn Data Science for free in 2021
Ideally, A Data Scientist have the following skills
- Programming Skills (Python or R)
- Data Analysis and Visualization
- Data Preprocessing
- Database (Relational, NonRelational)
- Machine Learning
- Linear Algebra and Statistics
- Deep Learning
- Cloud for model deployment
I will discuss all these fields and the best online courses to get started.
1. Programming Skills (~1 Month)
A good data Scientist is well versed in programming, especially in Python or R as these languages are top data science languages.
Google Trends: Blue is Python, Red is R.
We can see that there is a great worldwide interest in the Python programming language as compared to R, so I would advise a beginner to start learning and getting a good grip on Python.
You should start by learning the basics of Python via the Sentdex YouTube channel. He has a great series for beginners.
- Python 3 basics Tutorial has 68 short videos that cover a lot of topics and important modules.
- Intermediate Python Programming has 26 videos that covers intermediate python topics such as OOP, Error Handling, Async programming with Python, Decorators, and many more.
Going through these 2 series will help you get a good grip on Python. To enhance it further, I suggest you do some projects like these 5 intermediate projects with Python.
Alternatively, you can buy this Python Bootcamp by Jose Portilla that covers your basics to advanced and some projects.
2. Data Analysis and Visualization (~1.5–2 Months)
Data Analysis and Visualization are core parts of a data scientist. While there are a lot of automated tools, data analysis and visualization via Python still have a lot of value and worth. To start with, you should learn
- Numpy, a well-known mathematical library that is used to store n-dimensional data and can perform very fast mathematical functions on it. You can learn it via Freecodecamp, a well-known learning platform that has a great crash course on Numpy. For in-depth Numpy, you can read this free chapter from the book “Python Data Science Handbook.”
- Pandas, a popular data analysis library, is used to store, manipulate, visualize the dataset. You can learn it for free from Corey Schefer, a well-known Python programmer who has a great playlist on Pandas. Also, you can check the Pandas chapter from “Python Data Science Handbook,” which is free.
- Matplotlib, a popular and powerful plotting and visualization library inspired by ggplot in R is widely used for data visualization in Python. Sentdex has a great playlist to learn Matplotlib. You can also refer to Corey Schefer’s Matplotlib playlist on Youtube. If you like to read from books, you can refer to “Python Data Science Handbook,” which has a great chapter on Matplotlib.
You can refer to this mini 12-hour course by Freecodecamp for learning all these things in 1 video.
3. Data Preprocessing (~1 Months)
So, while learning pandas, you have had learned the basics of manipulating and preprocessing the data sets, so I would recommend some of the practical courses that will help you go through the basics of data preprocessing via sklearn and Pandas in Python.
I recommend you this data cleaning course by the Kaggle team, which is a very practical and hands-on course and you will learn a lot.
After this course, I recommend you go through this Feature Engineering Course by Kaggle, which is again a very practical course. This course will improve your skills a lot.
4. Databases (~2 Months)
Databases are essential skills for a data scientist. Mostly the data of any company or firm is stored in a database, and a data scientist must know the important queries in order to get the work done via the database.
Once again, I recommend this free course, “Intro to SQL” by Kaggle, which will teach you all the basics of the SQL database. After that, you can do this advanced SQL course by Kaggle, which will teach you hands-on practice on datasets using advanced SQL techniques.
For NoSQL databases, you go through this quick course by Freecodecamp, which will teach you all the basics of NoSQL and using it with Python.
5. Machine Learning (~3 Months)
I will divide this portion into 2 parts: 1.5 months for the theory of algorithms and 1.5 months for the practical implementation of the algorithm.
I will recommend you the popular Machine Learning Course by Professor Andrew Ng, who is a professor at Stanford and a world-renown Machine Learning Engineer. This course has been done by over 3.5 million students and has an outstanding rating of 4.9*. This course is free to audit, and all exercises are free too but have a fee if you want to get certified.
You can easily complete this course in 1 month without diving into the MATLAB part (this course has exercises in MATLAB). You can simply skip the MATLAB part and learn the theoretical explanation of the machine learning algorithms, where professor NG explains complex concepts in a very easy and precise manner.
Completing this theoretical portion, I want you to spend the next month applying all these algorithms you learned in Python from scratch, for which there are designed exercises and their grading in Python, that resembles the actual course assignments, but the only difference is that they are in Python. You can check this repo for it.
Now for the next half month, you can dig into some other machine learning algorithms, which are not discussed in the course, such as random forest, naive Bayes, and decision trees.
6. Linear Algebra and Statistics (~1.5 Months)
Linear algebra and statistics are the bread and butter of a data scientist and essential to understanding several different machine learning algorithms perfectly. I recommend you going through Khan Academy Linear Algebra Course or MIT online free Linear Algebra course by Professor Gilbert Strang.
7. Deep Learning (~2 Months)
Deep Learning is an essential part of a data scientist and a very powerful tool. A very hands-on course on Deep Learning is fast.ai, practical deep learning that will help you start your deep learning journey in an easy, precise, and practical way.
You can easily complete this course in 2 months, where you will learn the practical as well as theoretical part. You will learn the fast.ai package and PyTorch along the way.
If you do not want to go towards the fast.ai and PyTorch side, you can do the Deep Learning Specialization by Andrew Ng (free to audit) and Tensorflow in Practice specialization by Lawrence Moorey. The first one will get you a good grip on the theoretical portion of deep learning, and the second one will get you a good grip on the practical aspects of deep learning.
8. Cloud for model deployment (~2 Weeks)
You can spend good 2 weeks learning AWS via this playlist by the official Amazon team, which will help you learn how to build, train, test, and deploy a machine learning model on AWS.
Alternatively, you can do this specialization, “Machine Learning with TensorFlow on Google Cloud Platform Specialization,” on Coursera, which is free to audit and offered by the official Google cloud team to learn the Google cloud.
Additional Free Resources for Each Topic:
Read More …