The purpose of this section is to help students who are completely new to data science get warmed up and ready to go. We will cover various topics including how to set up your computer to do data science, using GitHub and Kaggle to share your work, using the command line in your OS, etc. Are you ready?

Introduction to SQL?

This is an introduction to the Structured Query Language (SQL), which is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS). It is particularly useful in handling structured data, i.e. data incorporating relations among entities and variables. SQL is a powerful tool for creating, updating, deleting, and requesting information from databases. It is an essential skill for any data scientist because relational databases are one of the most important data sources for any data science process.

Introduction to Python?

Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects. Python is absolutely the most important skill for any data scientist to master, since most of the popular data science modules and libraries are built on top of Python nowadays. In this module, you will get to learn the basics of Python programming and build a solid foundation for later modules where you will learn all the cool things about the Python data science packages.

Intermediate Python?

After you are familiar with the basic concepts in Python programming, it is time to take your skills to the next level! As a data scientist, mastering the intermediate level Python coding is extremely beneficial since it allows you to work on more complicated problems and better leverage the power of Python.

Introduction to Numpy?

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. NumPy provides Python with a powerful array processing library and an elegant syntax that is well suited to expressing computational algorithms clearly and efficiently. We'll introduce basic array syntax and array indexing, review some of the available mathematical functions in NumPy, and discuss how to write your own routines. Along the way, we'll learn just enough about Matplotlib to visualize results from our examples.

Learn the Basics of Machine Learning?

Machine learning, the field of computer science that gives computer systems the ability to learn from data, is one of the hottest topics in data science. Machine learning is transforming the world: from spam filtering in social networks to computer vision for self-driving cars, the potential applications of machine learning are vast. This section covers the foundational machine learning concepts and tools that will help you advance in your career. Whether you’re trying to analyze a dataset using machine learning, or you’re a data analyst trying to upgrade your skills, this is the best place to start.

Deep Learning for Beginners?

Deep learning is a machine learning technique that teaches computers to do what comes naturally to humans: learn by example. Deep learning is a key technology behind driverless cars, enabling them to recognize a stop sign, or to distinguish a pedestrian from a lamppost. It is the key to voice control in consumer devices like phones, tablets, TVs, and hands-free speakers. Deep learning is getting lots of attention lately and for good reason. It’s achieving results that were not possible before.

Natural Language Processing?

Natural Language Processing or NLP is a field of Artificial Intelligence that gives the machines the ability to read, understand and derive meaning from human languages. It is a discipline that focuses on the interaction between data science and human language, and is scaling to lots of industries. Today NLP is booming thanks to the huge improvements in the access to data and the increase in computational power, which are allowing practitioners to achieve meaningful results in areas like healthcare, media, finance and human resources, among others.
Version Control with Git and GitHub

Intro to Git

In this lesson, you will learn the basics of version control for data science with Git. Git is a free and open-source distributed version control system designed to handle projects of different size, speed, and efficiency. Git has a tiny footprint with lightning-fast performance. Not only that but also, it’s great for anyone who wants to be working on code, making extra changes, and still be able to go back to an earlier version. As a data scientist, you must be familiar with Git in order to efficiently collaborate with others on large-scale and complex projects.

If you prefer to read a short article on the same topic so that you could get started quickly, here is a great one “Version Control for Data Scientists: A Hands-on Introduction”

⌨️ (0:00) Introduction
⌨️ (1:10) What is git?
⌨️ (1:30) What is version control?
⌨️ (2:10) Terms to be learn in video
⌨️ (5:20) Git commands
⌨️ (7:05) sign up in GitHub
⌨️ (11:32) using git in local machine
⌨️ (11:54) git install
⌨️ (12:48) getting code editor
⌨️ (13:30) inside VS Code
⌨️ (14:30) cloning through VS Code
⌨️ (17:30) git commit command
⌨️ (18:15) git add command
⌨️ (19:15) committing
⌨️ (20:20) git push command
⌨️ (20:30) SSH Keys
⌨️ (25:25) git push
⌨️ (30:21) Review workflow so far
⌨️ (31:40) Compare between GitHub workflow and local git workflow
⌨️ (32:42) git branching
⌨️ (56:30) Undoing in git
⌨️ (1:01:50) Forking in git
⌨️ (1:07:55) Ending