Pandas Cheat Sheet — Python for Data Science – Dataquest


If you are all for working with information in Python, you are nearly definitely going to be utilizing the pandas library. But even if you’ve discovered pandas — maybe in our interactive pandas course — it is easy to overlook the particular syntax for doing one thing. That’s why we have created a pandas cheat sheet that will help you simply reference the most typical pandas duties.

Before we dive into the cheat sheet, it is price mentioning that you simply should not depend on simply this. If you have not discovered any pandas but, we might strongly suggest working by our pandas course. This cheat sheet will provide help to shortly discover and recall stuff you’ve already discovered about pandas; it is not designed to show you pandas from scratch!

It’s additionally a good suggestion to test  to the official pandas documentation every now and then, even when you will discover what you want within the cheat sheet. Reading documentation is a talent each information skilled wants, and the documentation goes into much more element than we will slot in a single sheet anyway!

If you are wanting to make use of pandas for a selected job, we additionally suggest trying out the complete listing of our free Python tutorials; lots of them make use of pandas along with different Python libraries. In our Python datetime tutorial, for instance, you may additionally learn to work with dates and instances in pandas.

Pandas Cheat Sheet: Guide

First, it might be a good suggestion to bookmark this web page, which might be straightforward to look with Ctrl+F if you’re wanting for one thing particular. However, we have additionally created a PDF model of this cheat sheet you could obtain from right here in case you’d prefer to print it out.

In this cheat sheet, we’ll use the next shorthand:

df | Any pandas DataBody object
s | Any pandas Series object

As you scroll down, you may see we have organized associated instructions utilizing subheadings as a way to shortly search for and discover the proper syntax primarily based on the duty you are attempting to finish.

Also, a fast reminder — to utilize the instructions listed beneath, you may must first import the related libraries like so:

import pandas as pd
import numpy as np

Importing Data

Use these instructions to import information from a wide range of completely different sources and codecs.

pd.read_csv(filename) | From a CSV file
pd.read_table(filename) | From a delimited textual content file (like TSV)
pd.read_excel(filename) | From an Excel file
pd.read_sql(question, connection_object) | Read from a SQL desk/database
pd.read_json(json_string) | Read from a JSON formatted string, URL or file.
pd.read_html(url) | Parses an html URL, string or file and extracts tables to a listing of dataframes
pd.read_clipboard() | Takes the contents of your clipboard and passes it to read_table()
pd.DataBody(dict) | From a dict, keys for columns names, values for information as lists

Exporting Data

Use these instructions to export a DataBody to CSV, .xlsx, SQL, or JSON.

df.to_csv(filename) | Write to a CSV file
df.to_excel(filename) | Write to an Excel file
df.to_sql(table_name, connection_object) | Write to a SQL desk
df.to_json(filename) | Write to a file in JSON format

Create Test Objects

These instructions could be helpful for creating take a look at segments.

pd.DataBody(np.random.rand(20,5)) | 5 columns and 20 rows of random floats
pd.Series(my_list) | Create a sequence from an iterable my_list
df.index = pd.date_range('1900/1/30', intervals=df.form[0]) | Add a date index

Viewing/Inspecting Data

Use these instructions to check out particular sections of your pandas DataBody or Series.

df.head(n) | First n rows of the DataBody
df.tail(n) | Last n rows of the DataBody
df.form | Number of rows and columns | Index, Datatype and Memory info
df.describe() | Summary statistics for numerical columns
s.value_counts(dropna=False) | View distinctive values and counts
df.apply(pd.Series.value_counts) | Unique values and counts for all columns


Use these instructions to pick a selected subset of your information.

df[col] | Returns column with label col as Series
df[[col1, col2]] | Returns columns as a brand new DataBody
s.iloc[0] | Selection by place
s.loc['index_one'] | Selection by index
df.iloc[0,:] | First row
df.iloc[0,0] | First ingredient of first column

Data Cleaning

Use these instructions to carry out a wide range of information cleansing duties.

df.columns = ['a','b','c'] | Rename columns
pd.isnull() | Checks for null Values, Returns Boolean Arrray
pd.notnull() | Opposite of pd.isnull()
df.dropna() | Drop all rows that comprise null values
df.dropna(axis=1) | Drop all columns that comprise null values
df.dropna(axis=1,thresh=n) | Drop all rows have have lower than n non null values
df.fillna(x) | Replace all null values with x
s.fillna(s.imply()) | Replace all null values with the imply (imply could be changed with nearly any operate from the statistics module)
s.astype(float) | Convert the datatype of the sequence to drift
s.substitute(1,'one') | Replace all values equal to 1 with 'one'
s.substitute([1,3],['one','three']) | Replace all 1 with 'one' and 3 with 'three'
df.rename(columns=lambda x: x + 1) | Mass renaming of columns
df.rename(columns={'old_name': 'new_ title'}) | Selective renaming
df.set_index('column_one') | Change the index
df.rename(index=lambda x: x + 1) | Mass renaming of index

Filter, Sort, and Groupby

Use these instructions to filter, kind, and group your information.

df[df[col] > 0.5] | Rows the place the column col is bigger than 0.5
df[(df[col] > 0.5) & (df[col] < 0.7)] | Rows the place 0.7 > col > 0.5
df.sort_values(col1) | Sort values by col1 in ascending order
df.sort_values(col2,ascending=False) | Sort values by col2 in descending order
df.sort_values([col1,col2],ascending=[True,False]) | Sort values by col1 in ascending order then col2 in descending order
df.groupby(col) | Returns a groupby object for values from one column
df.groupby([col1,col2]) | Returns groupby object for values from a number of columns
df.groupby(col1)[col2] | Returns the imply of the values in col2, grouped by the values in col1 (imply could be changed with nearly any operate from the statistics module)
df.pivot_table(index=col1,values=[col2,col3],aggfunc=imply) | Create a pivot desk that teams by col1 and calculates the imply of col2 and col3
df.groupby(col1).agg(np.imply) | Find the typical throughout all columns for each distinctive col1 group
df.apply(np.imply) | Apply the operate np.imply() throughout every column
nf.apply(np.max,axis=1) | Apply the operate np.max() throughout every row


Use these instructions to mix a number of dataframes right into a single one.

df1.append(df2) | Add the rows in df1 to the top of df2 (columns must be equivalent)
pd.concat([df1, df2],axis=1) | Add the columns in df1 to the top of df2 (rows must be equivalent) part of(df2,on=col1,how='interior') | SQL-style be part of the columns in df1 with the columns on df2 the place the rows for col have equivalent values. 'how' could be one in every of 'left''proper''outer''interior'


Use these instructions to carry out varied statistical assessments. (These can all be utilized to a sequence as properly.)

df.describe() | Summary statistics for numerical columns
df.imply() | Returns the imply of all columns
df.corr() | Returns the correlation between columns in a DataBody
df.rely() | Returns the variety of non-null values in every DataBody column
df.max() | Returns the very best worth in every column
df.min() | Returns the bottom worth in every column
df.median() | Returns the median of every column
df.std() | Returns the usual deviation of every column

Download a printable model of this cheat sheet

If you’d prefer to obtain a printable model of this cheat sheet you are able to do so right here.


Source hyperlink

Write a comment