(Tutorial) Pandas Apply – DataCamp
One different to utilizing a loop to iterate over a DataFrame is to make use of the pandas
.apply() methodology. This perform acts as a
map() perform in Python. It takes a perform as an enter and applies this perform to a complete DataFrame.
If you’re working with tabular information, you should specify an axis you need your perform to behave on (
0 for columns; and
1 for rows).
Very similar to the
map() perform, the
apply() methodology will also be used with nameless capabilities or lambda capabilities. Let us take a look at some
apply() examples utilizing baseball information.
Calculating Run Differentials With
First, you’ll name the
.apply() methodology on the
basebal_df dataframe. Then use the
lambda perform to iterate over the rows of the dataframe. For each row, we seize the
RA columns and move them to the
calc_run_diff perform. Lastly, you’ll specify the
axis=1 to inform the
.apply() methodology that we wish to apply it on the rows as an alternative of columns.
baseball_df.apply( lambda row: calc_run_diff(row['RS'], row['RA']), axis=1 )
You’ll discover that we need not use a
for loop. You possibly can acquire the run differentials straight into an object known as
run_diffs_apply. After creating a brand new column and printing the dataframe you’ll discover that the outcomes are much like what you’ll get with the
run_diffs_apply = baseball_df.apply( lambda row: calc_run_diff(row['RS'], row['RA']), axis=1) baseball_df['RD'] = run_diffs_apply print(baseball_df)
Staff League 12 months RS RA W G Playoffs RD 0 ARI NL 2012 734 688 81 162 0 46 1 ATL NL 2012 700 600 94 162 1 100 2 BAL AL 2012 712 705 93 162 1 7
Interactive Instance Utilizing
The Tampa Bay Rays need you to investigate their information.
They’d like the next metrics:
- The sum of every column within the information
- The full quantity of runs scored in a 12 months (
'RA'for every year)
'Playoffs'column in textual content format quite than utilizing
The beneath perform can be utilized to transform the
'Playoffs' column to textual content:
def text_playoffs(num_playoffs): if num_playoffs == 1: return 'Sure' else: return 'No'
.apply() to get these metrics. A DataFrame (
rays_df) has been printed beneath. This DataFrame is listed on the
RS RA W Playoffs 2012 697 577 90 0 2011 707 614 91 1 2010 802 649 96 1 2009 803 754 84 0 2008 774 671 97 1
sum()to every column of the
rays_dfto gather the sum of every column. You should definitely specify the proper
# Collect sum of all columns stat_totals = rays_df.apply(sum, axis=0) print(stat_totals)
Once we run the above code, it produces the next outcome:
RS 3783 RA 3265 W 458 Playoffs 3 dtype: int64
To be taught extra about pandas different to looping, please see this video from our course Writing Efficient Python Code.
This content material is taken from DataCamp’s Writing Efficient Python Code course by Logan Thomas.