(Tutorial) Pandas Apply – DataCamp


One different to utilizing a loop to iterate over a DataFrame is to make use of the pandas .apply() methodology. This perform acts as a map() perform in Python. It takes a perform as an enter and applies this perform to a complete DataFrame.

If you’re working with tabular information, you should specify an axis you need your perform to behave on (0 for columns; and 1 for rows).

Very similar to the map() perform, the apply() methodology will also be used with nameless capabilities or lambda capabilities. Let us take a look at some apply() examples utilizing baseball information.

Calculating Run Differentials With .apply()

First, you’ll name the .apply() methodology on the basebal_df dataframe. Then use the lambda perform to iterate over the rows of the dataframe. For each row, we seize the RS and RA columns and move them to the calc_run_diff perform. Lastly, you’ll specify the axis=1 to inform the .apply() methodology that we wish to apply it on the rows as an alternative of columns.

    lambda row: calc_run_diff(row['RS'], row['RA']),

You’ll discover that we need not use a for loop. You possibly can acquire the run differentials straight into an object known as run_diffs_apply. After creating a brand new column and printing the dataframe you’ll discover that the outcomes are much like what you’ll get with the .iterrows() methodology.

run_diffs_apply = baseball_df.apply(
         lambda row: calc_run_diff(row['RS'], row['RA']),
baseball_df['RD'] = run_diffs_apply
      Staff    League    12 months   RS    RA    W    G   Playoffs    RD
0      ARI        NL    2012  734   688   81  162          0    46
1      ATL        NL    2012  700   600   94  162          1   100
2      BAL        AL    2012  712   705   93  162          1     7

Interactive Instance Utilizing .apply()

The Tampa Bay Rays need you to investigate their information.

They’d like the next metrics:

  • The sum of every column within the information
  • The full quantity of runs scored in a 12 months ('RS' + 'RA' for every year)
  • The 'Playoffs' column in textual content format quite than utilizing 1's and 0's

The beneath perform can be utilized to transform the 'Playoffs' column to textual content:

def text_playoffs(num_playoffs):
    if num_playoffs == 1:
        return 'Sure'
      return 'No'

Use .apply() to get these metrics. A DataFrame (rays_df) has been printed beneath. This DataFrame is listed on the 'Yr' column.

       RS   RA   W  Playoffs
2012  697  577  90         0
2011  707  614  91         1
2010  802  649  96         1
2009  803  754  84         0
2008  774  671  97         1
  • Apply sum() to every column of the rays_df to gather the sum of every column. You should definitely specify the proper axis.
# Collect sum of all columns
stat_totals = rays_df.apply(sum, axis=0)

Once we run the above code, it produces the next outcome:

RS          3783
RA          3265
W            458
Playoffs       3
dtype: int64

Try it for yourself.

To be taught extra about pandas different to looping, please see this video from our course Writing Efficient Python Code.

This content material is taken from DataCamp’s Writing Efficient Python Code course by Logan Thomas.


Source link

Write a comment