Should you use “dot notation” or “bracket notation” with pandas?
Should you’ve ever used the pandas library in Python, you most likely know that there are two methods to pick out a Collection (that means a column) from a DataFrame:
# dot notation df.col_name # bracket notation df['col_name']
Which methodology must you use? I will make the case for every, after which you’ll be able to determine…
Why use bracket notation?
The case for bracket notation is easy: It all the time works.
Listed here are the particular circumstances wherein it’s essential to use bracket notation, as a result of dot notation would fail:
# column identify features a house df['col name'] # column identify matches a DataFrame methodology df['count'] # column identify matches a Python key phrase df['class'] # column identify is saved in a variable var = 'col_name' df[var] # column identify is an integer df # new column is created by way of project df['new'] = 0
In different phrases, bracket notation all the time works, whereas dot notation solely works below sure circumstances. That is a fairly compelling case for bracket notation!
As acknowledged within the Zen of Python:
There ought to be one– and ideally just one –obvious method to do it.
Why use dot notation?
Should you’ve watched any of my pandas videos, you will have observed that I take advantage of dot notation. Listed here are 4 explanation why:
Cause 1: Dot notation is simpler to sort
Dot notation is three fewer characters to sort than bracket notation. And when it comes to finger motion, typing a single interval is rather more handy than typing brackets and quotes.
This may sound like a trivial cause, however in the event you’re choosing columns dozens (or a whole bunch) of instances a day, it makes an actual distinction!
Cause 2: Dot notation is simpler to learn
Most of my pandas code is a made up of chains of picks and strategies. By utilizing dot notation, my code is generally adorned with intervals and parentheses (plus an occasional citation mark):
# dot notation df.col_one.sum() df.col_one.isna().sum() df.groupby('col_two').col_one.sum()
Should you as an alternative use bracket notation, your code is adorned with intervals and parentheses plus a lot of brackets and citation marks:
# bracket notation df['col_one'].sum() df['col_one'].isna().sum() df.groupby('col_two')['col_one'].sum()
I discover the dot notation code simpler to learn, in addition to extra aesthetically pleasing.
Cause 3: Dot notation is simpler to recollect
With dot notation, each element in a series is separated by a interval on either side. For instance, this line of code has Four parts, and thus there are Three intervals separating the person parts:
# dot notation df.groupby('col_two').col_one.sum()
Should you as an alternative use bracket notation, a few of your parts are separated by intervals, and a few aren’t:
# bracket notation df.groupby('col_two')['col_one'].sum()
With bracket notation, I typically overlook whether or not there’s alleged to be a interval earlier than
['col_one'], or each earlier than and after
With dot notation, it is simpler for me to recollect the proper syntax.
Cause 4: Dot notation limits the utilization of brackets
Brackets can be utilized for a lot of functions in pandas:
df[['col_one', 'col_two']] df.iloc[4, 2] df.loc['row_label', 'col_one':'col_three'] df.col_one['row_label'] df[(df.col_one > 5) & (df.col_two == 'value')]
Should you additionally use bracket notation for Collection choice, you find yourself with much more brackets in your code:
df['col_one']['row_label'] df[(df['col_one'] > 5) & (df['col_two'] == 'worth')]
As you utilize extra brackets, every bracket turns into barely extra ambiguous as to its goal, imposing the next psychological burden on the individual studying the code. By utilizing dot notation for Collection choice, you scale back bracket utilization to solely the important circumstances.
Should you desire bracket notation, then you should utilize it all the time! Nevertheless, you continue to should be accustomed to dot notation with a purpose to learn different individuals’s code.
Should you desire dot notation, then you should utilize it more often than not, so long as you’re diligent about renaming columns after they accommodates areas or collide with DataFrame strategies. Nevertheless, you continue to have to make use of bracket notation when creating new columns.
Which do you like? Let me know within the feedback beneath!
When choosing a Collection (that means a column) from a #pandas DataFrame, do you typically use “dot notation” or “bracket notation”?
— Kevin Markham (@justmarkham) September 13, 2019
There have been some considerate feedback about this challenge on Twitter, principally in favor of bracket notation:
Dot notation is a strict subset of the brackets. The brackets are additionally the canonical method to “choose subsets of knowledge” from all objects in python. strings, tuples, lists, dictionaries, numpy arrays all use brackets to pick out subsets of knowledge. https://t.co/AUMwSl0Wmn
— Ted Petrou (@TedPetrou) September 13, 2019
Bracket notation for the readability areas permit, for the power to make use of f-strings in column references and for the syntax highlighting.
I’ve by no means seen any level in dot notation.
— SupineCabbage (@SublimeKarnage) September 13, 2019
I just like the dot notation as a result of tab-completion is normally obtainable and I am lazy, however in sure circumstances utilizing it’s not sensible or not attainable and I find yourself with inconsistent notation, so I switched to utilizing brackets in all places.
— Naïve Bayesian (@naivebayesian) September 13, 2019
After I first discovered Pandas I used  notation. Not too long ago I have been utilizing ‘.’ notation out of pure laziness (name it “path of least resistance”). The difficulty is that IDEs will autocomplete the best column identify after the dot, however not often after the brackets. It is simply quicker.
— Pablo Cáceres (@PabloCceres) September 13, 2019