How do I use string methods in pandas?




[ad_1]

pandas includes powerful string manipulation capabilities that you can easily apply to any Series of strings. In this video, I’ll show you how to access string methods in pandas (along with a few examples), and then end with two bonus tips to help you maximize your efficiency.

SUBSCRIBE to learn data science with Python:
https://www.youtube.com/dataschool?sub_confirmation=1

JOIN the “Data School Insiders” community and receive exclusive rewards:
https://www.patreon.com/dataschool

== RESOURCES ==
GitHub repository for the series: https://github.com/justmarkham/pandas-videos
string handling documentation: http://pandas.pydata.org/pandas-docs/stable/api.html#string-handling

== LET’S CONNECT! ==
Newsletter: https://www.dataschool.io/subscribe/
Twitter: https://twitter.com/justmarkham
Facebook: https://www.facebook.com/DataScienceSchool/
LinkedIn: https://www.linkedin.com/in/justmarkham/

Source


[ad_2]

Comment List

  • Data School
    November 30, 2020

    That's really helpful!! Thank you so muchhh!!

  • Data School
    November 30, 2020

    regular expressions are useful, i'm glad you included that in the video as well. it's def a bonus

  • Data School
    November 30, 2020

    Wow, well described! Not so well in the Pandas Docs. Could have saved me days literally.

  • Data School
    November 30, 2020

    Hi @Kevin,

    One quick doubt, why str function replace doesn't support inplace

    Thanks, Suren

  • Data School
    November 30, 2020

    dataset[dataset.Name.str.contains("Dr.")]

    ### This line of code allowed me to names with Dr. (Backslash dot is ignore special character dot).

  • Data School
    November 30, 2020

    sir i have a question that how we can calculate occurence of all alphabet in multiple strings

  • Data School
    November 30, 2020

    excellent technique

  • Data School
    November 30, 2020

    I searched in the latest docs, it doesn't have string handling. What have they replaced it with?

  • Data School
    November 30, 2020

    Well done bro i liked your teaching way and waiting for your new videos <3

  • Data School
    November 30, 2020

    best teacher!!!!

  • Data School
    November 30, 2020

    Hi, your videos are very useful, could you suggest any material (or video) to learn regex in Python?

  • Data School
    November 30, 2020

    little related to core programming
    so orders.item_name is an attribute, right?
    and then orders.item_name.str is an object belonging to StringMethods Class. Correct me if I am wrong.

  • Data School
    November 30, 2020

    Please do more videos on regex method. Because I get more difficulty on handling string. First, How can I identity list of special characters that are available in a column without looking into a file? Also, how can I remove particular special characters from a column?
    For example, a column contains [ "USA [edit]", "New York (01)", "Washington DC (02)", "Germany [edit]", "Berlin, Capital:", "Frankfurt (01)" ] and I want this type of result [ "USA", "New York", "Washington DC", "Germany", "Berlin, Capital", "Frankfurt"]. If I have less data then it is easy to use for loop but if I have large amount of data such as 100000×2000 than which is the fastest way to analyse data, remove some special characters and replace them with an existing series?

  • Data School
    November 30, 2020

    thank you for the great tutorials,it's helping me a lot!

  • Data School
    November 30, 2020

    Hi Kevin, firstly kudos for these great tutorials,

    my doubs are: how does
    1) pandas DataFrame reflects to modifications(like slicing, filtering etc) on its rows or columns, in terms of memory allocation? Does everytime a new copy is created?
    2) Also on passing pandas df to a function, do they follow pass by value or pass by reference? Please shed some light on this

  • Data School
    November 30, 2020

    I'll got an error if I want to show the DataFrame after replacing a string
    e.g. : orders[orders.choice_description.str.replace('[','')]
    The error is: ValueError: cannot mask with array containing NA / NaN values
    The error remains also after removing the NaNs

  • Data School
    November 30, 2020

    Hands down one the best channels for learning data science. The small cues you explain besides the normal code give you the edge over others.

    PS. You remind me of Sheldon from Big Bang Theory

  • Data School
    November 30, 2020

    Excellent video but I am looking for the next level which is super easy with SQL.
    I am looking the SQL equivalent of replacing a string with a sub-string. Assuming you had a date formatted as "xxYYYYMMDDhhmm" and you wanted to extract only the YYYYMMDD.
    Using SQL you would do something like:
    Update Tbl set DColumn = substring(DColumn, 3,8) where ….

    I used your method df['DColumn'].str[2:10] and got what I needed and tried to use str.replace but I was not able to get it to work. I am thinking there should be a simple solution but I am not seeing it. I am thinking my SQL background is getting in my way.

    Do you have any suggestions of know of another video that may point me in the right direction? Thanks.

  • Data School
    November 30, 2020

    It's as if … PANDAS just turned into peanuts!!! God bless you …

  • Data School
    November 30, 2020

    Keep doing the good work. love your videos and your way of explanation. Most other tutors on youtube often rush things or aren't clear on what they are saying. you have a strong, clear and loud voice. this makes things easy to understand. thanks.

  • Data School
    November 30, 2020

    With below code I am able to create dataframe
    VAT = comm[comm['Particulars'].str.contains("comm|Britain", case=False)==True]

    VAT

    But I want to create column
    below code does not work
    comm['VAT'] = comm[comm['Particulars'].str.contains("comm|Britain", case=False)==True]

    comm

    Can you provide proper code ?

  • Data School
    November 30, 2020

    how do you pick only 'black beans ' from choice_decription?

  • Data School
    November 30, 2020

    great as always

  • Data School
    November 30, 2020

    Hello Kevin, Thank you for the video, it's helping me a lot!. I would like to know when I replace the [ ] ….how do I apply the changes to my dataframe. Thank you.

  • Data School
    November 30, 2020

    Hello Kevin,

    Thanks for sharing knowledge.
    at 2.29 able to get those columns fro particular strings but if I need more filtration like where order id =2 and view the only order_id, choice description, and item _price.
    how to achieve that? Could you help !

  • Data School
    November 30, 2020

    thank you

Write a comment