Stemming – Natural Language Processing With Python and NLTK p.3




[ad_1]

Another form of data pre-processing with natural language processing is called “stemming.”

This is the process where we remove word affixes from the end of words.

The reason we would do this is so that we do not need to store the meaning of every single tense of a word. For example:

Reader
Reading
Read

Aside from tense, and even one of these is a noun, they all have the same meaning for their “root” stem (read).

This way, we store one single value for the root stem of “read.” Then, when we wish to learn more, we can look into the affixes that were on the end, like “ing” is an active word, or in the past, then you have reader as someone who reads… then just plain read as either past tense or current.

sample code: http://pythonprogramming.net
http://hkinsley.com
https://twitter.com/sentdex
http://sentdex.com
http://seaofbtc.com

Source


[ad_2]

Comment List

  • sentdex
    December 8, 2020

    This is so so so so cool. Thank you sir.

  • sentdex
    December 8, 2020

    Hey Sentdex love your videos. Please I want to know if it is possible to develop an app for nlp using english and another language, that is not yet supported by any translator. Thank you

  • sentdex
    December 8, 2020

    Traceback (most recent call last):
    File "C:UsersAsmaAppDataLocalProgramsPythonPython38senti-text-classifier.py", line 23, in <module>
    print(ps.stem(t))
    TypeError: stem() missing 1 required positional argument: 'word'

  • sentdex
    December 8, 2020

    sentdex: I appreciate your videos very much – thank you for posting them! Just curious why you're not a fan of dark theme for your IDE. It kills my eyes staring at all the white screen. LOL.

  • sentdex
    December 8, 2020

    Why we go for stemming?

  • sentdex
    December 8, 2020

    How can I comment the lines so quickly?

  • sentdex
    December 8, 2020

    Lemmatization is better than stemming

  • sentdex
    December 8, 2020

    why does it not work with ["happi" , "Happier" , "Happiest" , "Happened" , "Happily"]

  • sentdex
    December 8, 2020

    actively doing a python

  • sentdex
    December 8, 2020

    Hi I wish I could express my appreciation of these videos and your introduction to NLTK. I'm working through this and learning to start using it for feedback analysis at work. You're terrific. Also….I wish we could have a compilation video of all your cute old fashioned interjections of surprise: "Dang" and so-forth.

  • sentdex
    December 8, 2020

    very informative but how to resolve the problem of the word 'important' stemmed to 'import' , 'once' to 'onc' etc. ? is using another stemming algorithm a solution to the problem?
    Thankyou

  • sentdex
    December 8, 2020

    for w in example_words[:2]: ### limit results to 2, limit useful with long strings
    print(ps.stem(w))

  • sentdex
    December 8, 2020

    "It is very important to import the importer" => After p.2 and p.3 => "It import import import"

  • sentdex
    December 8, 2020

    How i do this work on my csv file

  • sentdex
    December 8, 2020

    The way he says "tokenize" sounds like a stoner! ๐Ÿ˜†๐Ÿ˜Ž

  • sentdex
    December 8, 2020

    you haven't answered me yet. will you marry me???? ๐Ÿ˜›

  • sentdex
    December 8, 2020

    its like taking the common part in all the words?

  • sentdex
    December 8, 2020

    Doesnt it completely devoid the text from meaning? Is that not gonna affect our analysis?

  • sentdex
    December 8, 2020

    i love your tuto .good job keep up

  • sentdex
    December 8, 2020

    i wanna stemming Indonesia, how?

  • sentdex
    December 8, 2020

    why would someone dislike this very very useful video?

  • sentdex
    December 8, 2020

    Why not lemmatize? It provides meaningful 'stems' (lemmas)

  • sentdex
    December 8, 2020

    ps.stem(w)?? how do you use that ?

  • sentdex
    December 8, 2020

    I am getting an error:
    stem() missing 1 required positional argument: 'word'
    while executing it.

  • sentdex
    December 8, 2020

    I learnt that WordNet replaces stemming now days

  • sentdex
    December 8, 2020

    I observed something peculiar. For words ending in "-er" it doesn't always return the root form. For example, it returns "beginn" for "beginner" or "forgiv" for "forgiver", or, as in your example "python" for "pythoner", but returns "maker" for "maker" or "eater" for "eater" and so on for many other cases. Why does this happen? I do understand we don't always want to cut down the "-er" at the like for "father".

  • sentdex
    December 8, 2020

    I love how at the very end of the clip he says "oh and you'll never have to use any of this btw, WordNet does it all".

  • sentdex
    December 8, 2020

    use snowball, that would be more accurate

  • sentdex
    December 8, 2020

    hey sentdex i'm working n a project "Text To Speach Synthesis with Expression" will you please help me out

  • sentdex
    December 8, 2020

    hi,how can i do a stemming to arabic files exel

  • sentdex
    December 8, 2020

    Great tutorial dude!!!!!!

  • sentdex
    December 8, 2020

    i liked ur video its amazing.

  • sentdex
    December 8, 2020

    is stemming is used for the avoid the duplication of data in database.

  • sentdex
    December 8, 2020

    thanks for uploading such an informative videos i really enjoys the way you are making tutorials

  • sentdex
    December 8, 2020

    Sir, you look like "Edward snowden"

  • sentdex
    December 8, 2020

    Could you specify in what order do we perform stemming, tokenization and stopwords removal? kinda confused about it

  • sentdex
    December 8, 2020

    This has probably been said, or not. "onc" could be related to "oncology", etc.

  • sentdex
    December 8, 2020

    from the example code…

    for w in example_words:
    print (ps.stem(w))

    this is not working if i use…
    from nltk.stem import PorterStemmer as ps

    without the…
    ps = PorterStemmer()

    why is that?

    thanks in advance.

  • sentdex
    December 8, 2020

    hmmm it is not very efficient method I would say. I know that the video might be pretty old and new methods might already be present but: Stemming is a bit weird. Valuable information as "important" and "once" are utterly destroyed by this stemming :S. I know that for the word Python it might help us a lot but generally speaking, the sentence looses a lot of value.

  • sentdex
    December 8, 2020

    awesome tutorials!

  • sentdex
    December 8, 2020

    why does it seam not to work with "pythonly" ?

  • sentdex
    December 8, 2020

    Sir, how do I utilize wordnet or synsets with nltk??

  • sentdex
    December 8, 2020

    this stemmer seems utterly unPythonly (garbage)

Write a comment