Stop Words – Natural Language Processing With Python and NLTK p.2




[ad_1]

One of the largest elements to any data analysis, natural language processing included, is pre-processing. This is the methodology used to “clean up” and prepare your data for analysis.

One of the first steps to pre-processing is to utilize stop-words. Stop words are words that you want to filter out of any analysis. These are words that carry no meaning, or carry conflicting meanings that you simply do not want to deal with.

The NLTK module comes with a set of stop words for many language pre-packaged, but you can also easily append more to this list.

Playlist link: https://www.youtube.com/watch?v=FLZvOKSCkxY&list=PLQVvvaa0QuDf2JswnfiGkliBInZnIC4HL&index=1

sample code: http://pythonprogramming.net
http://hkinsley.com
https://twitter.com/sentdex
http://sentdex.com
http://seaofbtc.com

Source


[ad_2]

Comment List

  • sentdex
    December 19, 2020

    what about for another language . after i identify my stop words depending on my language it can't remove please help me

  • sentdex
    December 19, 2020

    Thank you! It is really helpful

  • sentdex
    December 19, 2020

    Thanks dude :)))

  • sentdex
    December 19, 2020

    Cool vro !!

  • sentdex
    December 19, 2020

    great teaching and funny way.. it's awesome

  • sentdex
    December 19, 2020

    Im new to NLP but its making me wonder why is 'and' considered an irrelevant stop word because on the contrary in maths and logic the 'and' and 'or' carry a significant importance in the meaning.

  • sentdex
    December 19, 2020

    I love this guys tutorials. In any ds related tutorial I search for, I always come here first. And boom, he has a tutorial about it. Awesome.

  • sentdex
    December 19, 2020

    Nobudy who knows what they are doing uses NLTK any more. Could you do this over with spacy or tenserflow?

  • sentdex
    December 19, 2020

    Love these videos, you did a great job. Thank you!

  • sentdex
    December 19, 2020

    Great work on the videos. Can you show how to search through a text file using a theme? All related words to that theme are counted and displayed… If you have already done a video on this, please point me to that video… Thanks a million…

  • sentdex
    December 19, 2020

    Great …. But I don't know why people use idle …. I have used all and jupyter notebook is way better than all

  • sentdex
    December 19, 2020

    Thanks, crack.

  • sentdex
    December 19, 2020

    filtered_sentence = set(word_tokenize(example)) – set(stopwords.words("english"))

  • sentdex
    December 19, 2020

    Thank you very much, easy to understand, easy to download and very well explained. And english is not my mother tongue. Thanks

  • sentdex
    December 19, 2020

    Great stuff but stop using windows 😉

  • sentdex
    December 19, 2020

    actually sentence matches with sentdex thats why there is that d again
    love you sir
    big time fan of your channel

  • sentdex
    December 19, 2020

    Hi sentdex
    First of all, great videos! I'm doing a sentiments analysis for the first time and I want to analyse twitter data. I have all tweets in a csv file with comma seperated columns (source, text, datetime, etc) I can follow the steps that you explain in the video when you create your own sentence and then tokenize it or removing stop words, but how can I apply it to a file, where I need to iterate through the rows in the csv file in excel just using the column with text (the tweet)??

  • sentdex
    December 19, 2020

    Video is POOR QUALITY. I have a 10.5 inch tablet and I cannot read the program because it is too small.
    You need to try viewing your videos on different formats to see if they work.

  • sentdex
    December 19, 2020

    ALERT ALERT! Stop words remove negations! THE NEGATION GOES TO THE TRASH BIN!

    example_sentence = 'This is an example showing off stop word filtration. I hope I do not use too many useless words.' =>
    ['This',

    'example',

    'showing',

    'stop',

    'word',

    'filtration',

    '.',

    'I',

    'hope',

    'I',

    'use',

    'many',

    'useless',

    'words',

    '.']

  • sentdex
    December 19, 2020

    pyyythong

  • sentdex
    December 19, 2020

    Sir, how to remove stop words from excel sheet, please make a video for it.

  • sentdex
    December 19, 2020

    In common parlance these are called "safe words"

  • sentdex
    December 19, 2020

    Is there a nice German stop words list out there?

  • sentdex
    December 19, 2020

    the d basically because your youtube name is sentdex

  • sentdex
    December 19, 2020

    awesome_tutorial as well, but I personally feel one-liners saves lines of code so its pretty useful for me. Once again, awesome tutorial!!

  • sentdex
    December 19, 2020

    I dont know if you will see this, but your videos are so good.
    You make this subjects not fucking boring.
    Ty so mutch

  • sentdex
    December 19, 2020

    why 'very' is a stop word?? 'very' help in describing something for example 'He is a bad guy' to 'He is a very bad guy' to 'He is a very very bad guy'…all three sentences will create different images of THAT BAD GUY for listenr

  • sentdex
    December 19, 2020

    You know, you are really funny.

  • sentdex
    December 19, 2020

    Thank you very much for taking the time to produce this awesome tutorial! I just wanted to mention that the stop words are all lower case. Thus, "A" "The" etc… require more preprocessing by creating your own stop words or lower case all the text.
    Thank you!

  • sentdex
    December 19, 2020

    im from pakistan. your videos are very informative and helping me. thank u sir. Allah bless u. always be happy 🙂

  • sentdex
    December 19, 2020

    thank you very much for your great videos on NLTK, I try to apply it on my German corpus.
    Is there's a special reason first to tokenize and then remove stopwords instead the other way, first remove punctuation and stopwords and than tokenize? thanks a lot again!

  • sentdex
    December 19, 2020

    It's giving error no module named nltk

Write a comment