Stop Words – Natural Language Processing With Python and NLTK p.2
[ad_1]
One of the largest elements to any data analysis, natural language processing included, is pre-processing. This is the methodology used to “clean up” and prepare your data for analysis.
One of the first steps to pre-processing is to utilize stop-words. Stop words are words that you want to filter out of any analysis. These are words that carry no meaning, or carry conflicting meanings that you simply do not want to deal with.
The NLTK module comes with a set of stop words for many language pre-packaged, but you can also easily append more to this list.
Playlist link: https://www.youtube.com/watch?v=FLZvOKSCkxY&list=PLQVvvaa0QuDf2JswnfiGkliBInZnIC4HL&index=1
sample code: http://pythonprogramming.net
http://hkinsley.com
https://twitter.com/sentdex
http://sentdex.com
http://seaofbtc.com
Source
[ad_2]
what about for another language . after i identify my stop words depending on my language it can't remove please help me
Thank you! It is really helpful
Thanks dude :)))
Cool vro !!
great teaching and funny way.. it's awesome
Im new to NLP but its making me wonder why is 'and' considered an irrelevant stop word because on the contrary in maths and logic the 'and' and 'or' carry a significant importance in the meaning.
I love this guys tutorials. In any ds related tutorial I search for, I always come here first. And boom, he has a tutorial about it. Awesome.
Nobudy who knows what they are doing uses NLTK any more. Could you do this over with spacy or tenserflow?
Love these videos, you did a great job. Thank you!
Great work on the videos. Can you show how to search through a text file using a theme? All related words to that theme are counted and displayed… If you have already done a video on this, please point me to that video… Thanks a million…
Great …. But I don't know why people use idle …. I have used all and jupyter notebook is way better than all
Thanks, crack.
filtered_sentence = set(word_tokenize(example)) – set(stopwords.words("english"))
Thank you very much, easy to understand, easy to download and very well explained. And english is not my mother tongue. Thanks
Great stuff but stop using windows 😉
actually sentence matches with sentdex thats why there is that d again
love you sir
big time fan of your channel
Hi sentdex
First of all, great videos! I'm doing a sentiments analysis for the first time and I want to analyse twitter data. I have all tweets in a csv file with comma seperated columns (source, text, datetime, etc) I can follow the steps that you explain in the video when you create your own sentence and then tokenize it or removing stop words, but how can I apply it to a file, where I need to iterate through the rows in the csv file in excel just using the column with text (the tweet)??
Video is POOR QUALITY. I have a 10.5 inch tablet and I cannot read the program because it is too small.
You need to try viewing your videos on different formats to see if they work.
ALERT ALERT! Stop words remove negations! THE NEGATION GOES TO THE TRASH BIN!
example_sentence = 'This is an example showing off stop word filtration. I hope I do not use too many useless words.' =>
['This',
'example',
'showing',
'stop',
'word',
'filtration',
'.',
'I',
'hope',
'I',
'use',
'many',
'useless',
'words',
'.']
pyyythong
Sir, how to remove stop words from excel sheet, please make a video for it.
In common parlance these are called "safe words"
Is there a nice German stop words list out there?
the d basically because your youtube name is sentdex
awesome_tutorial as well, but I personally feel one-liners saves lines of code so its pretty useful for me. Once again, awesome tutorial!!
I dont know if you will see this, but your videos are so good.
You make this subjects not fucking boring.
Ty so mutch
why 'very' is a stop word?? 'very' help in describing something for example 'He is a bad guy' to 'He is a very bad guy' to 'He is a very very bad guy'…all three sentences will create different images of THAT BAD GUY for listenr
https://github.com/shantanu9/python3/blob/master/NLP%20NLTK/Stop%20Words%20-%20Natural%20Language%20Processing%20With%20Python%20and%20NLTK%20p.2.ipynb
You know, you are really funny.
Thank you very much for taking the time to produce this awesome tutorial! I just wanted to mention that the stop words are all lower case. Thus, "A" "The" etc… require more preprocessing by creating your own stop words or lower case all the text.
Thank you!
im from pakistan. your videos are very informative and helping me. thank u sir. Allah bless u. always be happy 🙂
thank you very much for your great videos on NLTK, I try to apply it on my German corpus.
Is there's a special reason first to tokenize and then remove stopwords instead the other way, first remove punctuation and stopwords and than tokenize? thanks a lot again!
It's giving error no module named nltk