NLTK Corpora – Natural Language Processing With Python and NLTK p.9
[ad_1]
Remember from the beginning, we talked about this term, “corpora.”
Again, corpora is just a body of texts. Generally, corpora are grouped by some sort of defining characteristic.
NLTK is a massive toolkit for you. part of what they give you is a ton of highly valuable corpora to learn with, train against, and some of them are even capable of using in production.
This video is going to be all about accessing your corpora!
sample code: http://pythonprogramming.net
http://hkinsley.com
https://twitter.com/sentdex
http://sentdex.com
http://seaofbtc.com
Source
[ad_2]
Fuck you I have seen your five videos for an highly important work there is not a single video from which I got benefit
You don't have to tokenize. Don't use raw, use the other methods.
I recommend reading the Nltk book on the nltk website.
if i want to make my own corpora can i put the corpora in there? then i call my own corpora?
Oi Mac users just search nltk_data using command + space…. ezpz
can we get Arabic corpus?
Is it okay to create a corpus with word tokens for the purpose of multi-class text classification? Is it necessary that you need a full sentences in a corpus. Because at the end, we are tokenising them anyway for analysis.Please respond.
haha… old video…need to update… and example with "bible-kjv" actually caused me nausea… 🙂
sentdex laugh*
how i can add my own dataset into nltk/corpora to make it like movie_reviews?
when I write
from nltk.corpus import biocreative_ppi
I get the following error
File "<ipython-input-26-7825dfc39a9b>", line 1, in <module>
from nltk.corpus import biocreative_ppi
ImportError: cannot import name 'biocreative_ppi'
Here you go Mac Users 🙂
Hi, Does anyone know why I cannot find nltk_data when it explicitly stated the following?
"Downloading package mwa_ppdb to
[nltk_data] | C:UsersAnakAppDataRoamingnltk_data…
[nltk_data] | Package mwa_ppdb is already up-to-date!"
and yes I have downloaded nltk_data, and directory stated in nltk.download is "C:UsersAnakAppDataRoamingnltk_data"
how about a parallel corpus?
"gutenberg.abspath" is a better option for checking path.
It's really amazing that you still read comments from your old videos 🙂
lol the chat logs literally made me cry
You've really injected a bit of humour and joy into what could quite easily been a dry topic, great series, thank you.
How can you open your own corpus in nltk? cause In that case it's not whatever.Gutemberg.whatever, obviously, but then what is it?
Hello Sir , First of all thank you so much for this tutorial series.
I tried to make a folder of my own in the corpora directory and then tried a simple program of POS tagging on my personal corpus file but the import statement threw an error. Here is what it was:
————————————————————————————————————————
Traceback (most recent call last):
File "C:/Users/Hp/AppData/Local/Programs/Python/Python36-32/Python Programs/Corpus.py", line 2, in <module>
from nltk.corpus import personal
ImportError: cannot import name 'personal'
very nice and clear tutorial..
I tried to import gutenberg exactly like you said but it is saying that it cannot import gutenberg
I tried updating and downloading it but it says it is already up to date. Can you help me figure out what is wrong?
can somebody pls tell me what corpora is without getting mad at me or asking me that if i have watched previous videos ????
As a Mac user, you don't even have to use the /User/Username/nltk_data on mac terminal like you do when uploading a csv file. You can just type "nltk_data" on the terminal and badaboom there you go. You can do that for the data.py file too. Extremely simple.
import nltk
nltk._path_
phew!
Hi @sentdex! Can we create our own text files in the nltk corpora and use them?
I'm new to nltk could you pls tell me how to get the title,subtitle of a local text file of any format
I'm a mac user and my files were in: /Users/Simon/nltk_data/
Do you know how to use my own corpus? I have a corpus in xml format:
<article n="0" dialect="various" title="Blog Alessandra">
<s n="0-0">
<w n="0-0-0" pos="PPER">Thisss</w>
<w n="0-0-1" pos="VVFIN">isss</w>
<w n="0-0-2" pos="PTKNEG">aanother</w>
<w n="0-0-3" pos="ADJD">language</w>
<w n="0-0-4" pos="$.">.</w>
</s>
<s n="0-1">
…
</s>
…
</article>
I would like to use that corpus to segment my own text. Any ideas? Links? thx
If I have my own directory with my own set of .txt files can I drag and drop it into this location where the NLTK corpora resides so I can later pull it into my code? Will this work?
How to read .csv files using nltk??
Hello, I have a questions, Did yo know the BNC (British National Corpus ) ? you can use this corpus for one application of Natural Language Processing, thank you very much, and this videos are great.
For Linux user, the Corpora directory is under [~/nltk_data] (Debian).
I found it with [locate nltk]. Google it is another solution.
@sentdex: Can I add my own excel file in corpora and use it for sentiment analysis by importing it?
I have installed kali-rolling and the corpora are saved at ~/ntlk_data. Isn't it too convenient?
How I should be use r'? por example on 02:53 . I want to find more info abouta that, It's appear several times and I dont'n know How I should be find this. Thanks you
+sentdex, dude, how many monitors do you have? Seems 4 – one at the top, one at the right side, one right in front of you, and one at the left side (where probably your prepared cheetsheets are 🙂
@Sentdex : i am working on creating a chatbot and not able to get accuracy for the normal greeting for english language, which corpora should i use?
Hey, thanks for much for the tutorials, they have helped an incredible amount. I'm trying to align NLTK English-Spanish corpora (europarl_raw) and was wondering whether you could tell me a trick for doing this, as the nltk.align explanation is very confusing and I don't know whether I have to go through the process of 'Stopwords', 'Lowercases', 'Stemming' and 'POS tagging' beforehand.
Geez. Your tutorials are very helpful and well explained. Thank you so much!!!
I dont have nltk_data in appdata roaming. why so?
plz reply..
@sentdex, the shakespeare corpus has the files in an xml format. Could one still perform some ML on an xml file? Is there a way to convert it to txt format? Sorry if the question is not suitable.
@sentdex please, any instruction on how to install nltk for python3.3?
Thanks for the tutorials.Â
"Gutenberg" actually refers to Project Gutenberg (projectgutenberg.org), an online repository of about 50,000 out-of-copyright books that are free to download. NLTK comes with a sample pre-loaded. You can download lots of books in bulk to create your own corpus.
"here you go MAC Users" loved that…
@sentdex How can we create our own chat corpora?
That awkward moment when opening the chatlogs haha +1