NLTK Corpora – Natural Language Processing With Python and NLTK p.9




[ad_1]

Remember from the beginning, we talked about this term, “corpora.”

Again, corpora is just a body of texts. Generally, corpora are grouped by some sort of defining characteristic.

NLTK is a massive toolkit for you. part of what they give you is a ton of highly valuable corpora to learn with, train against, and some of them are even capable of using in production.

This video is going to be all about accessing your corpora!

sample code: http://pythonprogramming.net
http://hkinsley.com
https://twitter.com/sentdex
http://sentdex.com
http://seaofbtc.com

Source


[ad_2]

Comment List

  • sentdex
    December 11, 2020

    Fuck you I have seen your five videos for an highly important work there is not a single video from which I got benefit

  • sentdex
    December 11, 2020

    You don't have to tokenize. Don't use raw, use the other methods.
    I recommend reading the Nltk book on the nltk website.

  • sentdex
    December 11, 2020

    if i want to make my own corpora can i put the corpora in there? then i call my own corpora?

  • sentdex
    December 11, 2020

    Oi Mac users just search nltk_data using command + space…. ezpz

  • sentdex
    December 11, 2020

    can we get Arabic corpus?

  • sentdex
    December 11, 2020

    Is it okay to create a corpus with word tokens for the purpose of multi-class text classification? Is it necessary that you need a full sentences in a corpus. Because at the end, we are tokenising them anyway for analysis.Please respond.

  • sentdex
    December 11, 2020

    haha… old video…need to update… and example with "bible-kjv" actually caused me nausea… 🙂

  • sentdex
    December 11, 2020

    sentdex laugh*

  • sentdex
    December 11, 2020

    how i can add my own dataset into nltk/corpora to make it like movie_reviews?

  • sentdex
    December 11, 2020

    when I write
    from nltk.corpus import biocreative_ppi
    I get the following error

    File "<ipython-input-26-7825dfc39a9b>", line 1, in <module>

    from nltk.corpus import biocreative_ppi

    ImportError: cannot import name 'biocreative_ppi'

  • sentdex
    December 11, 2020

    Here you go Mac Users 🙂

  • sentdex
    December 11, 2020

    Hi, Does anyone know why I cannot find nltk_data when it explicitly stated the following?
    "Downloading package mwa_ppdb to

    [nltk_data] | C:UsersAnakAppDataRoamingnltk_data…

    [nltk_data] | Package mwa_ppdb is already up-to-date!"
    and yes I have downloaded nltk_data, and directory stated in nltk.download is "C:UsersAnakAppDataRoamingnltk_data"

  • sentdex
    December 11, 2020

    how about a parallel corpus?

  • sentdex
    December 11, 2020

    "gutenberg.abspath" is a better option for checking path.

  • sentdex
    December 11, 2020

    It's really amazing that you still read comments from your old videos 🙂

  • sentdex
    December 11, 2020

    lol the chat logs literally made me cry

  • sentdex
    December 11, 2020

    You've really injected a bit of humour and joy into what could quite easily been a dry topic, great series, thank you.

  • sentdex
    December 11, 2020

    How can you open your own corpus in nltk? cause In that case it's not whatever.Gutemberg.whatever, obviously, but then what is it?

  • sentdex
    December 11, 2020

    Hello Sir , First of all thank you so much for this tutorial series.
    I tried to make a folder of my own in the corpora directory and then tried a simple program of POS tagging on my personal corpus file but the import statement threw an error. Here is what it was:
    ————————————————————————————————————————
    Traceback (most recent call last):

    File "C:/Users/Hp/AppData/Local/Programs/Python/Python36-32/Python Programs/Corpus.py", line 2, in <module>

    from nltk.corpus import personal

    ImportError: cannot import name 'personal'

  • sentdex
    December 11, 2020

    very nice and clear tutorial..

  • sentdex
    December 11, 2020

    I tried to import gutenberg exactly like you said but it is saying that it cannot import gutenberg
    I tried updating and downloading it but it says it is already up to date. Can you help me figure out what is wrong?

  • sentdex
    December 11, 2020

    can somebody pls tell me what corpora is without getting mad at me or asking me that if i have watched previous videos ????

  • sentdex
    December 11, 2020

    As a Mac user, you don't even have to use the /User/Username/nltk_data on mac terminal like you do when uploading a csv file. You can just type "nltk_data" on the terminal and badaboom there you go. You can do that for the data.py file too. Extremely simple.

  • sentdex
    December 11, 2020

    import nltk
    nltk._path_
    phew!

  • sentdex
    December 11, 2020

    Hi @sentdex! Can we create our own text files in the nltk corpora and use them?

  • sentdex
    December 11, 2020

    I'm new to nltk could you pls tell me how to get the title,subtitle of a local text file of any format

  • sentdex
    December 11, 2020

    I'm a mac user and my files were in: /Users/Simon/nltk_data/

  • sentdex
    December 11, 2020

    Do you know how to use my own corpus? I have a corpus in xml format:

    <article n="0" dialect="various" title="Blog Alessandra">
    <s n="0-0">
    <w n="0-0-0" pos="PPER">Thisss</w>
    <w n="0-0-1" pos="VVFIN">isss</w>
    <w n="0-0-2" pos="PTKNEG">aanother</w>
    <w n="0-0-3" pos="ADJD">language</w>
    <w n="0-0-4" pos="$.">.</w>
    </s>
    <s n="0-1">

    </s>

    </article>

    I would like to use that corpus to segment my own text. Any ideas? Links? thx

  • sentdex
    December 11, 2020

    If I have my own directory with my own set of .txt files can I drag and drop it into this location where the NLTK corpora resides so I can later pull it into my code? Will this work?

  • sentdex
    December 11, 2020

    How to read .csv files using nltk??

  • sentdex
    December 11, 2020

    Hello, I have a questions, Did yo know the BNC (British National Corpus ) ? you can use this corpus for one application of Natural Language Processing, thank you very much, and this videos are great.

  • sentdex
    December 11, 2020

    For Linux user, the Corpora directory is under [~/nltk_data] (Debian).
    I found it with [locate nltk]. Google it is another solution.

  • sentdex
    December 11, 2020

    @sentdex: Can I add my own excel file in corpora and use it for sentiment analysis by importing it?

  • sentdex
    December 11, 2020

    I have installed kali-rolling and the corpora are saved at ~/ntlk_data. Isn't it too convenient?

  • sentdex
    December 11, 2020

    How I should be use r'? por example on 02:53 . I want to find more info abouta that, It's appear several times and I dont'n know How I should be find this. Thanks you

  • sentdex
    December 11, 2020

    +sentdex, dude, how many monitors do you have? Seems 4 – one at the top, one at the right side, one right in front of you, and one at the left side (where probably your prepared cheetsheets are 🙂

  • sentdex
    December 11, 2020

    @Sentdex : i am working on creating a chatbot and not able to get accuracy for the normal greeting for english language, which corpora should i use?

  • sentdex
    December 11, 2020

    Hey, thanks for much for the tutorials, they have helped an incredible amount. I'm trying to align NLTK English-Spanish corpora (europarl_raw) and was wondering whether you could tell me a trick for doing this, as the nltk.align explanation is very confusing and I don't know whether I have to go through the process of 'Stopwords', 'Lowercases', 'Stemming' and 'POS tagging' beforehand.

  • sentdex
    December 11, 2020

    Geez. Your tutorials are very helpful and well explained. Thank you so much!!!

  • sentdex
    December 11, 2020

    I dont have nltk_data in appdata roaming. why so?
    plz reply..

  • sentdex
    December 11, 2020

    @sentdex, the shakespeare corpus has the files in an xml format. Could one still perform some ML on an xml file? Is there a way to convert it to txt format? Sorry if the question is not suitable.

  • sentdex
    December 11, 2020

    @sentdex please, any instruction on how to install nltk for python3.3?

  • sentdex
    December 11, 2020

    Thanks for the tutorials. 

    "Gutenberg" actually refers to Project Gutenberg (projectgutenberg.org), an online repository of about 50,000 out-of-copyright books that are free to download. NLTK comes with a sample pre-loaded. You can download lots of books in bulk to create your own corpus.

  • sentdex
    December 11, 2020

    "here you go MAC Users" loved that…

  • sentdex
    December 11, 2020

    @sentdex How can we create our own chat corpora?

  • sentdex
    December 11, 2020

    That awkward moment when opening the chatlogs haha +1

Write a comment