Natural Language Processing (Part 5): Topic Modeling with Latent Dirichlet Allocation in Python




[ad_1]

This six-part video series goes through an end-to-end Natural Language Processing (NLP) project in Python to compare stand up comedy routines.

– Natural Language Processing (Part 1): Introduction to NLP & Data Science
– Natural Language Processing (Part 2): Data Cleaning & Text Pre-Processing in Python
– Natural Language Processing (Part 3): Exploratory Data Analysis & Word Clouds in Python
– Natural Language Processing (Part 4): Sentiment Analysis with TextBlob in Python
– Natural Language Processing (Part 5): Topic Modeling with Latent Dirichlet Allocation in Python
– Natural Language Processing (Part 6): Text Generation with Markov Chains in Python

All of the supporting Python code can be found here: https://github.com/adashofdata/nlp-in-python-tutorial

Source


[ad_2]

Comment List

  • Alice Zhao
    December 7, 2020

    Thanks for the simple explanation!

  • Alice Zhao
    December 7, 2020

    Thank you that was great

  • Alice Zhao
    December 7, 2020

    Amazing. 🔥 🔥 🔥 🔥 🔥

  • Alice Zhao
    December 7, 2020

    Excellent tutorial and explanation

  • Alice Zhao
    December 7, 2020

    Would it make sense to remove curse words and profanity from the corpus, since they are not really topics? Perhaps if a comedian just uses a lot of curse words, the actual topic his or hers show is about gets lost.

  • Alice Zhao
    December 7, 2020

    Wow! What a tutorial. After spending half a day reading LDA topic modelling articles on Github and Kaggle, I have arrived at the best one. Your explainations and structure are brilliant. Thank you x 1,000,000

  • Alice Zhao
    December 7, 2020

    One of the best tutorials. I cant say more. Thank you

  • Alice Zhao
    December 7, 2020

    Thank you so much for presenting this LDA technique, very informative and practical. I wonder if I could use the Python code to analyze documents in other languages, such as Chinese? Thanks!

  • Alice Zhao
    December 7, 2020

    Really loved your way of explaination. Thanks for making LDA topic modelling simple to understand.

  • Alice Zhao
    December 7, 2020

    First of all, thanks for the really explanatory video. Really enjoyed that! I have a question: how you generated the cv_stop.pkl file? Thanks in advance.

  • Alice Zhao
    December 7, 2020

    ???: "you can look at nouns and adjectives…….and now there are looking pretty good!"
    topic1: joke, mom, ass, hell, dog
    topic2: mom, door, dick, stupid
    topic3: friend, n****, gay, long
    I dunno about that…
    Haha jk I get that it's just a small dataset and you were just showing some techniques.
    Great vid! Thx

  • Alice Zhao
    December 7, 2020

    excellent explanation.thank you

  • Alice Zhao
    December 7, 2020

    Nice video, but sparse_counts = scipy.sparse.csr_matrix(tdm) gives TypeError: no supported conversion for types: (dtype('O'),)

  • Alice Zhao
    December 7, 2020

    Thanks for simplifying the lda methods. This is for the first time even a layman can understand how lda works

  • Alice Zhao
    December 7, 2020

    What a great tutorial. Thanks for this

  • Alice Zhao
    December 7, 2020

    Great explanation and nice work

  • Alice Zhao
    December 7, 2020

    Nice! really enjoyed the explanation! We're trying to google around to see a technique to identify a SEQUENCE of topics within documents, to test hypothesis that most of "these" documents follow a similar order of topics. If you happen to know a resource we can check out, we'd appreciate the nudge =) best wishes and stay safe

  • Alice Zhao
    December 7, 2020

    Nice Video but Voice very low . Please have better mike .

  • Alice Zhao
    December 7, 2020

    This is by far the best LDA explanation video. Awesome job!

  • Alice Zhao
    December 7, 2020

    haha the words in the topics are so inappropriate. But great video !

  • Alice Zhao
    December 7, 2020

    Thank you Alice so so so much For this amazing illustration and application! 🌻

  • Alice Zhao
    December 7, 2020

    🔥🔥🔥

  • Alice Zhao
    December 7, 2020

    —————————————————————————

    ValueError Traceback (most recent call last)

    <ipython-input-137-cbe5724bdb63> in <module>()

    1 corpus_transformed = ldana[corpusna]

    2 corpus_transformed

    —-> 3 list(zip([a for [(a,b)] in corpus_transformed], data_dtmna.index))

    <ipython-input-137-cbe5724bdb63> in <listcomp>(.0)

    1 corpus_transformed = ldana[corpusna]

    2 corpus_transformed

    —-> 3 list(zip([a for [(a,b)] in corpus_transformed], data_dtmna.index))

    ValueError: too many values to unpack (expected 1)

    when i try to run the last line. how do i fix it?

  • Alice Zhao
    December 7, 2020

    One might this installs for it to work:

    import nltk
    nltk.download('punkt')
    nltk.download('averaged_perceptron_tagger')

  • Alice Zhao
    December 7, 2020

    Much better than DeepLearning's NPL series!!! What a gem in YouTube. Thank you!

  • Alice Zhao
    December 7, 2020

    Great…You explained it in an awesome way. Btw..Can this be used in text clustering??

  • Alice Zhao
    December 7, 2020

    Thank you Alice. Very useful video

  • Alice Zhao
    December 7, 2020

    Thank you so much. I learned a lot from your videos.

  • Alice Zhao
    December 7, 2020

    This is the best LDA video I've ever seen. It is always easier to understand with examples.

  • Alice Zhao
    December 7, 2020

    I had fun watching your videos. They were very real, applicable and informative! Thank you for these videos and I'll look forward to more videos from you. 🙂

  • Alice Zhao
    December 7, 2020

    are you Chinese?

  • Alice Zhao
    December 7, 2020

    Explained.. thanks for uploading video!!

  • Alice Zhao
    December 7, 2020

    I try to run your Topic Modelling -passing my csv file , am getting TypeError: no supported conversion for types: (dtype('O'),) error. I changed your code as shown below since i don't have pickle

    import pandas as pd

    data = pd.read_csv('C:/Users/tbadi/TestIncidentDataCSV.csv')
    >>> my local file

    from gensim import matutils, models

    import scipy.sparse

    tdm = data.transpose()

    tdm.head()

    sparse_counts = scipy.sparse.csr_matrix(tdm)

    corpus = matutils.Sparse2Corpus(sparse_counts)

    #cv = pickle.load(open("cv_stop.pkl", "rb"))

    id2word = dict((v, k) for k, v in cv.vocabulary_.items())

  • Alice Zhao
    December 7, 2020

    Bless your soul!

  • Alice Zhao
    December 7, 2020

    You’re amazing. Everything was so brilliantly explained. Thank you so much

  • Alice Zhao
    December 7, 2020

    ML and NLP are the most boring and useless subjects ever

  • Alice Zhao
    December 7, 2020

    Genius!!! Thanks!!!

  • Alice Zhao
    December 7, 2020

    Excellent video!!
    Can the approach model opinions or argumentative text?

  • Alice Zhao
    December 7, 2020

    H iAlice,
    Topic A: Bananas -40%?, Kale 30%, Breakfast ..?
    Topic B: Kittens- 30%, Puppies 20%
    How did you assign the percentages?

    Also, When word is randomly assigned to a topic, probability of the word in the topic is 50%. With 50% of chance how is a topic for a word determined in the next iteration? So my question is how does reassignment happen in the next iteration?

  • Alice Zhao
    December 7, 2020

    Excellent explanation. Very useful video. Thanks a lot 🙂

  • Alice Zhao
    December 7, 2020

    wow~ excellent explanation. thanks

  • Alice Zhao
    December 7, 2020

    Simply amazing!

  • Alice Zhao
    December 7, 2020

    LDA is a unsupervised technique, after assigning each word to one of the topics randomly during first iteration how does it change its assignment to other topics in next or subsequent iterations? On what basis it is changing the assignment. Please provide an explanation.

  • Alice Zhao
    December 7, 2020

    Really nice to see all the failed attempts, so many instructors jump to final solution!

  • Alice Zhao
    December 7, 2020

    The explanation is crystal clear, but i would be more thankful if you explain the mathematical part.

Write a comment