Natural Language Processing (Part 5): Topic Modeling with Latent Dirichlet Allocation in Python
[ad_1]
This six-part video series goes through an end-to-end Natural Language Processing (NLP) project in Python to compare stand up comedy routines.
– Natural Language Processing (Part 1): Introduction to NLP & Data Science
– Natural Language Processing (Part 2): Data Cleaning & Text Pre-Processing in Python
– Natural Language Processing (Part 3): Exploratory Data Analysis & Word Clouds in Python
– Natural Language Processing (Part 4): Sentiment Analysis with TextBlob in Python
– Natural Language Processing (Part 5): Topic Modeling with Latent Dirichlet Allocation in Python
– Natural Language Processing (Part 6): Text Generation with Markov Chains in Python
All of the supporting Python code can be found here: https://github.com/adashofdata/nlp-in-python-tutorial
Source
[ad_2]
Thanks for the simple explanation!
Thank you that was great
Amazing. 🔥 🔥 🔥 🔥 🔥
Excellent tutorial and explanation
Would it make sense to remove curse words and profanity from the corpus, since they are not really topics? Perhaps if a comedian just uses a lot of curse words, the actual topic his or hers show is about gets lost.
Wow! What a tutorial. After spending half a day reading LDA topic modelling articles on Github and Kaggle, I have arrived at the best one. Your explainations and structure are brilliant. Thank you x 1,000,000
One of the best tutorials. I cant say more. Thank you
Thank you so much for presenting this LDA technique, very informative and practical. I wonder if I could use the Python code to analyze documents in other languages, such as Chinese? Thanks!
Really loved your way of explaination. Thanks for making LDA topic modelling simple to understand.
First of all, thanks for the really explanatory video. Really enjoyed that! I have a question: how you generated the cv_stop.pkl file? Thanks in advance.
???: "you can look at nouns and adjectives…….and now there are looking pretty good!"
topic1: joke, mom, ass, hell, dog
topic2: mom, door, dick, stupid
topic3: friend, n****, gay, long
I dunno about that…
Haha jk I get that it's just a small dataset and you were just showing some techniques.
Great vid! Thx
excellent explanation.thank you
Nice video, but sparse_counts = scipy.sparse.csr_matrix(tdm) gives TypeError: no supported conversion for types: (dtype('O'),)
Thanks for simplifying the lda methods. This is for the first time even a layman can understand how lda works
What a great tutorial. Thanks for this
Great explanation and nice work
Nice! really enjoyed the explanation! We're trying to google around to see a technique to identify a SEQUENCE of topics within documents, to test hypothesis that most of "these" documents follow a similar order of topics. If you happen to know a resource we can check out, we'd appreciate the nudge =) best wishes and stay safe
Nice Video but Voice very low . Please have better mike .
This is by far the best LDA explanation video. Awesome job!
haha the words in the topics are so inappropriate. But great video !
Thank you Alice so so so much For this amazing illustration and application! 🌻
🔥🔥🔥
—————————————————————————
ValueError Traceback (most recent call last)
<ipython-input-137-cbe5724bdb63> in <module>()
1 corpus_transformed = ldana[corpusna]
2 corpus_transformed
—-> 3 list(zip([a for [(a,b)] in corpus_transformed], data_dtmna.index))
<ipython-input-137-cbe5724bdb63> in <listcomp>(.0)
1 corpus_transformed = ldana[corpusna]
2 corpus_transformed
—-> 3 list(zip([a for [(a,b)] in corpus_transformed], data_dtmna.index))
ValueError: too many values to unpack (expected 1)
when i try to run the last line. how do i fix it?
One might this installs for it to work:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
Much better than DeepLearning's NPL series!!! What a gem in YouTube. Thank you!
Great…You explained it in an awesome way. Btw..Can this be used in text clustering??
Thank you Alice. Very useful video
Thank you so much. I learned a lot from your videos.
This is the best LDA video I've ever seen. It is always easier to understand with examples.
I had fun watching your videos. They were very real, applicable and informative! Thank you for these videos and I'll look forward to more videos from you. 🙂
are you Chinese?
Explained.. thanks for uploading video!!
I try to run your Topic Modelling -passing my csv file , am getting TypeError: no supported conversion for types: (dtype('O'),) error. I changed your code as shown below since i don't have pickle
import pandas as pd
data = pd.read_csv('C:/Users/tbadi/TestIncidentDataCSV.csv')
>>> my local file
from gensim import matutils, models
import scipy.sparse
tdm = data.transpose()
tdm.head()
sparse_counts = scipy.sparse.csr_matrix(tdm)
corpus = matutils.Sparse2Corpus(sparse_counts)
#cv = pickle.load(open("cv_stop.pkl", "rb"))
id2word = dict((v, k) for k, v in cv.vocabulary_.items())
Bless your soul!
You’re amazing. Everything was so brilliantly explained. Thank you so much
ML and NLP are the most boring and useless subjects ever
Genius!!! Thanks!!!
Excellent video!!
Can the approach model opinions or argumentative text?
H iAlice,
Topic A: Bananas -40%?, Kale 30%, Breakfast ..?
Topic B: Kittens- 30%, Puppies 20%
How did you assign the percentages?
Also, When word is randomly assigned to a topic, probability of the word in the topic is 50%. With 50% of chance how is a topic for a word determined in the next iteration? So my question is how does reassignment happen in the next iteration?
Excellent explanation. Very useful video. Thanks a lot 🙂
wow~ excellent explanation. thanks
Simply amazing!
LDA is a unsupervised technique, after assigning each word to one of the topics randomly during first iteration how does it change its assignment to other topics in next or subsequent iterations? On what basis it is changing the assignment. Please provide an explanation.
Really nice to see all the failed attempts, so many instructors jump to final solution!
The explanation is crystal clear, but i would be more thankful if you explain the mathematical part.