Jason Kessler – Using Scattertext and the Python NLP Ecosystem for Text Visualization


Scattertext is a Python package that lets you compare and contrast how words are used differently in two types of documents, producing interactive, Javascript-based visualizations that can easily be embedded into Jupyter Notebooks. Using spaCy and Empath, Scattertext can also show how emotional states and words relating to a particular topic differ.

Notebooks and presentation for this talk are available from https://github.com/JasonKessler/Scattertext-PyData.

Motivation and introduction
-What’s the matter with word clouds?
-How to read a plot made by Scattertext
How to make your own plots
-Preparing a Pandas data frame with your data set
-Plotting with Scattertext, and fine tuning plots for interpretability and speed
Scattertext and the Python NLP ecosystem
-Visualizing emotions using Empath.
-Using word vectors from spaCy and elsewhere see how topic-specific language differs.
-Visualizing topic models from scikit-learn.
-Source code for the package is hosted on Github at github.com/JasonKessler/scattertext.
-For more information, please see the paper which will appear as a 2017 ACL Demo at https://arxiv.org/abs/1703.00565.


PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.



Write a comment