Intro to NLP with spaCy (1): Detecting programming languages | Episode 1: Data exploration


In this new video series, data science instructor Vincent Warmerdam gets started with spaCy, an open-source library for Natural Language Processing in Python. His mission: building a system to automatically detect programming languages in large volumes of text. Follow his process from the first idea to a prototype all the way to data collection and training a statistical named entity recogntion model from scratch.

● Website:
● GitHub:
● Free online course:
● Twitter:

● Code:
● Stack Overflow dataset:

Vincent Warmerdam is a co-founder of PyData Amsterdam and experienced data science instructor. He has been evangelizing data and open source for the last 5 years. You might know him from his PyData videos where he attempts to defend common sense over hype in data science.

● Follow Vincent on Twitter:



Write a comment