Top Python Libraries for Deep Learning, Natural Language Processing & Computer Vision
In a previous post, we had a look at the top python libraries for data science, data visualization, and machine learning. This time, we look at the top libraries for deep learning, natural language processing, and computer vision. These categories really don’t need any further clarification.
This separation and classification is arbitrary, in some instances more than others, but we have done our best to group tools together by intended use case, hoping this is most useful for readers.
Clearly not all NLP and CV work these days is performed using deep learning techniques, but as the trends move toward such techniques for state of the art results, we stand by this otherwise arbitrary categorization logic.
Our list is made up of libraries that our team decided together by consensus was representative of common and well-used Python libraries. Also, to be included a library must have a Github repository. The categories are in no particular order, and neither are the libraries included within each. We contemplated constructing an ordering arbitrarily by stars or some other metric, but decided against it in order not explicitly stray from placing any perceived value or importance of the libraries within. Their listing here, then, is purely random. Library descriptions are directly from the Github repositories, in some form or another.
Thanks again to Ahmed Anis for contributing to the collection of this data, and to the rest of the KDnuggets staff for their inputs, insights, and suggestions.
Note that the visualization below, by Gregory Piatetsky, represents each library by type, plots it by stars and contributors, and its symbol size is reflective of the number of commits the library has on Github on a logarithmic scale.
Figure 1: Top Python Libraries for Deep Learning, Natural Language Processing & Computer Vision
Plotted by number of stars and number of contributors; relative size by log number of commits
And, so without further ado, here are the 30 top Python libraries for deep learning, natural language processing & computer vision, as best determined by KDnuggets staff.
Stars: 149000, Commits: 97741, Contributors: 2754
TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.
Stars: 50000, Commits: 5349, Contributors: 864
Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow.
Stars: 43200, Commits: 30696, Contributors: 1619
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Stars: 19800, Commits: 1450, Contributors: 607
fastai simplifies training fast and accurate neural nets using modern best practices
5. PyTorch Lightning
Stars: 9600, Commits: 3594, Contributors: 317
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.
Stars: 10000, Commits: 5708, Contributors: 221
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Stars: 19100, Commits: 11387, Contributors: 839
Stars: 3100, Commits: 747, Contributors: 112
High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.
Natural Language Processing
Stars: 21700, Commits: 379, Contributors: 47
fastText is a library for efficient learning of word representations and sentence classification.
Stars: 17400, Commits: 11628, Contributors: 482
Industrial-strength Natural Language Processing (NLP) with Python and Cython
Stars: 11200, Commits: 4024, Contributors: 361
Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.
Stars: 9300, Commits: 13990, Contributors: 319
NLTK — the Natural Language Toolkit — is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing.
13. Datasets (Huggingface)
Stars: 4300, Commits: 568, Contributors: 64
Fast, efficient, open-access datasets and evaluation metrics for Natural Language Processing and more in PyTorch, TensorFlow, NumPy and Pandas
14. Tokenizers (Huggingface)
Stars: 3800, Commits: 1252, Contributors: 30
Fast State-of-the-Art Tokenizers optimized for Research and Production
15. Transformers (Huggingface)
Stars: 3500, Commits: 5480, Contributors: 585
Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
Stars: 4800, Commits: 1514, Contributors: 19
Official Stanford NLP Python Library for Many Human Languages
Stars: 7300, Commits: 542, Contributors: 24
Simple, Pythonic, text processing–Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
Stars: 1800, Commits: 442, Contributors: 15
Basic Utilities for PyTorch Natural Language Processing (NLP)
Stars: 1500, Commits: 1324, Contributors: 23
A Python library for performing a variety of natural language processing (NLP) tasks, built on the high-performance spaCy library.
Stars: 626, Commits: 1405, Contributors: 13
Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide variety of downstream tasks.
Stars: 1900, Commits: 266, Contributors: 17
Text preprocessing, representation and visualization from zero to hero.
22. Spark NLP
Stars: 1700, Commits: 4363, Contributors: 50
Spark NLP is a Natural Language Processing library built on top of Apache Spark ML.
Stars: 2200, Commits: 712, Contributors: 72
GluonNLP is a toolkit that enables easy text preprocessing, datasets loading and neural models building to help you speed up your Natural Language Processing (NLP) research.
Stars: 7800, Commits: 10799, Contributors: 303
Pillow is the friendly PIL fork. PIL is the Python Imaging Library.
Stars: 49600, Commits: 29453, Contributors: 1234
Open Source Computer Vision Library
Stars: 4000, Commits: 12352, Contributors: 403
Image processing in Python
Stars: 644, Commits: 1273, Contributors: 25
Mahotas is a library of fast computer vision algorithms (all implemented in C++ for speed) operating over numpy arrays.
Stars: 2400, Commits: 2625, Contributors: 69
SimpleCV is a framework for Open Source Machine Vision, using OpenCV and the Python programming language.
Stars: 4300, Commits: 774, Contributors: 101
GluonCV provides implementations of the state-of-the-art (SOTA) deep learning models in computer vision.
Stars: 7500, Commits: 1286, Contributors: 334
The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision.