AI, Analytics, Machine Learning, Data Science, Deep Learning Research Main Developments in 2020 and Key Trends for 2021


To the chagrin of absolutely no one, 2020 is finally drawing to a close. It has been a rollercoaster of a year, one defined almost exclusively by the COVID-19 pandemic. But other things have happened, including in the fields of AI, data science, and machine learning as well. To that end, it’s time for KDnuggets annual year end expert analysis and predictions. This year we posed the question:

What were the main developments in AI, Data Science, Machine Learning Research in 2020 and what key trends do you see for 2021?

Last year’s noted main developments and predictions included continued advancements in many research areas, NLP in particular. While there can be debate as to whether 2020’s big NLP advancement was as formidable as some may have originally thought (or continue to think), there is no doubt that there was a continued and intense focus on NLP research in 2020. It should not be difficult to surmise that this continues into 2021 as well.

Topics such as ethics and diversity were taking center stage in 2019, and this past year they stayed there. There seems to have been a transition from thinking of diversity and ethics and related subjects as periphery concerns in machine learning to viewing them as core considerations alongside technology. Let’s hope this trend continues into 2021 and beyond.

What did our panel come up with as the main developments of 2020, and what do they see as the most likely key trends for 2021? Our group this year consists of Dan Becker, Pedro Domingos, Ajit Jaokar, Ines Montani, Brandon Rohrer, Dipanjan Sarkar, Rosaria Silipo, Rachael Tatman, and Daniel Tunkelang. More so than any other year, we thank our contributors for taking the time out of their busy schedules in these tumultuous times to share their insights with our readers.

This is the first in a series of 3 such posts over the coming week. While they will be split up into research, technology, and industry, there is considerable and understandable overlap between these disciplines, and as such we recommend you check out all 3 as they are published.

Word cloud


Without further delay, here are the 2020 key trends and 2021 predictions from this year’s group of experts.

Dan Becker (@dan_s_becker) is Founder of Decision AI, and previously founded Kaggle Learn.

ML research followed some established themes this year:

  1. Transformers: GPT-3 received the most attention of any development this year, and it shows the continued evolution of Transformer models trained on huge corpora. We also saw first successful experiments to use transformers for computer vision, which was historically dominated by convolutional networks.
  2. Generative Models: Research like Vid2Player shows computer-generated video at a level of quality beyond what we’ve seen in the past. The social impact of generative models will be huge and hard to predict.
  3. Reinforcement Learning: I saw less attention to RL in 2020 than I saw in the previous couple years. But the transfer learning across tasks in One Policy To Rule Them All looks hugely promising. I expect this to be less important than GPT-3 over the next couple years, but likely far more important over a longer time horizon. Most people don’t realize the huge impact RL will have once it works more reliably.


  1. Probabilistic Programming and Bayesian Models: We’ve seen a lot of experimentation in new probabilistic programming languages. This reminds me of the experimentation I saw in deep learning frameworks 5 years ago. So I hope probabilistic programming is a key trend in 2021, though it will also require more education for users to take advantage of the new tools.
  2. GPT-4: As more people experiment with GPT-3, I think we’ll find it falls a little short of most practical usefulness… Extrapolating from recent trends, GPT-4 will be much better and will likely cross that threshold of practical usefulness.
  3. GPUs for structured data: The NVIDIA RAPIDS team is creating data science tools that promise a sudden speedup beyond anything we’ve seen in the last decade. My sense is that this software isn’t yet ready for prime time, but that could come in 2021.
  4. AutoML becomes uninteresting: Most data scientists are still tuning parameters through ad hoc experimentation. It’s a matter of time until we all use more automatic solutions, and the time may come in the next year.
  5. Reinforcement learning becomes practically useful: This is what I’m most excited about. Conventional machine learning is focused on prediction, but few data scientists optimize the decision layer that translates those predictions into real-world business decisions. This has resulted in models that are accurate without being useful. We’ll see a mindset shift in 2021 to use models for optimal decision-making in complex environments.

Pedro Domingos (@pmddomingos) is a Professor in the Dept. of Computer Science & Engineering, University of Washington.

To my mind, the main developments in 2020 were the emergence of graph neural networks and neuro-symbolic AI as major research directions. I think in 2021 we’ll see the latter subsume the former: GNNs are a limited form of relational learning, and before long we’ll have neuro-symbolic approaches that accomplish all that GNNs do, and then some. After this, where you turn the dial of representational power for specific applications is mainly the usual matter of overfitting control and scalability. At the high end, how far neuro-symbolic AI gets us toward human-level AI is the trillion dollar question.

Ajit Jaokar (@AjitJaokar) is the course director of the “Artificial Intelligence: Cloud and Edge implementations” course at the University of Oxford, and is an entrepreneur.

2020 was the year of COVID but also of tech. AI matured through MLOps deployments. The Cloud platforms (ex: AWS, Azure, GCP) continue to drive innovation in all areas of AI including AI on Edge devices. I expect to see much more innovation in this space after the acquisition of ARM by Nvidia.

In the AI world, the big trend was NLP (GPT-3 and other models). For 2021, the real question is – would few shot learner models (like GPT-3) change the way models are built? Instead of the traditional sequence of building a model from data reflecting a problem, we could flip it. We can think of just a forward pass with very large models i.e. Model – Problem – Inference. Of course, we need a massive pre-trained model like GPT-3. If this trend does take off – it will be transforming for AI over the next two years.

In 2021, traditional machine learning models could become a commodity in the sense that everyone would be using some form of basic ML or DL. So, we could shift from data science to decision science. The output of data science is a model with a performance metrics (for example accuracy). With decision science, we could take this further by suggesting actions and execute these actions. That means algorithms like reinforcement learning could be a part of 2021 and beyond

Ines Montani (@_inesmontani) is a software developer working on Artificial Intelligence and Natural Language Processing technologies, and the co-founder of Explosion.

2020 has been an extraordinary year and while we’ve seen many exciting advancements in the field, the most important developments in my opinion have been about consolidation rather than revolution. In previous years, the technology was evolving so quickly that for many companies, it was wise to wait. That calculation has changed now, and there’s a much better understanding of what projects are likely to succeed. Building prototypes and applying machine learning to business problems has never been easier but what remains challenging is closing the gap between prototyping and shipping successful projects into production. In 2021, we’ll likely keep seeing more focus on the whole lifecycle of a machine learning project: from prototype to production, and from iterative development to ongoing maintenance and monitoring.

Brandon Rohrer (@_brohrer_) is Principal Data Scientist at iRobot and Instructor at End-to-End Machine Learning.

Convolutional and recurrent neural networks are beginning to show they can’t solve every problem as well as we would like. Two papers this year sum up this trend. The Hardware Lottery ( describes how much serendipity can be involved in which algorithms rise to prominence and become entrenched as an industry standard. And the tour de force paper Underspecification Presents Challenges for Credibility in Modern Machine Learning ( casts a harsh light how we have been evaluating models and measuring progress. These are good things. In 2021, if we choose to, we can invest in exploration and in solving new families of problems.

Also, because we’ve been left no choice, we’ve sprung to develop tools and practices for remote instruction, distributed teams, and asynchronous work. The machine learning research environment of 2020 would be unrecognizable to our 2019 selves. In 2021 I predict the quality and quality of online instruction and collaboration will double.

Dipanjan Sarkar is a Data Science Lead at Applied Materials, a Google Developer Expert in Machine Learning, a published author, and an editor at Towards Data Science.

Based on my prediction last year, 2020 has rightly been the year of NLP, with transformers paving the way to solve tough problems like question answering, search and translation with ease. Explainable AI has also started moving out of the ‘inflated expectations’ Gartner Hype Cycle phase, with a lot of practical implementations being available and used to explain complex models across diverse problems and data.

For 2021, I am sure we are going to see the advent of powerful, yet efficient models especially for both vision and natural language processing. We have already seen progress with efficient transformer models like DistilBERT, Reformer and Performer. Deep Learning frameworks like TensorFlow are focusing on ML for mobile and IoT devices with TFLite and TF.js with edge and on-device computing being in-demand.

I also foresee more progress in areas pertaining to unsupervised and self-supervised learning in the field of deep learning with methodologies like SimCLR, SimSiam and SwAV which have achieved huge success in pre-training models to give a better performance during the adaptation phase. Last, but not the least, low-code automated machine learning platforms and responsible AI are two other areas to watch out for as we can definitely expect some interest advancements there.

Rosaria Silipo (@DMR_Rosaria) is Principal Data Scientist at KNIME.

In this strange year of 2020, given the uncertainty about the future, attention has focused on getting data science solutions ready to work and produce results: safe deployment, application monitoring, and secure solutions. This trend will probably continue in 2021.

Deployment remains the critical phase in a data science project, where all unnoticed mistakes from the previous steps resurface. So, in addition to the classic enterprise features, we are starting to feel the need for productionizing an application from within the training environment to avoid needless mistakes during the transfer.

Some focus in 2021 will also be on the interpretation of the data analysis process, especially in the life sciences via machine learning interpretability (MLI) or eXplainable AI (XAI) techniques for black-box models.

On a side note, I really suspect that if the COVID-isolation persists in many countries around the world, the number of books about machine learning and artificial intelligence will skyrocket.

Rachael Tatman (@rctatman) is Developer Advocate at Rasa working on natural language processing.

I know a lot of folks would probably consider GPT-3 to be a major new development in NLP this year, but I’d consider it a pretty straightforward extension of existing NLP methods on a scale that’s utterly impractical for the vast majority of NLP applications. What I personally find far more exciting is the growing trend of focusing on small, efficient models that still perform well. The first SustainNLP workshop ( is a great example of this. From a research perspective, I think finding ways to get really excellent model performance with limited data and compute resources is going to be a huge challenge in the field, but also really rewarding.

Daniel Tunkelang is an independent consultant specializing in search, discovery, and ML/AI.

In 2020, AI has continued to improve incrementally. We’ve seen iterations on transformer-based models for natural language understanding and generation, the most notable being OpenAI’s GPT-3. Autonomous vehicles continue to be almost ready for mainstream use. More broadly, AI has moved from being a buzzword to a critical capability for companies in all industries.

Meanwhile, 2020 has been dominated by the Covid-19 pandemic. While AI has played a part in the fight against the virus, what’s been more interesting is how, because of the pandemic, most people working on or studying machine learning are doing so from home. If the mainstream acceptance of remote work and education persists after the pandemic — which seems likely — then we can look forward to two competing trends. On one hand, AI expertise will become truly global rather than tied to specific hubs. On the other hand, technology powerhouses will recruit talent globally, at the expense of smaller regional firms.

And yet, even as remote work drives the globalization of AI, the growing conflict between the United States and China splinters it. It seems likely that we will spend the next decade in an AI arms race.

ps. On November 30, the submission deadline for this post, DeepMind researchers announced that their AlphaFold system has solved the Critical Assessment of protein Structure Prediction (CASP) grand challenge by predicting protein folding grand challenge with revolutionary accuracy and speed. It’s too early to digest this announcement, but this may indeed turn out to be the biggest AI breakthrough of 2020.


Read More …


Write a comment