Four Predictions for Natural Language Processing in 2021

Click to learn more about author David Talby.

2020 has been a year of massive growth for applied natural language processing (NLP). Even in the wake of COVID-19 and stunted IT budgets, a recent study showed that NLP spending increased 10-30 percent across organization industries, company sizes, and geographies (Gradient Flow). NLP tools can be implemented in various forms, in verticals from retail to finance to healthcare, and have the power to improve patient and customer experiences, reduce the need for human input, and even help save lives.

Take eCommerce, for example. NLP is widely implemented to chat with customers and listen to conversations to understand sentiments, topics of interest, answer questions, and filter fake and toxic content. In finance, NLP serves as the eyes and ears of FinTech organizations, as algorithms are both reading and writing financial news, from SEC filings to tweets. In healthcare, the powerful NLP named entity recognition (NER) and classifiers are enabling data scientists to detect and prevent adverse drug events — a huge burden to the healthcare system and patients.

With these use cases and many more, it’s clear that 2021 is poised for even greater strides in NLP, and there are several trends driving this uptick in adoption and spending. While there are many contributing factors, below are four of the key trends that will have a major impact on the NLP industry over the next year.

1. State-of-the-Art Models Are Reduced to a Single Line of Code

Democratizing natural language processing is a surefire way to guarantee continued growth in the field and enables practitioners of all skill levels to start realizing its benefits. Fortunately, running many of the most accurate and complex deep learning models in history has been reduced to a single line of Python code. This lowers the bar of entry significantly for those just getting started. By reducing the requirements to one line of code, people who know nothing about NLP can get started. Even if you’re a data scientist who knows how to train models, this ease-of-use enables a level of automation that gives you time for more complex undertakings.

Having a formal education in the field and hands-on experience with the core NLP, deep learning, and transfer learning libraries to the core NLP libraries used to be necessary to apply the technology in practice. Take sentiment analysis, for example: Inferring that “a beautiful day” is a positive statement was something you would need a data scientist to train, while today, full emotion analysis (i.e., being able to tell between joy, fear, surprise, and sadness) is available out-of-the-box in many languages. Many entry barriers are things of the past, and as the technology improves, it will be accessible to even more people.

2. Auto NLP: Who Needs a Data Scientist Anyway?

When it comes to code, what’s better than one-liners, like the Python example mentioned above? How about a code-free process entirely? While you still have to train your own models to understand domain-specific text, Auto-NLP is coming fast behind the Auto-ML trend to make this possible. Implied by the word itself, automation allows everyone to join the machine learning community and does not restrict the technology to data scientists and software engineers.

Not only does Auto-NLP serve to close a skills gap, but it’s performing surprisingly well. In fact, new research investigating state-of-the-art Auto-ML frameworks found that Auto-ML tools are performing better or on par with their human counterparts. While Auto-ML and -NLP tools cannot serve as standalone solutions just yet, they can complement the skills of data scientists to reduce time spent on mundane tasks while helping novices get acquainted with the technology.

3. Better Model Discovery, Search, and Curation

While putting models at the fingertips of eager users is great, the larger the selection of models becomes, the harder it is to find the one you should actually use for your next project. Just think of the number of publicly available NLP models that have exploded over the last few years by the TensorFlow, PyTorch, and Hugging Face communities. Hugging Face, for example, enables anyone to upload models for free and now has more than 3,000 models to choose from. That makes it exceedingly difficult to find the one that best meets your needs.

Model hubs are quickly improving to help users get this right — with better-faceted search, curated “most popular” and “top-rated” suggestions, and smarter ranking of search results. The Spark NLP model hub takes a different approach by limiting community uploads but providing official support for all the published models as part of the library. This means that models and pipelines for each NLP task are regularly updated, regularly replaced when a better state-of-the-art algorithm, model, or embedding becomes available, and that licensed customers can depend on enterprise support in case they encounter issues.

4. NLP Goes Multilingual

Historically, the highest quality NLP software was built for English and then for Mandarin Chinese. Now, companies like Google and Facebook are publishing pre-trained embeddings for 150-plus languages as free and open source. NLP libraries are following suit, too. Take Spark NLP, for example, which now offers models in 46 languages. This level of multilingual support was unheard of just a few years ago, so this is a huge step for inclusion and diversity, putting NLP in the hands of data scientists all over the globe.

According to the aforementioned NLP survey, language support was listed as one of the biggest challenges technical leaders cited when it comes to the technology. Thanks to recent advances like language-agnostic sentence embeddings, zero-shot learning, and the public availability of multilingual embeddings, open source libraries that support dozens of languages are becoming the norm for the first time.

These trends all have one thing in common: They are democratizing NLP. With more accurate software becoming easier to apply, better tools for finding and using the best models, and widespread access to the technology, 2021 will be another year of significant growth for natural language technology.

LISTEN NOW: MY CAREER IN DATA PODCAST

Data Topics

Four Predictions for Natural Language Processing in 2021

Leave a Reply Cancel reply