The State of Natural Language Processing: 5 Trends Shaping the Industry

Click to learn more about author Ben Lorica.

Natural Language Processing (NLP) has been on the rise for several years, and for good reason. With the ability to identify new variants of COVID-19, improve customer service, and significantly refine search capabilities, use cases are expanding as the technology proliferates. While some verticals have adopted NLP faster than others, new global research shows that budgets are growing across industries, geographies, company size, and levels of expertise.

In its second year, John Snow Labs’ and Gradient Flow’s NLP Industry Survey shows investments in NLP have jumped from at least 10% to nearly doubling for a majority of technologists. But even as budgets rise, NLP is not void of challenges, and barriers to entry still remain – especially for those early on in their artificial intelligence (AI) journeys.

In order to get ahead of these roadblocks, it’s important to understand the trends driving NLP adoption in the enterprise. Here are five that technologists should keep in mind.

1. Popular Use Cases Remain Practical – and Expanding

For the second year, named entity recognition (NER) and document classification were named the primary use cases for NLP among tech leaders. Companies further along the NLP adoption curve tend to leverage NER at a higher rate compared to less mature companies. This signifies that NER is a building block for NLP and a good place for organizations to get started. As the technology becomes more sophisticated, we can expect growth in Q&A and natural language generation use cases powered by large language prediction models and related open-source alternatives.

2. NLP Data Sources Remain Consistent

Text fields in databases, files, and online content are the main data sources powering NLP. While files like PDFs are cited as one of the main sources, there are Data Quality issues inherent with extracting text from this type of document. While deep learning models have made advances, it can still be more cost-effective to scan a PDF and apply optical character recognition (OCR) – treating the document more like an image – to extract its text before using an NLP library. It’s encouraging to see these advances, as it’s likely that data sources will remain the same in years to come.

3. Accuracy Is a Top Priority – and Challenge

Akin to extracting text from a PDF, accuracy poses big challenges for NLP practitioners. In fact, 40% of survey respondents agreed accuracy was a top priority when evaluating an NLP solution. In many cases, this requires not only a data scientist but a domain expert to achieve. Here’s why: models need to be tuned and customized for their specific purpose. It’s why a model trained on patient data in a health care setting will not perform the same in retail. What’s more, because NLP projects involve pipelines, where the results from a previous task are used downstream, accuracy is extremely important from the get-go. It’s an ongoing process that requires constant monitoring and tweaking.

4. Cloud NLP Solutions Are Widely Used

Most technologists using NLP are using a cloud service, whether supplemented with another solution or exclusively. In fact, 83% of survey respondents stated that they used at least one of the following NLP cloud services: AWS Comprehend, Azure Text Analytics, Google Cloud Natural Language AI, or IBM Watson NLU. Despite their popularity, difficulty tuning models and cost were two top challenges mentioned by tech leaders. And as mentioned before, in order to keep accuracy top of mind, models need to be regularly tuned.

5. NLP Libraries Are Gaining Steam

The now expansive ecosystem of tools and libraries available make this a great time to get started with NLP. Many of these libraries can be used together, and most NLP developers take this approach for their projects. Not only are there many open-source libraries with active communities to choose from, but new models and improvements are being integrated at a rapid pace. A third of all respondents reported using Spark NLP, making it the most popular NLP library, while more than half reported using at least one of the following NLP libraries popular within the Python ecosystem – Hugging Face, spaCy, NLTK, Gensim, or Flair. As with cloud services, there are many options for users to choose from, and often, an amalgamation of NLP tools is the best solution.

With NLP investments continuing to trend upward, it will be interesting to see how the technology matures post-pandemic. But despite its many benefits and expanding use cases, it’s important to keep its shortcomings in mind when implementing the technology and making sure it thrives over time.

TAKE OUR DATA MANAGEMENT CERTIFICATION PREP COURSES

Data Topics