A Brief History of Natural Language Processing (NLP)

By on

In the early 1900s, a Swiss linguistics professor named Ferdinand de Saussure died, and in the process, almost deprived the world of the concept of “Language as a Science.” From 1906 to 1911, Professor Saussure offered three courses at the University of Geneva, where he developed an approach describing languages as “systems.” Within the language, a sound represents a concept – a concept that shifts meaning as the context changes.

He argued that meaning is created inside language, in the relations and differences between its parts. Saussure proposed “meaning” is created within a language’s relationships and contrasts. A shared language system makes communication possible. Saussure viewed society as a system of “shared” social norms that provides conditions for reasonable, “extended” thinking, resulting in decisions and actions by individuals. (The same view can be applied to modern computer languages).


Get our weekly newsletter in your inbox with the latest Data Management articles, webinars, events, online courses, and more.

Saussure died in 1913, but two of his colleagues, Albert Sechehaye and Charles Bally, recognized the importance of his concepts. (Imagine the two, days after Saussure’s death, in Bally’s office, drinking coffee and wondering how to keep his discoveries from being lost forever). The two took the unusual steps of collecting “his notes for a manuscript,” and his students’ notes from the courses. From these, they wrote the Cours de Linguistique Générale, published in 1916. The book laid the foundation for what has come to be called the structuralist approach, starting with linguistics, and later expanding to other fields, including computers.

In 1950, Alan Turing wrote a paper describing a test for a “thinking” machine. He stated that if a machine could be part of a conversation through the use of a teleprinter, and it imitated a human so completely there were no noticeable differences, then the machine could be considered capable of thinking. Shortly after this, in 1952, the Hodgkin-Huxley model showed how the brain uses neurons in forming an electrical network. These events helped inspire the idea of Artificial Intelligence (AI), Natural Language Processing (NLP), and the evolution of computers.

Natural Language Processing

 Natural Language Processing (NLP) is an aspect of Artificial Intelligence that helps computers understand, interpret, and utilize human languages. NLP allows computers to communicate with people, using a human language. Natural Language Processing also provides computers with the ability to read text, hear speech, and interpret it. NLP draws from several disciplines, including computational linguistics and computer science, as it attempts to close the gap between human and computer communications.

Generally speaking, NLP breaks down language into shorter, more basic pieces, called tokens (words, periods, etc.), and attempts to understand the relationships of the tokens. This process often uses higher-level NLP features, such as:

  • Content Categorization: A linguistic document summary that includes content alerts, duplication detection, search, and indexing.
  • Topic Discovery and Modeling: Captures the themes and meanings of text collections, and applies advanced analytics to the text.
  • Contextual Extraction: Automatically pulls structured data from text-based sources.
  • Sentiment Analysis: Identifies the general mood, or subjective opinions, stored in large amounts of text. Useful for opinion mining.
  • Text-to-Speech and Speech-to-Text Conversion: Transforms voice commands into text, and vice versa.
  • Document Summarization: Automatically creates a synopsis, condensing large amounts of text.
  • Machine Translation: Automatically translates the text or speech of one language into another.

NLP Begins and Stops

Noam Chomsky published his book, Syntactic Structures, in 1957. In it, he revolutionized previous linguistic concepts, concluding that for a computer to understand a language, the sentence structure would have to be changed. With this as his goal, Chomsky created a style of grammar called Phase-Structure Grammar, which methodically translated natural language sentences into a format that is usable by computers. (The overall goal was to create a computer capable of imitating the human brain, in terms of in thinking and communicating, or AI.)

In 1958, the programming language LISP (Locator/Identifier Separation Protocol), a computer language still in use today, was released by John McCarthy. In 1964, ELIZA, a “typewritten” comment and response process, designed to imitate a psychiatrist using reflection techniques, was developed. (It did this by rearranging sentences and following relatively simple grammar rules, but there was no understanding on the computer’s part.) Also in 1964, the U.S. National Research Council (NRC) created the Automatic Language Processing Advisory Committee, or ALPAC, for short. This committee was tasked with evaluating the progress of Natural Language Processing research.

In 1966, the NRC and ALPAC initiated the first AI and NLP stoppage, by halting the funding of research on Natural Language Processing and machine translation. After twelve years of research, and $20 million dollars, machine translations were still more expensive than manual human translations, and there were still no computers that came anywhere near being able to carry on a basic conversation. In 1966, Artificial Intelligence and Natural Language Processing (NLP) research was considered a dead end by many (though not all).

Return of the NLP   

It took nearly fourteen years (until 1980) for Natural Language Processes and Artificial Intelligence research to recover from the broken expectations created by extreme enthusiasts. In some ways, the AI stoppage had initiated a new phase of fresh ideas, with earlier concepts of machine translation being abandoned, and new ideas promoting new research, including expert systems. The mixing of linguistics and statistics, which had been popular in early NLP research, was replaced with a theme of pure statistics. The 1980s initiated a fundamental reorientation, with simple approximations replacing deep analysis, and the evaluation process becoming more rigorous.

Until the 1980s, the majority of NLP systems used complex, “handwritten” rules. But in the late 1980s, a revolution in NLP came about. This was the result of both the steady increase of computational power, and the shift to Machine Learning algorithms. While some of the early Machine Learning algorithms (decision trees provide a good example) produced systems similar to the old school handwritten rules, research has increasingly focused on statistical models. These statistical models are capable making soft, probabilistic decisions. Throughout the 1980s, IBM was responsible for the development of several successful, complicated statistical models.

In the 1990s, the popularity of statistical models for Natural Language Processes analyses rose dramatically. The pure statistics NLP methods have become remarkably valuable in keeping pace with the tremendous flow of online text. N-Grams have become useful, recognizing and tracking clumps of linguistic data, numerically. In 1997, LSTM recurrent neural net (RNN) models were introduced, and found their niche in 2007 for voice and text processing. Currently, neural net models are considered the cutting edge of research and development in the NLP’s understanding of text and speech generation.

After the Year 2000

In 2001, Yoshio Bengio and his team proposed the first neural “language” model, using a feed-forward neural network. The feed-forward neural network describes an artificial neural network that does not use connections to form a cycle. In this type of network, the data moves only in one direction, from input nodes, through any hidden nodes, and then on to the output nodes. The feed-forward neural network has no cycles or loops, and is quite different from the recurrent neural networks.

In the year 2011, Apple’s Siri became known as one of the world’s first successful NLP/AI assistants to be used by general consumers. Within Siri, the Automated Speech Recognition module translates the owner’s words into digitally interpreted concepts. The Voice-Command system then matches those concepts to predefined commands, initiating specific actions. For example, if Siri asks, “Do you want to hear your balance?” it would understand a “Yes” or “No” response, and act accordingly.

By using Machine Learning techniques, the owner’s speaking pattern doesn’t have to match exactly with predefined expressions. The sounds just have to be reasonably close for an NLP system to translate the meaning correctly. By using a feedback loop, NLP engines can significantly improve the accuracy of their translations, and increase the system’s vocabulary. A well-trained system would understand the words, “Where can I get help with Big Data?” “Where can I find an expert in Big Data?,” or “I need help with Big Data,” and provide the appropriate response.

The combination of a dialog manager with NLP makes it possible to develop a system capable of holding a conversation, and sounding human-like, with back-and-forth questions, prompts, and answers. Our modern AIs, however, are still not able to pass Alan Turing’s test, and currently do not sound like real human beings. (Not yet, anyway.)

Image used under license from

Leave a Reply

We use technologies such as cookies to understand how you use our site and to provide a better user experience. This includes personalizing content, using analytics and improving site operations. We may share your information about your use of our site with third parties in accordance with our Privacy Policy. You can change your cookie settings as described here at any time, but parts of our site may not function correctly without them. By continuing to use our site, you agree that we can save cookies on your device, unless you have disabled cookies.
I Accept