“Think about just how interconnected the world is now,” said Matt Gould, the co-founder of Arria NLG, a prominent enterprise in the development and deployment of Natural Language Generation (NLG) technologies worldwide. “Think about it from just a personal context. How much data are you generating personally every day?” Modern, connected humans interact constantly online with computers, mobile phones, and many other devices. They pay bills, watch movies, purchase products, interact with medical professionals, use fitness apps, listen to music, and work online.
“That whole drifting miasma of invisible data is spilling off you constantly and consistently, and it’s happening for at least half the world’s population now,” said Gould during a recent DATAVERSITY® phone interview. “Now, the big challenge with that huge generation of data is its synthesis.” Such a synthesis requires understanding to be derived from it in a comprehensible and easily digestible fashion. Take that enormous amount of data coming off individuals and increase it exponentially upwards, and you then reach the amounts of Big Data currently being captured and stored in global enterprises.
At this point in the history of Data Management the actual analytics of such vast quantities of data is still quite immature. The Business Intelligence (BI) industry has created a multitude of tools that range from simple descriptive analysis through considerably more advanced prescriptive analysis with such developments as Machine Learning, Data Science, Artificial Intelligence, and others. Such tools can effectively – as long as all the proverbial data ducks are in a row – help aggregate, organize, analyze, and display data into various forms such as visualizations, graphs, dashboards, and the like. But, as discussed in a white paper by Dr. Robert Dale, the Chief Strategy Scientist at Arria NLG, “[C]urrent business intelligence tools, information visualization applications and dashboards only go so far.” It still takes an expert, or in many cases, a meeting of experts to interpret the data in comprehensible ways for people to understand – no matter whether they are marketing analysts, IT specialists, C-level executives, researchers, or consumers. Someone has to sit down, write out an explanation of the data, and be able to present it in a clear format… or put simply, in human language. Gould noted:
“Nothing currently can give personalized communication directly to you in language. The best they can do is prepared statements that someone has thought about before. That’s the crisis, that’s the problem. We’ve created this huge global machine, this global mind that knows stuff. It knows so much stuff — Big Data stuff — but it can’t tell us, it can’t speak to us. It can’t talk unless we pre-program it with answers to questions it thinks we’re going to ask it, so we are changing that. Our Natural Language Platform can turn that data, all that stuff that the Internet knows about you and your business, and provide it to you.”
What is Natural Language Generation?
One way to answer such a question is to discuss what it is not. According to Gould, “[I]t’s not just templates. It’s a rich narrative, generated from scratch.” He discussed the Google Translate system where a user puts in one language and out comes a similar phrase in another. “It’s amazing,” he said. “And it’s pretty accurate.” The Google system is essentially trying to simulate what the mind does at a very basic level. A user types in a word, phrase, or sentence and the system compares it, looks for nuances in context as best it can, and gives an answer. “But at no point did the system actually know or understand what is being asked of it,” said Gould. “It didn’t need to. It just matched it. That’s what the NLG system does not do… What is does is what your mind does. It starts with a process of data.”
That Arria NLG data process begins with more than thirty years of research and experience from some of the best global scientists in computational linguistics and heuristics, “a whole lot of algorithms,” and numerous technologies and patents within a purpose-built, multilevel NLG engine. Some of them include explicit reasoning, data mining, pattern recognition, space-time analytics, criticality assessment, document planning, sentence aggregation, lexical choice, referring expression generation, linguistic realization, and many others which in the end, according to Gould, turns all that data into written or spoken language, in the same way the human brain does:
“It’s revolutionary in the simplest and most fundamental way. It’s not a self-driving car, it’s not a space robot, it’s not a new kind of battery. It’s so fundamental and natural. We’re giving the Internet a voice, where the Internet can speak to you, generate its own language in real time and not rely on people writing things and storing things for it to reproduce at the right time.”
In the interview, Gould gave many use cases. The technical overview, white papers, and case studies on the Arria NLG site give many more. At a personal/consumer level, one such example is of a person out mountain biking and stopping to stand in a field many miles outside of town, wondering about the weather report. In today’s world, they can get a report that was written by a meteorologist many hours ago (if not a couple of days ago) about a town that may or may not be near the field they are standing in. It might show wind direction (in that particular town), an estimated temperature based on averages and time of day, etc. “It will either be too old or it will be simplistic, and it will not be relevant to you,” remarked Gould. The full application of Natural Language Generation by Arria will allow that person to get an individual, written or spoken account of the weather in that field, based on the fact that they are likely mountain biking (since that is what they often do while out in the country). They may learn that it’s forty-five minutes until sunset, and there are head winds going back towards where they live (among thousands of other personal details the Internet already knows about all of us that can be included in the individualized report). That is only a single example of the personal applications of NLG.
Actionable Analytics in Human Terms
The business narrative coming out of many cubicles, board rooms, and various staff members puzzling over dashboards and spreadsheets is that there is too much data and it’s too hard to compile all together into meaningful information – data is useless in and of itself. It must become an information asset before real insight is gained. According to Gould:
“Look at what happens when you are a large company and you’ve spent millions, literally millions on CRM systems and ERP systems, stock control systems, and point-of-sale systems, to make it as efficient as possible to manage your business. You’ve integrated all these systems, or you hope you have anyway, and that has cost you a lot. Now the system is reporting to you on the health of the entire system. It knows what is going on; it knows what is happening from the factory floor to the supermarket shelf or Internet web page.”
How does that system, or conglomeration of systems, then communicate back to the humans who have to derive actionable insights from those systems? “If you want to know the health of the company right now,” remarked Gould, “you have to knock on the wall of that big virtual data center and ask it.” But it then gives back a range of reports, visualizations, spreadsheets, and other documents that might be useful or they might not be. “You then have to have a CFO, or Chief Marketing Officer, or stock guy, or IT specialist, or whomever explain what it all means.” And days, weeks, or months may pass without any actionable insights being gained from all this data as more and more is collected ad infinitum.
The Arria NLG engine allows an enterprise to analyze all that information and present it back to whatever human needs it, for whatever reason, in actual written or natural language:
“This isn’t mail merge or document assembly. It isn’t filling slots in templates. It’s about taking the data source and using domain knowledge to massage and aggregate that data, identifying packets of information that can be expressed linguistically, then using rich knowledge of language to work out how best to express that information in text or voice.”
It literally takes an organization’s data and transforms it into language, not standard computer-generated text that is overly technical and difficult to read, but natural human language that reads like a literate and well-educated person wrote it. Humans, and our forebears, have been communicating in natural language for around 100,000 years or so. It’s through the technologies embedded in a multilevel Natural Language Generation platform like Arria’s that computers can now present actionable intelligence gained from all an enterprise’s data assets and give that business real insight in the same, natural way. Gould put it this way:
“We’ve taken the CFO, the CEO, the data analyst, and we’ve spent time with them. We’ve looked at how they write. We’ve looked at how their minds respond to the data, and we’ve reverse-engineered their writing — their minds — into our NLG platform, and now it’s acting as they do in response to the data.”