You are here:  Home  >  Education Resources For Use & Management of Data  >  Data Daily | Data News  >  Current Article

New On The Speech Recognition Scene: Droids With NLP Processors And More

By   /  July 24, 2013  /  No Comments

There are new Motorola Droid devices in town: The three Verizon Android 4.2 smartphones unveiled at a press event yesterday include the Motorola Droid Mini, Ultra and Maxx. The line includes what the company touts as the longest-lasting 4G LTE smartphone in the Maxx, with the vendor claiming 48 hours on a single charge, and what it says is the thinnest 4G LTE smartphone around in the Ultra. The smartphones reportedly all come with a unique Kevlar fiber 3D unibody design and a few months’ free Google Music All Access subscription, too. But what will catch the eyes of readers of this blog is the proprietary Motorola X8 Mobile Computing System that’s behind the sleek-looking handsets.

In addition to the graphics and application processor cores found within the eight-core System are two new low-power cores, one to power contextual computing and one aimed at natural language processing. The natural language processor, which includes speech recognition technology, audio sensors, and noise cancellation, kicks in to handle voice-related requests for the phones’ Touchless Control feature. According to this report, the low-power specialized processor enables always-on voice interaction without sacrificing battery life. It works in response to “OK Google Now”, as shown here in a video of the announcement demo in which Google’s intelligent personal assistant service is called on to call a (pretend-) lost Droid. According to Motorola, which was bought by Google last year, the X8 natural language processor responds to a user speaking the “OK Google Now” phrase, in quest of news, calendar or other information is needed, whether the device is on or off and whether or not the user is holding the phone.

The trend clearly is underway for advancing how speech recognition/natural language interfaces are becoming a de facto part of the smartphone experience, married to artificial intelligence, Knowledge Graphs, and predictive search engines. Without getting into the hoary details of the lawsuits underway around Apple, Samsung and Google over whose infringing on whose voice patents, Apple and Siri got the ball rolling in a big way, with Samsung and its S-Voice, Android handsets with Google Now, Blackberry OS 10 devices with their voice recognition and control systems, and Windows Phone’s Speech adding to the mix.

Microsoft, in fact, last month said it made voice recognition on Windows Phones more accurate (by 15 percent) and twice as fast. The Bing speech team worked with Microsoft Research to use Deep Neural Network technology, a computational framework for automatic pattern recognition, to achieve the results for Bing Voice Search. “By coupling MSR’s major research breakthroughs in the use of DNNs with the large datasets provided by Bing’s massive index, the DNNs were able to learn more quickly and help Bing voice capabilities get noticeably closer to the way humans recognize speech,” the Bing speech team wrote in the blog posting.

And things in the speech space continue to get even more interesting. Also this week, news came that Intel is developing an accelerator to improve voice recognition from Nuance, maker of Dragon voice recognition technology, which powers as many as 6 billion connected devices.

Some vendors even are trying to move the needle from understanding what users are saying to how they’re saying it, whether that message comes across mobile or other devices. BeyondVerbal, for example, yesterday announced that it’s received an addition $1 million from Israel-based startup investment fund Winnovation for research and development and business development, adding to the $2.8 million the Israeli startup received earlier to launch its business. Its emotion analytics technology and artificial intelligence, according to the company, “can extract, decode, and measure a full spectrum of human emotions from a person’s raw voice via a set of emotional detection engines that allow devices and applications to understand an individual’s mood, attitude, and decision-making characteristics as they speak.”

About the author

Jennifer Zaino is a New York-based freelance writer specializing in business and technology journalism. She has been an executive editor at leading technology publications, including InformationWeek, where she spearheaded an award-winning news section, and Network Computing, where she helped develop online content strategies including review exclusives and analyst reports. Her freelance credentials include being a regular contributor of original content to The Semantic Web Blog; acting as a contributing writer to RFID Journal; and serving as executive editor at the Smart Architect Smart Enterprise Exchange group. Her work also has appeared in publications and on web sites including EdTech (K-12 and Higher Ed), Ingram Micro Channel Advisor, The CMO Site, and Federal Computer Week.

You might also like...


Data Governance: Making Big Data Manageable

Read More →