by Angela Guess
A new press release reports, “Baidu, Inc., China’s leading search engine, announced today it is making available Chinese language APIs for its four key speech technologies: Long Utterance Speech Recognition, Far-Field Speech Recognition, Expressive Speech Synthesis and Wake Word. The announcement coincides with the three-year anniversary of Baidu’s speech API launch. The speech APIs are the latest in a series of AI-based technologies Baidu has released publicly, these include facial recognition, optical character recognition, natural language processing and others. In September, the company also open sourced its deep learning framework PaddlePaddle, an easy-to-use platform allowing developers to apply deep learning to their products and services. ‘We are at the dawn of the AI era. By opening our AI technologies, we will make it easier for everyone to create AI-enabled applications,’ says Andrew Ng, chief scientist of Baidu.”
The release goes on, “The newly released speech technologies are being used in a range of products and services from Baidu and its partners. Long Utterance Speech Recognition enables products to automatically transcribe long audio clips such as interviews, speeches and lectures. Far-Field Speech Recognition enables the recognition of speech from audio sources that are up to 16 feet away, such as voice controlled televisions. Baidu’s deep learning-based Expressive Speech Synthesis provides a collection of realistic voices, differing in tone and accents, that can be used for devices to read audio books or news aloud –a service already available in Baidu’s products to enhance users’ experiences. With Wake Word technology (previously released as an earlier version) developers can create customized short words or phrases that can be spoken to “wake up” devices, without additional user input needed. For example, a user can take a selfie with his or her phone by just uttering the word ‘cheese’.”
Read more at Marketwired.
Photo credit: Baidu