by Angela Guess
Chris Jager recently wrote for LifeHacker, “Before a computer can even understand what you mean, it needs to be able to understand what you said. This involves a complex process that includes audio sampling, feature extraction and then actual speech recognition to recognise individual sounds and convert them to text. Researchers have been working on this technology for many years. They have developed techniques that extract features in a similar way to the human ear and recognise them as phonemes and sounds that human beings make as part of their speech. This involves the use of artificial neural networks, hidden Markov models and other ideas that are all part of the broad field of artificial intelligence. Through these models, speech-recognition rates have improved. Error rates of less than 8% were reported this year by Google.”
Jager goes on, “But even with these advancements, auditory recognition is only half the battle. Once a computer has gone through this process, it only has the text that replicates what you said. But you could have said anything at all. The next step is natural language processing. Once a machine has converted what you say into text, it then has to understand what you’ve actually said. This process is called ‘natural language processing.’ This is arguably more difficult than the process of voice recognition, because the human language is full of context and semantics that make the process of natural language recognition difficult. Anybody who has used earlier voice-recognition systems can testify as to how difficult this can be. Early systems had a very limited vocabulary and you were required to say commands in just the right way to ensure that the computer understood them.”
photo credit: Flickr/ diongillard