AI Is Booming with Image Recognition, but Audio Recognition Lags Behind

By on

Click to learn more about author Rachel Roumeliotis.

Artificial Intelligence (AI) has made considerable inroads in the enterprise. Image recognition technology, in particular, has been gaining steam, helping users achieve tasks from assisting in screening and diagnosis of disease through medical imaging, to enabling self-driving cars to accurately interpret their surroundings. However, there is still a ways to go before this is ready for consumers. 

Image recognition has almost become synonymous with AI, as we think of applications such as augmented and virtual reality, to more practical applications such as computer vision. This technology uses digital images and videos to gain stronger insights from users. In fact, in many cases, we’re interacting with computer vision applications, such as facial recognition, in our daily lives without thinking twice. 

While image recognition technology is having a moment, the same can’t necessarily be said for speech recognition. Despite audio and visual components often going hand-in-hand to create a cohesive entity, this doesn’t ring true in AI. In fact, it’s quite the opposite. And recent research delves into these areas of AI, among others. 

My company’s survey, which explores the opinions of consumers vs. AI creators – those working to develop AI, including CTOs, data scientists, software engineers, solutions architects, and IT directors – found that in a work environment, image recognition (34.6%) was ranked one of the top three areas of AI technology where the most progress is being made. Trailing just behind automation, image recognition is already providing business value from supply chain management in manufacturing to powering surveillance and security systems. 

The framework for image recognition is already taking hold among technical workers too. PyTorch is an optimal tool for this, and it’s growing fast. Another survey found that PyTorch grew its share to more than 36% last year. Couple this with its easy usability – a majority of respondents (55%) found PyTorch to be very useful – and you have a recipe for AI success. 

In contrast, audio recognition was ranked one of the least used AI technologies, mentioned by only 13.2% of respondents. While image recognition technology is being productized, there are fewer use cases for audio recognition, at least for now. Simple speech recognition is already enough to help power chatbots and carry out basic speech-to-text functions. Customers aren’t yet asking for more advanced features, such as the ability to detect different voices. Unlike image recognition technology, the ROI is not there from a business perspective. 

However, there are some practical areas of audio recognition that show big promise. Of survey respondents, 27% expressed a desire for voice-to-text translation as an AI function to help with work, while 19% stated immediate translations are an exciting area of development. It’s easy to understand why: Processing voice and text is critical to capturing data, and there is a lot of it to be processed. Customer service and conversational commerce are two examples of how AI-enabled audio recognition is improving operational efficiency. 

In the enterprise, it’s clear that image recognition is outpacing its audio counterpart – a theme that also tracks on the consumer side. When asked about the most useful areas in consumer AI technology, 79% of respondents indicated health and fitness insights, such as Apple Health, as a space to watch. Further, 47% agreed that detailed health insights were one of the most exciting areas for AI development. Health insights that incorporate image recognition and analysis can have a huge impact on humanity and will only grow with the proliferation of more personalized health care expectations. 

On the other hand, virtual assistants, like Siri and Alexa, which incorporate audio technology, were only found useful by 7% of respondents. Despite this, 30% indicated that they are excited for AI to develop in this area. This is a hopeful outlook, but as it stands, usability and privacy concerns could be a hindrance to progress. Like most emerging technology, we’re also not as used to interacting with computers via voice yet. This is poised for change, but it will take time. 

It’s clear that both image and audio recognition technology are areas of AI with great potential in the enterprise and in everyday life. Both will continue to make appearances in our work and home environments, but the demand and applications for image recognition are leading the charge. That said, we shouldn’t count out audio recognition, and it will be interesting to see how it evolves over the next few years. 

Leave a Reply

We use technologies such as cookies to understand how you use our site and to provide a better user experience. This includes personalizing content, using analytics and improving site operations. We may share your information about your use of our site with third parties in accordance with our Privacy Policy. You can change your cookie settings as described here at any time, but parts of our site may not function correctly without them. By continuing to use our site, you agree that we can save cookies on your device, unless you have disabled cookies.
I Accept