Lost in Translation: Language Gap Holds Back GenAI in Life Sciences Industries

*Read more about author Sanmugam Aravinthan.*

Across industries, organizations continue to seek out a range of use cases in which to deploy advanced intelligence. With the development of generative artificial intelligence (GenAI), various industries are leveraging the technology to process and analyze complex data, identify hidden patterns, automate repetitive tasks and generate creative content. The promise of GenAI is transformative, offering the potential to revolutionize the way we interact with information. However, there exists one crucial caveat to the rapid deployment of this emerging technology: its current language limitations.

Most of the information used to train GenAI comes from internet sources and databases primarily written in English and other globally dominant languages, like Spanish and Chinese. Considering that around the globe there are over 7,000 languages spoken, it can be assumed that these languages, and therefore populations, are neglected in current iterations of GenAI training models and databases. It’s crucial to recognize that the current language limitations of GenAI could have unintended consequences, potentially disadvantaging diverse populations across the globe by leading to biased results or limited access to information.

Life Sciences Use of Generative Artificial Intelligence

The life sciences industry, grappling with an influx of information, is a primary candidate for the use of GenAI technology. A recent survey indicates that industry investment in GenAI has more than tripled in just four months, jumping from 13% to 46%. This industry, at the forefront of discovery, is undoubtedly seeking out cutting-edge technology to improve critical insights, accelerate research timelines and advance data analysis.

In fact, GenAI has demonstrably improved various aspects of life science industry challenges. With its advanced capabilities, organizations have improved signal detection, data integration, and automated reporting, which are critical to safety surveillance. Currently, this technology can be used to forecast and detect adverse drug or medical device reactions across various platforms, such as social media. By training GenAI technology to recognize patterns through ontologies and character recognition, organizations can better predict and identify adverse events, ultimately improving patient safety.

GenAI can also be used to assess clinical data and identify potential patients for clinical trials for promising new treatments. This can increase patient recruitment and in time reduce overall trial durations. GenAI has even expanded its capabilities to chatbot features, providing a new accessible resource for patients. This technology enables the gathering of patient symptoms and provides recommendations based on provided symptoms. Not only does this improve patient engagement, but it decreases the burden on healthcare professional staff.

Despite the incredible capabilities of GenAI, limited language fluency hinders its ability to improve outcomes for diverse populations. Current AI and GenAI models have a blind spot for non-English speaking patients, preventing GenAI capabilities, such as early detection of adverse events, participant identification and chatbot capabilities from revolutionizing patient outcomes.

Addressing Generative Artificial Intelligence’s Language Blind Spot

The digital language divide carries much greater implications for the use of advanced technology across various industries. However, expanding GenAI capabilities in the life sciences industry carries the potential to dramatically improve patient outcomes. Addressing GenAI’s language gap now will help ensure that future technologies will be able to rely on larger databases of various languages used around the world. Expanding training models to include multilingual data, diverse patient data and language-agnostic development will be critical to the increased accessibility of GenAI in healthcare and life science efforts around the globe.

So, what steps can be taken to develop technology that safeguards patients worldwide?

Increasing the number of “high-resource” languages, otherwise known as languages that maintain enough data to effectively train language-based systems, such as GenAI, is a critical first step. Expanding the number of languages that reach this categorization will require improved global access to digital devices and services. One significant reason that these languages have such a small digital footprint is the lack of access to digital services, hence the scarce amount of trainable data. Improving global accessibility through high-speed broadband and internet-enabled devices can help close the gap between high and low-resource languages.

Similarly, fortifying language capabilities within GenAI technology refers to not only the type of language but also language variation and dialect. Linguistic biases against non-standard forms of language can be just as detrimental to protecting patient safety and outcomes as language sources. Limiting the language variation inputs in GenAI technology can lead to unintended biases. GenAI’s ability to understand real-world conversations, vernacular, slang and code-switching is critical to its success in detecting abnormalities and potential concerns related to patient outcomes.

Global Implications of Generative Artificial Intelligence

As GenAI continues to permeate the life sciences landscape, it is critical to understand its limitations as well as its capabilities. For decades, the healthcare and life sciences industries have struggled to reach marginalized communities and improve diversity metrics in research. For example, one study found that in a global sample of thousands of participants across almost 50 countries, 85% of participants were found to be Caucasian. This is a serious misrepresentation of the global population and can perpetuate health inequities on a global scale. Already, the industry struggles with underrepresentation and accessibility challenges in recruiting patients. Failing to recognize the limitations of GenAI’s current language capabilities will only exacerbate these existing issues.

While GenAI offers immense potential to revolutionize the healthcare and life sciences industry, its current language limitations pose a significant barrier to achieving equitable patient outcomes. By expanding access to multilingual and diverse patient data, increasing digital services availabilities globally and recognizing language variations, organizations can better prepare to lessen the digital language divide. Addressing GenAI’s shortcomings now will ensure that its transformative power can foster a future where healthcare and life sciences advancements can benefit all populations, regardless of language.