Advertisement

Balancing Generative AI Risk with Reward

By on
RerF_Studio / Shutterstock

Have you heard the story about the lawyer who used ChatGPT to conduct legal research and unknowingly cited non-existent cases in a New York federal court filing? The GenAI tool just made them up. Did you know that X’s Grok accused Golden State Warriors Guard Klay Thompson of vandalizing a string of homes in Sacramento after misinterpreting tweets about a game? Or that early last year, Google’s Gemini AI image generator rendered historically inaccurate depictions of humans, such as illustrations of Nazi soldiers as people of color?

Even more unsettling was the news that a teenager took his life “after forming a deep emotional attachment to an artificial intelligence (AI) chatbot on the Character AI website.” The boy’s mother has filed a lawsuit, which noted that transcripts of her son’s conversation with the chatbot show often-sexual discussions, as well as talk of suicide, with the chatbot using phrases such as “That’s not a reason not to go through with it.”

You can see why, in many organizations, there’s as much trepidation as there is excitement about leveraging GenAI. The State of the Generative AI Market report from ISG notes that enterprises were spending on average $2.6 billion on their single largest GenAI use case in 2024, and they were expected to increase their spending on GenAI by 50% this year. 

Approximately 70% of them were using ChatGPT for software development activities. At the same time, Statista has reported that, according to a 2024 survey, 53% of respondents from organizations worldwide stated their main worry about adopting GenAI within their company was that it would open them up to greater risks. 

Mixed feelings about GenAI are all very normal, noted Kira Rodarte, lead data scientist of data and analytics at TriNet, during her “Balancing Generative AI Risk with Reward” presentation at DATAVERSITY’s recent DGIQ Conference. Many companies feel they need to embrace GenAI for competitive reasons, but they question: How do you apply it and not open yourself up to a lot of risk?

As businesses start evolving in their use of this technology and exposing it to a broader base inside and outside their companies, risks can increase. “I’ve always loved to say AI likes to please,” said Danielle Derby, director of enterprise data management at TriNet, who joined Rodarte at the presentation.

Risk manifests “because AI doesn’t know when to stop,” said Derby, and you, for example, may not have thought about including a human or technology guardrail to keep it from answering a question you hadn’t prepared it to be able to accurately manage. “There are a lot of areas where you’re just not sure how someone who’s not you is going to handle this new technology,” she said.

The AI Lifecyle

The ISG report mentioned above found that 37% of respondents were concerned that GenAI adoption would not be done strategically or methodically. And indeed, AI is not something you can just build and forget, Derby explained. 

Enter the AI lifecycle: “It’s actually something that you need to continue to nurture. … You need to consider, to understand, what it does and then how it can be changing based on the data and the interactions that it’s currently receiving after you release it.”

There’s added risk that continues to amplify if you’re not paying attention to what your AI is doing, said Rodarte. For instance, what if ChatGPT changes its model and your prompt breaks? “The finish line kind of feels like it continues to move, but it becomes more of a cycle, right? And you have to keep coming back, checking in, paying attention to it, governing it wonderfully.”

How to get, and stay, ahead of potential risks? Rodarte and Derby offered the following advice:

Problem definition: Start by identifying problem scope, defining aims and outcomes, and gathering relevant data sources.

Overlooking users’ real needs, ignoring data requirements, playing to unrealistic expectations, and failing to involve stakeholders set the stage to not drive business value. 

“Orient your use case into what your users are actually asking for. … Do not go for shiny object syndrome,” cautioned Rodarte. Instead, go for the true pain point. “You want to have that conversation with your business, with your users, and say, ‘Is this actually something that we need to do, is this something that we actually should do, and what is the goal of doing this?’” 

It’s also important that users understand that the model will not always provide the same answer to a particular type of question — AI models are probabilistic and generate responses based on a probability distribution. 

It’s key, then, to pull users along with you all along your journey. “Ideally, if you can build some sort of framework to be able to iterate over these use cases quickly, I think you’re going to see a lot more value,” Derby said.

Critically, you’ve got to consider issues such as data privacy and security. “In the world of customer service, maybe it is easier to have an AI agent go and perform an action for the customer service representative, right? But now you’re giving that AI a lot of access to PII, to confidential information,” said Rodarte. “Where are your risks? Where are the concerns, what happens if that AI does it wrong? What happens if that person inputs the command wrong? There’s a lot of different things to consider, and so you really have to think about not only just in terms of giving access to data,” but about the data management foundation as well.

Data collection: Collect raw data, clean and preprocess data, and identify missing or irrelevant data.

Not necessarily earth-shattering concepts, Rodarte admitted, but when you take into account how to address their components – data privacy and security concerns, bias in data and ethical considerations, data quality and consistency issues, data ownership and consent, and data relevance – “that’s when you go, ‘Hey, this is a framework,’” she said. 

As such, it demands that you work early with your privacy and security teams; audit and test for biases, use broader data sets; establish quality standards upfront and enforce them from the source; and implement data governance with the involvement of your stewards and data owners. Finally, tie back data requirements to the objectives you seek and have robust conversations about whether specific data matters.

“Your data, your quality, what you have, what you’re capturing, ethically, that matters so much more now, because you need data to power any type of AI,” she said. 

Training: Split data into training and test data sets, optimize hyperparameters, and use cross-validation.

Improper data splitting can lead to data leakage, resulting in overly optimistic model performance, which you can mitigate by using techniques like stratified sampling to ensure representative splits and by always splitting the data before performing any feature engineering or preprocessing. 

Inadequate training data can lead to overfitting and too little test data can yield unreliable performance metrics, and you can mitigate these by ensuring there is enough data for both training and testing based on problem size, and using a validation set in addition to training and test sets. 

“I need to go and use synthetic data to be able to mitigate the risk introduced by this,” Rodarte said. You can actually build that by asking someone knowledgeable in your organization about a certain issue to give you a brief list of questions and leverage GenAI to get multiple ways of asking those questions. “It’s a little bit reductive, but it at least starts to build that data set. And then you can start augmenting with real questions,” she said. “You can go and start to iterate on, and you’re building up a synthetic set.”

As for what Rodarte termed the other “data-nerdy” points, she and Derby suggest conducting systematic hyperparameter turning using techniques such as grid or random search to correct for default hyperparameter settings that may not be optimal for the specific data sets, and implementing k-fold cross-validation to assess model performance more robustly. 

Be aware that computationally intensive cross-validation processes can lead to excessive resource use and long training times, so limit the number of folds to balance between computational cost and reliability, and consider using stratified k-fold to maintain class distribution while optimizing computation.

Model evaluation: Use the accuracy, precision, recall, and F1 Score, ROC curve and AUC, and confusion matrix analysis.

The speakers highlighted the risks of inappropriate metrics, noting the importance of using a suite of metrics that apply to the outcome you are seeking. Additionally, they advised analyzing components of the confusion matrix that can lead to incorrect assumptions about model effectiveness to assess where the model is making errors, using domain knowledge to set specific thresholds based on acceptable levels of false positives and false negatives. 

Default thresholds that are not best for all use cases can be mitigated by using ROC and Precision-Recall curves to select a threshold that aligns with business goals and risk tolerance, and considering adjusting thresholds to balance recall and precision as needed for the application.

Interpretability: Consider model transparency, explainable AI and feature importance, and SHAP values.

“Having those lines of communications is a very important aspect of this technology,” Derby noted. To that point, it’s important to use simple, clear, and visual explanations of the model to your audience, balancing simplicity with accuracy. Providing training and education helps avoid misinterpretations that can lead to incorrect conclusions. Using SHAP (SHapley Additive exPlanations) values, which are based on cooperative game theory, offer an easy way to feature interactions that are overlooked and risk incomplete understanding of the model. 

It’s critical to the user experience that you make it clear when someone is interacting with an AI model and what it means. It’s critical to the governance experience that you document everything in preparation for regulatory scrutiny and keep up to date on all data regulations.

“What I always recommend is thoughtful AI,” said Rodarte. “Yes, it’s fun to chase the shiny object. [But] if you want to chase it, try it in a POC. Small, bite-sized things … [and] really think about what is the point of it? What can this bring us? And then use that as your use case to drive it.” 

You don’t need to apply GenAI to every single chatbot, she said. “Thoughtful application of any technology can really drive a lot of change, right? And that’s what we’re seeing – there’s a lot of risk, but when you manage the risk by thoughtful application, you end up with this win-win situation.”

Want to learn more about DATAVERSITY’s upcoming events? Check out our current lineup of online and face-to-face conferences here.

Here is the video of the DGIQ presentation: