Hello! I’m Mark Horseman and welcome to The Cool Kids Corner. This is my monthly check-in to share with you the people and ideas I encounter as a data evangelist with DATAVERSITY. (Read last month’s column here.) This month, we’re talking about the interplay between Data Governance and artificial intelligence (AI). What do we need to watch for while our organizations rush to adopt AI solutions like ChatGPT and many others? What are the benefits? What are the risks? And, ultimately, we’ll see what the Cool Kids are saying.
Artificial intelligence (AI) has been around for decades. It didn’t just spring into existence. What it did do, however, is jump into the cultural zeitgeist in a way that we rarely see technology do. How did this change happen? A lot of the methods and theories that AI folks theorized in the early days of computing became a reality when computing power became readily available to execute on the algorithms put forth by those early visionaries.
The computing power mentioned above enabled the creation of popular artificial intelligence tools like ChatGPT and many others. These tools now raise questions around Data Governance. Your Data Governance team is concerned with compliance, risk, and ultimately the production, use, and definition of data. When using AI, especially a large language model (LLM), to produce or curate content, there are several significant Data Governance challenges.
- Privacy: The data you put into the model can sometimes be used for future training, which means it could be used in the results other users generate. In the last year, there was a breach of an LLM such that conversation data was involved in the leak. If using confidential data, this can be rather negative.
- Accuracy: The term for an LLM making up non-truthful results is called a hallucination. Depending on the model being used, hallucinations can be as common as 40% of results. This requires some usage guidelines for users to prevent quality issues.
- Compliance: Depending on the types of activity that users wish to engage in, the use of ChatGPT could run afoul of various legislative controls, including GDPR, CCPA, HIPPA, and PCI data. This of course varies on the use case but is not outside the realm of possibility. Again, users would need guidelines on using artificial intelligence from their Data Governance team to use the tool without fear.
- Ethics: There have been several stories in the news lately of AI having inherent bias because of its training data. Whether it’s the story of AI white-washing an MIT student to make her look more professional, or facial recognition tools only working well with Caucasian users because the training data didn’t have much diversity. Ethics is something that we need to consider at every step when using any AI solution. An algorithm has no ability to be moral or ethical – it’s up to humans to step in and ensure fairness in the operation of any model.
- Future-Proofing: There are a lot of legislative bodies looking to specifically regulate the use of artificial intelligence, whether it’s an LLM or a machine learning algorithm that operates a recommender engine. A bill that requires that an organization can describe how an algorithm selects based on learning data could potentially be quite onerous to maintain for some organizations. This might not make it reasonably practicable to use these tools in the future if the algorithms in question are a complete black box. An example of potential legislation is the AIDA coming out of Canada.
- Shadow AI: Organizations are struggling with employees using AI tools off the side of their desk without policy, procedure, or guideline support to ensure privacy, accuracy, compliance, and ethics concerns above. Abhishek Gupta wrote an illuminating piece called “Beware the Emergence of Shadow AI” that highlights some of these issues. In short – get something in place and socialize it, because this can happen in your organization.
What are the Cool Kids saying about the interplay between Data Governance and AI? This month, our Cool Kid is Sarah Rasmussen, engagement partner for First San Francisco Partners. Sarah is presenting at Enterprise Data World (EDW) in just a few short days. Her talk, “How Data Governance and Data Management Capability Maturity Should Influence Your Investment in AI/ML Models,” will cover the following:
- Present an AI/ML model deconstructed, a subject area model detailing the most critical and common business, governance, technology, and data-related components that are part of or influence its lifecycle.
- Demonstrate the need for a Data Governance operating model to ensure AI/ML model inputs and outputs are understood and proactively managed. This sets the stage to identify and overcome regulatory and reputational risks and ensure the ethical use of an individual’s data is always at the forefront.
- Explore the surprisingly large ecosystem stakeholders, within and outside the organization, which influence the model’s success and with which model developers must interact transparently.
- Summarize why focusing on maturing its Data Governance and management capabilities to support AI/ML models benefits a host of other enterprise-level use cases.
Remember that you can meet Cool Kids like Sarah at DATAVERSITY events:
- Enterprise Data World – Anaheim, Calif. (Sept. 18-21, 2023)
- Enterprise Analytics Online (Oct. 25, 2023)
- Data Governance & Information Quality Conference – Washington, D.C. (Dec. 4-8, 2023)
Want to become one of the Cool Kids? All you need to do is share your ideas with the community! To be active in the community, come to DATAVERSITY webinars, participate in events, and network with like-minded colleagues.
Next month, we’ll be discussing the rising demand for master data management – why is it in vogue now more than ever? Stay tuned!