Advertisement

AI Data Retention Creates Environmental Stumbling Block

By on
Read more about author Soniya Bopache.

As artificial intelligence reshapes our world, an environmental crisis is building in its digital wake. Data center power demand is projected to surge 160% by 2030, potentially generating up to $149 billion in social costs, including resource depletion, environmental impact, and public health. While most of the conversations around this focus on the energy processing demands associated with AI, another environmental threat lurks within data centers as AI-related data storage quietly accumulates at a rapid pace.

AI models are trained using massive amounts of data, sometimes even millions of records. The volume of output data can also be enormous. Without clear regulatory guidelines for AI data retention, many organizations lack confidence in knowing what data is “safe” to delete. In the interim, they’ve adopted a “save everything” approach, which saddles them with growing volumes of unnecessary storage. That doesn’t bode well for their environmental stewardship or bottom line. 

Global regulators must act swiftly to help address this problem; however, the absence of codified laws shouldn’t be an excuse for organizations to avoid taking practical steps of their own. 

Don’t Let AI-Related Data “Go Dark”

AI training and output data can quickly become dark data when it’s collected and stored but never used. It’s like the attic in a home where things get set aside and forgotten. What’s out of sight, is often out of mind, as the saying goes. By some estimates, more than half of an organization’s data is dark data. 

Filtering dark data, and deleting the information that’s not needed, should become a moral imperative for businesses everywhere. With data volumes increasing yearly and more businesses adopting large language models, the time to act is now. 

  • Start with data mapping and discovery to understand how information flows through your organization. This will show where data and sensitive information is being stored, who has access to it, and how long it’s being retained. 
  • Proactively manage your data, so you can take control of associated risks and make well-educated decisions regarding what can be deleted. It’s also advisable to automate these discovery and insight routines. 
  • If your organization is handling petabytes of data and billions of files, your insights approach should integrate with archiving, backup, and security solutions to prevent data loss and ensure policy-based retention. 
  • Minimize and place controls around data. This reduces the amount of data being stored and ensures the data you retain is directly related to its collection purpose. Classification and compliance tools can provide additional confidence in the deletion of non-relevant information and serve as a cornerstone for dark data projects and companywide compliance. Continue to monitor for ongoing adherence to existing compliance standards.

Focus on Preparing, Not Repairing

Dark data will likely never go away completely, but applying the steps above along with techniques like compression and deduplication can keep it from stockpiling. These steps should ideally be part of an overall data governance framework, which can be ingested into your entire data ecosystem. You can get ahead of problems by tracking data as it’s created and applying controls. It’s always easier to prepare in advance rather than fix things after the fact.

As you create this framework, be sure it’s clear and concise. All effective data management, AI data or otherwise, starts with clear, enforceable internal policies. These guidelines should be flexible enough to adapt to future regulations while maintaining strict controls against unnecessary data hoarding. A risk-based approach should inform all retention decisions, with thorough documentation as to why certain data is kept or deleted. Regular review cycles ensure stored data maintains its relevance and value while helping to proactively identify opportunities for data reduction or archiving. 

To support all of this, organizations must create a culture of responsible data management through comprehensive staff training. It’s the responsibility of every individual within an organization to maintain good data hygiene. Yes, the ultimate owners are compliance and infrastructure teams, but all of us should be asking what we can do to limit the amount of dark data and unnecessary AI-related data storage. 

The Road Ahead

As we balance technological advancement and environmental stewardship, our choices about AI data management will have far-reaching consequences. It’s not simply about more efficient algorithms or greener data centers – it’s about being more thoughtful about the data we choose to keep. We must ensure we’re not building the future of AI on a landfill of unnecessary data. 

Many companies have made significant strides in their decarbonization efforts over the last several years. Unfortunately, some of that progress could be undone in part from AI data storage bloat. By adopting proactive data management strategies, businesses can lead the way in responsible AI usage, even as we await more precise regulatory guidance.