We live in a data-driven economy, but what lies beneath the data is hidden gold. Metadata, or data that describes data, delivers many benefits for storage and IT managers. Yet metadata is complex, vast, and distributed across hybrid cloud infrastructure. Understanding and strategically managing metadata as part of your overall data storage strategy has become central to optimizing unstructured data management and data governance practices across the organization.
Explaining Metadata for File and Object Storage
Metadata management includes both standard metadata that most storage systems create and track as well as extended attributes that are customized and specific. Standard metadata are system attributes such as when the file was created, who created it, what type of file it is, its size, when it was last accessed, and when it was last modified.
Advanced metadata is handled differently by file storage and object storage environments. File storage organizes data in directory hierarchies, which means you can’t easily add custom metadata attributes. Object storage lacks the hierarchical directory structure of file storage, but you can customize it. For instance, a clinical image file would only contain metadata such as creation date, owner, location, and size. But if it is stored as an object, metadata could include demographics such as patient’s name, age, and diagnosis.
Ideally, metadata leverages both standard attributes and customized tags (by users or systems), which add context. For example, a metadata tag could identify a project, sensitive or PII data, demographics, location, or financial results such as quarterly sales.
Metadata Management Benefits for Unstructured Data Storage
Why invest in metadata management for data storage? Firstly, metadata brings structure to unstructured data, which is critical for search, data mobility, management, and analytics. Below are some additional benefits of metadata management for data storage teams:
- Gain data visibility: Metadata supplies more information on your data, which allows storage teams to understand top data owners, top file types and sizes, and usage information such as last access date. These basic file characteristics are a great starting point to help guide decisions, such as where to store the data based on its business priority or to answer questions, such as, “Who are the top data owners in a department?” As you enrich metadata, authorized users can segment and search for data based on keywords so they can reuse it, delete it, or move it.
- Improve cost savings and decision-making for data storage: Since metadata improves overall visibility and understanding of your data, you can ensure it’s always in the right place at the right time. For instance, set a policy whereby once a research project has concluded, all files tagged with the project name and data are archived – preserving costly, top-tier storage for your latest most active data.
- Improve compliance: By tagging regulated or audited data sets, such as PII, IP, or FDA data, you can search across the enterprise to ensure sensitive files are stored according to compliance rules. You can expand this to include internal corporate policies, such as how to handle ex-employee or financial data or when to confine files for deletion.
- Improve search and workflows for AI/ML: Metadata management is becoming central to AI and machine learning initiatives, helping data owners and stakeholders find key data sets faster and move them to the right location for projects. With AI tools needing massive sets of the right kind of data for a project, the ability to automate this process will become increasingly vital to successful AI/ML outcomes.
Challenges for Managing Metadata on Unstructured Data
Metadata is massive because the volume and variety of unstructured data – files and objects – are massive and difficult to wrangle. Data is spread across on-premises and edge data centers and clouds and stored in potentially many different systems. To leverage metadata, you first need a process and tools for managing data.
Managing metadata requires both strategy and automation; choosing the best path forward can be difficult when business needs are constantly changing and data types may also be morphing from the collection of new data types such as IoT data, surveillance data, geospatial data, and instrument data.
Managing metadata as it grows can also be problematic. Can you have too much? One risk is a decrease in file storage performance. Organizations must consider how to mitigate this; one large enterprise we know switched from tagging metadata at the file level to the directory level.
How to Optimize Metadata for Storage Insights and Savings
While you can benefit from the metadata that your storage systems automatically create, an optimal plan will include curated or refined metadata that adds additional information to your files. Here are some considerations:
- Develop a holistic metadata strategy, which includes rules and guidelines for using, searching for, and customizing metadata. This can ensure that metadata does not get out of control and that it is used appropriately. A strategy may include policies for security and privacy, such as separation of duty. For instance, in a highly regulated business, users can tag the files they have access to, but only certain IT users should be authorized to execute action on the data once tagged. Your strategy should spell out goals and desired outcomes for metadata management. It is a good idea to create a tagging taxonomy and/or metadata catalog so users know when to use what tags.
- Decide on directory-/folder-level tagging versus file-level tagging. The former is easier to manage, as it reduces the number of tags you must create, track, store, and manage. For instance, you can collect all files related to one program within an integrated marketing campaign into a directory and use a Data Management system to automatically tag it as such. However, be diligent on directory contents to ensure that no errant files have landed in the directory and are now being inappropriately tagged.
- Enrich metadata with custom tagging: There are many use cases, from legal to research to marketing to product development, where it’s useful to add additional metadata tags to files. For example, a biotech company running an experiment in Munich and one in Palo Alto could create tags for each of those experiments so that later, a researcher wanting to run additional analysis could select the specific files from the specific location that she needs. Metadata enrichment is easiest using unstructured Data Management software. Otherwise, you will need a database to store and track metadata tags and policies and all tagging is manual. This will require heavy manhours so consider if you have the staff to do it.
- Collaborate with data stakeholders: IT and storage managers don’t typically have insight on the data, but rather managing storage and file access. IT must rely on data scientists and data owners to tag data accurately. You will need a process for collaborative metadata tag management.
- Metadata management automation: It’s highly advisable to use automation where you can, given the volume and variety of metadata today. You can do this with your existing storage solutions, with Data Governance software such as master data management or data catalog software and/or using unstructured Data Management solutions. There are caveats: Storage solutions have some metadata features, but these are limited to the files in that system; you’ll need to maintain and integrate multiple metadata processes and tools across all storage. Further, file storage systems do not allow you to add or edit metadata to files. Depending upon your goals and the diversity of your storage infrastructure, consider a unified solution that can look across all data and metadata to centralize your efforts.
- Use tools that combine queries and tagging: Metadata management tools should not overuse tags and make users generate tags for information already available in metadata. This is cumbersome for users and leads to tag proliferation, tag conflicts, and scaling issues. As well, solutions should provide the ability to build and save queries that combine both standard and extended metadata. This query-plus-tag approach delivers efficient automation, scaling and minimizes manual effort for users.
As unstructured data volumes grow, IT and storage managers need to control the chaos and the costs – and that encompasses the metadata. The optimal metadata management strategy includes close collaboration with business and security teams on Data Governance and analytics needs, tagging tools to enrich the metadata and automation to analyze and track it. With some effort and the right investment, you can reap the priceless benefits of greater cost savings and long-term value from your mountains of unstructured data and metadata.