A successful data strategy generally requires the use of certain key components. Most businesses have invested in some form of Data Management, but all too often different sections of a business aren’t terribly well coordinated. A data strategy can be described as a dynamic process used to support the collection of data, its organization and analysis, and delivery, to support business goals and activities.
The coordination of different project teams and business departments is a necessity for optimizing the flow of data and maximizing its value.
When problems with the data come up, a data strategy provides a process for identifying the problems so the humans can find solutions. Essential data strategy components provide a system that delivers the best possible solutions to an organization. Data Strategy be considered a roadmap for identifying both current and future Data Management issues.
A data strategy can be used to proactively support goals that promote the business’s growth.
Developing a data strategy begins with identifying the business’s goals. This is followed by identifying the problems that exist within the data environment – evaluating each team or department and selecting achievable goals that will make the data more accessible and sharing easier. Stephen Yu, president of Willow Data Strategy, emphasizes focusing on the future:
“Take a phased approach. The initial steps must be about eliminating pain points, but one should not lose sight of the long-term business goals. This is important as each phase may call for different types of talents and expertise.”
Warning: Developing a data strategy is not a once-and-we’re-done effort. Some of the selected goals will take a while to complete, and others may start at a later date. (It’s common to schedule the completion of goals, which may include the completion of milestones as subgoals.) Additionally, business goals change, which means altering the data strategy to support those changes. Reviewing and adjusting the data strategy to maintain its efficiency should be done on a regular, scheduled basis.
Scheduled data strategy reviews provide the potential to resolve problems before they become serious. The steps include examining the following data strategy components:
- Data integration
- Labeling data
- Data storage
- Data security
- Sharing and selling data
- Data Governance
Gathering and combining data from different sources normally involves transforming the data into a unified, consistent format (typically an SQL format). NoSQL, data lakes, and data lakehouses, by design, do not require a unified, consistent format.
The solution for integrating data taken from multiple sources and using different formats is basically a matter of finding the right software. Fortunately, there are a large number of tools available. Some of the more popular ones are:
- Informatica Power Center: A user-friendly ETL tool that provides extended interfacing capabilities. This tool has useful GUI capabilities and supports drag-and-drop technology from one end of a data pipeline to the other.
- Fivetran: Although this is more expensive than other options, it comes with a variety of unique features the others lack. It offers seamless connectivity, regular security updates, and will replicate applications, databases, events, and files.
- Jitsu: An open-source platform for data integration. It can gather data from external sources using an API and supports several native connectors. Additionally, Jitsu can act as a bridge to Singer, an open-source framework for API connectors.
Using and sharing data within an organization requires establishing ways to identify and communicate the data’s contents. A name or label is needed to locate, process, and update data. Generally, this involves documents and files, and these need to be named. If no name is given, the computer normally defaults to “untitled” document or “untitled” file.
Establishing a consistent system for naming files and documents will make the data easier to find, particularly if there are multiple data users.
It is also important to use metadata as a referencing resource. Metadata is essentially small amounts of data used to locate and describe “packages of data.” Libraries provide a good analogy. When a library patron wants to find a book, they type in the title, and the computer shows the important information – a brief description and where the item is located, the publishing date, and the author’s name.
Metadata provides very similar useful information (or should). While each organization has control of how they organize the metadata applied to their packages of data, generally included are the title, the date created, a brief description, and perhaps the author’s name. Metadata is also applied to unstructured data, for example, web pages, video, images, and audio.
The use of metadata offers a way of identifying and managing unstructured data.
Metadata can be created and assigned through the use of automation, or it can be created manually. Interestingly, efficiency increases with manually created metadata labels. This is because automated metadata tends to be very basic and elementary, displaying only the most basic information. Manually created metadata can supply more useful information.
Metadata supports a successful data strategy.
Data storage is one of the most important data strategy components for organizations that work with data.
However, while most organizations have a data storage system, that doesn’t necessarily mean it is efficient. Generally speaking, smaller businesses don’t give data storage a lot of consideration. As they grow, small organizations don’t develop a plan for managing their data until the need forces them to. This philosophy of waiting until the last minute should be reconsidered. A smooth transition to a new data storage system requires planning, as opposed to the process of desperately scrambling to install a new storage system that may or may not be the best option.
Many growing businesses have discovered that, as their storage needs have grown, storing all their data in a single location simply isn’t realistic (the size of the organization, its data distribution patterns, and the diversity of data sources often make loading data into one storage system impractical.
Having data stored in multiple locations has become normal behavior; however, it is important to provide ways for staff to find and access it.
Data security, while perhaps tedious, is also responsible behavior, which your customers will not only appreciate but demand. Should they discover an organization has leaked their personal information, they will stop doing business with that organization at the very least.
Implementing and maintaining a data security program is key to developing a successful data strategy.
There are essentially two philosophies regarding data security and access to data stored within an organization. One supports the ideal that everyone within the organization should have access to all the data. The other dictates only those with a need to know can access data that is pertinent to their job. Both are a little extreme, with the first making the customers’ personal data accessible to potential criminals and the second blocking the flow of work because staff can’t access needed data.
A middle-road philosophy is more optimal, with the customer’s personal data being restricted to two or three trusted managers, while the rest of the data is available to the staff that work with data. (Access to data projects would generally be restricted to the project team, but that depends on the project and the organization.)
An efficient, streamlined business will ensure there is an easy-to-use process for storing all the data generated by the business, while supporting easy access to the data used by trustworthy staff.
Sharing and Selling Data
Data can be sold or shared. Shared data can be information publicly available on a website for customers, research data shared by academic institutions, or data shared by businesses in the same industry for developing business information. Most shared data falls into two basic categories: data gathered from outside sources (cloud applications, academic institutions, third-party data, etc.) and internally generated data (customer information, purchase details, etc.).
If a business is selling data, that data needs to be prepared and packaged before the sale. If data is going to be sold, or shared, the process should be included in the data strategy.
The question to ask is whether selling the data is worth both the effort and the potential loss of customers concerned about their privacy.
Many individuals are opposed to the sale of personal information for ethical reasons (such as the potential for manipulation or acquisition by people with criminal intent). This has resulted in consumer protection laws in Europe (GDPR), California (CCPA), and Brazil (LGPD). Sadly, U.S. federal lawmakers have provided no significant protections for U.S. citizens.
A fully functional Data Governance system is complex. It is a combination of software and rules for staff when dealing with data. A Data Governance program dictates the policies and procedures that are used to gather, organize, and manage accurate data.
It is used to improve data analytics, and in turn promotes better decision-making and more efficient management. Data Governance also addresses the compliance requirements of GDPR and other regulatory requirements (such as regulations that apply to employee data, financial records, and other legal issues).
If an organization does not comply with the various data regulations, it risks monetary fines and legal action.
The majority of businesses begin using Data Governance to deal with specific problems or improvements (data accuracy, data regulations, improved efficiencies). However, as use and awareness of the Data Governance program grows, staff and management typically start exploring its other features. As this develops, management realizes establishing policies, rules, and behaviors can promote the use of more accurate, higher-quality data.
Emphasizing Business Success
It is important to focus on business goals. Remember that implementing new technology is a way of achieving your business goals. Shiny and new means nothing if it doesn’t help in achieving the business’s goals and increasing profits.
Listing the business goals should be the first step in developing a successful data strategy. This list should not be restricted or limited, but should reflect the ideal, an image of the business you want to build.
Image used under license from Shutterstock.com