What Are Data Products and Why Do They Matter?

By on

Data products are software in the form of specialty tools and apps that are designed to support data used as a service. They may be as simple and straightforward as a program that converts a dataset into a visualization, or as complex as a machine learning system based on large language models (LLM), such as ChatGPT. What all data products have in common is that they achieve a specific goal through the application of data.

One potentially confusing aspect of the technology is the distinction between data products and “data as a product,” which merges data tools with strategies to meet the needs of specific data consumers, whether one person or an entire department or organization. In contrast, data products serve as the raw material that companies can combine in unique ways to implement strategies to achieve their short-term and long-term goals. They operate at the level of individuals, teams, departments, businesses, and entire industries.

What Is a Data Product?

AI and other burgeoning technologies allow organizations to glean insights from their data assets in ways that maximize the data’s value. Data products serve as the means by which companies convert data into the actions that improve their efficiency, competitiveness, and profitability. Former U.S. Chief Data Scientist DJ Patil coined the term “data jujitsu” in 2012 as “the art of turning data into product.” 

Through the clever application of data elements, data jujitsu allows otherwise intractable iterative data problems to be solved by using the “weight” of the problem against itself, just as jujitsu combatants attempt to use their opponents’ weight to their advantage and their foes’ disadvantage. The standard problem-solving approach of attacking it head-on using various technical expertise often complicates the problem and makes it more difficult to solve.

The goal of data products is to simplify problem-solving by addressing a simple question at the outset: Who wants or needs this product? To answer this question quickly, developers take shortcuts that may make it to the finished version, or be replaced by more complicated approaches later in the process. The key is to start simply, to avoid being bogged down at the beginning of the project.

Components of Data Products

Even the simplest data products are made up of a diverse list of elements that combine to support decisions and solve business problems. These are the eight key components of a data product:

  • Data sources must be reliable, accessible in real time or in batches, relevant to the problem being solved, and in compliance with data protection regulations such as GDPR and HIPAA, as well as with legal and ethical standards.
  • Data pipelines automate any required data conversions (ETL, for example), scale to accommodate growing datasets, include robust error-handling tools and Data Quality checks, and are modular to support configuration changes.
  • Data storage must meet performance requirements, scale horizontally and vertically without disruption, apply encryption and access controls, and be cost-effective while supporting structured, semi-structured, and unstructured data types.
  • Data models and algorithms provide accurate insights and predictions that have been validated using techniques such as cross-validation. They need to be easy for stakeholders to understand, computationally efficient, and easy to maintain.
  • The user interface should be intuitive enough to require minimal user training. It should make use of visualizations and facilitate users’ interaction with the data, including feedback mechanisms and multi-device support.
  • APIs and endpoints require secure authorization and authentication, limits on the number of API calls from each user or system, and sufficient developer documentation. They should support data formats such as JSON and XML to ensure compatibility.
  • Monitoring and logging in real-time allows data products to identify and address problems quickly. Administrators are alerted to performance issues and errors, and audit trails help firms meet compliance requirements. Performance metrics to be monitored include latency, throughput, and error rates.
  • Documentation includes user manuals, technical specification, documentation for APIs, change logs, and compliance records.

Examples of Data Products

The most popular example of a data product may be ChatGPT, the free AI-based tool that answers simple and complex questions in a conversational manner and enters into a dialog with users that allows follow-up questions, admits its mistakes, and challenges inaccuracies. ChatGPT qualifies as a data product because it depends on a very large text dataset, although the system is much more complex than typical data products. 

However, in its current state, ChatGPT lacks one important aspect of data products: accuracy. The data product’s owner is responsible for ensuring both a positive user experience and a trustworthy resolution to the problem that the product was designed to help remedy. This requires best practices in product management, and consistent and reliable access to analyses that support business decisions.

These six categories of data products demonstrate the use of the technology in everyday products:

  • Recommendation engines offered by companies such as Amazon, Netflix, and TripAdvisor personalize their responses to enhance customer engagement and improve conversion rates.
  • Predictive analytics tools include those used by FICO, LinkedIn, and Zillow that identify trends in data and generate forecasts based on advanced data mining and modeling techniques.
  • Data APIs such as Google Maps, LinkedIn Profiles, and IO Weather facilitate the smooth flow of data between disparate systems. Common formats are representational state transfer (REST), Simple Object Access Protocol (SOAP), XML-RPC, and JSON-RPC.
  • Real-time dashboards present data visually and update users’ screens automatically as new information becomes available. They’re applied to monitor inventory, sales, and operational data in support of business decisions. Popular dashboards include Tableau, Microsoft BI, and Zoho Analytics.
  • Personal finance tools include Empower (formerly Personal Capital), Quicken, and You Need a Budget (YNAB), all of which attempt to bring more clarity and confidence to individuals’ financial planning.
  • Wearable health monitoring products such as Apple Watch, FitBit, and Dexcom’s Continuous Glucose Meter go beyond tracking pulse rates, sleep patterns, and other health matters by sharing the information with healthcare providers.

Why Data Products Are Important

Data products benefit data consumers in several ways:

  • They gain insights faster by using pre-built products rather than having to start each project from scratch.
  • The integrity of the data is verified beforehand, so trust is built into the products.
  • Real-time situational awareness enhances the value of data analyses.
  • The ability to respond in real time supports faster informed decision-making.
  • Governance is facilitated by up-front guarantees of Data Quality and compliance.
  • The products make data easy to find and access from diverse systems.

Organizations see data products as the key to greater efficiency and profitability:

  • Data products help sharpen the company’s focus on positive outcomes.
  • They improve the agility of organizations and deliver value incrementally.
  • Reuse of data products maximizes the value of data with very little overhead.
  • Data Architectures are rendered future-proof by the adaptability of data products.
  • Fewer questions arise about the trustworthiness and integrity of the underlying data.
  • Business and IT departments communicate using the same language.

Perhaps the greatest benefit of data products to organizations is their ability to unlock the value of data by serving as the glue that bonds together physical systems, data modeling, and business processes and use cases. They replace the piecemeal approach that many companies take to their data operations while also decentralizing Data Management. This frees the underlying data to be applied on the fly in diverse situations and conditions, with minimal or no preprocessing. 

According to McKinsey, data products allow new business use cases to be implemented 90% faster and total cost of ownership to decline by 30%. They also reduce risk and the time and money spent on governance operations.

Realizing the benefits promised by data products requires adopting an agile approach to Data Management that starts small, releases quickly, iterates, and demonstrates the products’ value. Add a few more capabilities with each release to boost the product’s value incrementally to spur adoption and garner increased investment for new products and use cases. Once data products become integrated with your company’s everyday business processes, the tools will begin to sell themselves as their value becomes apparent to users and managers. 

Image used under license from Shutterstock