A concise definition of data product was coined by DJ Patil as “a product that facilitates an end goal through the use of data.” This includes not only pure datasets, both raw and refined, but also products based heavily on algorithms, models, and similar data-intensive workloads. Leaders in data are leaders in business, and treating data as a product is a proven route to successful outcomes; build trust in data to build business value.
But do we really manage our data as a product? What does that mean for customers? Is a data product really that different from a traditional product? And what can we learn from other more established industries about their products, and apply them back to our fledgling data product space?
A More Established Product: The Car
The car may be the most expensive consumer product we encounter, and also a mature product market – at over 100 years. It’s one worth projecting comparisons against data products as it should be well thought out by now.
Let’s go on a drive to see how we interact with this mature product, comparing to our data product needs:
A product is designed with a set of usage patterns in mind. Taking a Land Rover on a sand dune is fine; a Lamborghini, maybe not. Each was designed for a certain class of usage, purpose, type of user, level of experience – in short, effectiveness to do a job.
The same applies to data. Certain data products might be too complex for basic users; conversely, everyone can work with CSV files, but they won’t drive your autonomous vehicle. We are now offering more types of data experiences to specific types of users, to help us be more effective: Call an API service and get an answer instead of copying and crunching data ourselves; be alerted when data events occur; recommend us a course of action. Can your data product be more effective for your users’ needs?
Opinion matters, and affects what we decide to purchase or use. Understanding who else uses the product and how they rate it is commonplace. We see star ratings based on objective criteria to seek reassurance and trust.
Data should also be reviewed, ranked, and compared. Consumers have an idea if the product is working for them, and can help others either during the selection process or later when in use. Data product reviews can be useful input for product owners too.
Often with purchases you don’t know much about support until later. For major purchases, like a car, you may receive support in the buying process too. Which model, version, and options are right for me?
Once we’re committed, the true customer support experience becomes more evident, especially when we have a question or issue: How easy to contact? By which channels? Are responses quick? Is my issue being tracked? Am I kept updated? Is there a satisfaction rating survey? How do I escalate? Managing the support of data products should be similar to other products.
When renting a car, the agreement specifies the usage: The car cannot be used for a taxi, taken off tarmac roads, or limited to a certain distance or jurisdiction. The consumer is legally tied to specific usage of the product. Usage may be monitored if it matters, with trackers and black boxes, often to manage risk more effectively.
We can now also rent data. In established verticals with mature data vendors, stipulations exist to purge “rented” data at the end of the agreement. When you give your data to a third party there is both legislation and legal terms that specify the allowed usage of that data. Data should also be monitored, so that that agreed-upon usage is not being violated or to understand and control risks.
Product Safety and Security
The safety and security of vehicles have improved and can be largely attributed to standards for objective measurement and legislation.
Safety and security apply to data products too: a self-driving car’s algorithm, personal data in a social media site, smart motorway traffic systems. Unlike cars, however, we can easily copy data products, manipulate, leak, and misinterpret. Different risks and standards, yes, but are you measuring these data risks objectively?
Traceability in automotive manufacturing is expected. Safety and reliability are often drivers. Risks are sizeable and need to be managed.
For data products, the same; financial regulators ask for data traceability and legislation exists. When a report looks incorrect, we look at the data and processes behind it to identify where it came from, where it went, and what happened to it. We might also issue a data product recall, which is hard to do, because unlike a physical asset, it can be copied, manipulated, and sent on again. One might easily argue that data traceability is harder than for physical parts.
Cars, being expensive physical assets that need to be traced, have multiple identification mechanisms used for different purposes: one for manufacturers (VIN), another for governments or insurance companies (number plate). They can be immutable or reassignable.
Identification is a basic function needed to perform others (e.g., traceability). Does your data product have a unique identifier? A product version identifier? What about the code that created the data product — can you identify that accurately? If not, read more about data-as-code.
When you’ve identified your product, then you need to be able to register it, find it, track it, and manage this asset.
Would you buy a car without knowing its specifications? Does it fit in my garage? How far can my EV go without charging? Product specifications are objective measurements and configurations of a (class of) product, available via product catalogs and updated as versions change.
The same goes for data. The data catalog is the key point to identify, compare, evaluate, and trace data products. Do you have product specifications in your data catalog, and what functions are on offer?
Inventories are used for multiple purposes, including financial or supply chain management. It’s important for a car manufacturer to know what products are sitting outside of the factory and for a reseller or a consumer to know what stock is where.
The inventory of data products is no less important, assuming you believe that data is an asset. Managing data as a product means you need to have an inventory to understand what your stock levels are: What data do you have? Need but don’t have? Need to remove? Can monetize? For this you must know the type of products and their specification, hence catalog and inventory are symbiotic.
Cars are registered and can be referenced by many parties for different purposes. Product registries provide a safe place where the question of legal ownership can be resolved. It can also reduce fraud and legal disagreements.
Data products, especially those sold on a commercial basis, need the same. A registry of ownership, rights to use, rights to distribute, and even contractual details can either be in the registry or linked from it. For a DaaS business, selling data products, or for a buyer of data products, a registry should be available; this could be combined with inventory or catalog.
With cars, quality and reliability are inexorably linked, as their primary capability is to provide transport. Cars also have safety ratings that are seen as quality marks.
But what about data? Automated pipelines of data delivery need reliable inputs too, for instance, knowing that an API is always available, or a file will be delivered correctly at a certain time, or that the values in a column have no blanks. From the consumer’s point of view, they want to know that the product has been quality-checked so they can trust it.
This check should be executed before it leaves the factory and also on a regular basis during its lifetime (safety checks, servicing for cars). Both cars and data can be modified and suffer wear and tear!
Cars come with long instruction manuals, as they are complex products. The documentation is available in many forms (on-screen, on-paper, searchable, indexed, etc.).
Our data products also need documentation to explain what terms mean, how to use them properly, the data scope, size, history, performance expectations, etc. Next time you create a dataset for consumption, ask yourself whether documentation is sufficient. In fact, better to ask your consumers. Get it right and the number of support inquiries should reduce.
Product Consumption (as-a-Service)
Lastly, our products are consumed in different ways. Cars aren’t always bought; more and more, we are leasing them, hiring them for a day, hour (Zipcar), or just a journey (Uber). Over time this has become more about Transport-as-a-Service capability and less about the product’s features.
Gartner coined this trend as “XaaS” (Everything-as-a-Service) several years back, and data products are no different. Data-as-a-Service (DaaS) is commonplace, as are marketplaces for their data. We are also seeing the rise of Platforms-as-a-Service to provide capabilities to create and manage data products.
We can learn from experiences in other industries to make our data products better and hence provide greater trust and value to our customers. This age of greater data product maturity is upon us – just remember that the data industry is still a young one.