Click to learn more about author Kim Kaluba.
The world has embarked upon a new era where the convergence of data and analytics is more important than ever. To best understand the COVID-19 virus and to glean what happens from here, a strong Data Strategy is essential. The data must be fit for purpose and trusted by the decisioning community, so confident decisions can be made and executed.
It Starts with the Data
Data is a foundational element to any analytical, reporting, and decisioning function, and this is especially true during a crisis (like the Covid-19 pandemic). We are all familiar with the various data dashboards showing the current state of the pandemic. The data sources feeding these dashboards come from reporting agencies around the globe. These reports are being used to make judgments that impact the daily lives of the citizens living in their countries and the businesses operating in them.
A data professional reviewing any data reports needs to question the accuracy and reliability of the data feeding the dashboards: What type of vetting has the data gone through to ensure that the data is accurate, consistent, complete, and free of duplicates? Do we have all the data, or are we only getting some of the information? What type of Data Governance practices are in place to ensure that the results being reported can be trusted for decision-making? What is the level of transparency for the type of data being used?
A Data Strategy is the best mechanism to ensure that the data being used will produce accurate analytics and reports, so the correct decisions can be made. The goal of a Data Strategy is to make sure that the data resources are positioned so they can be used, shared, and moved easily and efficiently while ensuring accurate, transparent, and reliable data for analytics and decisioning. The following foundational Data Strategy components can help build trust in a dashboard, whether for a crisis or not.
It’s imperative to identify data and understand its meaning, regardless of structure, origin, or location. Establishing consistent data element naming and value conventions is core to using and sharing data. These details should be independent of how the data is stored (e.g., a database or file) or the physical system where it resides. This component will help identify data that is missing.
Reviewing some of the available COVID-19 dashboards reveals a few missing elements for a complete and accurate account of the virus. Yes, the dashboards show the categories of number of active cases, survival, and mortality rates by country, province, state, etc., but there is often a lack of information on the statistics within each category. For example, can we identify why one person dies from the virus, and another does not? Or if social distancing, home, or state lockdowns are working?
Data professionals should consider what other data is needed to answer additional questions. For example, should we combine the data that is being reported and append with historical non-identifiable health data (e.g., age, gender, country, and city) of the patient who died and/or recovered to determine what the common element(s) are for mortality vs. survival? Should we overlay additional data like pollution level data, distance to medical facilities, type of medical care, cell phone location data, demographic elements, and other third-party data to understand if those factors play a role?
By being able to identify the data in place and reconcile the missing data, we are on our way to understanding what is required to support the best decisions and enable answering deeper and more complete questions.
Storing and Provisioning
The next step is assuring that the data needed is readily available for analytics and decisioning. Data will be shared with numerous other systems, so it is critical to address storage efficiently, in a way that simplifies access. The goal of storing the data is to ensure that data is available and provisioned in a way that allows shareability and reduces the need to copy data. In the case of the current pandemic, to be useful to other organizations, the data needs to be stored in a way that is accessible to the larger community and can be shared and consumed broadly.
Process and Governance
Data needs to be accurately managed to ensure transparency and trust in the analytics process. This confirms that the data has been vetted and adjusted appropriately to meet the need of the data users/community and that it adheres to governance policies. The role governance plays within an overall Data Strategy is to ensure that data is managed consistently and that it provides the transparency and trust needed for decisions. Governance ensures that once data is decoupled from the application that created it, the rules and details of the data are known and respected by all the data constituents.
For such dashboards, a few process and governance considerations are as follows: Is there documentation that outlines the various reporting elements or meanings? If data is extrapolated, how is it being done? Are we using the entire data set or sample? Is the data being processed, enhanced, or standardized, and if so, how?
In summary, a Data Strategy ensures that data is properly identified, stored, provisioned, processed, and governed for analytics and decisioning needs. With the right data, in the right place, at the right time, more accurate and timelier judgments can be made to ensure the successful answers to business problems, or in today’s current environment, address and respond to the spread of a pandemic.