The shift in the business perception of data has now catapulted Data Management into new heights. Data Science is a core component of Data Management now, but Data Management and Data Science are often seen as two different activities. Working among data analysts, data engineers, and DBAs, data scientists spend their time getting the data infrastructure right for data analysis and competitive intelligence. But, in the growing next-generation data market, Data Management and analytics will be the core differentiators for market success, and so both Data Management and Data Science must work together.
A Forbes post refers to an Everest Group study that states the global Data Management and analytics market will reach $135 billion by 2025. Over the years, vendors in this market have moved from a function-to-process to platform orientation. In platform orientation, data is no longer viewed as a byproduct of business processes, but rather the nerve-center of the business.
Data Management vs. Data Science: The Fundamental Difference
The Data Management function of an organization is in overall control of the enterprise data acquisition, storage, quality, governance, and integrity — thus overseeing the development and implementation of all data-related policies within that organization. However, the Data Management team only manages the data assets; it does not usually get involved in the core technical applications of the data. The Data Management function owns all the data. In the webinar Data Management vs Data Strategy, Peter Aiken, talked about “prioritizing organizational Data Management needs versus Data Strategy needs.”
On the other hand, the Data Science function in an organization conceives, develops, implements, and practices all “technical application” of the data assets. In this sense, the “technical applications” imply the science, technology, craft, and business practices involving the enterprise data.
The Data Science team never owns any data; they simply collect, store, process, analyze the data — then report data-driven outcomes to the rest of the organization for business gains. The data scientist is considered an expert on Data Science and associated technologies, who relies on highly specialized knowledge (knowledge of statistics, computer science, AI and so on) for advising the enterprise on data-driven practices.
In actual practice, the Data Science function is under the Data Management function in the organization. The Data Science team brings a set of core technical skills to the organization to implement best practices, as set up by Data Management policies, procedures, and guidelines.
Data Management Practices vs. Data Science Practices
With data rising exponentially in volume and complexity, Data Management has become one of the most important aspects of business functioning. Data Management practices involve setting up of data-related policies, procedures, roles, responsibilities, and stringent access-control mechanisms.
A well-structured Data Management strategy, which focuses on Data Governance for maximizing business value, is now a central theme of discussion among business leaders and operators. The Data Management team in an enterprise conceives and develops all the policies.
The data professionals in the different parts of an organization are responsible for implementing and following all policies and guidelines in their daily data-related work. Data Governance has been identified as a core component of Data Management, as explained in Data Management vs. Data Governance: Improving Organizational Data Strategy.
In the Data Science world, the strategic policies, procedures, and guidelines play a major role in the implementation of the data technology projects, although none of the management roles are directly present at this stage. In other words, the organizational data strategists conclude their work by shaping the policies, procedures, and guidelines for managing data; then it is the data scientists’ or other data professionals’ duty to adhere to the policies and guidelines to ensure that the organizational-data-strategy blueprint is intact.
Data Management strategists will also think about possible violations and penalties in order to oversee the implementation of the enterprise Data Strategy through the use of controls.
What the Data Scientist Should Know about Data Management
Towards Data Science states that several recent technology movements have required data scientists to rethink Data Management practices for advanced analytics. These technology movements are:
- Reduced cost and rising capacity of data storage
- Rise of IoT devices with streaming data
- The reinvention of data lakes to store and analyze multi-type data
- Big data analytics
- Use of machine learning models
With the above taking center-stage in modern businesses, the data scientist now faces the challenge of building the right governance-enabled data infrastructure to conduct advanced analytics and extract value-added insights.
Augmented Data Management: Relieving the Data Scientist
In a typical augmented Data Management system, five core Data Science activities, namely data integration, Data Quality, Master Data Management (MDM), Metadata Management, and Database Management Systems (DBMS), are fully or partially automated through tools.
The data scientist is relieved of the “drudgery of data preparation” through the use of advanced AI, Ml, or analytics tools. Typically, about 80 percent of a data scientist’s time is spent on preparing data for analytics; these tools remove that time-consuming engagement — leaving ample time for complex analytics work, which may include model development or data interpretation. Augmented Data Management featured as one of Gartner’s Top 10 Data Analytics Trends for 2020.
The Role of Data Regulations in Data Management and Data Science
The emergence of data regulations such as General Data Privacy Regulations (GDPR) and CCPA has added a new dimension to existing Data Management practices overlapping Data Science. The new regulations offer better governance mechanisms, especially in the areas of data privacy, data security, and ethics, but complicates the AI-powered Data Science platform. Now, the data managers have to not only think of implementing strict controls for data privacy, security, and ethics, but they also have to worry about the impact of advanced technologies (AI, ML) on Data Governance.
In the new world of regulation-centric Data Governance, Data Management, and Data Science practices, these will remain parallel activities, but will intersect at several instances.
The net result of such collision? Vendors and service providers will merge, acquire, and integrate.
From a strictly technical standpoint, Gartner has laid down the following observable shifts in enterprise Data Management and Data Science practices:
- Learning by doing
- Business information architecture
- Thinking of a data hub for enhanced Data Governance
- To centralize or de-centralize and the new CDO role, whether it’s Chief Data or Chief Digital
How Do Data Management and Data Science Align?
In an ideal business scenario, Data Management and Data Science practices align to get the best results. So, how can the two practices align?
- Through mutual agreements on preserving Data Governance guidelines
- Through better understanding of how and where Data Management and Data Science overlap
- Through having a well-structured Data Science framework in place, so that junior data scientists can get the job done
According to a discussion on Quora, Data Management focuses on well-governed data collection and data access. Data Science focuses on deriving strategic business decisions from data analysis. The absence of Data Management indicates the risk of “Data Science delivering bad analytics due to poor quality or inaccessible data.”
Image used under license from Shutterstock.com