Data Modeling has become easier – in certain instances. It has arisen from the nearly exclusive reach of Data Scientists and IT departments to take up residence where it is needed most—within the comforting confines of the business.
There are still a number of caveats involved with this trend, as end users may still need some basic statistical knowledge and initial configuration of data models may require the touch of Data Scientists or IT.
However, the skills shortage of the former has greatly contributed to an emerging Data Management landscape in which, as Gartner observed, “Predictive analytics vendors are trying to reach a broader audience than traditional statisticians and data scientists by adding more exploration and visualization capabilities for novices and business users” (Linden et al, 2014).
One of the most fundamental aspects of those capabilities is a form of self-service Data Modeling (largely augmented by Machine Learning and incorporating elements of Natural Language Processing) that automates the data modeling process, enabling users to focus on their data and the data’s significance.
The automation and reduced complexity of Data Modeling offered by a number of analytics vendors today is largely attributed to Machine Learning. Machine Learning algorithms can determine patterns between previously existing data and future data, to enhance the capabilities of the former to account for the latter. When this technology is applied to data models, users can effectively create new data models from previously existing ones that address specific business problems or use cases. Many analytics vendors utilize this technology in a way that automates the data modeling process. BeyondCore, for example, can create data models that analyze all possible variables in a specific situation. RapidMiner utilizes Machine Learning to base future data models on the results of queries and analytics provided from former ones. In almost all cases (which are starting to include an increasing number of Data Discovery/Business Intelligence options) the data modeling process is done automatically.
There are several types of Machine Learning algorithms that can assist with automated data modeling and other aspects of Data Management in 2015, including:
- Deep Learning: Gartner defines Deep Learning as “an increasingly popular variant of neural networks, with more than the typical two processing layers. The objective of the additional layers is to have higher-level abstractions (that is, features), induced from data that aim at better classification and prediction accuracy” (et al, 2014). Deep Learning is associated with Cognitive Computing and is ideal for sets of Big Data.
- Ensemble Learning: Ensemble learning algorithms effectively aggregate the outputs from a series of predictive analytics models to form a lone output. One of the advantages of this approach is it combines different types of models and helps to coalesce their outcomes; this method is employed by RapidMiner’s.
- Bootstrap Aggregating: This algorithm improves the precision of Machine Learning methods in regression models and other model types.
Finally, some analytics solutions can provide the results to queries via Natural Language Processing, or provide explanations for query results with this semantic-based technology. The coming years will see the propensity for users to ask data related questions via NLP as well.
Frequently, data models for advanced analytics were facilitated by Data Scientists, many of whom not only had to spend a good portion of their position creating and testing such models, but monitoring and adjusting them throughout production for the needs of the business. Although these professionals will still be required to aid users in the initial calibrations of many Data Discovery and analytics offerings, the self-service nature of Machine Learning algorithms will greatly reduce their need for further assistance. Instead of making modifications to their initial models as users implement them everyday and encounter new business requirements or use cases, Data Scientists can instead focus on more complex analytics applications, identify new ways for them to solve business issues, and work with analysts who need occasional assistance. Noted Data Scientist Dr. Kira Radinsky discussed the automated data modeling propensity of SalesPredict’s marketing and sales product for analytics:
“Whether it’s for lead, maturity, or retention, we build a model which automatically gives you, for each potential deal, a score and then explain to you how it was derived and then give you recommendations on how to close the deal. And you know what the cool thing is? All of this is done automatically without a Data Scientist.”
Additionally, Machine Learning advancements in data modeling should help Data Scientists increase their level of production as well, which is vital in an age in which there is still a shortage of these workers. Conversely, those same advancements should also help to incorporate some of the most rudimentary aspects of Data Science and model creation into the jobs and requirements of business users—which is another way in which the newfound ease of this aspect of analytics and its impact on data modeling is helping to address the shortage of Data Scientists.
Another trend that is emerging in Data Modeling and which could see cross-industry prominence is the reuse of patterns for models within various domains. According to a DATAVERSITY webinar on the subject the key to successfully reproducing data model patterns lies in the fact that ideally, “One third of a data model contains fields common to all business…one third contains fields common to the industry, and the…other third is specific to the organization.” According to this rule of thumb, organizations can more rapidly produce data infrastructure for different use cases by utilizing previous models as a starting point, which should greatly decrease the time spent on customizing models for new solutions.
There are several different ways in which the burgeoning adoption of Cloud Computing is affecting Data Modeling—both directly and indirectly. Machine Learning is partly so prevalent in analytics applications today because of the scalability and on-demand provisioning of Cloud resources able to accommodate the massive quantities of data requiring automated data modeling; developments in parallel computing and in-memory technologies help facilitate a similar environment on premises. The current emphasis on Software Oriented Architecture and the many different varieties of Cloud Business Intelligence and analytics facilitated through the Cloud reinforce this fact.
2015 will witness more and more service providers and customers interested in Data-as-a-Service (DaaS). The proliferation of SOA offerings is partly responsible for the perceived value in on demand data and applications fitting specific uses cases and business needs. When DaaS is utilized, business analysts can treat the resultant data as merely another source to add to the integration capabilities of more recent Data Discovery offerings, leveraging the Machine Learning capabilities for data modeling to analyze them.
The reduced complexity of Data Modeling will also be propagated by developments in Platform-as-a-Service (PaaS) offerings pertaining to analytics. Vendors are beginning to issue PaaS offerings that combine several different aspects of analytics including reusable data models and methods for testing, applications and their frameworks, and a number of different algorithms. Most importantly, perhaps, these platforms are also supported by third party vendors that add their expertise and technologies in a cross collaborative effort that increases the ease of several aspects of Data Science, Data Modeling, and analytics—making them less time consuming and more widely accessible.
The Data Modeling process is as of yet not completely automated—at the rate at which technology is advancing and business requirements change, it is possible that it never will be. Still, there is no ignoring the advancements in Data Modeling that are making it much more accessible to Data Scientists and end users alike, particularly in the context of analytics. The implications for Machine Learning that are the basis for many of these developments and some of the more recent algorithms it incorporates will continue to play a direct effect on this aspect of Data Management throughout 2015. Its reverberations will be felt throughout the spheres of Data Science, analytics and Business Intelligence, and Cloud Computing.
Linden, A., Kart, L., Lakshmi, R., Schulte, W.R., Sallam, R.L., Chandler, N., Herschel, G. (2014). Predicts 2015: a step change in the industrialization of analytics. www.gartner.com