WANT TO STAY IN THE KNOW?
Get our weekly newsletter in your inbox with the latest Data Management articles, webinars, events, online courses, and more.
Click to learn more about author Mike Lamble.
In 2017, the term “Data Scientist” was LinkedIn’s fastest growing job title; yet, in the same year, McKinsey reported that less than 10 percent of Analytic Models that are developed actually make it to production where they can deliver ROI. The bottleneck lies in what industry insiders call “the last mile”, which is when models are moved into production after they’re built and trained. As we look ahead to the New Year, I predict 2019 will be the year the Data Science industry makes huge moves toward removing last mile challenges by making progress on five key fronts:
- IT will prioritize model productionization.
A Wall Street Data Scientist recently told me, “it took two months to build and train a model, but six months to deploy it.” While this is the norm, the deployment problem felt by Data Scientists has not been a priority for most IT departments. Production IT is a complex, fast-moving, high volume, governed universe where Data Scientists neither make nor necessarily understand the rules. For models to become embedded into real-time workflows – e.g., credit decisions, pricing, offer management – Data Scientists’ delivery processes must meld with production IT processes in a manner that scales.
- “Model Deployment” and “Model Management” will become the buzz but it will also lead to confusion.
Over the last few years, venture capitalists have invested bullishly into last mile solutions, but 2019 will be the year that vendors and VCs begin to see big returns.
However, for enterprises, this will lead to confusion as they struggle to compare footprints and offerings. When trying to make sense of vendors’ various capabilities and claims, enterprises should ask pointed questions to determine the best approach, including:
- How does it abstract models from data, applications, and infrastructure?
- Does it limit model development tool choices?
- What phases of the MDLC (model development life cycle) does it apply to?
- What monitoring tools come out-of-the-box?
- Is it a toolkit or a platform?
- How does it enable model re-use?
- How does it enable collaboration between Data Scientists?
- Is it Cloud native?
- Does it ensure reproducibility?
- Data Scientists will get help in the form of ModelOps support.
Most Data Science organizations are missing a unit: a support team specifically for Data Scientists’ models, aka a “ModelOps” group. Absent a ModelOps team, the workload defaults to Data Scientists. The work is part IT and part Data Science operations. In 2019, enterprises will begin to fill this organizational gap, in the form of a new job title, the “Analytics Engineer” whose role will be to glue the IT and Data Sciences objects together. The ModelOps team will also bridge the Data Scientists’ model development life cycle with IT’s DevOps processes.
- Enterprises will begin to centralize model deployment and management.
It’s hard to imagine, but there was a time when Data Integration via a Data Lake or Data Warehouse wasn’t a given. 2019 will be the tipping point for enterprises’ recognition of the need for a shared services solution for model deployment and management. Making this work at scale (i.e. supporting many internal data science clients and hundreds of models, minimizing deployment time and compute costs, ensuring governance, meeting SLAs and compliance requirements) is a sophisticated technical and organizational challenge that needs be tackled only once.
- Consensus will emerge on best practices for model deployment and management.
2019 will find a critical mass of vendors evangelizing similar design patterns that will enable enterprises to converge on views of best practices. For example, microservices-based architectures will be used to support diverse model development tools and frameworks and to future-proof against changes in application, data, and infrastructure. Solutions will focus on meta data logging and supporting reproducibility and traceability; version controlling and CI/CD-compliance; API management; supporting collaboration and re-use of models; performance monitoring dashboards; API wrappers; et al. Bottomline? Vendors will be competing more on solution scope than design patterns.