When the number of users for a predictive model grows, it is expected (albeit often wrongly) that the machine learning powered systems will automatically scale to keep up with this growth. If the system fails to scale, processing requirements may outpace performance. Using an example from a LinkedIn article, a sample recommender system fails to recommend the desired list of products or services in a timely manner, which means the customer does not receive the product or service recommendations at the time of purchase.
Though developing a scalable system can pose a serious challenge, shying away from building a scalable system can become a bigger problem and can result in lost customers or unrealized revenue. During scaling, many technical problems like workload issues, memory representation, framework restrictions, resource use vs. performance, and others can surface and stall the production.
Scaling of ML models may mean anything and everything in between training ML models from “humble beginning” to deploying them for “world domination.” Review this Towards Data Science article to sense the journey of training a machine learning model from zero to one million users.
IT industry literature indicates that currently too many machine learning (ML)systems built at scale are dying within the walls of research labs. According to the author of a Venture Beat article, Gartner predicts that 80 percent of AI projects will die in the research lab. Here are some known challenges for building scalable ML systems:
- Lack of proper planning in designing the system
- Inflated and unrealistic goals and expectations
- Failure to operationalize ML models
- Technical issues faced during scaling: workloads, resource use vs performance, or memory handling
- Lack of consensus building between stakeholders
- Lack of focus of development team members
- Scalability best practices yet to be explored
Gain familiarity with unexpected challenges of scaling in this Researchgate publication.
In the article Contemporary Data Scientists: Working Machine Learning at Scale, author Jennifer Zaino noted that Gartner’s Magic Quadrant for Data Science and Machine Learning Platforms, it was mentioned that the “Data Science and ML platform market will be in a state of flux over the next few years.” Two years later, this market is experiencing that foretold flux with ML technologies and tools still waiting for approval and countless ML systems at scale failing during production. This article focuses on Anaconda Enterprise 5.2 platform and its open-source distribution catering to more than six million users building ML for Windows, Linux, and Mac OS X platforms. The AI and ML communities are increasingly opting for low-cost R&D platforms for building scalable ML applications.
Machine Learning at Scale
The author of a Codementor blog post explains that rising net population and the average network speed, has significantly contributed to the sudden explosive growth of data. This translates to high-volume growth of data for training ML models. The author further uses an example from Facebook, where 25 percent of engineers working with training models end up training 600,000 models per month. This figure indicates the rising popularity of automation in Data Management platforms claiming to perform “real-world tasks with human-like (or in some cases even better) efficiency.”
This piece of statistics is sufficient to justify the importance of scaling ML systems if scalable systems have to operate properly. Without planned scaling, even the best of ML systems will fail to deliver results when the user volume grows. According to the author of this post, the trickiest part of scaling is implementing learning algorithms within particular frameworks like TensorFlow or PyTorch. During scaling, the memory representation related to iterative data feeds is particularly challenging. Moreover, resource use vs performance is an additional challenge.
Scalable Machine Learning (ML) Applications of the Future
As you think of next-generation machine learning applications, the success rate of future, scalable ML systems will ensure the future sustainability of “machine learning at scale” as a technological concept. A critical component of future ML applications could be increased workloads, and scaling workloads with growth of users will be of paramount importance. The current emphasis is to find better ways of scaling ML systems. Investments in ML R&D are expected to spike to $60 billion by 2021. On the other hand, ML system implementations have grown four times from 2017 to 2020. The future adoption rate of ML depends largely on the success of the scalability factor. The current ML architecture is neither “elastic nor efficient at scale.” Read this Western Digital post to find out more about the use of GPU for ML processing needs.
Machine Learning (ML) at Scale Use Cases: Apache Spot for Cloudera, Computer Vision, and Big Data
- Cyber-Threat Detection at Scale
A number of years ago, the press release Open Source Innovation Accelerates Cloudera’s Machine Learning at Scale, which announced that Cloudera, an innovative machine learning platform, had made Apache Spot 1.0 available on its platform for “fast, easy, and more scalable cybersecurity machine learning.” Apache Spot is an open-source cybersecurity project, designed to offer advanced analytics to all. The distinct feature of this platform is its scalability. This solution platform has till now provided cyber-threat detection at scale. The community-driven approach of Apache Spot enables organizations to collaborate and detect cyber attacks in a hyper-connected world with the help of ML-powered analytics technology.
- Computer Vision Recognition at Scale
Another prominent scalable ML application surfaced during a ScaledML podcast, where Reza Zadeh, adjunct professor at Stanford University, and co-organizer of ScaledML, talked about the “real-world strategies for scaling ML.” The conversations naturally diversified into hardware and software interfaces for ML, the growth of deep learning, and computer vision recognition.
- Use of Machine Learning (ML) for Big Data at Scale
The last use case is borrowed from the world of big data. For the purpose of illustration, Spark, scalable ML, and MLF can be used to demonstrate the application of machine learning on big data. It has been emphasized that most scalable ML models either live or die during the production phase; production testing is the litmus test for scalable ML.
Here is a presentation devoted to problems surfacing in machine learning models “when data size grows in terms of sample count, feature count, and model parameter count.” This presentation provides sufficient technical depth to the topic, and can serve as a useful overview to scalable ML systems for further study.
Image used under license from Shutterstock.com