Streamlining the Production of Artificial Intelligence

By on

People often think the algorithms used for Machine Learning (ML) are the most important factors for developing a successful ML system. However, shrewd Artificial Intelligence (AI) and Machine Learning systems in production (managing the data at all stages, with multiple models) have much more impact on the success of the model than the specific learning algorithm. In their book AI and Analytics in Production, Ted Dunning and Ellen Friedman describe how organizations can get their AI systems into production and delivering value.

They wrote:

“In addition to the platform and application orchestration technologies, you will need an architectural design that simplifies logistics, supports multiple models and multiple teams easily, and gives you agility to respond quickly as the world (and data) changes, as indeed it will.”

Database storage has been determined by specific processes that assured accessibility, security, and accuracy, however, increasing amounts of unstructured data and the increased use of data lakes has caused significant problems in Data Management. Creating a large-scale data system that works, and getting that system to work during production, are very different things. The value of Big Data becomes real after data-intensive applications are put into production.

Big Data Fabric

The new term Data Fabric is rather loosely defined at present and expresses a need rather than a specific concept. Big Data Fabric describes a system providing the seamless, real-time integration of data coming from multiple data silos and a variety of data sources. Many systems designated as Big Data Fabrics focus on Hadoop, although integrating the data with a non-Hadoop storage system is also an option.

Organizations have been struggling in recent years with integrating all of their data into one single, scalable platform, typically for purposes of Big Data research. Data can easily become relegated and isolated in silos, becoming stagnant and inaccessible. And legacy systems make it even more difficult to access the data. These issues typically result in lower productivity and efficiency. Additionally, silos are not the only things separating data. Modern technologies, such as the cloud and data centers can inadvertently divide data into separated clusters.

The issues influencing Data Fabric have become more complex as new data sources have opened and diversified. The integration of data is a problem, because data coming from diverse operations has often been held in discrete silos. Organizations may need to combine data from data warehouses, data lakes, cloud storage, transactional stores, machine logs, application storage, unstructured data sources, and social media storage. Data lakes are proliferating as interest in cloud storage and the IoT grows.

In a recent DATAVERSITY® interview, Ted Dunning, the Chief Application Architect at MapR, said:

“You’re going to want a data system that connects from edge to center, and cloud to cloud, to all of your premises, and to small computers. When you hook data systems together like that, you’re forming what is called a ‘Data Fabric.’ Now, you want one that actually works. One that gives you all of these APIs both magically, automatically, and moves the data as necessary so that you feel like you’ve got one copy of it. The system may be replicating it, or caching it locally, or all kinds of tricks like that. That’s what the Data Fabric should do for you. It would make you feel that you’ve got one system that goes from sea to shining sea.”

A Production Ready Culture

Generally speaking, software is “production ready” after being thoroughly tested and assured it is compliant with the client’s needs. Code undergoes performance and load testing to ensure it can handle a large number of users. After the code (or product) has been rigorously tested in all possible ways, it is production ready, and can be used in an actual production environment, said Dunning.

Good Business Intelligence and quick decision-making abilities require access to high-quality data, and as quickly as possible. The challenge of developing Artificial Intelligence requires significant testing and training, remarked Dunning. It is possible to process millions of transactions in real time, while simultaneously showing offers and advertisements to the appropriate customers. A properly trained AI can initiate real-time repair alerts, warning technicians before machinery breaks down.

The MapR Data Platform, said Dunning, offers excellent data access performance using an open approach that supports AI and Deep Learning workloads. The NVIDIA DGX is an AI supercomputer, designed for the novel demands of Deep Learning. The strengths of their platform include:

  • Speed: The platform can read at speeds of 18 GB/s and can write at speeds of 16 GB/s. This platform offers speeds 10 times faster than traditional GPU-based DL systems.
  • Future Proof Architecture: Clients can leverage this platform for additional DL workloads as GPU technologies evolve.
  • Flexibility: Clients can choose from several multi-tenant data access combinations.

Dunning commented that:

“Production ready Deep Learning provides some significant advantages. You can have an advanced system or a simple system, and the Data Management is often a comparable challenge. A lot of people break down the Data Management to just feature extraction, but it’s actually a whole lot more than that, because if you’re going to be putting these models into production, then you’re taking on a burden of building a reproducible system, something that you can build over and over again, and get pretty close to the same results. That is not handled well, at all, by the conventional sort of networks.”

Preferred Habits in an AI Development Team

It is believed by many industry experts less than 20 percent of all Hadoop-based systems are actually in production. Dunning said MapR has identified the common approaches used by many AI research teams and discussed how others can use them in bringing AI and other analytics systems into production. The approaches used by these teams include:

  • The Embrace of Multi-tenancy: Strictly and securely insulating separate tenants, while still providing shared access to data.
  • Simplicity and Flexibility: If a design has a lot of workarounds, it is a warning the architecture and technology are too conflicting and too complicated.
  • The Data Fabric: Data Fabric isn’t a purchased product, but a system of data built by the data owner that is assembled and integrated from multiple sources.
  • Use Kubernetes to Orchestrate Containers: Kubernetes is an open-source container orchestration system for managing their deployment, and considered the leading tool for this process.
  • Extend Applications to the Clouds and Edges: The platform should reach across various data centers on location or in the cloud and be able to interact with Kubernetes across these geo-distributed locations.
  • Include Streaming Architecture and Streaming Microservices: Streaming can provide the base of an overall architecture, offering significant advantages, such as the flexibility of microservices.
  • Develop a Production-ready Culture: Good social habits synergistically improve the chances of success during any project. An overall data-awareness within the culture keeps the focus. Support for the freedom to explore, experiment, and innovate.

Recognizing the importance of promoting innovative behavior, using the right Machine Learning tools, and maximizing distribution will give any organization a boost in efficiency. Promoting these production behaviors will help AI teams become more successful and get into production on large scale systems. While algorithms are important, streamlined and efficient AI and Machine Learning systems of production have much more impact. Dunning closed by saying that:

“The fact of life is that a year from now, you’ll be using some different Machine Learning tools than you are right now. If you’re on the ball, you’re probably using more than one now. You may not be pushing more than one kind in production, but you should be evaluating four or five, to make sure that they don’t do a whole lot better on your problems and even for problems that are pretty well-defined, like machine translation.”

Image used under license from

Leave a Reply