Addressing Edge Computing Needs with Advanced Data Storage

The number of companies using edge computing is rapidly growing as real-time insights are becoming more important to business success.

Edge computing “uses a network of microdata servers to process and store Big Data locally, taking the concept of a distributed architecture to the next level.” It is meant to help decrease latency problems, allow businesses more flexibility with storage and analytics needs, expand IoT effectiveness, and to “screen incoming information, processing useful data on the spot, and sending it directly to the user.”

According to Gartner, “edge computing is a solution that facilitates data processing at or near the source of data generation.” Sensors or devices embedded in cars, tractors, or factory machinery, for example, collect data at the “edge” as part of a decentralized extension of a computer network.

This collected data is sent to a location near where it’s created rather than a centralized cloud, where the data can be used for analytics. The process of sending that data to a CPU or GPU is energy intensive, and companies are concerned about the energy and hardware required to get those insights. Of respondents in a 2018 research study entitled Challenges with Deploying Edge Infrastructure, 73 percent said that reducing server and storage power consumption was a priority, and 81 percent are working toward minimizing server and storage footprints.

The Bottleneck

Scott Shadley, Vice President of Marketing at NGD Systems, said that, historically,efforts to address the growing need for power have been focused on making memory or the compute faster:

“You started getting multi-core processors. You got into the Xeon/AMD battle with Intel and AMD, you get in-memory computing where people are adding flash to DIMM so that you can have all this done.”

Storage was an afterthought, even though that is where the data resides. Data volumes have become so vast that the ability to move data has become a challenge. No matter what architecture is used and how much money is spent, “You still have to move it,” Shadley said.

Storage media was formerly the slowest component of storage infrastructure, but with the advent of flash, it has become the fastest. George Crump, Chief Steward at Storage Switzerland, said:

“The interface between the flash controller and each NAND chip, known as the common flash memory interface (CFI), has significantly more bandwidth than the PCIe interface used by NVMe flash drives. While networks will get faster, they will still always be one of the major roadblocks for organizations looking to optimize NVMe performance.”

In a recent interview with DATAVERSITY®, Nader Salessi, the Founder and CEO of NGD Systems said that bottlenecks are a fundamental pain point. “It’s everybody at 5pm trying to get into a two-lane freeway at the same time.” Respondents in the Dimensional Research study said that compute-storage bottlenecks were among the top three barriers to using AI, machine learning, and real-time analytics, and 54 percent of respondents reported that bottlenecks occur at 10 terabytes or less.

Organizations deploying IoT are looking to 5G as a way to move more information, but it really isn’t, Shadley said. “It’s higher bandwidth, but we’ve already saturated that bandwidth with trying to move all the stuff (Data) around.”

NVMe storage emerged because SATA and SCSI had many legacy constraints related to spinning media that the SSDs using those same interfaces couldn’t overcome. Early PCIe drives could be placed in a server or a platform, and provided speed, “But then you had to augment it with a whole bunch of other storage,” he said. In the last two years, NVMe use has expanded into use throughout the system. In response to the resulting bottlenecks, NVMe over fabrics and other solutions emerged, “But none of that ever solves the problem that the data itself was not being addressed more effectively.”

Computational Storage

Rather than beefing up the bandwidth, computational storage technology addresses the bottleneck issue from a different perspective. “Instead of moving the data, we moved that concept of isolated compute into the storage device,” said Salessi. Storage devices can now do analysis and computation inside the drive, in parallel with each other, and because data is no longer being moved, the process is a lot more energy efficient.

Shadley added that these new technologies are at a price point where people are able to pay for them, especially with the amount of data being generating that is creating the actual need.

Computational storage puts compute capabilities directly on storage to enable data to be processed in place where it resides, solving the issue of flooding a server’s PCIe bus. By bringing intelligence to the storage drives themselves, the system’s overall throughput is increased.

In addition to facilitating faster processing, this also enables the host CPU to be better utilized and to scale across a larger number of workloads — thus cutting costs for the enterprise, according to Krista Macomber, Senior Analyst, in a post on Storage Switzerland. Key benefits of computational storage are related to the gravity of data. So you can perform faster sorting of data, analytics or searching within a reduced power envelope, and with a reduced demand for server dynamic random-access memory (DRAM) and network bandwidth resources.

How it Works

To illustrate how computational storage changes a simple search, Shadley tells an application to find all the pictures of cats residing in a centralized processing system:

“That one CPU is going through and trawling the entire storage footprint. So, you’re loading memory up with information from storage, searching it, getting the nugget you want, storing it off, flushing your memory, and doing this repeat process. And it just takes forever.”

In contrast, computational storage takes that application intact as the user wrote it and puts an instance of it inside the quad-core processor, where it is executed in place on every drive. The host is still responding to a call to pull information from storage, “But the drives simply say, ‘Here’s the pictures you really want. Everything else I’ve got on me, you don’t need today.’” Searching the device locally provides a faster, more efficient response. Drives that don’t contain a picture don’t return any data, and the Host sees the output or results at the application level, he said:

“It’s very easy to implement, and that’s one of the biggest benefits you get from it. So if I want to do a cat picture search, or to run a Hadoop instance, or do compression — all those can be restarted, stopped, or rerun — whatever you want to over the course of the life of that product.”

Shadley considers NVMe the interface of the future for SSD’s:

“If we’re going to put a product in the market, it’s got to be on the interface that makes sense. We took an innovative and patented view and said we’re going to pass information over that NVMe bus over TCP packets or TCP/IP connections and allow that transfer of information.”

Typically, improved performance comes at the cost of increased power consumption. With computational storage, as the query per second improves, energy consumption decreases because there’s no need to move the data. “With computational storage query per second performance has improved by 600 percent and energy consumption has been reduced by 300 percent,” said Salessi, as it relates to a specific customer example that has been published and released.

SNIA Members Collaborate to Create Standards

Shadley talked about the value of being part of the Storage Networking Industry Association (SNIA), a nonprofit global organization dedicated to developing standards and education programs to advance storage and information technology. The SNIA comprises 185 member companies, and 50,000 IT end users and storage professionals worldwide. A globally recognized and trusted authority, SNIA’s mission, according to their website, is to lead the storage industry in developing and promoting vendor-neutral architectures, standards, and educational services that facilitate the efficient management, movement, and security of information.

Shadley said there are computational storage solutions from other vendors that do different things for different customers, but members don’t consider each other as competition because they are working together to define the computational storage ecosystem as a market segment. “We’ve now got industry events like Flash Memory Summit, where we have an entire day on the concept of computational storage,” he said.

One of the advantages he sees is that the working group is not just for vendors, so a variety of perspectives are at the table. “We’ve got consumers and integrators, and the actual code writers, so we have someone like RedHat involved and Oracle and VMware.”

NGD Systems

NGD’s goal in bringing computational storage to market is to reduce the amount of data physically moving, especially when time is critical. Regardless of the architecture used, computational storage provides a way to find a needle in a haystack, and by only moving what is needed, saves vast amounts of bandwidth, cost, and performance, said Shadley. “And that’s really why it’s unique.” Storage hardware has always been something companies are required to buy, he said, “We’re saying, ‘Well, since you have to buy it, let’s make it do value-added work for you.’”

Computational Storage Vocabulary

From the SNIA dictionary online:

Computational Storage Device (CSx): A Computational Storage Drive, Computational Storage Processor, or Computational Storage Array.

Computational Storage Drive (CSD): A storage element that provides Computational Storage Services and persistent data storage.

Computational Storage Processor (CSP): A component that provides Computational Storage Services for an associated storage system without providing persistent data storage.

Computational Storage Array (CSA): A collection of Computational Storage Devices, control software, and optional storage devices.

Computational Storage Service (CSS): A data service or information service that performs computation on data where the service and the data are associated with a storage device. The Computational Storage Service may be a Fixed Computational Storage Service or a Programmable Computational Storage Service.

Fixed Computational Storage Service (F-CSS): CSS that provides a given function that may be configured and used. Service examples: compression, RAID, erasure coding, regular expression, encryption.

Programmable Computational Storage Service (P-CSS):CSS that is able to be programmed to provide one or more CSSs. Service examples: this service may host an operating system image, container, Berkeley packet filter, FPGA bitstream.

Image used under license from Shutterstock.com

LISTEN NOW: MY CAREER IN DATA PODCAST

Data Topics

Addressing Edge Computing Needs with Advanced Data Storage

Leave a Reply Cancel reply