Yet, many enterprises may be concerned that they won’t have the ability to keep all their potentially valuable data assets around, organized, and protected as information volumes explode. With companies hopeful of exposing their Machine Learning models to different or new data to support their evolution and drive increasingly reliable results, they come face to face with the fact that they don’t know a priori which information could be most important to their models, he noted.
“We see that people are saying that if that’s the way things are going, we want to retain every piece of data we can,” said Brockett. Critically, that includes lots of ever finer-grained unstructured data. “What that means is that the scale of the problem people are dealing with today is very different than ten years ago.”
Recent research conducted by StorageCraft revealed the worries of IT decision makers on these fronts. 43 percent of respondents said they are struggling with data growth now and believe it is going to get worse.
Organizations well outside the Fortune 500 realm that used to be gigabyte scale data storage shops are now petabyte-scale storage shops, facing challenges they never expected, said Brockett. More and more businesses will need high-performing and scalable systems that intelligently store and also protect massive quantities of data over extended periods.
According to StorageCraft’s research, though, 51 percent of respondents know they would benefit from more frequent data backup but are unable to do so because their existing IT infrastructure doesn’t allow it. More than half are not confident that their organizations’ IT infrastructures can perform instant data recovery in the event of a failure.
In many cases, especially among smaller companies, dealing with these issues has become the function of the person Brockett refers to as the “accidental storage administrator.” Such individuals are in search of ways to bring together, and seamlessly assign data that will be critical to efforts like Machine Learning and Data Analytics to primary data storage, and less critical data to secondary data storage.
They want help in how to handle backup storage in support of improved disaster recovery and to meet compliance requirements around standards like HIPAA and GDPR, “without having to have a PhD in storage,” he remarked.
Businesses units in larger companies often must address these issues, too. They may enter into a data intense project thinking they can manage it without a dedicated IT team, he said, only to be flummoxed by unexpectedly rapid data growth. “Enterprises are chasing down some of the very same problems you might see in the mid-market,” he noted.
What all afflicted parties should be able to draw upon to realize those goals is intelligent next-generation data storage technology that adds intelligence and analytics to the software stack to deliver self-managing and efficiently scalable environments.
Importantly, according to Brockett, taking the route of buying a big, traditional SAN system and amortizing it over a few years isn’t a good model to follow anymore. As an example, one of StorageCraft’s customers is running an IoT application with myriads of sensors collecting a multitude of data that it wants to leverage for machine learning analytics.
“It’s not like you can look at a project like that today and know how big the data sets are going to be five years from now. Data might grow between 80 and 120 percent a year, but you can’t take that risk and buy all that storage at once. You need something scalable and intelligent.”
The solution also shouldn’t “require a forklift upgrade” if data growth expands even beyond predictions. “Companies faced with huge growth don’t want to change out their entire infrastructure two years down the road,” said Shridar Subramanian, the VP of Marketing and Product Management at StorageCraft during the same interview.
Indeed, dealing with the scale of data that may end up populating that infrastructure may be,
“Beyond what humans can think about in a coherent way now,” Brockett said. “Terabytes and petabytes of information boggles the mind and it’s hard for people to get their heads around organizing principles for that.”
No human, he remarked, can figure out what is or isn’t important when file systems host tens of millions of files, many of them not of the structured data variety that programmers can metadata-tag to a priori identify as important or unimportant pieces of information, and establish relationships between information and projects and categories.
As more and more data is sourced from “fuzzier” places – and not necessarily tagged or well-controlled – the default becomes treating everything with the same degree of importance.
“When you think about how you could move less important information to less expensive data tiers or figure out what is critical to the business to have a solid disaster recovery plan around it, people can’t catch that with any level of reliability,” said Brockett.
To that end, StorageCraft is putting R&D dollars into figuring out how to apply intelligence – and specifically Machine Learning – around how to actually manage that information:
“That’s about Real-Time Analytics done by the storage system itself for optimal placement and protection of any element of information in the data set,” Subramanian remarked. “That’s going to be the only way to keep up with this in the future.”
Such capabilities will complement existing technology that starts with an easy-to-use scale-out data storage appliance that lets users simply add any number of drives in any capacity, growing the storage pool with no configuration or application downtime. It’s backed up by a multi-tenant Cloud-based Management Service that dynamically protects itself against the fallout from issues like disk failures, sporting the ability to grab very fine-grained snapshots from computer systems and restore them with a high degree of reliability across physical and virtual environments.
Comfortable with Content Addressability
Buyers today want the company to help bring together primary, secondary, and data protection storage in a strategic way, Brockett explained: “They know that they have multiple islands of storage that they lack the ability to do analytics across, and they need a simple and easy to manage place to get that going.”
So, getting in place a smart, scalable, and unified data storage infrastructure is the crawl that leads to the walk, which is feeding Machine Learning algorithms enough data to make sense from an analytics perspective.
“Run is where you get to a full-on intelligent system where Machine Learning takes over a lot of the grunt work that humans do today. We’ve got to get to the point where being a storage administrator is articulating sensible policy and letting an AI system implement it for you.”
StorageCraft’s arsenal is prepared for “moving higher on the assurance curve while reducing the burden of administration,” Brockett said, noting that its technology is built on a content-addressable storage system that lets it search through huge amounts of storage in a fixed amount of time. “When you think about what ML and AI guys have to do to separate the wheat from the chaff, it’s going to be based on content, so that’s an advantage.”
As Brockett and Subramanian see it, any company contemplating their future data ecosystem should start preparing for that by breaking things down into digestible chunks.
“Don’t try to pretend you know what the world brings for you next year,” said Brockett. “Instead, say here’s how you can plan an architecture, starting small with something that scales with you over time. Don’t fall for the idea that you need to boil the ocean with your first step and spend a ton of money on storage technology today. Be nimble and flexible and buy into something scalable that requires a relatively small investment now and that you can develop over time.”
Photo Credit: Tommy Lee Walker