Loading...
You are here:  Home  >  Data Education  >  Big Data News, Articles, & Education  >  Big Data Blogs  >  Current Article

The Challenge of Data-Driven Security in the Petabyte Era

By   /  January 28, 2015  /  No Comments

by James Kobielus

What keeps security professionals up all night? It’s worrying about the things that might be slipping through the cracks.

In an online world of extraordinary complexity, it’s safe to assume that something somewhere is always slipping through the cracks. As I stated in a slightly different context here, the security devil’s in the details. Complex networks are nothing but an endless devil of fresh details.

Once your systems have become too complex for any single human to track closely, the vulnerabilities mount and risks threaten to spin out of control. In a world where cloud computing is ramping the complexities up to unimagined new levels, how can security professionals stay on top of it all? How can you track all security-relevant components, entities, and interactions at all levels of abstraction? How can you combine historical, current, and predictive analyses into a single view of intrusions, attacks, infections, and other events of concern? How can you mine this data to look for security-relevant patterns and events that you can proactively protect your systems against in the future? How large is all this security-relevant big data likely to grow? How sophisticated are the analytics algorithms and tools you’ll need to buy or build in the coming era in which petabyte data clouds will be commonplace?

Being able to collect and analyze literally EVERYTHING that might be relevant to IT security is the ultimate pipe dream. However, this Network World article from 2014 notes that, on a practical level, it may be more possible, practical, and cost-effective than people think.

“We used to be limited by analytics platforms and the cost of storage, but this is no longer the case,” states author Jon Oltsik. “Big data, cheap storage, and cloud-based storage services have altered the rules of the games from an analytics and economics perspective.” He points to the new generation of big-data security analytics platforms as an essential component of any comprehensive approach. And he notes that predictive algorithms, advanced visualizations, machine learning, and threat-intelligence automation are also important pieces of the security equation.

Essentially, Oltsik is arguing for core big-data hardcore use case that I’ve referred to elsewhere as “whole-population analytics.” This refers to the value from having interactive access to the entire population of analytical data, rather than just to convenience samples, subsets or slices. Until big data came our way, few data scientists have had the luxury of being able amass petabytes of data on every relevant variable of every entity in the population under study. As the price of storage, processing and bandwidth continue to decline, computational analysts will be able to keep the entire population of all relevant data under their algorithmic microscopes.

According to Oltsik, whole-population security analytics is not just feasible but imperative in today’s world. In the cited article, he pooh-poohs the notion of identifying any one type of security-relevant data as being more or less important than any other. Specifically, he takes issue with the list of data types–firewall logs, IDS/IPS alerts, PC/laptop forensic data, IP packet capture, server logs–that Chief Information Survey Officers (CISOs) were asked in a survey to prioritize as inputs in malware protection schemes. “[This list] is skewed toward the network perimeter which no longer makes sense in a mobile device/mobile user world,” says Oltsik. He argues that, in addition to those types included in the survey, security professionals should be correlating it all with metadata, identity data, transaction data, emails, and other types to get a rounded view of what’s going on at all levels.

Lest this whole discussion come off as an IT budget-buster in the making, Oltsik makes an excellent point that often gets overlooked among big-data devotees: you don’t need to store all, or even most, of this security-relevant big data in perpetuity. The point of big-data analytics is to find patterns that may be difficult or impossible to glimpse at lesser scales.

Consequently, security professionals only need to analyze these gobs of data long enough to find whatever correlations, events, and interactions they’re searching for. Once outcomes have been achieved, much of the data can be safely purged. For example, some security-relevant events may occur once in the proverbial “blue moon,”, so it’s likely they’ll never be detected under any approach less comprehensive than whole-population historical event analytics. But once you’ve analyzed the factors behind those few times those events occurred in the past, say, ten years, there may be little operational need (for this use case, at least) to justify storing all that data.

“To be clear, big data security analytics doesn’t demand retention of data,” Oltsik states, “but it does demand scanning the data in search of suspicious/anomalous behavior. In many cases, CISOs only retain the metadata, a fraction of the whole enchilada.”

So bear in mind that collecting and analyzing everything doesn’t always mean you need to store everything. Filtering all the relevant data streams–and analyzing them against past and expected patterns–should be enough for many of the most critical security threats your enterprise faces.

About the author

James Kobielus, Wikibon, Lead Analyst Jim is Wikibon's Lead Analyst for Data Science, Deep Learning, and Application Development. Previously, Jim was IBM's data science evangelist. He managed IBM's thought leadership, social and influencer marketing programs targeted at developers of big data analytics, machine learning, and cognitive computing applications. Prior to his 5-year stint at IBM, Jim was an analyst at Forrester Research, Current Analysis, and the Burton Group. He is also a prolific blogger, a popular speaker, and a familiar face from his many appearances as an expert on theCUBE and at industry events.

You might also like...

Machine Learning Will Do Auto-Programming’s Heavy Lifting

Read More →