by Angela Guess
Stephen Swoyer of Search Data Management recently argued that Big Data analytics are leading to the end of data sampling. He writes, “To sample or not to sample? It didn’t used to be a question. Data sets were so huge, and compute resources so inadequate, that most BI professionals simply accepted sampling as a kind of pragmatic (albeit inadequate) necessity. The good news is that BI professionals now have a better choice: big data analytics, according to experts. ‘If you really want the lowdown on what’s happening in your business, you need large volumes of highly detailed data,’ wrote Philip Russom, research director for data warehousing with The Data Warehousing Institute (TDWI), in Big Data Analytics, a recent TDWI report. ‘If you truly want to see something you’ve never seen before, it helps to tap into data that’s never been tapped for business intelligence or analytics.’ That’s the radical raison d’être of big data analytics, and it’s radical because it is unprecedented.”
He explains, “Not the notion of big data itself, which — as Russom reminds us — dates back at least to ‘the early 2000s, [when] storage and CPU technologies were overwhelmed by the numerous terabytes of big data … to the point that IT faced a data scalability crisis.’ What’s unprecedented is the application of advanced analytics technologies (such as data mining) to massive and diverse data sets. That’s what’s meant by big data analytics, the advent of which, Russom said, signals the end of this data scalability crisis. It used to be that organisations couldn’t meaningfully process — i.e., mine, analyze and in some cases, report against — all of the data that they were collecting. That’s why practices such as sampling came to be viewed as pragmatic necessities — even if almost everyone conceded that they were inherently problematic, to say nothing of capricious.”

















