by Angela Guess
William McKnight recently reported on the virtualization of Big Data distribution: “The management of data and the integration of data are two sides of the same coin. The more you separate the data, the more you will need to integrate it to satisfy business queries and compile the analytics to feed to other systems. And separate data is what we continue to do. The general-purpose data warehouse, in a row-based relational database management system (DBMS), has got to be feeling neglected. Stealing the buzz and some budget, research and data are the other systems that are filling niches that are really beginning to expand on the possibilities started by the data warehouse.”
McKnight continues, “Hadoop has essentially co-opted ‘big data’ and is not only allowing companies to think about storing every log, every click, every everything that can be captured, but to do it scalably and cheaply. These NoSQL stores are by definition distributed, and often across dozens to hundreds of servers and networks, which creates the need for integration. And, like several of the other systems, price-performant fit only for specific use cases. Many companies are thinking ‘outside the box’ to satisfy the growing abilities of analysts and systems to utilize information.”
McKnight goes on, “While SQL purists may question some of the Hadoop movement, the low cost and lower overhead are alluring. When companies need to capture their web-scale data, the sparse, multidimensional structure and the co-locating of relevant data to reduce I/O and heavy distribution makes for a fair tradeoff of full SQL functionality… Integration requirements between these systems will be many however. Workload distribution across the array of possibilities is still subject to some debate and it’s important enough that some company’s futures will rise or fall based on good data distribution and integration decisions.”
For more on this topic see the full article.
photo credit: Tom Raftery
























