by Angela Guess
Big Data is often broken down into four Vs: volume, velocity, variety, and value. A recent article takes a look at managing the first of these aspects, volume. It begins, “Two main patterns are apparent when dealing with large volumes. The most obvious is parallelism, and while we spend a lot of effort as an industry parallelizing computation, data parallelism remains a challenge and is the focal point of most current solutions. Additionally, it’s becoming apparent that in many cases compute grids are bottlenecking data access. Therefore the pattern of moving compute tasks to the data rather than moving large amounts of data over the network is also becoming paramount. Several technical approaches combine these patterns, parallelizing both data and computation while bringing the compute tasks closer to the data.”
One of the these approaches is NoSQL: “The concept of schema-less data management (which is what NoSQL is really all about) has been steadily gaining momentum in recent years. At its core is the notion that developers can be more productive by circumventing the need for complex schema design during the development lifecycle of data-intensive applications, especially when the data lends itself to being modeled in key-value pairs (e.g. time series data). Despite being based on different principles, many of these technologies essentially follow a similar philosophy for data grids: they distribute data horizontally across many nodes and model it in an Object-Oriented rather than a relational manner.”
photo credit: Rubber Dragon
















