It appears the marriage between Big Data and NoSQL is one made of necessity. As data grows larger and larger, the weaknesses in the relational data model are exacerbated. NoSQL technologies grew out of the need for fast query speed and real-time analytics from data sources too large for traditional SQL.
As 2012 enters its second half, this article looks at emerging trends in products and practices from the married worlds of Big Data and NoSQL.
Informatica Speaks Big Data
As a worldwide leader in data integration software, Informatica currently embraces Big Data with the release of version 9.5 of their flagship platform. The company sponsored the Big Data/NoSQL track at the recent Enterprise Data World 2012.
The Informatica Platform was engineered to handle the growing size and velocity of today’s data. Informatica 9.5 also processes data from a variety of sources: high-throughput transactional systems, social media and web data, and even Hadoop/MapReduce frameworks.
Speaking of Hadoop, the latest version of Informatica comes with native Hadoop support, including transforms used by developers for matching, parsing, and cleaning data. The 9.5 platform also includes a visual development environment, allowing even non-experts to deploy to Hadoop if necessary, while also leveraging existing expert-level technical skills.
In addition to the new focus on Big Data, Informatica promises that its platform still helps enterprises increase the business value of their data, while helping to lower the cost of managing that data.
Franz Inc.’s AllegroGraph Provides Storage for NoSQL Applications
As part of an array of semantic web products, Franz Inc. offers AllegroGraph, a graph database platform well-suited as the database layer for a variety of enterprise-level NoSQL applications.
The latest version of AllegroGraph, 4.7, embraces NoSQL by featuring support for MongoDB, including the ability for developers to create, delete, and modify JSON-notated objects directly into a MongoDB store.
AllegroGraph sports the insanely fast query times typical of graph databases, as well as fully supporting ACID transactional processing. It is a definite option for customers looking for a graph database that now speaks the language of NoSQL.
SAP Sybase IQ Turns Big Data into a Big Opportunity
Sybase IQ has long been well-regarded as a columnar database useful for analytics applications. After Sybase’s acquisition by SAP, the database is now branded as SAP Sybase IQ. Additionally, the current buzz factor of Big Data has led Sybase and SAP to trumpet IQ’s suitability for huge data stores.
SAP’s main point as far as their trumpeted “Big Opportunity” is based on a research study by the University of Texas showing a ten percent improvement in data usability at a Fortune 1000 company leads to an over two billion dollar improvement in yearly revenue, while an ten percent improvement in data accessibility leads to an over 65 million dollar improvement in yearly net income. These are impressive statistics, indeed.
There is little doubt that enterprises looking for an experienced partner to navigate the turbulent seas of today’s rapidly changing “datasphere” would do well to explore SAP Sybase’s product line, including the Big Data applications for Sybase IQ.
Neil Raden’s Fresh Take on Big Data
Neil Raden is known as a thought leader in the world of business intelligence. He is currently a VP and Principal Analyst at Constellation Research. Recently, he gave a talk at Enterprise Data World providing a fresh take on Big Data.
Even beyond its size, Raden feels Big Data also represents diversity in the sources of data as well as new aggregators serving the role as consumers of said data. He feels the sheer volume and velocity of Big Data get too much attention compared to its actual business value.
Another major point of Raden’s presentation is to not always associate Big Data with Hadoop. Sure, Hadoop is an important technology created in part to deal with massive datasets, but other platforms, some even based on relational databases, also remain relevant.
Raden also stressed that Big Data and analytics by themselves do not provide business value. The processing Big of Data allows analytics to occur, and those analytics drive the informed decisions that produce true value.
Current challenges for Big Data include inefficiencies in server clusters, deriving obvious value from implementations, as well as an IT skills shortage around data scientists. Raden feels industry uptake will happen more rapidly given better programming language support and application interactivity.
Ultimately, Neil strongly feels the true promise of Big Data lies not in improved algorithms predicting consumer desires for an online retailer, but in the potential advancements in the worlds of medicine, science, and the environment.
Eventual Consistency Drops ACID
ACID is a familiar term to most data professionals. Atomicity, Consistency, Isolated and Durability remain requirements for many traditional relational database systems. With the advent of high-velocity Big Data, the ACID model has lessened in relevance for these newer systems.
Eventual consistency adds a modifier to the “C” in ACID. Enterprise Architecture firm, ZapThink are experts in cloud-based architecture. Their president, Jason Bloomberg, gave a talk on Eventual Consistency at Enterprise Data World 2012.
The core of Bloomberg’s presentation covered application architecture in the Cloud, where fault tolerance and elasticity are important concepts. In this instance, elasticity means an app that is rapid, measured, automated, as well as being able to scale both upwards and downwards.
Cloud-based fault tolerance includes stateless processing and automated post-failure bootstrapping as compared the mirroring and RAID disk arrays used in traditional architectures.
Bloomberg also touched on the CAP theorem (consistency, availability, partition tolerance) leading to a discussion on eventual consistency where data becomes consistent after a predetermined amount of time. He compared the concept to any process ending with a settlement, like bank transactions or mobile phone roaming.
In these nascent cloud-based days, BASE becomes a more useful acronym than ACID. Basic Availability, Soft State, and Eventual Consistency make a better architectural fit for high-velocity data systems. If re-architecting, it is important that existing systems need to support BASE, be horizontally scalable, as well as be cloud friendly.
Big Data Drives the New Data Warehouse
The Data Warehouse as a concept really took off in the mid 1990s when Ralph Kimball’s seminal book, The Data Warehouse Toolkit, provided data modelers with a collection of logical, real-world examples of the differences between OLAP and OLTP along with a easy to follow road map for data warehousing architecture.
In this current era, those original data warehousing architectural patterns have been modified to take in account current trends, like Big Data, mobility, analytics, and new visualization needs.
Krish Krishnan is the Founder and President of Sixth Sense Advisors and a thought leader on current data warehousing architecture. His recent EDW seminar covered the modern Data Warehouse centered on the concepts of Big Data, mobile accessibility, and social intelligence.
That last concept, social intelligence, is a key point driving many online retailers to look at new data warehousing models. 70 percent of these retailers want to introduce personal product recommendations on their websites. Peer-to-peer recommendations are also important.
Big Data obviously plays a role in achieving the real-time analysis necessary for quality social intelligence. This includes data from unstructured and semi-structured sources in addition to the traditional structured datasets.
Improved accessibility includes mobile device access to real time data, as well as enhanced visualization. In many cases, firms need to embrace Agile processes to be able to architect and implement the desired solutions at the current speed of competitive business.
Krishnan feels heterogeneity is another key point of modern data warehousing architecture, as multiple platforms, multiple data providers, and multiple consumption devices are a part of any system. On the other hand, consensus needs to happen on models, interfaces, and interoperability of any fresh data warehousing design.
Semantic technology is another key component of the New Data Warehouse. The ability to use linguistics and other techniques to wean meaningful information out of masses of unstructured textual data is vital.
Integration is obviously important as well. Most modern RDMBS providers now include NoSQL and Big Data technologies in their offerings. Metadata helps greatly in the overall integration process.
The bottom line is that Big Data and NoSQL are equally evolutionary growths in data processing as they are revolutionary. Hopefully this article provided some food for thought as well as some points to inspire further research and exploration into these state of the art concepts, software platforms and applications.