Teradata has long aspired to provide an environment in which it can readily analyze and integrate all forms of data for the enterprise.
On April 15, 2013 it revealed a number of technologies that have made the dream a reality.
The prominent data analytics solutions provider substantially bolstered its Unified Data Architecture (UDA) with the Teradata Enterprise Access for Hadoop technology and its fabric-based computing system, Mellanox’s Infiniband. Both technologies revolutionize the ease of accessing and running analytics for Big Data through Hadoop. Teradata also announced the release of an updated Teradata Active Enterprise Data Warehouse 6700 and a smaller version, the Teradata Data Mart Appliance 670, a departmental warehouse designed for testing and development.
The company’s commitment to fabric-based computing is essential for carrying out its goal of providing universal analytics through UDA. Infiniband operates as the common backing through which users can move data seamlessly between UDA’s principle components, the Teradata Integrated Data Warehouse and the Aster Discovery Platform. Enterprise Access for Hadoop includes Hadoop in the free exchange and analysis of data, which UDA expands to conventional data marts and analytical archives.
According to Teradata’s strategic deployment of Big Data solutions head Tasso Argyros – who founded Aster Data before Teradata acquired it in 2011:
“The reason this is important is because one size doesn’t fit all anymore. You can’t build your data architecture on only one data source. There’s a huge advantage to using best of breed technologies working together. Having a monitoring infrastructure that allows you to view all sources from one place is very important for operation efficiency.”
UDA’s “best of breed” technologies not only include Big Data access through Hadoop and Apache developments like HCatalog, but also those of a variety of products such as Hortonworks Data Platform, Intel Xeon Processors, and Linux’s enterprise server operating system. UDA users can monitor and manage data in any location with Viewpoint, while InfiniBand provides the hardware for Teradata’s BYNET V5 software for massive parallel processing broadcast functions.
Infiniband provides the foundation for Teradata’s fabric-based computing, and is considered a highly scalable, swift means of enabling connectivity between analytic and reporting tools, and transferring of data between sources. Its reliability, speed, and scalability are largely enhanced by BYNET, which boosts the capacity of Teradata’s Enterprise Data Warehouse to 61 petabytes and works best when moving data between dual networks. The result is that users can perform real-time in-query data sorting in an environment that is designed to optimize the speed and performance of business intelligence and analytics – which is crucial for integrating various types of structured and unstructured data under tight time constraints. BYNET also increases network fail-over capability.
Enterprise Access for Hadoop
The ultimate benefit of fabric-based computing is the uniformed analytics it makes possible through shifting the data into various sources, which is facilitated through the Teradata Enterprise Access for Hadoop when Big Data is involved. The most important features of this release are Teradata’s SQL-H (as in Hadoop) and Teradata’s Smart Loader for Hadoop. The latter enables analysts and laymen to manipulate and move data from Hadoop to Teradata’s secure, proprietorial integrated data warehouse. It is able to do so through the power of the former, which allows users to formulate queries and issue reports on Big Data using SQL, the impact of which Argyros says should not be taken lightly:
“Now, all the SQL analysts that most enterprises already have can do SQL analytics on Big Data without knowing anything about Hadoop. That’s one of the reasons SQL-H has been so successful so far, because instead of enterprises having to go and hire 30 Hadoop data scientists, they can utilize the 25 SQL analysts they already have and, with SQL-H, only hire five more people.”
SQL-H also mitigates security concerns about accessing data in Hadoop (which is open source), since it allows users to move data into their own data warehouses. Thanks to InfiniBand and BYNET, analysts can access Big Data in real time and issue queries and reports without code or script. This self-service aspect of SQL-H encourages operations, business, and executive use for either ad-hoc or planned analysis. Organizations can still extract information from Big Data sources utilizing the conventional architecture and methods for BI that they’re already acquainted with, without extensive overhead costs for hiring and training in No-SQL.
SQL-H integrates with Hortonworks Data Platform and Apache HCatalog to facilitate intelligent data across a multitude of systems. The latter enables users to minimize replication and data movement costs by only moving data into Teradata’s data integration warehouse that is required for a query. The combination approach of integrating and performing analytics on data from virtually all sources is the basis for Teradata’s claim for UDA. Users can choose between Cloudera Distribution and Hortonworks Data Platform for commercial distribution of Hadoop, while Teradata’s integration warehouse grants numerous users simultaneous access.
Teradata Studio with Smart Loader for Hadoop simplifies the Hadoop browsing experience by presenting data in tables (with table properties) for an easy, point-and-click experience. Bi-directional table copies create maps of data by type between Teradata and Hadoop sources for ready comparison. Other features include transfer status and history functions for users to track statuses of loads.
The Teradata Active Enterprise Data Warehouse (EDW) 700 provides operational and strategic intelligence with real-time updates. Its speed is due in part to running BYNET on Infiniband, as is the extreme scalability it offers. The most recent version of the Active EDW Platform is available in two different models, the 6700C and the 6700H. The 6700H has more memory, storage capacity, and a higher Teradata performance per node. One of the central differences between the two is that the 6700H comes with a hybrid storage architecture that utilizes both Solid State Drive (SSD) and Hard Disk Drive (HDD) technologies; the 6700C comes with HDD and can be upgraded to include SDD. One of the primary benefits of this platform is the fact that more regularly used “hot” data is placed in SSD for expedient access, whereas less frequently used data is relegated to HDD. Teradata’s Virtual Storage allows users to specify in which technology they would like data placed.
The primary distinction between the recently released Teradata Active EDW Platform and its predecessor is that the updated version incorporates an Eight Core Intel Xeon Processor and high performance computing nodes that, when combined with recent fabric-based computing technologies, makes it significantly faster. It utilizes Viewpoint for convenient monitoring of data and supports subsequent and prior platform generations to increase investment protection and encourage sustainability. The Data Mart Appliance 670 also features an Intel Xeon Processor and high performing computer nodes and is available in both HDD or hybrid versions, yet has substantially less storage than the 6700. Argyros commented:
“We have products that are very cost effective and geared toward point problems all the way to high end products like the 6700. That allows you to integrate structured data from across the enterprise scaled to many terabytes, and it supports hundreds of thousands of users.”
Ultimately, Terradata representatives base the validity of UDA’s viability and comprehensive data analytics on the strength of its integrated data warehouse, which utilizes Hadoop’s Big Data and Aster’s discovery tools to unlock its full potential. When one considers all of the other data sources that can integrate with it, Teradata’s claim for offering unified analytics appears convincing. Argyros reflected on the process of UDA’s development:
“We were looking for how we could unify the analytics and the processing trail. In order to do so you need to be able to move data from Hadoop and Teradata into Aster, and from Aster and Teradata into Hadoop. And you ideally want to make sure that analytics frame data from Aster, Teradata and Hadoop at the same time. That’s kind of the holy grail of software integration, and we’ve done that.”