by Angela Guess
The Pentaho Corporation recently announced that “it has made freely available under open source all of its big data capabilities in the new Pentaho Kettle 4.3 release, and has moved the entire Pentaho Kettle project to the Apache License, Version 2.0. Because Apache is the license under which Hadoop and several of the leading NoSQL databases are published, this move will further accelerate the rapid adoption of Pentaho Kettle for Big Data by developers, analysts and data scientists as the go-to tool for operationalizing big data.”
The article continues, “Big data capabilities available under open source Pentaho Kettle 4.3 include the ability to input, output, manipulate and report on data using the following Hadoop and NoSQL stores: Cassandra, Hadoop HDFS, Hadoop MapReduce, Hadapt, HBase, Hive, HPCC Systems and MongoDB.”
It adds, “With regard to Hadoop, Pentaho Kettle makes available job orchestration steps for Hadoop, Amazon Elastic MapReduce, Pentaho MapReduce, HDFS File Operations, and Pig scripts. All major Hadoop distributions are supported including: Amazon Elastic MapReduce, Apache Hadoop, Cloudera’s Distribution including Apache Hadoop (CDH), Cloudera Enterprise, EMC Greenplum HD, HortonWorks Data Platform powered by Apache Hadoop, and MapR’s M3 Free and M5 Edition. Pentaho Kettle can execute ETL transforms outside the Hadoop cluster or within the nodes of the cluster, taking advantage of Hadoop’s distributed processing and reliability.”
photo credit: Pentaho

















