Seminar Overview
This new two day workshop is aimed at getting Data Scientists, Data Warehousing and BI professionals up to speed on Big Data, Hadoop, other NoSQL DBMSs, and Multi-Platform Analytics. What is Big Data? How can you make use of it? How does it fit within a traditional analytical environment? What skills do you need to develop for Big Data Analytics? All of these questions are addressed in this new knowledge packed workshop.Audience
IT directors, CIO’s, IT Managers, BI Managers, Data Warehousing Professionals, Data Scientists, Enterprise Architects, Data Architects.Click Here to See Full Course Outline
Big Data Analytics: From Strategy to Implementation
Learning Objectives:
What Big Data is
How Big Data creates several new types of analytical workload
Big Data technology platforms beyond the data warehouse
Big Data analytical techniques and front-end tools
How to analyze un-modeled, multi-structured data using Hadoop, MapReduce & Spark
How to integrate Big Data with traditional data warehouses and BI systems
How to clearly understand business use cases for different Big Data technologies
How to set up and organize Big Data Projects including skills
How to make use of Big Data to deliver business value
Module 1: An Introduction to Big Data
This session defines Big Data and looks at business reasons for wanting to make use of this new area of technology. It looks at Big Data use case studies and what the difference is between traditional BI and Data Warehousing versus Big Data
What is Big Data?
Types of Big Data
Why analyze Big Data?
The need to analyze new more complex data sources
Industry use cases – Popular big data analytic applications
What is Data Science?
Data Warehousing and BI versus Big Data
Popular patterns for Big Data technologies
Module 2: An Introduction to Big Data Analytics
This session looks at Big Data Analytical workloads, the technology components involved and how you can integrate these with existing DW/BI systems in a new architecture for end-to-end analytics and to enrich business insight. It also looks at how to preserve existing investment in data management and BI tools across DW and Big Data platforms
Traditional data warehousing and BI in the enterprise
The need to analyze new more complex data sources
Types of Big Data analytical workloads
Streaming data at high velocity
Structured data analysis
Multi-structured data analysis
Challenges when managing and analyzing big data
Key components in a Big Data Analytics environment
The Big Data Extended Analytical Ecosystem
Module 3: Big Data Platforms and Storage Options
This session looks at platforms and data storage options for big data analytics
The new multi-platform Analytical Ecosystem
Beyond the Data Warehouse – Analytical databases, Hadoop and NoSQL DBMSs
Big Data Appliances – Oracle Big Data Appliance, IBM BigInsights, Microsoft PDW and HD Insight, EMC GreenPlum DCA & PivotalHD
NoSQL databases, e.g. Neo4j, YarcData, MongoDB
Creating a multi-platform analytical ecosystem
The role of Data Virtualization in a Big Data environment
Multi-platform optimization – the new trend in Big Data Analytics
Module 4: Big Data Integration And Governance in a Multi-Platform Analytical Environment
This session will look at the challenge of integrating and governing Big Data and the unique issues it raises. How do you deal with very large data volumes and different varieties of data? How does loading data into Hadoop differ from loading data into analytical relational databases? What about NoSQL databases? How should low-latency data be handled? Topics that will be covered include:
Types of Big Data
Connecting to Big Data sources, e.g. web logs, clickstream, sensor data, unstructured and semi-structured content
The role of information management in an extended analytical environment
Supplying consistent data to multiple analytical platforms
Best practices for integrating and governing multi-structured and structured Big data
Change data capture – what’s possible
Dealing with data quality in a Big Data environment
Big Data transformation and integration
Loading Big Data – what’s different about loading HDFS, Hive & NoSQL Vs analytical relational databases
Tools for ELT processing on Hadoop – The Enterprise Data Refinery
ETL tools Vs Pig Vs self- service DI/DQ
Governing data in a Data Science environment
Joined up analytical processing from ETL to analytical workflows
Mapping discovered data of value into your DW and business vocabulary
Module 5: Tools and Techniques for Analyzing Big Data
This session looks at tools and techniques available to data scientists, business analysts and traditional DW/BI professionals to analyze Big Data. It looks how different types of developers and users can exploit Big Data platforms such as Hadoop and NoSQL databases using programming techniques, text analytics, search, self-service BI tools as well as how vendors are making it easier to gain access both the NoSQL/Hadoop world and the Analytical RDBMS world by using data virtualization
Data Science projects
Creating Sandboxes for Data Science projects
MapReduce developers versus SQL developers
MapReduce developer tools – What is R?
Using R as an analytical language for Big Data
Managing stream computing in a Big Data environment
Tools and techniques for streaming analytics
Using Data virtualization to simplify access Big Data and traditional DW/BI systems
SQL connectivity initiatives to Big Data – e.g. Impala, Hive
Speeding up Hive with Stinger
Analyzing Big Data using Self-Service BI Tools, e.g. Tableau, QlikView, Spotfire MicroStrategy, SAP BO,
NoSQL BI Tools and applications for Hadoop, e.g. Datameer, Karmasphere, Platfora, IBM Customer Insight
Big data analytics – query performance enablers
Data visualization and in-memory data in a Big Data environment
Module 6: Integrating Big Data Analytics into the Enterprise
This session looks at how new Big Data platforms can be integrated with traditional Data Warehouses and Data Marts. It looks at stream processing, Hadoop, NoSQL databases, Data Warehouse appliances and shows how to put them together to maximize business value from Big Data Analytics.
Integrating Big Data platforms with traditional DW/BI environments – what’s involved
Integrating event processing with Hadoop and Analytical DW Appliances
Integrating Hadoop with DW Appliances and Enterprise Data Warehouses
Tying together front end tools
Multi-platform Analytics
About the Instructor – Mike Ferguson
Mike Ferguson is Managing Director of Intelligent Business Strategies Limited. As an analyst and consultant he specialises in BI/Analytics, Big Data and Data Management. With over 32 years of IT experience, Mike has consulted for dozens of companies on BI, technology selection, Big Data, enterprise architecture, and data management. He has spoken at events all over the world and written numerous articles. Mike provides articles, blogs and his insights on the industry. Formerly he was a principal and co-founder of Codd and Date Europe Limited – the inventors of the Relational Model, a Chief Architect at Teradata on the Teradata DBMS and European Managing Director of Database Associates. He teaches popular master classes in BI, Big Data Analytics, Data Governance & Master Data Management