- Intro to Hadoop, BigData
- What is BigData?
- Parallel Computer vs. Distributed Computing
- Brief history of Hadoop
- RDBMS/SQL vs. Hadoop
- Scaling with Hadoop
- Intro to the Hadoop ecosystem
- Optimal hardware and network configurations for Hadoop
- Vendor Comparison
- Use cases
LAB #1: Virtual Machine Setup
- HDFS – Hadoop Distributed File System
- Linux File system options
- NameNode architecture
- Secondary NameNode architecture
- DataNode architecture
- Heartbeats, Rack Awareness, Health Check
- Exploring the HDFS Web UI
LAB #2: HDFS CMD Line
- Beginning MapReduce
- MapReduce Architecture
- Shuffle and Sort
- Exploring the MapReduce Web UI
- Walkthrough of a simple Java MapReduce example
- Use case: Word Count in MapReduce
LAB #3: Running MapReduce in Java
- Advanced MapReduce
- Data Types and File Formats.
- Driver, Mapper & Reducer Class Code.
- Build Map & Reduce programs using Eclipse.
- Serialization and File-Based Data Structures
- Input/output formats
- Run Map Reduce locally and on cluster.
LAB #4: Java MapReduce API
- Hive for Structured Data
- Hive architecture
- Hive vs. RDBMS
- HiveQL and Hive Shell
- Managing tables
- Data types and schemas