Bigdata/ Hadoop Administrator Course Outline
Audience - Oracle Database Administrator, SQL DBA, windows administrator, Unix Administrator, and Network Administrators
Module 1
Big data Getting Started
|
What is Big Data?
What is Apache Hadoop ?
History of Hadoop
Understanding distributed file systems and Hadoop
Hadoop eco system components
Hadoop use cases
Ubuntu Installation
JDK Installation
|
Module 2
Hadoop Distributed File system
|
Eclipse Installation
Overview of HDFS
Communication Protocols
Hadoop cluster Topology Overview
Setting up SSH for Hadoop Cluster
Running Hadoop –
1.
Pseudo-distributed mode
Linux basic commands
HDFS file commands
Reading and writing to HDFS programmatically
Hands-on Lab Exercises
|
Module 3
MapReduce Framework
|
Java Basics
Anatomy of a MapReduce Program
Writables
InputFormat
OutputFormat
Streaming API
Inherent failure handling
Reading and writing
Hands-on Lab Exercises
|
Module 4
Advanced MapReduce Programming
|
Input splits, Record Reader, Mapper, Partition & Shuffle, Reduce,
OutputFormat
Writing MapReduce program
Streaming in Hadoop
Counters
Performance Tuning
Joins
Sorting
Determining Optimal number of reducers, partitions
Hadoop cluster – Performance tuning
Hands-on Lab Exercises
|
Module 5 - Apache Hadoop Administration
|
|
Level 1
|
Operating System Preparation
Deployment Setup
Software
Hostname, DNS, and
Identification
Users, Groups and
Privileges
Kernel Tuning
vm.overcommit_memory
Vm.swappiness
Best Practices for Hadoop setup and infrastructure
Hadoop cluster Installation preparation & Configuration
Ø Cluster
network design
Ø Installation
of Linux operating system
Ø Configuring
SSH
Ø Walkthrough
on Rack topology and set up
Managing Hadoop cluster
Ø HDFS
cluster management
Ø Secondary
Name node configuration
Ø Task
Tracker management
Ø Configuring
the HDFS quota
Ø Configuring
Fair Scheduler
Ø Upgrading
Hadoop
Ø Deploying
and managing Hadoop clusters
with Ambari
Monitoring Hadoop cluster
Ø Monitoring
Hadoop cluster with Ganglia
Ø Monitoring
Hadoop cluster with Ambari
Ø Monitoring
Hadoop cluster with Nagia
Hadoop Cluster Performance Tuning
Ø Benchmarking
and profiling
Ø Using
compression for input and output
Ø Configuring
optimal map and reduce
slots for the TT
Ø Fine
tuning Job Tracker config
Ø Fine
tuning Task Tracker config
Ø Tuning
Shuffle, merge and sort parameters
Security Implementation
Kerberos security
Implementation
Workflow Scheduler
FIFO Scheduler
Configuration
Capacity
Scheduler Configuration
Fair
Scheduler Configuration
dfsadmin & mradmin commands
Administration of Hcatalog and Hive
Backup and Recovery
-
|
Level 2 Cluster maintenance
|
Starting and stopping Processes with Init Scripts
Starting and Stopping processes manually
HDFS maintenance Tasks
-
Data node
failure & Recovery
-
Name Node Failure & Recovery
-
JT & TT failure & Recovery
-
Removing data nodes
-
Adding Data nodes
-
Commissioning and decommissioning of nodes
Map Reduce
maintenance Tasks
-
Shared upon request
|
Level 3 Monitoring
|
Hadoop Metrics
Health-check
Hadoop Processes
Rest of them shared upon
request
|
Level 4 Backup and Recovery
|
Data Backup
Name Node backup
|
Module 6
Pig and Pig Latin
|
Installation and configuration
Running Pig Lating through grunt
Working with Scripts
Lab Exercises
|
Module 7
HBase and ZooKeeper
|
NoSQL Vs SQL
Cap Theorem
Architecture
Installation
Configuration
Java API
Performance Tuning
Lab Exercises
|
Module 8
Hive
|
Features of Hive
Architecture
Installation and configuration
HiveQL
Lab Exercises
|
Module 9
Other Hadoop eco system components
|
Overview of Ambari, Oozie ,Mahout
Installing & configuring Sqoop, mysql-server
Installing & configuring flume
Lab Exercises
|
Module 10
Hadoop on Cloud
|
Hosting Hadoop on Amazon EC2
EMR Hands-on
|