Wednesday 12 March 2014

Hadoop Administrator Course Content

Bigdata/ Hadoop Administrator Course Outline

Audience - Oracle Database Administrator, SQL DBA, windows administrator, Unix Administrator, and Network Administrators

Module 1
Big data Getting Started
What is Big Data?
What  is Apache Hadoop ?
History of Hadoop
Understanding distributed file systems and Hadoop
Hadoop eco system components
Hadoop use cases
Ubuntu Installation
JDK Installation
Module 2
Hadoop Distributed File system

Eclipse Installation
Overview of HDFS
Communication Protocols
Hadoop cluster Topology Overview
Setting up SSH for Hadoop Cluster
Running Hadoop –
1.       Pseudo-distributed mode
Linux basic commands
HDFS file commands
Reading and writing to HDFS programmatically
Hands-on Lab Exercises
Module 3
MapReduce Framework

Java Basics
Anatomy of a MapReduce Program
Writables
InputFormat
OutputFormat
Streaming API
Inherent failure handling
Reading and writing
Hands-on Lab Exercises
Module 4
Advanced MapReduce  Programming
Input splits, Record Reader, Mapper, Partition & Shuffle, Reduce, OutputFormat
Writing MapReduce program
Streaming in Hadoop
Counters
Performance Tuning
Joins
Sorting
Determining Optimal number of reducers, partitions
Hadoop cluster – Performance tuning
Hands-on Lab Exercises
Module 5 - Apache Hadoop Administration

Level 1  
Operating System Preparation
      Deployment Setup
      Software
      Hostname, DNS, and Identification
      Users, Groups and Privileges

 Kernel Tuning
     vm.overcommit_memory
     Vm.swappiness

Best Practices for Hadoop setup and infrastructure

Hadoop cluster Installation preparation & Configuration
   Ø  Cluster network design
   Ø  Installation of Linux operating system
   Ø  Configuring SSH
   Ø  Walkthrough on Rack topology and set up

Managing Hadoop cluster
   Ø  HDFS cluster management
   Ø  Secondary Name node configuration
   Ø  Task Tracker management
   Ø  Configuring the HDFS quota
   Ø  Configuring Fair Scheduler      
   Ø  Upgrading Hadoop     
   Ø  Deploying and managing Hadoop clusters
          with Ambari

Monitoring Hadoop cluster
   Ø  Monitoring Hadoop cluster with Ganglia
   Ø  Monitoring Hadoop cluster with Ambari
   Ø  Monitoring Hadoop cluster with Nagia

Hadoop Cluster Performance Tuning
   Ø  Benchmarking and profiling
   Ø  Using compression for input and output
   Ø  Configuring optimal map and reduce
          slots  for the TT
   Ø  Fine tuning Job Tracker config
   Ø  Fine tuning Task Tracker config
   Ø  Tuning Shuffle, merge and sort parameters
Security Implementation
              Kerberos security Implementation      
Workflow Scheduler
              FIFO Scheduler Configuration
               Capacity Scheduler Configuration
               Fair Scheduler  Configuration

dfsadmin & mradmin commands

Administration of Hcatalog and Hive

Backup and Recovery
-           
Level  2  Cluster maintenance
Starting and stopping Processes with Init Scripts
Starting and Stopping processes manually

  HDFS maintenance Tasks
-           Data node failure & Recovery
-          Name Node Failure & Recovery
-          JT & TT failure  & Recovery
-          Removing data nodes
-          Adding Data nodes
-           Commissioning and decommissioning of nodes
  Map Reduce  maintenance Tasks
-          Shared upon request
Level 3  Monitoring
Hadoop Metrics

Health-check
        Hadoop Processes
     Rest of them shared upon request
Level 4 Backup and Recovery
Data Backup
 Name Node backup


Module  6
Pig and Pig Latin
Installation and configuration
Running Pig Lating through grunt
Working with Scripts
Lab Exercises
Module  7
HBase and ZooKeeper
NoSQL Vs SQL
Cap  Theorem
Architecture
Installation
Configuration
Java API
Performance Tuning
Lab Exercises
Module  8
Hive
Features of Hive
Architecture
Installation and configuration
HiveQL

Lab Exercises
Module  9
Other Hadoop eco system components
Overview of Ambari, Oozie ,Mahout
Installing & configuring Sqoop, mysql-server
Installing & configuring flume

Lab Exercises
Module 10
Hadoop on Cloud
Hosting Hadoop on Amazon EC2
EMR Hands-on

1 comment:

  1. Your posts is really helpful for me.Thanks for your wonderful post. I am very happy to read your post.

    Big Data Training in Chennai

    ReplyDelete