Big Data and Hadoop Training Training Provided by learntek
Big Data and Hadoop Training free videos and free material uploaded by learntek staff .
Big Data Hadoop Training :
Hadoop Introduction
Big Data Hadoop Training :
Introduction to Data and System
Types of Data
Traditional way of dealing large
data and its problems
Types of Systems & Scaling
What is Big Data
Challenges in Big Data
Challenges in Traditional
Application
New Requirements
What is Hadoop? Why Hadoop?
Brief history of Hadoop
Features of Hadoop
Hadoop and RDBMS
Hadoop Ecosystem’s overview
Hadoop Installation
Installation in detail
Creating Ubuntu image in
VMwareDownloading Hadoop
Installing SSH
Configuring Hadoop, HDFS &
MapReduce
Download, Installation &
Configuration Hive
Download, Installation &
Configuration Pig
Download, Installation &
Configuration Sqoop
Download, Installation &
Configuration Hive
Configuring Hadoop in Different
Modes
Hadoop Distribute File System
(HDFS)
File System – Concepts
Blocks
Replication Factor
Version File
Safe mode
Namespace IDs
Purpose of Name Node
Purpose of Data Node
Purpose of Secondary Name Node
Purpose of Job Tracker
Purpose of Task Tracker
HDFS Shell Commands – copy,
delete, create directories etc.
Reading and Writing in HDFS
Difference of Unix Commands and
HDFS commands
Hadoop Admin Commands
Hands on exercise with Unix and
HDFS commands
Read / Write in HDFS – Internal
Process between Client, NameNode & DataNodes.
Accessing HDFS using Java API
Various Ways of Accessing HDFS
Understanding HDFS Java classes
and methods
Admin: 1. Commissioning /
DeCommissioning DataNode
Balancer
Replication Policy
Network Distance / Topology Script
Map Reduce Programming
About MapReduce
Understanding block and input
splits
MapReduce Data types
Understanding Writable
Data Flow in MapReduce Application
Understanding MapReduce problem on
datasets
MapReduce and Functional
Programming
Writing MapReduce Application
Understanding Mapper function
Understanding Reducer Function
Understanding Driver
Usage of Combiner
Understanding Partitioner
Usage of Distributed Cache
Passing the parameters to mapper
and reducer
Analysing the Results
Log files
Input Formats and Output Formats
Counters, Skipping Bad and
unwanted Records
Writing Join’s in MapReduce with 2
Input files. Join Types.
Execute MapReduce Job – Insights.
Exercise’s on MapReduce.
Job Scheduling: Type of
Schedulers.
Hive
Hive concepts
Schema on Read VS Schema on Write
Hive architecture
Install and configure hive on
cluster
Meta Store – Purpose & Type of
Configurations
Different type of tables in Hive
Buckets
Partitions
Joins in hive
Hive Query Language
Hive Data Types
Data Loading into Hive Tables
Hive Query Execution
Hive library functions
Hive UDF
Hive Limitations
Pig
Pig basics
Install and configure PIG on a
cluster
PIG Library functions
Pig Vs Hive
Write sample Pig Latin scripts
Modes of running PIG
Running in Grunt shell
Running as Java program
PIG UDFs
HBase
HBase concepts
HBase architecture
Region server architecture
File storage architecture
HBase basics
Column access
Scans
HBase use cases
Install and configure HBase on a
multi node cluster
Create database, Develop and run
sample applications
Access data stored in HBase using
Java API
Sqoop
Install and configure Sqoop on
cluster
Connecting to RDBMS
Installing Mysql
Import data from Mysql to hive
Export data to Mysql
Internal mechanism of import/export
Oozie
Introduction to OOZIE
Oozie architecture
XML file specifications
Specifying Work flow
Control nodes
Oozie job coordinator
Flume
Introduction to Flume
Configuration and Setup
Flume Sink with example
Channel
Flume Source with example
Complex flume architecture
ZooKeeper
Introduction to ZooKeeper
Challenges in distributed
Applications
Coordination
ZooKeeper : Design Goals
Data Model and Hierarchical
namespace
Cilent APIs
YARN
Hadoop 1.0 Limitations
MapReduce Limitations
History of Hadoop 2.0
HDFS 2: Architecture
HDFS 2: Quorum based storage
HDFS 2: High availability
HDFS 2: Federation
YARN Architecture
Classic vs YARN
YARN Apps
YARN multitenancy
YARN Capacity Scheduler
Big Data Hadoop Training : Hadoop is a free, Java -based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes of storage capacity. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative.
Write a public review