BIG DATA TRAINING COURSE Lecture Note, Question papers, MCQ, PPT, Videos

BIG DATA TRAINING COURSE free videos and free material uploaded by ducatittrainingschool staff .

Syllabus / What will i learn?

COURSE CURRICULUM

Pre-requisites for the Big Data Hadoop Training Course?

There will be no pre-requisites but Knowledge of Java/ Python, SQL, Linux will be beneficial, but not mandatory Ducat provides a crash course for pre-requisites required to initiate Big Data training

Apache Hadoop on AWS Cloud

This module will help you understand how to configure Hadoop Cluster on AWS Cloud:

Introduction to Amazon Elastic MapReduce
AWS EMR Cluster
AWS EC2 Instance: Multi Node Cluster Configuration
AWS EMR Architecture
Web Interfaces on Amazon EMR
Amazon S3
Executing MapReduce Job on EC2 & EMR
Apache Spark on AWS, EC2 & EMR
Submitting Spark Job on AWS
Hive on EMR
Available Storage types: S3, RDS & DynamoDB
Apache Pig on AWS EMR
Processing NY Taxi Data using SPARK on Amazon EMR[Type text]

Learning Big Data and Hadoop

Common Hadoop ecosystem components
Hadoop Architecture
HDFS Architecture
Anatomy of File Write and Read
How MapReduce Framework works
Hadoop high level Architecture
MR2 Architecture
Hadoop YARN
Hadoop 2.x core components
Hadoop Distributions
Hadoop Cluster Formation

Hadoop Architecture and HDFS

This module will help you to understand Hadoop & HDFS Cluster Architecture:

Configuration files in Hadoop Cluster (FSimage & editlog file)
Setting up of Single & Multi node Hadoop Cluster
HDFS File permissions
HDFS Installation & Shell Commands
Deamons of HDFS
- Node Manager
- Resource Manager
- NameNode
- DataNode
- Secondary NameNode
- YARN Deamons
- HDFS Read & Write Commands
- NameNode & DataNode Architecture
- HDFS Operations
- Hadoop MapReduce Job
- Executing MapReduce Job

Hadoop MapReduce Framework

This module will help you to understand Hadoop MapReduce framework:

How MapReduce works on HDFS data sets
MapReduce Algorithm
MapReduce Hadoop Implementation
Hadoop 2.x MapReduce Architecture
MapReduce Components
YARN Workflow
MapReduce Combiners
MapReduce Partitioners
MapReduce Hadoop Administration
MapReduce APIs
Input Split & String Tokenizer in MapReduce
MapReduce Use Cases on Data sets

Advanced MapReduce Concepts

This module will help you to learn:

Job Submission & Monitoring
Counters
Distributed Cache
Map & Reduce Join
Data Compressors
Job Configuration
Record Reader

Pig

This module will help you to understand Pig Concepts:

Pig Architecture
Pig Installation
Pig Grunt shell
Pig Running Modes
Pig Latin Basics
Pig LOAD & STORE Operators[Type text]
Diagnostic Operators
- DESCRIBE Operator
- EXPLAIN Operator
- ILLUSTRATE Operator
- DUMP Operator
Grouping & Joining
- GROUP Operator
- COGROUP Operator
- JOIN Operator
- CROSS Operator
Combining & Splitting
- UNION Operator
- SPLIT Operator
Filtering
- FILTER Operator
- DISTINCT Operator
- FOREACH Operator
Sorting
- ORDERBYFIRST
- LIMIT Operator
Built in Fuctions
- EVAL Functions
- LOAD & STORE Functions
- Bag & Tuple Functions
- String Functions
- Date-Time Functions
- MATH Functions
Pig UDFs (User Defined Functions)
Pig Scripts in Local Mode
Pig Scripts in MapReduce Mode
Analysing XML Data using Pig
Pig Use Cases (Data Analysis on Social Media sites, Banking, Stock Market & Others)
Analysing JSON data using Pig
Testing Pig Sctipts

Hive

This module will build your concepts in learning:

Hive Installation
Hive Data types
Hive Architecture & Components
Hive Meta Store
Hive Tables(Managed Tables and External Tables)
Hive Partitioning & Bucketing
Hive Joins & Sub Query
Running Hive Scripts
Hive Indexing & View
Hive Queries (HQL); Order By, Group By, Distribute By, Cluster By, Examples
Hive Functions: Built-in & UDF (User Defined Functions)
Hive ETL: Loading JSON, XML, Text Data Examples
Hive Querying Data
Hive Tables (Managed & External Tables)
Hive Used Cases
Hive Optimization Techniques
- Partioning(Static & Dynamic Partition) & Bucketing
- Hive Joins > Map + BucketMap + SMB (SortedBucketMap) + Skew
- Hive FileFormats ( ORC+SEQUENCE+TEXT+AVRO+PARQUET)
- CBO
- Vectorization
- Indexing (Compact + BitMap)
- Integration with TEZ & Spark
Hive SerDer ( Custom + InBuilt)
Hive integration NoSQL (HBase + MongoDB + Cassandra)
Thrift API (Thrift Server)
UDF, UDTF & UDAF
Hive Multiple Delimiters
XML & JSON Data Loading HIVE.
Aggregation & Windowing Functions in Hive
Hive Connect with Tableau

Sqoop

Sqoop Installation
Loading Data form RDBMS using Sqoop
Sqoop Import & Import-All-Table
Fundamentals & Architecture of Apache Sqoop
Sqoop Job
Sqoop Codegen
Sqoop Incremental Import & Incremental Export
Sqoop Merge
Import Data from MySQL to Hive using Sqoop
Sqoop: Hive Import
Sqoop Metastore
Sqoop Use Cases
Sqoop- HCatalog Integration
Sqoop Script
Sqoop Connectors

Flume

This module will help you to learn Flume Concepts:

Flume Introduction
Flume Architecture
Flume Data Flow
Flume Configuration
Flume Agent Component Types
Flume Setup
Flume Interceptors
Multiplexing (Fan-Out), Fan-In-Flow
Flume Channel Selectors
Flume Sync Processors
Fetching of Streaming Data using Flume (Social Media Sites: YouTube, LinkedIn, Twitter)
Flume + Kafka Integration
Flume Use Cases

KAFKA

This module will help you to learn Kafka concepts:

Kafka Fundamentals
Kafka Cluster Architecture
Kafka Workflow
Kafka Producer, Consumer Architecture
Integration with SPARK
Kafka Topic Architecture
Zookeeper & Kafka
Kafka Partitions
Kafka Consumer Groups
KSQL (SQL Engine for Kafka)
Kafka Connectors
Kafka REST Proxy
Kafka Offsets

Oozie

This module will help you to understand Oozie concepts:

Oozie Introduction
Oozie Workflow Specification
Oozie Coordinator Functional Specification
Oozie H-catalog Integration
Oozie Bundle Jobs
Oozie CLI Extensions
Automate MapReduce, Pig, Hive, Sqoop Jobs using Oozie
Packaging & Deploying an Oozie Workflow Application

HBase

This module will help you to learn HBase Architecture:

HBase Architecture, Data Flow & Use Cases
Apache HBase Configuration
HBase Shell & general commands
HBase Schema Design
HBase Data Model
HBase Region & Master Server
HBase & MapReduce
Bulk Loading in HBase
Create, Insert, Read Tables in HBase
HBase Admin APIs
HBase Security
HBase vs Hive
Backup & Restore in HBase
Apache HBase External APIs (REST, Thrift, Scala)
HBase & SPARK
Apache HBase Coprocessors
HBase Case Studies
HBase Trobleshooting

Data Processing with Apache Spark

Spark executes in-memory data processing & how Spark Job runs faster then Hadoop MapReduce Job Course will also help you understand the Spark Ecosystem & it related APIs like Spark SQL, Spark Streaming, Spark MLib, Spark GraphX & Spark Core concepts as well This course will help you to understand Data Analytics & Machine Learning algorithms applying to various datasets to process & to analyze large amount of data

Spark RDDs
Spark RDDs Actions & Transformations
Spark SQL : Connectivity with various Relational sources & its convert it into Data Frame using Spark SQL
Spark Streamin
Understanding role of RDD
Spark Core concepts : Creating of RDDs: Parrallel RDDs, MappedRDD, HadoopRDD, JdbcRDD
Spark Architecture & Components

BIG DATA PROJECTS

Project #1: Working with MapReduce, Pig, Hive & Flume

Problem Statement : Fetch structured & unstructured data sets from various sources like Social Media Sites, Web Server & structured source like MySQL, Oracle & others and dump it into HDFS and then analyze the same datasets using PIG,HQL queries & MapReduce technologies to gain proficiency in Hadoop related stack & its ecosystem tools

Data Analysis Steps in :

Dump XML & JSON datasets into HDFS
Convert semi-structured data formats(JSON & XML) into structured format using Pig,Hive & MapReduce
Push the data set into PIG & Hive environment for further analysis
Writing Hive queries to push the output into relational database(RDBMS) using Sqoop
Renders the result in Box Plot, Bar Graph & others using R & Python integration with Hadoop

Project #2: Analyze Stock Market Data

Data : Data set contains stock information such as daily quotes ,Stock highest price, Stock opening price on New York Stock Exchange Problem Statement: Calculate Co-variance for stock data to solve storage & processing problems related to huge volume of data

Positive Covariance, If investment instruments or stocks tend to be up or down during the same time periods, they have positive covariance
Negative Co-variance, If return move inversely, If investment tends to be up while other is down, this shows Negative Co-variance

Project #3: Hive,Pig & MapReduce with New York City Uber Trips

Problem Statement: What was the busiest dispatch base by trips for a particular day on entire month?
What day had the most active vehicles
What day had the most trips sorted by most to fewest
Dispatching_Base_Number is the NYC taxi & Limousine company code of that base that dispatched the UBER
active_vehicles shows the number of active UBER vehicles for a particular date & company(base) Trips is the number of trips for a particular base & date

Project #4: Analyze Tourism Data

Data: Tourism Data comprises contains : City Pair, seniors travelling,children traveling, adult traveling, car booking price & air booking price Problem Statement: Analyze Tourism data to find out :

Top 20 destinations tourist frequently travel to: Based on given data we can find the most popular destinations where people travel frequently, based on the specific initial number of trips booked for a particular destination
Top 20 high air-revenue destinations, i.e the 20 cities that generate high airline revenues for travel, so that the discount offers can be given to attract more bookings for these destinations
Top 20 locations from where most of the trips start based on booked trip count

Project #5: Airport Flight Data Analysis : We will analyze Airport Information System data that gives information regarding flight delays, source & destination details diverted routes & others

Industry: Aviation Problem Statement: Analyze Flight Data to:
List of Delayed flights
Find flights with zero stop
List of Active Airlines all countries
Source & Destination details of flights
Reason why flight get delayed
Time in different formats

Project #6: Analyze Movie Ratings

Data: Movie data from sites like rotten tomatoes, IMDB, etc Problem Statement: Analyze the movie ratings by different users to:

Get the user who has rated the most number of movies
Get the user who has rated the least number of movies
Get the count of total number of movies rated by user belonging to a specific occupation
Get the number of underage users

Project #7: Analyze Social Media Channels :

Facebook
Twitter
Instagram
YouTube
Industry: Social Media
Data: DataSet Columns : VideoId, Uploader, Internal Day of establishment of You tube & the date of uploading of the video, Category, Length, Rating, Number of comments
Problem Statement: Top 5 categories with maximum number of videos uploaded
Problem Statement: Identify the top 5 categories in which the most number of videos are uploaded, the top 10 rated videos, and the top 10 most viewed videos
Apart from these there are some twenty more use-cases to choose: Twitter Data Analysis
Market data Analysis

Curriculum for this course

0 Lessons 00:00:00 Hours

+ View more

Description

You need online training / explanation for this course?

1 to 1 Online Training contact instructor for demo :

+ View more

Other related courses

About the instructor

0 Reviews
0 Students
140 Courses

+ View more

ducatittrainingschool staff

Student feedback

Average rating

Reviews

Material price :

Free

Get enrolled

1:1 Online Training Fee: 1 /-

Contact instructor for demo :

BIG DATA TRAINING COURSE

COURSE CURRICULUM

Pre-requisites for the Big Data Hadoop Training Course?

Apache Hadoop on AWS Cloud

Learning Big Data and Hadoop

Hadoop Architecture and HDFS

Hadoop MapReduce Framework

Advanced MapReduce Concepts

Pig

Hive

Sqoop

Flume

KAFKA

Oozie

HBase

Data Processing with Apache Spark

BIG DATA PROJECTS

Project #1: Working with MapReduce, Pig, Hive & Flume

Project #2: Analyze Stock Market Data

Project #3: Hive,Pig & MapReduce with New York City Uber Trips

Project #4: Analyze Tourism Data

Project #5: Airport Flight Data Analysis : We will analyze Airport Information System data that gives information regarding flight delays, source & destination details diverted routes & others

Project #6: Analyze Movie Ratings

Project #7: Analyze Social Media Channels :

Fill the form

Material price :

Online Training Request Form

Modal

Are you sure to delete this information ?