BIG DATA TRAINING COURSE

BIG DATA TRAINING COURSE course provide by ducat IT training School

Beginner 0(0 Ratings) 0 Students enrolled
Created by ducatittrainingschool staff Last updated Mon, 02-May-2022 English


BIG DATA TRAINING COURSE free videos and free material uploaded by ducatittrainingschool staff .

Syllabus / What will i learn?

COURSE CURRICULUM

Pre-requisites for the Big Data Hadoop Training Course?

There will be no pre-requisites but Knowledge of Java/ Python, SQL, Linux will be beneficial, but not mandatory Ducat provides a crash course for pre-requisites required to initiate Big Data training

Apache Hadoop on AWS Cloud

This module will help you understand how to configure Hadoop Cluster on AWS Cloud:

  • Introduction to Amazon Elastic MapReduce
  • AWS EMR Cluster
  • AWS EC2 Instance: Multi Node Cluster Configuration
  • AWS EMR Architecture
  • Web Interfaces on Amazon EMR
  • Amazon S3
  • Executing MapReduce Job on EC2 & EMR
  • Apache Spark on AWS, EC2 & EMR
  • Submitting Spark Job on AWS
  • Hive on EMR
  • Available Storage types: S3, RDS & DynamoDB
  • Apache Pig on AWS EMR
  • Processing NY Taxi Data using SPARK on Amazon EMR[Type text]

Learning Big Data and Hadoop

  • Common Hadoop ecosystem components
  • Hadoop Architecture
  • HDFS Architecture
  • Anatomy of File Write and Read
  • How MapReduce Framework works
  • Hadoop high level Architecture
  • MR2 Architecture
  • Hadoop YARN
  • Hadoop 2.x core components
  • Hadoop Distributions
  • Hadoop Cluster Formation

Hadoop Architecture and HDFS

This module will help you to understand Hadoop & HDFS Cluster Architecture:

  • Configuration files in Hadoop Cluster (FSimage & editlog file)
  • Setting up of Single & Multi node Hadoop Cluster
  • HDFS File permissions
  • HDFS Installation & Shell Commands
  • Deamons of HDFS
    • Node Manager
    • Resource Manager
    • NameNode
    • DataNode
    • Secondary NameNode
    • YARN Deamons
    • HDFS Read & Write Commands
    • NameNode & DataNode Architecture
    • HDFS Operations
    • Hadoop MapReduce Job
    • Executing MapReduce Job

Hadoop MapReduce Framework

This module will help you to understand Hadoop MapReduce framework:

  • How MapReduce works on HDFS data sets
  • MapReduce Algorithm
  • MapReduce Hadoop Implementation
  • Hadoop 2.x MapReduce Architecture
  • MapReduce Components
  • YARN Workflow
  • MapReduce Combiners
  • MapReduce Partitioners
  • MapReduce Hadoop Administration
  • MapReduce APIs
  • Input Split & String Tokenizer in MapReduce
  • MapReduce Use Cases on Data sets

Advanced MapReduce Concepts

This module will help you to learn:

  • Job Submission & Monitoring
  • Counters
  • Distributed Cache
  • Map & Reduce Join
  • Data Compressors
  • Job Configuration
  • Record Reader

Pig

This module will help you to understand Pig Concepts:

  • Pig Architecture
  • Pig Installation
  • Pig Grunt shell
  • Pig Running Modes
  • Pig Latin Basics
  • Pig LOAD & STORE Operators[Type text]
  • Diagnostic Operators
    • DESCRIBE Operator
    • EXPLAIN Operator
    • ILLUSTRATE Operator
    • DUMP Operator
  • Grouping & Joining
    • GROUP Operator
    • COGROUP Operator
    • JOIN Operator
    • CROSS Operator
  • Combining & Splitting
    • UNION Operator
    • SPLIT Operator
  • Filtering
    • FILTER Operator
    • DISTINCT Operator
    • FOREACH Operator
  • Sorting
    • ORDERBYFIRST
    • LIMIT Operator
  • Built in Fuctions
    • EVAL Functions
    • LOAD & STORE Functions
    • Bag & Tuple Functions
    • String Functions
    • Date-Time Functions
    • MATH Functions
  • Pig UDFs (User Defined Functions)
  • Pig Scripts in Local Mode
  • Pig Scripts in MapReduce Mode
  • Analysing XML Data using Pig
  • Pig Use Cases (Data Analysis on Social Media sites, Banking, Stock Market & Others)
  • Analysing JSON data using Pig
  • Testing Pig Sctipts

Hive

This module will build your concepts in learning:

  • Hive Installation
  • Hive Data types
  • Hive Architecture & Components
  • Hive Meta Store
  • Hive Tables(Managed Tables and External Tables)
  • Hive Partitioning & Bucketing
  • Hive Joins & Sub Query
  • Running Hive Scripts
  • Hive Indexing & View
  • Hive Queries (HQL); Order By, Group By, Distribute By, Cluster By, Examples
  • Hive Functions: Built-in & UDF (User Defined Functions)
  • Hive ETL: Loading JSON, XML, Text Data Examples
  • Hive Querying Data
  • Hive Tables (Managed & External Tables)
  • Hive Used Cases
  • Hive Optimization Techniques
    • Partioning(Static & Dynamic Partition) & Bucketing
    • Hive Joins > Map + BucketMap + SMB (SortedBucketMap) + Skew
    • Hive FileFormats ( ORC+SEQUENCE+TEXT+AVRO+PARQUET)
    • CBO
    • Vectorization
    • Indexing (Compact + BitMap)
    • Integration with TEZ & Spark
  • Hive SerDer ( Custom + InBuilt)
  • Hive integration NoSQL (HBase + MongoDB + Cassandra)
  • Thrift API (Thrift Server)
  • UDF, UDTF & UDAF
  • Hive Multiple Delimiters
  • XML & JSON Data Loading HIVE.
  • Aggregation & Windowing Functions in Hive
  • Hive Connect with Tableau

Sqoop

  • Sqoop Installation
  • Loading Data form RDBMS using Sqoop
  • Sqoop Import & Import-All-Table
  • Fundamentals & Architecture of Apache Sqoop
  • Sqoop Job
  • Sqoop Codegen
  • Sqoop Incremental Import & Incremental Export
  • Sqoop Merge
  • Import Data from MySQL to Hive using Sqoop
  • Sqoop: Hive Import
  • Sqoop Metastore
  • Sqoop Use Cases
  • Sqoop- HCatalog Integration
  • Sqoop Script
  • Sqoop Connectors

Flume

This module will help you to learn Flume Concepts:

  • Flume Introduction
  • Flume Architecture
  • Flume Data Flow
  • Flume Configuration
  • Flume Agent Component Types
  • Flume Setup
  • Flume Interceptors
  • Multiplexing (Fan-Out), Fan-In-Flow
  • Flume Channel Selectors
  • Flume Sync Processors
  • Fetching of Streaming Data using Flume (Social Media Sites: YouTube, LinkedIn, Twitter)
  • Flume + Kafka Integration
  • Flume Use Cases

KAFKA

This module will help you to learn Kafka concepts:

  • Kafka Fundamentals
  • Kafka Cluster Architecture
  • Kafka Workflow
  • Kafka Producer, Consumer Architecture
  • Integration with SPARK
  • Kafka Topic Architecture
  • Zookeeper & Kafka
  • Kafka Partitions
  • Kafka Consumer Groups
  • KSQL (SQL Engine for Kafka)
  • Kafka Connectors
  • Kafka REST Proxy
  • Kafka Offsets

Oozie

This module will help you to understand Oozie concepts:

  • Oozie Introduction
  • Oozie Workflow Specification
  • Oozie Coordinator Functional Specification
  • Oozie H-catalog Integration
  • Oozie Bundle Jobs
  • Oozie CLI Extensions
  • Automate MapReduce, Pig, Hive, Sqoop Jobs using Oozie
  • Packaging & Deploying an Oozie Workflow Application

HBase

This module will help you to learn HBase Architecture:

  • HBase Architecture, Data Flow & Use Cases
  • Apache HBase Configuration
  • HBase Shell & general commands
  • HBase Schema Design
  • HBase Data Model
  • HBase Region & Master Server
  • HBase & MapReduce
  • Bulk Loading in HBase
  • Create, Insert, Read Tables in HBase
  • HBase Admin APIs
  • HBase Security
  • HBase vs Hive
  • Backup & Restore in HBase
  • Apache HBase External APIs (REST, Thrift, Scala)
  • HBase & SPARK
  • Apache HBase Coprocessors
  • HBase Case Studies
  • HBase Trobleshooting

Data Processing with Apache Spark

Spark executes in-memory data processing & how Spark Job runs faster then Hadoop MapReduce Job Course will also help you understand the Spark Ecosystem & it related APIs like Spark SQL, Spark Streaming, Spark MLib, Spark GraphX & Spark Core concepts as well This course will help you to understand Data Analytics & Machine Learning algorithms applying to various datasets to process & to analyze large amount of data

  • Spark RDDs
  • Spark RDDs Actions & Transformations
  • Spark SQL : Connectivity with various Relational sources & its convert it into Data Frame using Spark SQL
  • Spark Streamin
  • Understanding role of RDD
  • Spark Core concepts : Creating of RDDs: Parrallel RDDs, MappedRDD, HadoopRDD, JdbcRDD
  • Spark Architecture & Components

BIG DATA PROJECTS

Project #1: Working with MapReduce, Pig, Hive & Flume

Problem Statement : Fetch structured & unstructured data sets from various sources like Social Media Sites, Web Server & structured source like MySQL, Oracle & others and dump it into HDFS and then analyze the same datasets using PIG,HQL queries & MapReduce technologies to gain proficiency in Hadoop related stack & its ecosystem tools

Data Analysis Steps in :

  • Dump XML & JSON datasets into HDFS
  • Convert semi-structured data formats(JSON & XML) into structured format using Pig,Hive & MapReduce
  • Push the data set into PIG & Hive environment for further analysis
  • Writing Hive queries to push the output into relational database(RDBMS) using Sqoop
  • Renders the result in Box Plot, Bar Graph & others using R & Python integration with Hadoop    

Project #2: Analyze Stock Market Data

Data : Data set contains stock information such as daily quotes ,Stock highest price, Stock opening price on New York Stock Exchange Problem Statement: Calculate Co-variance for stock data to solve storage & processing problems related to huge volume of data

  • Positive Covariance, If investment instruments or stocks tend to be up or down during the same time periods, they have positive covariance
  • Negative Co-variance, If return move inversely, If investment tends to be up while other is down, this shows Negative Co-variance

Project #3: Hive,Pig & MapReduce with New York City Uber Trips

  • Problem Statement: What was the busiest dispatch base by trips for a particular day on entire month?
  • What day had the most active vehicles
  • What day had the most trips sorted by most to fewest
  • Dispatching_Base_Number is the NYC taxi & Limousine company code of that base that dispatched the UBER
  • active_vehicles shows the number of active UBER vehicles for a particular date & company(base) Trips is the number of trips for a particular base & date

Project #4: Analyze Tourism Data

Data: Tourism Data comprises contains : City Pair, seniors travelling,children traveling, adult traveling, car booking price & air booking price Problem Statement: Analyze Tourism data to find out :

  • Top 20 destinations tourist frequently travel to: Based on given data we can find the most popular destinations where people travel frequently, based on the specific initial number of trips booked for a particular destination
  • Top 20 high air-revenue destinations, i.e the 20 cities that generate high airline revenues for travel, so that the discount offers can be given to attract more bookings for these destinations
  • Top 20 locations from where most of the trips start based on booked trip count

Project #5: Airport Flight Data Analysis : We will analyze Airport Information System data that gives information regarding flight delays, source & destination details diverted routes & others

  • Industry: Aviation Problem Statement: Analyze Flight Data to:
  • List of Delayed flights
  • Find flights with zero stop
  • List of Active Airlines all countries
  • Source & Destination details of flights
  • Reason why flight get delayed
  • Time in different formats

Project #6: Analyze Movie Ratings

Data: Movie data from sites like rotten tomatoes, IMDB, etc Problem Statement: Analyze the movie ratings by different users to:

  • Get the user who has rated the most number of movies
  • Get the user who has rated the least number of movies
  • Get the count of total number of movies rated by user belonging to a specific occupation
  • Get the number of underage users

Project #7: Analyze Social Media Channels :

  • Facebook
  • Twitter
  • Instagram
  • YouTube
  • Industry: Social Media
  • Data: DataSet Columns : VideoId, Uploader, Internal Day of establishment of You tube & the date of uploading of the video, Category, Length, Rating, Number of comments
  • Problem Statement: Top 5 categories with maximum number of videos uploaded
  • Problem Statement: Identify the top 5 categories in which the most number of videos are uploaded, the top 10 rated videos, and the top 10 most viewed videos
  • Apart from these there are some twenty more use-cases to choose: Twitter Data Analysis
  • Market data Analysis


Curriculum for this course
0 Lessons 00:00:00 Hours
+ View more
Description
You need online training / explanation for this course?

1 to 1 Online Training contact instructor for demo :


+ View more

Other related courses
About the instructor
  • 0 Reviews
  • 0 Students
  • 140 Courses
Student feedback
0
Average rating
  • 0%
  • 0%
  • 0%
  • 0%
  • 0%
Reviews

Material price :

Free

1:1 Online Training Fee: 1 /-
Contact instructor for demo :