Apache Spark with Java 8 Training Provided by learntek
Apache Spark with Java 8 free videos and free material uploaded by learntek staff .
Apache Spark with Java – Overview of Java8
Overview of Interface, Static method and Default method in
interface
Anonymous Inner Classes
Introduction to Lambda Expressions
Functional Interface, type inference
Method references
Composing Lambda
Understanding Closure
Overview of Streams
Working with Streams
Infinite Streams
Apache Spark with java – Introduction to Spark
Introduction to Big Data
Big Data Problem
Scale-Up Vs Scale-Out Architecture
Characteristics of Scale-Out
Introduction to Hadoop, Map-Reduce and HDFS
Introducing Spark
Hortonworks Data Platform (HDP) using Virtual box
Importing HDP VM image using Virtual box on local machine
Configuring HDP
Overview of Ambari and its components
Overview of services configuration using Ambari
Overview of Apache Zeppelin
Creating, importing and executing notebooks in Apache
Zeppelin
IDEs for Spark Applications
Intellij
Eclipse
Resolving dependencies for Spark applications
Spark Basics
Spark Shell
Overview of Spark architecture
Storage layers for Spark
Initialize a Spark Context and building applications
Submitting a Spark Application
Use of Spark History Server
Spark Components
Spark Driver Process
Spark Executor
Spark Conf and Spark Context
SparkSession object
Overview of spark-submit command
Spark UI
RDDs
Overview of RDD
RDD and Partitions
Ways of Creating RDD
RDD transformations and Actions
Lazy evaluation
RDD Lineage Graph (DAG)
Element wise transformations
Map Vs FlatMap Transformation
Set Transformation
RDD Actions
Overview of RDD persistence
Methods for persisting RDD
Persisting RDD with Storage option
Illustration of Caching on an RDD in DAG
Removal of Cached RDD
Pair RDDs
Overview of Key-Value Pair RDD
Ways of creating Pair RDDs
Transformations on Pair RDD
ReduceByKey(), FoldByKey(),MapValues(),
FlatMapValues(),keys() and Values() Transformation
Grouping, Joining, Sorting on Pair RDD
ReduceByKey() Vs GroupByKey()
Pair RDD Action
Launching Spark on cluster
Configure and launch Spark Cluster on Google Cloud
Configure and launch Spark Cluster on Microsoft Azure
Logging and Debugging a Spark Application
Setting up a window environment for executing Spark
Application using IDE
Steps of using slf4j logging mechanism in Spark Application
Attaching a debugger to Spark Application
Example of debugging a Spark application running inside a
cluster
Spark Application Architecture
Spark Application Distributed Architecture
Spark Application submission Mode
Overview of Cluster Manager
Example of using Standalone Cluster Manager
Driver and its responsibilities
Overview of Job, Stage and Tasks
Spark Job Hierarchy
Executor
Spark-submit command and various submission options
Yarn Cluster Manager
Yarn Architecture
Client and Cluster Deploy-mode
Advance concepts in Spark
Accumulator
Broadcast
RDD partitioning
Re-partition RDD
Determining RDD partitioner
Partition based RDD like mapPartitions,
mapPartitionsWithIndex,
mapPartitionsToPair
Spark SQL
Introduction to SparkSQL
Creating SparkSession with Hive Support
DataFrame
Ways of Creating DataFrame
Registering a DataFrame as View
DataFrame Transformations API
DataFrame SQL statement
Aggregate Operations
DataFrame Action
Catalyst Optimizer
Limitation of DataFrame
Introduction to Dataset
Introduction to Encoder
Creating Dataset
Functional transformation on Dataset
Loading CSV, JSON, Parquet format file in SparkSQL
Loading and saving data from/in Hive, JDBC, HDFS, Cassandra
Introduction to User-Defined-Function (UDF)
Customizing a UDF
Usage of UDF in DataFrame Transformations API
Usage of UDF in Spark SQL statement
Introduction to Window Function
Steps of defining a window function
Illustration of Window function usage
Introduction to UDAF
Customizing a UDAF
Illustration of customized UDAF usage
Basic Spark Streaming
Introduction to data streaming
Spark Streaming framework
Spark Streaming and Micro batch
Introduction of DStreams
DStreams and RDD
Word Count example using Socket Text Stream
streaming with Twitter feeds
Setting up a Twitter App
Resolving Twitter dependency in Spark Streaming Application
Steps of creating Uber Jar
Example of extracting hashtags from tweet data
Troubleshooting Twitter Streaming issue in Spark Application
Steps of creating Spark Streaming Application
Architecture of Spark Streaming
Stateless Transformations
Twitter Streaming examples using stateless transformation
Introduction to stateful Transformations
Window Transformations
Window Duration and Slide Duration
Window Operations
Naive and inverse window reduce operation
Checkpoint
Tracking State of an event using updateStateByKey operation
Interact directly with RDD using transform () operation
Example of HDFS file streaming
Example of Spark-Kafka interaction
Saving DStreams to external file system
Apache Spark with Java 8 Training :Spark was introduced by
Apache Software Foundation for speeding up the Hadoop software computing
process.
The main feature of Spark is its in-memory cluster
computing that highly increases the speed of an application
processing.
Spark is designed to cover a wide range of workloads such as
batch applications, iterative algorithms, interactive queries and streaming
applications by reducing the management burden of maintaining separate tools.
Apache Spark also have the following features.
Write a public review