SETUP – Hadoop, Spark, Kafka and NoSQL Environment
- Install and configure Virtual Box
- Load and configure RHEL based Virtual Machine
- Install/Configure VM with basic software’s
- User setup and Database account creation
- Configure SSH and checks ports availability
Hadoop – HDFS (Hadoop – Distributed File System)
- Hadoop Distributed file system, Background, GFS
- HDFS config files – core,hdfs, mapred site xmls
- Data Replication – Static and Dynamic configuration
- Data Storage – Block Size details
- HDFS – DFS shell commands
- HDFS -Admin commands and data recovery
Hadoop – MapReduce Framework
- MapReduce Introduction
- Writing MapReduce Programs
- Mappers and Reducers details
- Running MR jobs
- Configure custom Map and Reduce jobs.
Hadoop – Apache HIVE
- Hive Installation and Meta store setup
- Hive Shell commands
- Hive QL Basics
- Hive Local and MR mode data load Working with Tables, Databases etc.
- Hands on Exercises and Assignments
Spark – Spark Installation and Introduction
- Apache Spark Installation (version 2.x) Spark shell and Pyspark shell setup.
- Spark Executor cores and Executors setup
- Spark configurations for logs .
- Writing UDF (user defined functions)
Spark– Scala Installation and Introduction
- Scala Installation (version 2.x)
- Scala setup for Spark environment
- Scala based Spark exercise
Spark – Resilient Distributed Datasets (RDD)
- Working with RDDs in Spark
- Creating RDDs from scratch
- Creating RDD from preexisting data
- Accumulators and Broadcast variables
- RDD – Transformations commands
- RDD – Actions commands
- RDD complex exercises
Spark – Spark SQL and Data Frames
- Spark SQL and the SQL Context
- Creating DataFrames from raw datasets
- Transforming and Querying DataFrames
- Using csv files and mapping schema
- Using case structures and user defined data types
Spark – Spark Mlib (Machine Learning)
- Basic Principles of Machine Learning
- Supervised and Unsupervised Learnings
- Setup Machine Learning for Spark Transformations, Correlation Algorithm.
- Exercise for Regression , Correlation.
Kafka– Apache Kafka
- Introduction to Apache Kafka
- Identifying the major Kafka components
- Determining what data is appropriate for use with Kafka
- Developing with Kafka producers, consumers, and brokers
Kafka– Installation and Labs
- Kafka Features and terminologies
- High level Kafka architecture Kafka Installation in Linux/Windows.
- Install Kafka Zookeeper
- Install Kafka Server
Kafka– Consumer, Producer and Topics
- Writing Kafka Consumer Labs
- Create Kafka Messages
- Create Kafka Topics
- Message structure and topic configuration
- Write Kafka Producer
- Configure Producer and Kafka Server
- Kafka Multi Broker Configuration
NoSQL– Introduction and Details
- NoSQL databases introduction
- Types of NoSQL databases – MongoDB, Cassandra, Couch DB
- Use cases for NoSQL databases
- Document DB types
- Comparison with RDBMS
NoSQL– MongoDB
- MongoDB installation on Linux/windows box
- Mongo Demon threads
- Mongo Shell configuration
- Mongo collection creation
- Mongo data load in collections
NoSQL – Mongo Query Language
- MongoDB query language
- Mongo create() , update() and delete() query
- Mongo find() query
Study Materials and Labs
Complete Virtual Machine is shared with students. It has Java , Oracle DB , Mozilla
Firefox and other components pre-installed
The VM can be used even after the training is DONE. Please note it’s NOT a remote lab type environment. You will be able to keep the VM and all labs even after the training is completed