Way2IT4U – Big Data Hadoop Online Training

machine-learning-online-training-hyderabad

Why Way2IT4U Big Data – Hadoop Online Training

Way2IT4U – Big Data Hadoop Online Training: Big Data refers to the analysis of large data sets to find trends, correlations or other insights not visible with smaller data sets or traditional processing methods. The exponential growth of internet-connected devices and sensors is a major contributor to the massive data and the storage, processing and analysis can require hundreds or thousands of computers. An example of big data in use is in the development of the autonomous vehicle. The sensors on self-driving vehicles are capturing millions of data points that can be analyzed to help improve performance and avoid accidents.

Table of Contents

SETUP – Hadoop, Spark, Kafka and NoSQL Environment

  • Install and configure Virtual Box
  • Load and configure RHEL based Virtual Machine
  • Install/Configure VM with basic software’s
  • User setup and Database account creation
  • Configure SSH and checks ports availability

Hadoop – HDFS (Hadoop – Distributed File System)

  • Hadoop Distributed file system, Background, GFS
  • HDFS config files – core,hdfs, mapred site xmls
  • Data Replication – Static and Dynamic configuration
  • Data Storage – Block Size details
  • HDFS – DFS shell commands
  • HDFS -Admin commands and data recovery

Hadoop – MapReduce Framework

  • MapReduce Introduction
  • Writing MapReduce Programs
  • Mappers and Reducers details
  • Running MR jobs
  • Configure custom Map and Reduce jobs.

Hadoop – Apache HIVE

  • Hive Installation and Meta store setup
  • Hive Shell commands
  • Hive QL Basics
  • Hive Local and MR mode data load Working with Tables, Databases etc.
  • Hands on Exercises and Assignments

Spark – Spark Installation and Introduction

  • Apache Spark Installation (version 2.x) Spark shell and Pyspark shell setup.
  • Spark Executor cores and Executors setup
  • Spark configurations for logs .
  • Writing UDF (user defined functions)

Spark– Scala Installation and Introduction

  • Scala Installation (version 2.x)
  • Scala setup for Spark environment
  • Scala based Spark exercise

Spark – Resilient Distributed Datasets (RDD)

  • Working with RDDs in Spark
  • Creating RDDs from scratch
  • Creating RDD from preexisting data
  • Accumulators and Broadcast variables
  • RDD – Transformations commands
  • RDD – Actions commands
  • RDD complex exercises

Spark – Spark SQL and Data Frames

  • Spark SQL and the SQL Context
  • Creating DataFrames from raw datasets
  • Transforming and Querying DataFrames
  • Using csv files and mapping schema
  • Using case structures and user defined data types

Spark – Spark Mlib (Machine Learning)

  • Basic Principles of Machine Learning
  • Supervised and Unsupervised Learnings
  • Setup Machine Learning for Spark Transformations, Correlation Algorithm.
  • Exercise for Regression , Correlation.

Kafka– Apache Kafka

  • Introduction to Apache Kafka
  • Identifying the major Kafka components
  • Determining what data is appropriate for use with Kafka
  • Developing with Kafka producers, consumers, and brokers

Kafka– Installation and Labs

  • Kafka Features and terminologies
  • High level Kafka architecture Kafka Installation in Linux/Windows.
  • Install Kafka Zookeeper
  • Install Kafka Server

Kafka– Consumer, Producer and Topics

  • Writing Kafka Consumer Labs
  • Create Kafka Messages
  • Create Kafka Topics
  • Message structure and topic configuration
  • Write Kafka Producer
  • Configure Producer and Kafka Server
  • Kafka Multi Broker Configuration

NoSQL– Introduction and Details

  • NoSQL databases introduction
  • Types of NoSQL databases – MongoDB, Cassandra, Couch DB
  • Use cases for NoSQL databases
  • Document DB types
  • Comparison with RDBMS

NoSQL– MongoDB

  • MongoDB installation on Linux/windows box
  • Mongo Demon threads
  • Mongo Shell configuration
  • Mongo collection creation
  • Mongo data load in collections

NoSQL – Mongo Query Language  

  • MongoDB query language
  • Mongo create() , update() and delete() query
  • Mongo find() query

Study Materials and Labs

Complete Virtual Machine is shared with students. It has Java , Oracle DB , Mozilla

Firefox and other components pre-installed

The VM can be used even after the training is DONE. Please note it’s NOT a remote lab type environment. You will be able to keep the VM and all labs even after the training is completed