Way2IT4U – Big Data Hadoop Online Training
Why Way2IT4U Big Data – Hadoop Online Training
Way2IT4U – Big Data Hadoop Online Training: Big Data refers to the analysis of large data sets to find trends, correlations or other insights not visible with smaller data sets or traditional processing methods. The exponential growth of internet-connected devices and sensors is a major contributor to the massive data and the storage, processing and analysis can require hundreds or thousands of computers. An example of big data in use is in the development of the autonomous vehicle. The sensors on self-driving vehicles are capturing millions of data points that can be analyzed to help improve performance and avoid accidents.
SETUP – Hadoop, Spark, Kafka and NoSQL Environment
- Install and configure Virtual Box
- Load and configure RHEL based Virtual Machine
- Install/Configure VM with basic software’s
- User setup and Database account creation
- Configure SSH and checks ports availability
Hadoop – HDFS (Hadoop – Distributed File System)
- Hadoop Distributed file system, Background, GFS
- HDFS config files – core,hdfs, mapred site xmls
- Data Replication – Static and Dynamic configuration
- Data Storage – Block Size details
- HDFS – DFS shell commands
- HDFS -Admin commands and data recovery
Hadoop – MapReduce Framework
- MapReduce Introduction
- Writing MapReduce Programs
- Mappers and Reducers details
- Running MR jobs
- Configure custom Map and Reduce jobs.
Hadoop – Apache HIVE
- Hive Installation and Meta store setup
- Hive Shell commands
- Hive QL Basics
- Hive Local and MR mode data load Working with Tables, Databases etc.
- Hands on Exercises and Assignments
Spark – Spark Installation and Introduction
- Apache Spark Installation (version 2.x) Spark shell and Pyspark shell setup.
- Spark Executor cores and Executors setup
- Spark configurations for logs .
- Writing UDF (user defined functions)
Spark– Scala Installation and Introduction
- Scala Installation (version 2.x)
- Scala setup for Spark environment
- Scala based Spark exercise
Spark – Resilient Distributed Datasets (RDD)
- Working with RDDs in Spark
- Creating RDDs from scratch
- Creating RDD from preexisting data
- Accumulators and Broadcast variables
- RDD – Transformations commands
- RDD – Actions commands
- RDD complex exercises
Spark – Spark SQL and Data Frames
- Spark SQL and the SQL Context
- Creating DataFrames from raw datasets
- Transforming and Querying DataFrames
- Using csv files and mapping schema
- Using case structures and user defined data types
Spark – Spark Mlib (Machine Learning)
- Basic Principles of Machine Learning
- Supervised and Unsupervised Learnings
- Setup Machine Learning for Spark Transformations, Correlation Algorithm.
- Exercise for Regression , Correlation.
Kafka– Apache Kafka
- Introduction to Apache Kafka
- Identifying the major Kafka components
- Determining what data is appropriate for use with Kafka
- Developing with Kafka producers, consumers, and brokers
Kafka– Installation and Labs
- Kafka Features and terminologies
- High level Kafka architecture Kafka Installation in Linux/Windows.
- Install Kafka Zookeeper
- Install Kafka Server
Kafka– Consumer, Producer and Topics
- Writing Kafka Consumer Labs
- Create Kafka Messages
- Create Kafka Topics
- Message structure and topic configuration
- Write Kafka Producer
- Configure Producer and Kafka Server
- Kafka Multi Broker Configuration
NoSQL– Introduction and Details
- NoSQL databases introduction
- Types of NoSQL databases – MongoDB, Cassandra, Couch DB
- Use cases for NoSQL databases
- Document DB types
- Comparison with RDBMS
NoSQL– MongoDB
- MongoDB installation on Linux/windows box
- Mongo Demon threads
- Mongo Shell configuration
- Mongo collection creation
- Mongo data load in collections
NoSQL – Mongo Query Language
- MongoDB query language
- Mongo create() , update() and delete() query
- Mongo find() query
Study Materials and Labs
Complete Virtual Machine is shared with students. It has Java , Oracle DB , Mozilla
Firefox and other components pre-installed
The VM can be used even after the training is DONE. Please note it’s NOT a remote lab type environment. You will be able to keep the VM and all labs even after the training is completed