Success of many organizations depends on their ability to derive business insights from massive amount of raw data coming from various sources. Apache Hadoop is a proven production-ready platform for large-scale data processing that meets most demanding technical and business requirements. This intensive training course provides theoretical and technical aspects of programming using Hadoop-centric systems, emphasizing in Hive. The course is supplemented by hands-on labs that help attendees reinforce their theoretical knowledge of the learned material

  • Good knowledge of SQL
  • Familiarity with general database concepts
}

16 Hours

Data Management

h

Certificate: No

Price: contact us for more details

Don't Be Shy

Leave your details and one of our customer service representatives will respond to you as soon as possible

Course Outline

  • Module 1: Introduction to Hadoop
    • Hadoop vs. traditional data storage and processing
    • The Hadoop ecosystem
    • Discuss and explain the various projects
    • Open source VS commercial distributions
    • Hadoop Distributors: Cloudera , Horton works, MapR

    Module 2: Becoming familiar with Hadoop and HDFS

    • General concepts
    • The building blocks of Hadoop
    • Working with HDFS
    • Configuration files
    • Performance considerations
    • Hands on lab
    • HDFS commands
    • Writing and reading data

  • Module 3: Big Data Analytics over Hadoop – Hive
    • Intro to Hive and Impala
    • Features
    • Data Definition Language
    • Data Manipulation Language
    • Querying with Hive and Impala
    • Hands on lab
    • Using DDL, DML and SQL

    Module 4: Big Data Analytics over Hadoop – Continue

    • Analyzing complex data and text
    • Extending Hive
    • Hands on lab
    • Complex scenarios