Introduction to Big Data for Practitioners is a two-day course in Big Data technologies designed for end users of Big Data – Data Scientists, analysts, and technical managers. When you have competed this course you will be ready to tackle big data problems using a variety of different, powerful technologies.
Who is this class for?
- System architects
- Data scientists
- Technical Managers
What skills are required?
- Understanding or relational technologies, SQL, and data management
- Understanding of data analysis and business analysis
What does this class cover?
- Big Data concepts – intention, history, current state, product family
- Business drivers – what is driving this market and how is this technology being used.
- And overview of Hadoop
- The Hadoop Distributed File System
- How Hadoop processes data
- Storing and managing data using HDFS
- Using Pig to process and analyze data
- Using Hive to process and analyze data
- Using HBase to store data
- Using Hue to build and manage your data processing tasks
- Using Sqoop to transfer data to and from HDFS/HBase and relational databases
What you will learn:
Once this training is completed, the students should be able to understand what “Big Data” means, the history, drivers, and technologies behind Hadoop. They will become familiar with the Hadoop Distributed File System (HDFS) and become able to manage HDFS data. They students will understand the concepts of MapReduce and will be able to browse MapReduce jobs with the job tracker Web UI. Furthermore, the students will be able to understand how to use Pig and Hive for managing data within the Hadoop system. The students will understand the NoSQL database paradigm and learn how to use HBase for efficient data storage. They will learn how to integrate existing data into Hadoop using Sqoop. The student will understand how to use Hue to perform most of the tasks that they have just learned. Finally, the students will come out of the class with the understanding of the bigger picture of what Hadoop and Big data encompass: not just data management and access, but how to solve problems.