Cloudera - Hadoop Training

Register Now

Collier IT provides certified Cloudera training in partnership with Cloudera University. Our courses provide in-depth, hands-on training in a variety of course delivery modalities. All courses are taught by instructors that have been certified by Cloudera. We provide training for Hadoop Administrators., Hadoop Developers and Big Data Analysts.

Collier IT Cloudera Training

Cloudera Administrator Training for Apache Hadoop

Cloudera University’s four-day administrator course provides the technical background you need to manage and scale a Hadoop cluster in a development or production environment. This is the core administrator learning path curriculum.

Duration: 4 Days

Overview

Take your knowledge to the next level with Cloudera’s Apache Hadoop Training and Certification Cloudera University’s four-day administrator training course for Apache Hadoop provides participants with a comprehensive understanding of all the steps necessary to operate and maintain a Hadoop cluster using Cloudera Manager. From installation and configuration through load balancing and tuning, Cloudera’s training course is the best preparation for the real-world challenges faced by Hadoop administrators.

Hands-On Hadoop

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:

• The internals of YARN, MapReduce, and HDFS
• Determining the correct hardware and infrastructure for your cluster
• Proper cluster configuration and deployment to integrate with the data center
• How to load data into the cluster from dynamically-generated files using Flume and from RDBMS using Sqoop
• Configuring the FairScheduler to provide service-level agreements for multiple users of a cluster
• Best practices for preparing and maintaining Apache Hadoop in production
• Troubleshooting, diagnosing, tuning, and solving Hadoop issues

Audience & Prerequisites

This course is best suited to systems administrators and IT managers who have basic Linux experience. Prior knowledge of Apache Hadoop is not required.

Register Now

Have a Question? - Contact Instructor

Cloudera Developer Training for Spark and Hadoop

Scala and Python developers new to Hadoop will learn key concepts and expertise participants need to ingest and process data on a Hadoop cluster using the most up-to-date tools and techniques, including Apache Spark, Impala, Hive, Flume, and Sqoop.

Duration: 4 Days

Overview

Learn how to import data into your Apache Hadoop cluster and process it with Spark, Hive, Flume, Sqoop, Impala, and other Hadoop ecosystem tools This four-day hands-on training course delivers the key concepts and expertise participants need to ingest and process data on a Hadoop cluster using the most up-to-date tools and techniques. Employing Hadoop ecosystem projects such as Spark, Hive, Flume, Sqoop, and Impala, this training course is the best preparation for the real-world challenges faced by Hadoop developers. Participants learn to identify which tool is the right one to use in a given situation, and will gain hands-on experience in developing using those tools.

Hands-On Hadoop

Through instructor-led discussion and interactive, hands-on exercises, participants will learn Apache Spark and how it integrates with the entire Hadoop ecosystem, including:

• How data is distributed, stored, and processed in a Hadoop cluster
• How to use Sqoop and Flume to ingest data
• How to process distributed data with Apache Spark
• How to model structured data as tables in Impala and Hive
• How to choose the best data storage format for different data usage patterns
• Best practices for data storage

Audience & Prerequisites

This course is designed for developers and engineers who have programming experience. Apache Spark examples and hands-on exercises are presented in Scala and Python, so the ability to program in one of those languages is required. Basic familiarity with the Linux command line is assumed. Basic knowledge of SQL is helpful; prior knowledge of Hadoop is not required.

Register Now

Have a Question? - Contact Instructor

Cloudera Develeper Search Training

Developers with Hadoop experience who want to index data in Hadoop for more powerful real-time queries using Cloudera Search with external applications.

Duration: 3 Days

Overview

Take Your Knowledge To The Next Level And Solve Real-World Problems With Training For Hadoop And The Enterprise Data Hub Cloudera University’s three-day Search training course is for developers and data engineers who want to index data in Hadoop for more powerful real-time queries. Participants will learn to get more value from their data by integrating Cloudera Search with external applications.

Hands-On Hadoop

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:

• Performing batch indexing of data stored in HDFS and HBase
• Indexing streaming data in near-real-time with Flume
• How to index content in multiple languages and file formats • Creating a user interface for an index using Hue
• Integrating Cloudera Search with external applications
• Improving the experience using faceting, highlighting, and spelling correction

Audience & Prerequisites

This course is intended for developers and data engineers with at least basic familiarity with Hadoop and experience programming in a general-purpose language such as Java, C, C++, Perl, or Python. Participants should be comfortable with the Linux command line and should be able to perform basic tasks such as creating and removing directories, viewing and changing file permissions, executing scripts, and examining file output. No prior experience with Apache Solr or Cloudera Search is required, nor is any experience with HBase or SQL.

Advance Your Ecosystem Expertise

Cloudera Search brings full-text, interactive search and scalable, flexible indexing to Hadoop and an enterprise data hub. Powered by Apache Solr, Search delivers scale and reliability for a new generation of integrated, multi-workload queries.

Register Now

Have a Question? - Contact Instructor

Cloudera Developer Training for Apache HBase

Cloudera University’s three-day HBase course enables participants to store and access massive quantities of multi-structured data and perform hundreds of thousands of operations per second. This course is part of both the developer learning path and the administrator learning path.

Duration: 3 Days

Overview

Take your knowledge to the next level with Cloudera’s Apache Hadoop Training and Certification Cloudera University’s three-day training course for Apache HBase enables participants to store and access massive quantities of multi-structured data and perform hundreds of thousands of operations per second.

Hands-On Hadoop

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:

• The use cases and usage occasions for HBase, Hadoop, and RDBMS
• Using the HBase shell to directly manipulate HBase tables
• Designing optimal HBase schemas for efficient data storage and recovery
• How to connect to HBase using the Java API to insert and retrieve data in real time
• Best practices for identifying and resolving performance bottlenecks

Audience & Prerequisites

This course is appropriate for developers and administrators who intend to use HBase. Prior experience with databases and data modeling is helpful, but not required. Knowledge of Java is assumed. Prior knowledge of Hadoop is not required, but Cloudera Developer Training for Apache Hadoop provides an excellent foundation for this course.

Advance Your Ecosystem Expertise

Apache HBase is a distributed, scalable, NoSQL database for big data built on Hadoop. HBase can store data in massive tables consisting of billions of rows and millions of columns, serve data to many users in near real time, and provide fast, random read/write access to applications.

Register Now

Have a Question? - Contact Instructor

Data Science at Scale using Spark and Hadoop

Cloudera University’s three-day introduction to data science develops the skills required to build information platforms and analytical tools that reduce costs, increase profits, improve products, retain customers, and identify new opportunities. This course is part of both the developer learning path and the data analyst learning path.

Duration: 3 Days

Overview

Take Your Knowledge to the Next Level with Cloudera’s Data Science Training and Certification Data scientists build information platforms to provide deep insight and answer previously unimaginable questions. Spark and Hadoop are transforming how data scientists work by allowing interactive and iterative data analysis at scale. Learn how Spark and Hadoop enable data scientists to help companies reduce costs, increase profits, improve products, retain customers, and identify new opportunities. Cloudera University’s three-day course helps participants understand what data scientists do, the problems they solve, and the tools and techniques they use. Through in-class simulations, participants apply data science methods to real-world challenges in different industries and, ultimately, prepare for data scientist roles in the field.

Hands-On Hadoop

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:

• How to identify potential business use cases where data science can provide impactful results
• How to obtain, clean and combine disparate data sources to create a coherent picture for analysis
• What statistical methods to leverage for data exploration that will provide critical insight into your data
• Where and when to leverage Hadoop streaming and Apache Spark for data science pipelines
• What machine learning technique to use for a particular data science project
• How to implement and manage recommenders using Spark’s MLlib, and how to set up and evaluate data experiments
• What are the pitfalls of deploying new analytics projects to production, at scale

Audience & Prerequisites

This course is best suited to developers, data analysts, and statisticians with basic knowledge of Apache Hadoop: HDFS, MapReduce, Hadoop Streaming, and Apache Hive. Students should have proficiency in a scripting language; Python is strongly preferred, but familiarity with Perl or Ruby is sufficient.

Register Now

Have a Question? - Contact Instructor

Cloudera Data Analyst Training

Cloudera University’s four-day data analyst course is for anyone who wants to access, manipulate, transform, and analyze massive data sets in the Hadoop cluster using SQL and familiar scripting languages. This is the core curriculum in the data analyst learning path.

Duration: 4 Days

Overview

Take your knowledge to the next level with Cloudera’s Apache Hadoop Training Cloudera University’s four-day data analyst training course focusing on Apache Pig and Hive and Cloudera Impala will teach you to apply traditional data analytics and business intelligence skills to big data. Cloudera presents the tools data professionals need to access, manipulate, transform, and analyze complex data sets using SQL and familiar scripting languages.

Hands-On Hadoop

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:

• The features that Pig, Hive, and Impala offer for data acquisition, storage, and analysis
• The fundamentals of Apache Hadoop and data ETL (extract, transform, load), ingestion, and processing with Hadoop
• How Pig, Hive, and Impala improve productivity for typical analysis tasks
• Joining diverse datasets to gain valuable business insight
• Performing real-time, complex queries on datasets

Audience & Prerequisites

This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators. Knowledge of SQL is assumed, as is basic Linux command-line familiarity. Knowledge of at least one scripting language (e.g., Bash scripting, Perl, Python, Ruby) would be helpful but is not essential. Prior knowledge of Apache Hadoop is not required.

Advance your ecosystem expertise

Apache Pig applies the fundamentals of familiar scripting languages to the Hadoop cluster. Apache Hive makes transformation and analysis of complex, multi-structured data scalable in Hadoop. Cloudera Impala enables real-time interactive analysis of the data stored in Hadoop via a native SQL environment. Together, Pig, Hive, and Impala make multi-structured data accessible to analysts, database administrators, and others without Java programming expertise.

Register Now

Have a Question? - Contact Instructor

Have a Question? – Contact Our Instructors