HDP Developer: Enterprise Apache Spark
This course is designed as an entry point for developers who need to create applications to analyse big data stored in Apache Hadoop using Spark.
Topics include: An overview of the Hortonworks Data Platform (HDP), including HDFS and YARN; using Spark Core APIs for interactive data exploration; Spark SQL and DataFrame operations; Spark Streaming and DStream operations; data visualisation, reporting, and collaboration; performance monitoring and tuning; building and deploying Spark applications; and an introduction to the Spark Machine Learning Library.
Software engineers that are looking to develop in-memory applications for time sensitive and highly iterative applications in an Enterprise HDP environment.
After this course students will be able to:
- Describe Hadoop, HDFS, YARN, and the HDP ecosystem
- Describe Spark use cases
- Explore and manipulate data using Zeppelin
- Explore and manipulate data using Spark REPL
- Explain the purpose and function of RDDs
- Employ functional programming practices
- Perform Spark transformations and actions
- Work with Pair RDDs
- Perform Spark queries using Spark Streaming stateless and window transformations
- Visualise data, generate reports, and collaborate using Zeppelin
- Monitor Spark applications using Spark History Server
- Learn general application optimisation guidelines / tips
- Use data caching to increase performance of applications
- Build and package Spark applications
- Deploy applications to the cluster using YARN
- Understand the purpose of Spark MLlib
Students should be familiar with programming principles and have previous experience in software development using either Python or Scala. Previous experience with data streaming, SQL, and HDP is also helpful, but not required.
- Lectures 0
- Quizzes 0
- Language English
- Students 0
- Assessments Self