Apache Spark Programming with Databricks
Duration: 2 days
Industry: Information Technology
About this course
What is the Apache Spark Programming with Databricks all about?
This course uses a case study-driven approach to explore the fundamentals of Spark Programming with Databricks, including Spark architecture, the DataFrame API, query optimization, and Structured Streaming. First, you will become familiar with Databricks and Spark, recognize their major components, and explore datasets for the case study using the Databricks environment. After ingesting data from various file formats, you will process and analyze datasets by applying a variety of DataFrame transformations, Column expressions, and built-in functions. Lastly, you will execute streaming queries to process streaming data and highlight the advantages of using Delta Lake.
What is Apache Spark?
Databricks defines Apache Spark as a lightning-fast unified analytics engine for big data and machine learning. Since its release, Apache Spark, the unified analytics engine, has seen rapid adoption by enterprises across a wide range of industries. Internet powerhouses such as Netflix, Yahoo, and eBay have deployed Spark at a massive scale, collectively processing multiple petabytes of data on clusters of over 8,000 nodes. It has quickly become the largest open-source community in big data, with over 1000 contributors from 250+ organizations.
For more information, please check this blog from P2L.
Who can benefit?
- Data engineer
- Data scientist
- Machine learning engineer
- Data architect
This is what you'll learn
- Define the major components of Spark architecture and execution hierarchy
- Describe how DataFrames are built, transformed, and evaluated in Spark
- Apply the DataFrame API to explore, preprocess, join, and ingest data in Spark
- Apply the Structured Streaming API to perform analytics on streaming data
- Navigate the Spark UI and describe how the catalyst optimizer, partitioning, and caching affect Spark's execution performance
Prerequisite Skills
- Familiarity with basic SQL concepts (select, filter, group by, join, etc.)
- Beginner programming experience with Python or Scala (syntax, conditions, loops, functions)
Schedule (iMVP)
Sep 19-20, 2022
Oct 3-4, 2022
Oct 17-18, 2022
Oct 24-27, 2022
Nov 7-10, 2022