As a data engineer, the Apache Spark platform provides a great deal of functionality designed to solve common problems around data movement and processing, particularly in the cloud. In this session, we will learn how to use Apache Spark in Microsoft Azure. We will see which Azure services provide Apache Spark integration points, look at use cases in which Apache Spark is a great choice, and use the metaphor of the data pipeline to perform data movement and transformation in the cloud. We will additionally learn how to use notebook workflows in Azure Databricks to simplify the process.
ADDITIONAL MEDIA
No recordings or additional media are available for this talk.
Azure Data Factory allows you to run a Databricks notebook. This is helpful if you use ADF for several other tasks and want to integrate that process with Databricks, rather than scheduling jobs to do the work.
Databricks Connect isn't a topic I cover in this talk, but it is useful as you spend more time in the platform and you want to do this work in a full IDE.