CMS/DP-203 : Data Engineering on Microsoft Azure

4 Gün (24 Saat) Orta Sınıf / Online NoSQL ve Büyük Veri

Bu derste katılımcılar, Azure veri platformu teknolojilerini kullanarak toplu ve gerçek zamanlı analitik çözümlerle çalışmaya ilişkin veri mühendisliği kalıpları ve uygulamaları hakkında bilgi edinecektir. Katılımcılar, analitik bir çözüm oluşturmak için kullanılan temel bilgi işlem ve depolama teknolojilerini anlayarak başlayacaklardır. Ardından, bir analitik hizmet katmanlarının nasıl tasarlanacağını keşfedecek ve kaynak dosyalarla çalışmak için veri mühendisliği konularına odaklanacaklar. Katılımcılar, bir veri gölündeki dosyalarda depolanan verileri etkileşimli olarak nasıl keşfedeceklerini öğrenecekler. Azure Synapse Analytics veya Azure Databricks'te bulunan Apache Spark özelliğini kullanarak veri yüklemek için kullanılabilecek çeşitli alma tekniklerini veya Azure Data Factory veya Azure Synapse işlem hatlarını kullanarak nasıl alınacağını öğrenecekler. Ayrıca, verileri almak için kullanılan teknolojilerin aynısını kullanarak verileri dönüştürebilecekleri çeşitli yolları da öğreneceklerdir. Katılımcı, veri yüklerinin veya sistemlere karşı yayınlanan sorguların performansını optimize edebilmeleri için analitik sistemin performansını nasıl izleyeceğini ve analiz edeceğini öğrenmek için eğitimde zaman harcayacaktır. Verilerin beklemede veya aktarım sırasında korunmasını sağlamak için güvenlik uygulamasının önemini anlayacaklardır. Katılımcılar daha sonra bir analitik sistemdeki verilerin panolar oluşturmak veya Azure Synapse Analytics'te tahmine dayalı modeller oluşturmak için nasıl kullanılabileceğini gösterecektir.

Eğitim İçeriği

Module 1: Explore compute and storage options for data engineering workloads

Lesson

Introduction to Azure Synapse Analytics
Describe Azure Databricks
Introduction to Azure Data Lake storage
Describe Delta Lake architecture
Work with data streams by using Azure Stream Analytics

Lab : Explore compute and storage options for data engineering workloads

Combine streaming and batch processing with a single pipeline
Organize the data lake into levels of file transformation
Index data lake storage for query and workload acceleration

Module 2: Design and implement the serving layer

Lesson

Design a multidimensional schema to optimize analytical workloads
Code-free transformation at scale with Azure Data Factory
Populate slowly changing dimensions in Azure Synapse Analytics pipelines

Lab : Designing and Implementing the Serving Layer

Design a star schema for analytical workloads
Populate slowly changing dimensions with Azure Data Factory and mapping data flows

Module 3: Data engineering considerations for source files

Lesson

Design a Modern Data Warehouse using Azure Synapse Analytics
Secure a data warehouse in Azure Synapse Analytics

Lab : Data engineering considerations

Managing files in an Azure data lake
Securing files stored in an Azure data lake

Module 4: Run interactive queries using Azure Synapse Analytics serverless SQL pools

Lesson

Explore Azure Synapse serverless SQL pools capabilities
Query data in the lake using Azure Synapse serverless SQL pools
Create metadata objects in Azure Synapse serverless SQL pools
Secure data and manage users in Azure Synapse serverless SQL pools

Lab : Run interactive queries using serverless SQL pools

Query Parquet data with serverless SQL pools
Create external tables for Parquet and CSV files
Create views with serverless SQL pools
Secure access to data in a data lake when using serverless SQL pools
Configure data lake security using Role-Based Access Control (RBAC) and Access Control List

Module 5: Explore, transform, and load data into the Data Warehouse using Apache Spark

Lesson

Understand big data engineering with Apache Spark in Azure Synapse Analytics
Ingest data with Apache Spark notebooks in Azure Synapse Analytics
Transform data with DataFrames in Apache Spark Pools in Azure Synapse Analytics
Integrate SQL and Apache Spark pools in Azure Synapse Analytics

Lab : Explore, transform, and load data into the Data Warehouse using Apache Spark

Perform Data Exploration in Synapse Studio
Ingest data with Spark notebooks in Azure Synapse Analytics
Transform data with DataFrames in Spark pools in Azure Synapse Analytics
Integrate SQL and Spark pools in Azure Synapse Analytics

Module 6: Data exploration and transformation in Azure Databricks

Lesson

Describe Azure Databricks
Read and write data in Azure Databricks
Work with DataFrames in Azure Databricks
Work with DataFrames advanced methods in Azure Databricks

Lab : Data Exploration and Transformation in Azure Databricks

Use DataFrames in Azure Databricks to explore and filter data
Cache a DataFrame for faster subsequent queries
Remove duplicate data
Manipulate date/time values
Remove and rename DataFrame columns
Aggregate data stored in a DataFrame

Module 7: Ingest and load data into the data warehouse

Lesson

Use data loading best practices in Azure Synapse Analytics
Petabyte-scale ingestion with Azure Data Factory

Lab : Ingest and load Data into the Data Warehouse

Perform petabyte-scale ingestion with Azure Synapse Pipelines
Import data with PolyBase and COPY using T-SQL
Use data loading best practices in Azure Synapse Analytics

Module 8: Transform data with Azure Data Factory or Azure Synapse Pipelines

Lesson

Data integration with Azure Data Factory or Azure Synapse Pipelines
Code-free transformation at scale with Azure Data Factory or Azure Synapse Pipelines

Lab : Transform Data with Azure Data Factory or Azure Synapse Pipelines

Execute code-free transformations at scale with Azure Synapse Pipelines
Create data pipeline to import poorly formatted CSV files
Create Mapping Data Flows

Module 9: Orchestrate data movement and transformation in Azure Synapse Pipelines

Lesson

Orchestrate data movement and transformation in Azure Data Factory

Lab : Orchestrate data movement and transformation in Azure Synapse Pipelines

Integrate Data from Notebooks with Azure Data Factory or Azure Synapse Pipelines

Module 10: Optimize query performance with dedicated SQL pools in Azure Synapse

Lesson

Optimize data warehouse query performance in Azure Synapse Analytics
Understand data warehouse developer features of Azure Synapse Analytics

Lab : Optimize Query Performance with Dedicated SQL Pools in Azure Synapse

Understand developer features of Azure Synapse Analytics
Optimize data warehouse query performance in Azure Synapse Analytics
Improve query performance

Module 11: Analyze and Optimize Data Warehouse Storage

Lesson

Analyze and optimize data warehouse storage in Azure Synapse Analytics

Lab : Analyze and Optimize Data Warehouse Storage

Check for skewed data and space usage
Understand column store storage details
Study the impact of materialized views
Explore rules for minimally logged operations

Module 12: End-to-end security with Azure Synapse Analytics

Lesson

Secure a data warehouse in Azure Synapse Analytics
Configure and manage secrets in Azure Key Vault
Implement compliance controls for sensitive data

Lab : End-to-end security with Azure Synapse Analytics

Secure Azure Synapse Analytics supporting infrastructure
Secure the Azure Synapse Analytics workspace and managed services
Secure Azure Synapse Analytics workspace data

Module 13: Real-time Stream Processing with Stream Analytics

Lesson

Enable reliable messaging for Big Data applications using Azure Event Hubs
Work with data streams by using Azure Stream Analytics
Ingest data streams with Azure Stream Analytics

Lab : Real-time Stream Processing with Stream Analytics

Use Stream Analytics to process real-time data from Event Hubs
Use Stream Analytics windowing functions to build aggregates and output to Synapse Analytics
Scale the Azure Stream Analytics job to increase throughput through partitioning
Repartition the stream input to optimize parallelization

Module 14: Create a Stream Processing Solution with Event Hubs and Azure Databricks

Lesson

Process streaming data with Azure Databricks structured streaming

Lab : Create a Stream Processing Solution with Event Hubs and Azure Databricks

Explore key features and uses of Structured Streaming
Stream data from a file and write it out to a distributed file system
Use sliding windows to aggregate over chunks of data rather than all data
Apply watermarking to remove stale data
Connect to Event Hubs read and write streams

Module 15: Build reports using Power BI integration with Azure Synpase Analytics

Lesson

Create reports with Power BI using its integration with Azure Synapse Analytics

Lab : Build reports using Power BI integration with Azure Synpase Analytics

Integrate an Azure Synapse workspace and Power BI
Optimize integration with Power BI
Improve query performance with materialized views and result-set caching
Visualize data with SQL serverless and create a Power BI report

Module 16: Perform Integrated Machine Learning Processes in Azure Synapse Analytics

Lesson

Use the integrated machine learning process in Azure Synapse Analytics

Lab : Perform Integrated Machine Learning Processes in Azure Synapse Analytics

Create an Azure Machine Learning linked service
Trigger an Auto ML experiment using data from a Spark table
Enrich data using trained models
Serve prediction results using Power BI

Öncesinde Önerilenler

Data Engineer
Büyük Verinin İşlenmesi, Yönetimi, Veri Kalitesini Arttırma, Bulut Bilişim ve Veri Bilimi için Kodlama, Spark ve Hadoop gibi Dağıtık Mimariler ile Çalışma.

Sonrasında Önerilenler

Data Engineer
Büyük Verinin İşlenmesi, Yönetimi, Veri Kalitesini Arttırma, Bulut Bilişim ve Veri Bilimi için Kodlama, Spark ve Hadoop gibi Dağıtık Mimariler ile Çalışma.

İhtiyacınıza Uygun Eğitimleri Keşfedin!

Önce Siz Haberdar Olun!