Giter Site home page Giter Site logo

microsoftlearning / dp-203-azure-data-engineer Goto Github PK

View Code? Open in Web Editor NEW
340.0 12.0 372.0 28.7 MB

Exercise files for Microsoft Data Engineer curriculum

Home Page: https://microsoftlearning.github.io/dp-203-azure-data-engineer/

License: MIT License

PowerShell 59.97% TSQL 13.20% Jupyter Notebook 26.83%

dp-203-azure-data-engineer's Introduction

DP-203: Azure Data Engineer

This repository contains instructions and assets for hands-on exercises in the Microsoft Official Courseware to support the Microsoft Certified: Azure Data Engineer Associate certification. The exercises are designed to complement the associated training modules on Microsoft Learn, and a subset of these exercises comprises the hands-on labs in the official DP-203T00: Data Engineering on Microsoft Azure instructor-led training course.

Exercise design principles

The exercises in the repo are designed to support both self-paced learners on Microsoft Learn, and students in official instructor-led training deliveries. In most cases, self-paced learners must provide their own cloud subscription, while students attending official instructor-led courses are typically provided with subscriptions they can use to complete each individual exercise that is included in the course. Note that Microsoft does not support instructor-led deliveries of the exercises in this repo in environments other than those provided by Microsoft authorized lab hosters (ALHs).

The exercises are designed to stand alone, independently of one another. Most labs begin with instructions to clone the repo and run an exercise-specific setup script to prepare the environment, which can take anywhere between 2 and 20 minutes depending on the resources required in the exercise. While this can result in a repetitive experience that requires some patience as the setup script runs, it is necessary to minimize cloud service costs (for both self-paced learners and hosted lab environments in instructor-led deliveries), and to support self-paced learners choosing their own path through the associated training modules on Microsoft Learn.

The numbering of the exercises in this repo indicates a suggested logical sequence that reflects the flow of modules in the official learning paths and the instructor-led materials. The numbers do not indicate the corresponding slide deck or "lab" in an instructor-led course.

Some of the exercises are suggested as instructor demo's in classroom deliveries. Trainers can follow these suggestions or demonstrate any of the exercises at their discretion. Note however that hosted lab profiles and cloud subscriptions may not be provided for exercises that are not included as student labs in courses; and the exercise-specific hosted subscriptions that are provided in lab profiles may have policies applied that prevent completion of other exercises. Trainers are advised to test available lab profiles and to use their own cloud subscriptions for demonstrations if necessary.

Contributing to this repo

Microsoft Certified Trainers (MCTs) are welcome to submit issues and PRs related to content or assets in this repo, subject to the guidance in the GitHub User Guide for MCTs. Trainers should bear in mind that the repo is designed to support self-paced learners on Microsoft Learn as well as students in instructor-led courses, and that some of the exercises in the repo are not included in the hosted lab profiles for classroom delivery. Issues relating to configuration or performance of lab environments provided by ALHs are not supported here - contact your ALH if you experience problems related to the hosted lab environment.

dp-203-azure-data-engineer's People

Contributors

caeltheloved avatar ephiax20 avatar epicmau5time avatar graememalcolm avatar hamelinboyerj avatar jeannedark avatar lgvacanti avatar mihai-ac avatar penchalaiahy avatar sabareh avatar shaximinion avatar steven-nich-sait avatar stijn-arends avatar thejamesherring avatar zhiliangxu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dp-203-azure-data-engineer's Issues

Lab

Lab: 19

Description of issue: Title of the lab is slightly incorrect:

"Create a realtime report with Azure Stream Analytics and Microsoft Power BI" should be changed to "Create a realtime dashboard with Azure Stream Analytics and Microsoft Power BI" as we aren't creating a real-time report in this case. We are creating a streaming dashboard.

https://github.com/MicrosoftLearning/dp-203-azure-data-engineer/blob/master/Instructions/Labs/19-Stream-Power-BI.md#create-a-realtime-report-with-azure-stream-analytics-and-microsoft-power-bi

Gitlab 17/ VM lab 14 throws significant errors due to managed identity not assigning correctly to eventhub

Module: 00

Lab/Demo: 14

Task: 00

Step: 00

Description of issue

Repro steps:

Multiple Users report this issue:
MicrosoftTeams-image (13)

so we run and they then get permissions issues

MicrosoftTeams-image (15)

managed identity was not assigned to the event hub

Thus:
https://learn.microsoft.com/en-us/azure/stream-analytics/event-hubs-managed-identity

Solution:

  1. Create a new input
  2. Assigned managed identity from SA to Event hub analytics 'Azure Event Hubs Data Owner'
  3. run
  4. ignore all notifcations and check SA
    image

Can we add the permission to Eventhubs in the PS script?

UK South/Central has issues with serverless sql

Module: 1, 2, 3 (+?)

Lab/Demo: 00

Task: 00

Step: 00

Description of issue
When the random location becomes UK, the serverless SQL pool is unreachable due to a 'firewall issue'. See https://learn.microsoft.com/en-us/answers/questions/1065383/azure-synapse-unable-to-connect-to-serverless-sql.

Not every lab requires serverless, but for those that do we need to delete the resources and redeploy.

I would understand if you point out this is an issue with Azure, not with the course materials. However, perhaps it is possible to, for now, remove the option to get your resources deployed in UK (when the lab requires the Serverless pool).

Repro steps:

Run setup, land on UK, enter synapse studio and check your sql pools.

Failed to copy notebook to workspace

Module: 06

Lab/Demo: 06

Task: 02

Step: 10

Description of issue
Step 10 asks to open a notebook. Setup.ps1 should have copied it there. The copy failed but the script reported success anyways. See attached imaged.
IMPORTANT: I have multiple Azure subscriptions. I selected the MSDN/Visual Studio one which I use for R&D. Everything in the setup.ps1 script worked but this particular copy.

I believe the reason for the error is that my CloudShell disk is hosted on another subscription.

Repro steps:

  1. Run setup.ps1
  2. Follow lab steps
    2023-03-21_10-57-39

Add a callout on sqlpassword requirements

Module: dp203

Lab/Demo: 01

Task: 00

Step: 00

Description of issue
Can we add a callout in the sql password requirements that it should not contain "SQLUser" or 3 consecutive characters from it since it is the username. Tried creating the worskspace using this as password and did not and took a lot of time to figure out.
Change required in the setup.ps1 file.

Lab 14 - Sweden Central is not available yet for Synapse

Module: 00

Lab/Demo: 14 Use Azure Synapse Link for Azure Cosmos DB

Task: Provision Azure resources

Step: 5 (./setup.ps1)

Description of issue

  • Synapse is not available yet in swedencentral. We had a learner this week who noticed that not all resources were deployed after running the setup script. I noticed that region swedencentral was used. According to Products available by region it will come in Q3 2023.

Repro steps:
just follow instructions

Lab 18 Exercise Ingest streaming data into Azure Synapse Analytics - Module Not Found

Module: Ingest realtime data with Azure Stream Analytics and Azure Synapse Analytics
Lab/Demo: 18
Task: Ingest streaming data into a dedicated SQL pool
Step: 06

Description of issue

The following command ( run a client app that sends 100 simulated orders to Azure Event Hubs) doesn't work:

node ~/dp-203/Allfiles/labs/18/orderclient

When running the above in PowerShell, the following error is thrown:

Error: Cannot find module '/home/name/dp-203/Allfiles/labs/18/orderclient'
at Function.Module._resolveFilename (node:internal/modules/cjs/loader:1028:15)
at Function.Module._load (node:internal/modules/cjs/loader:873:27)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12)
at node:internal/main/run_main_module:22:47 {
code: 'MODULE_NOT_FOUND',
requireStack: []

Thanks for your help with this.

Databricks: Data renamed to Catalog

Module: 00

Lab/Demo: 23

Task: Create and query a table

Step: 05

Description of issue
"Data" task was renamed to "Catalog"
Old:
image

New: image

Repro steps:
Old: In the tab on the left, select the Data task, and verify that the products table ....
New: In the tab on the left, select the Catalog task, and verify that the products table h

Cannot provision databricks workspace using setup script.

Module: Explore Azure Databricks

Lab/Demo: 23

Task: 00

Step: 00

Description of Issue:

Hi! I am having some issues when following the instructions for the 'Explore Azure Databricks' exercise.

I manage to clone the repo using the Powershall pane as usual, but after changing the folder and running the script I get an error, and in the end no Databricks workspace is created or anything else for that matter (no resource group or otherwise).

After running the setup scrip this is the messages I receive on screen

error

I'm not sure exactly where the problem lies, so any advice is welcome! If you need more information, let me know!

Hopefully someone can help. Thanks in advanced

Minor: Lab 7 creates RG DP000****** instead of DP203******

Module: 00

Lab/Demo: 07

Task: 01

Step: 05

Description of issue
Running the setup for the lab "Use Delta Lake with Spark in Azure Synapse Analytics" created a DP000****** resource group for me, instead of DP203. It's just a naming issue and I'm sure people will be able to spot the correct resource group. But the instructions do mention DP203*****, which is false.

Repro steps:

  1. run the setup in step 5.

Lab 3 setup failures ( Failed to connect to MSI)

Lab 3
Description of issue
The setup script runs with an error on step "Granting permissions on the datalake storage account..."

Repro steps:

Run the cloudshell setup
Chose my MSDN subscription

ERROR: Failed to connect to MSI. Please make sure MSI is configured correctly.
Get Token request returned: <Response [400]>
Screenshot 2023-07-26 135149

Could you help me with this error?

Lab 04, missing RetailDB directory inside deployed SA

Module: Analyze data in a lake database

Lab/Demo: 04

Task: 00

Step: 00

Description of issue:

I followed the instruction for 4th lab, but after successfully executed setup script - there is no RetailDB directory inside StorageAccount.
As I understand its required.

Script was executed 2 times and deployed resources(AzureSynapse/SA) in different regions(WestEurope, NorthEurope).
Stdout for last script run in attach.

Please help me to fix that, and text me if you need Additional info from my side.

Thanks in advance!

setup_stdout.txt

image image

Json and Parquet sample data files missing

Hi,

I am working through the Microsoft Learn Data engineering path and in the lesson.

Lesson
Learn Module Lesson

The sample files for the sales data in Parquet and Json seem to missing from LAB01 as mentioned in the instructions.

Path to lab instructions
Lab Instructions

Can you please assist in adding the files to the labs?

Thank you,
Kind regards
Tirendra

No datasets were found

Module: 06

Lab/Demo: 04

Task: Load data into the table's storage path

Step: 4

There is no returned data.

Warning: No datasets were found that match the expression 'RetailDB.dbo.Customer'.

Statement ID: {6D97EB15-84E0-4C75-BDD6-151350430D34} | Query hash: 0xC74B9B855CD85610 | Distributed request ID: {E9E904AE-6014-4F0B-A94F-8E622319E868}. Total size of data scanned is 0 megabytes, total size of data moved is 0 megabytes, total size of data written is 0 megabytes.
(0 record affected)

Total execution time: 00:00:11.455

Repro steps:

  1. In the main pane, switch back to the files tab, which contains the file system with the RetailDB folder. Then open the RetailDB folder and create a new folder named Customer in it. This is where the Customer table will get its data.

  2. Open the new Customer folder, which should be empty.

  3. Download the customer.csv data file from https://raw.githubusercontent.com/MicrosoftLearning/dp-203-azure-data-engineer/master/Allfiles/labs/04/data/customer.csv and save it in a folder on your local computer (it doesn't matter where). Then in the Customer folder in Synapse Explorer, use the ⤒ Upload button to upload the customer.csv file to the RetailDB/Customer folder in your data lake.

  4. In the Data pane on the left, on the Workspace tab, in the ... menu for the Customer table, select New SQL script > Select TOP 100 rows. Then, in the new SQL script 1 pane that has opened, ensure that the Built-in SQL pool is connected, and use the ▷ Run button to run the SQL code. The results should include first 100 rows from the Customer table, based on the data stored in the underlying folder in the data lake.

Lab setup failures

Lab 3
Description of issue
The setup script runs with an error when validating the Synapse Workspace. Subscription needs to be registered with the Microsoft.SQL resource provider

When refreshing the Azure portal, only the datalake can be seen.

run the setup a second time, on completion there are 2 resource groups and 2 sets of resources. This has happened to me and to my students.

Repro steps:

  1. Run the cloudshell setup
  2. Only the resource group and datalake are visible
  3. Run the setup a second time resources created sucessfully

image

This using Skillable labs - could be related to the cloudslice??

Lab 26(21 in esi) issue. New UI is the default(Databricks) and instructions are based on old UI

Module: 00

Lab/Demo: 26 - Use a SQL Warehouse in Azure Databricks

Task: 02

Step: 01

Description of issue
Databricks defaults to its new UI as of late. This has caused some confusion for many of the learners.

Repro steps:

  1. Sign to Azure using esilearnondemand
  2. Load the necessary resources through the given Powershell scripts
  3. Start reading instructions on how to view and start the SQL warehouse (instructions won't match the UI because it has been updated by default)

Extra info/solution:
The instructions on the right are going only to work if you have the old UI.

You can disable the new UI:

image

This should ideally be called out going forward.

Lab 22 Synapase-purview

Module: 08

Lab/Demo: 22

Task: Create Microsoft Purview account

Step: 00

Description of issue

Repro steps:

  1. Only one student could create the purview account
  2. The others got an error because one purview account was already created for the tenant. It looks like the rules have changed since the last time I run the course, because we could create multiple accounts in the same tenant, one per azure subscription. Now we can't.
  3. As the instructor I have tried the lab and it worked fine. I couldn't reproduce the issue as I was probably the first one creating the account at the time I tried the lab later in the day after the "disaster" during the class. My students were very upset as Purview was a subject that most of them were very keen to experiment with.

15-Synapse-link-sql.md - replace SQL Database with SQL Server

15-Synapse-link-sql.md

Configure Azure SQL Database

Step: 1

In the paragraph:

In the page for your Azure SQL Database resource, in the pane on the left, in the Security section (near the bottom), select Identity. Then under System assigned managed identity, set the Status option to On. Then use the 🖫 Save icon to save your configuration change.

Replace "Azure SQL Database resource" with "Azure SQL Server resource".

Improper Star Scema Representation

Module: 2

Lab/Demo: 04 - Create Lake Database

Task: Create a Table From Existing Data

Step: 5 and 6

Description of issue
the way the instructions have it, the relationship is filtering from the fact table (SalesOrder) to the Dim tables (Product, Customer) which isnt a correctly formatted model

the relationships for the SalesOrder table should be created in the To section which properly represents how models should filter rom Dim to Fact tables (image included)
Screenshot 2023-02-28 090648

Repro steps:

Resource 'dp203-c9xwrah' was disallowed by policy

Module: Exercise - Integrate Azure Synapse Analytics and Microsoft Purview in learning path Govern data across an enterprise

Lab/Demo: Use Microsoft Purview with Azure Synapse Analytics

Task: Provision Azure resources

Step: 8. Wait for the setup script to complete

Description of issue:
Finding an available region. This may take several minutes...
Trying southeastasia
Using southeastasia
Creating dp203-c9xwrah resource group in southeastasia ...
New-AzResourceGroup: /home/inge/dp-203/Allfiles/labs/22/setup.ps1:132
Line |
132 | New-AzResourceGroup -Name $resourceGroupName -Location $Region | Out- …
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| Resource 'dp203-c9xwrah' was disallowed by policy.

Not allowed to deploy resources in southeastasia. Policy allows me to deploy in northeurope or westeurope.
Coming from a Microsoft Learn training you never know what is in place in learner's subscriptions.

Repro steps:

  1. Try to deploy to a region while there is Azure Policy in place that does not allow you to deploy to that region.

image Create a resource group where I cannot select the southeastasia region and get the message on the default region eastus

Deployment manual for resources are missing for certain exercises.

Module: Analyze and optimize data warehouse storage in Azure Synapse Analytics

https://learn.microsoft.com/en-us/training/modules/analyze-optimize-data-warehouse-storage-azure-synapse-analytics/

Lab/Demo: All exercises in the module

Task: 00

Step: 00

Description of issue
All exercises in the module require the use of a resource deployment for which the manual is in the retired repo https://github.com/MicrosoftLearning/dp-203-data-engineer.

Repro steps:

  1. https://learn.microsoft.com/en-us/training/modules/analyze-optimize-data-warehouse-storage-azure-synapse-analytics/
  2. Consider all units with prefix "Exercise"
  3. Observe that there is no step that explains how to deploy the required resources.

Remark: I did not check for any subsequent modules in the DP-203 course. So there might be more similar issues that I am not aware of.

Lab 03 CSV Header Row Missing

Lab/Demo: 03

Task: 03

Step: 06

When I executed the PS script, it created everything fine, but apparently, the CSV files don't have the header row. I had to delete them, and upload the versions that are in this repo.

Lab 26 resources

Couldn't start the starter SQL warehouse. It requires 8 cores while the quota of each region is 3.

Workspace Cost

Do you have the estimated hourly cost for this workspace?

Explore data with a Data Explorer pool

Module: Explore data with a Data Explorer pool

Lab/Demo: Explore data with a Data Explorer pool

Task: Create a Data Explorer database and ingest data into a table

Step: 06

Description of issue

Hi, I was trying to follow the step by step outlined in the repo and get to part of "Explore data with a Data Explorer Pool". I couldn't find the ingest-data script that mention in step 6 below. Was there any step I missed or the document is misssing some steps?

Switch to the Develop page, and in the KQL scripts list, select ingest-data. When the script opens, note that it contains two statements:

Repro steps:

DP-203-26-Databricks SQL - minor UI Update

Lab/Demo: DP-203-26-Databricks SQL

Task: Create a table

Step: 1

Description of issue
UI has changed. Instructions don´t match the new UI.
image

Current instructions :

  1. In the sidebar, select (+) New and then select Table.

  2. In the Upload file area, select browse. Then in the Open dialog box, enter https://raw.githubusercontent.com/MicrosoftLearning/dp-203-azure-data-engineer/master/Allfiles/labs/26/data/products.csv and select Open.

    Tip: If your browser or operating system doesn't support entering a URL in the File box, download the CSV file to your computer and then upload it from the local folder where you saved it.

  3. In the Create table in Databricks SQL page, select the adventureworks database and set the table name to products. Then select Create.

Repro steps:

Steps should be the following

  1. In the sidebar, select (+) New and then select File upload.

  2. In the Upload data area, select browse. Then in the Open dialog box, enter https://raw.githubusercontent.com/MicrosoftLearning/dp-203-azure-data-engineer/master/Allfiles/labs/26/data/products.csv and select Open.

    Tip: If your browser or operating system doesn't support entering a URL in the File box, download the CSV file to your computer and then upload it from the local folder where you saved it.

  3. In the Upload data page, select the adventureworks database and set the table name to products. Then select Create table.

Still can't select data in Lake Database

Hi, thanks!

I still can't get any data, even changing the authentication method.

Please, check the previous post: Originally posted by @ralsouza in #33 (comment)

Error:
Started executing query at Line 1
External table 'RetailDB.dbo.Customer' is not accessible because content of directory cannot be listed.
Total execution time: 00:00:02.225

Also, I tried with Pyspark and got the same results.

%%pyspark
df = spark.sql("SELECT * FROM RetailDB.Customer")
df.show(10)

+-----------+----------+----------+--------------+------+
|CustomerId|FirstName|LastName|EmailAddress|Phone|
+-----------+----------+----------+--------------+------+
+-----------+----------+----------+--------------+------+

Originally posted by @ralsouza in #33 (comment)

Creating lake database does not initalise GUI in synapse

Module: 00

Lab/Demo: 04

Task: Create a lake database

Step: 6

Description of issue
A learner has reported they are unable to open the lake database in the lab.

Current detour is to go through the knowledge center and set up table, continue with the steps for table_1, then delete the table afterwards.

Repro steps:

using my own environment, I am also able to recreate this same error: image image

databricks - finding uploaded data

Module: 00

Lab/Demo: 23 VM lab 18

Task: 00

Step: 00

Description of issue
users having difficulty finding uploaded data - make it a little more general and have them enable the DBFS file store in the admin settings console.

solution:

If you have uploaded the data but cannot find it in the data.

 Note:  Data must be uploaded already

 Any errors you face afterwards: 
image

enable DBFS file store in your admin console
image

 then refresh and return to your notebook
image

Find your file using the DBFS explore -> right click and copy the file path

Note: you may have to do a bit of digging based on where you uploaded your data but it should be there. Otherwise, you can always add it again.

REPLACE THE FOLLOWING CODE

filepath = #INSERT YOUR FILEPATH
df1 = spark.read.format("csv").option("header", "true").load(filepath)

Script error where notebooks don't get loaded in Synapse Workspace due to unauthorised IP address

Module: 6/3

Lab/Demo: 11/7 (any with spark notebook)

Task: Provision an Azure Synapse Analytics workspace

Step: 8

Description of issue

The following error has appeared twice this week for two people in the course I'm teaching where they run the powershell script that requires a spark notebook and receive an error in the setup script. I want to make you aware that not all ip addresses are accepted and it will need to be manually declared in the setup script.

Appears in cloud shell prompt. E.g.:

Importing Spark Transform.ipynb ...
ERROR: (ClientIpAddressNotAuthorized) Client Ip address : 13.73.***.**
Code: ClientIpAddressNotAuthorized
Message: Client Ip address : 13.73.***.**
Script completed at 05/31/2023

Temp fix:
github repo -> allfiles/lab -> lab number you're in -> notebooks -> save notebook as rawfile (.ipynb) + ensuring no additional extensions added e.g. '.txt'. -> in synapse, import file -> publish.

Repro steps:

  1. allow all IP address or search to find current IP address and enable (not sure exactly where in the setup scripts) in the ps script

Lab 18 error generating event hub client app

Module: 07

Lab/Demo: 18

Task: 00

Step: 00

Description of issue
During setup the event hub client app creation fails.

image

Repro steps:
Don't know how to fix this problem
1.
1.
1.

Lab 22 - Synapse and Purview

Module: 08

Lab/Demo: 22 -> old number (17 in Skillable)

Task: Create and run a pipeline

Step: 05

Description of issue
Student reports:
SQL table is named dbo.DimProduct instead of dbo.products and the column is not ProductKey but ProductID.

Please view image
MicrosoftTeams-image (14)

Modify setup to not require admin permissions

Module: dp-203

Lab/Demo: all

Description of issue

The setup scripts (example) require admin permissions to

  1. register resource providers
  2. create resource groups

My org does not give these permissions our cloud users. We have mechanisms in place for users to request that resource providers be enabled, and for resource groups to be created, but they never receive the permissions to do so themselves.

I propose that the setup scripts are modified to:

  1. fail with a clear error message if a require resource provider is not registered when the user doesn't have the ability to register it
  2. prompt the user to let them use an existing resource group instead of requiring the creation of a new one

additionally, it looks like the script checks azure regions for compatibility with services, but the list of preferred regions doesn't include canadacentral.
Perhaps this could also be a user prompt, and if the user-entered region fails the check only then would it attempt to automatically find a compatible one for them?

Instructions deviate from the Azure service in stream analytics (that impacts whether lab works)

Module: 00

Lab/Demo: 17 (stream analytics)

Task: Create an Azure Stream analytics

Step: 01

Description of issue

Learner tried to do connection string instead of identity which resulted in failed SA deployment. The instructions need to be updated to be a bit more clearer when it comes to determining the storage account (e.g. using a managed identity instead of connection string)

Repro steps:
image

image

image

DBC upload error generated but file loaded

Module: 09

Lab/Demo: 24

Task: Explore data using a notebook

Step: 02

Description of issue
When the learner submits the url request, there is an error generated (image included). The file does seem to import though, as there were several instances of the same file. Not sure why the error is produced.

Screenshot 2023-03-02 152757

Screenshot 2023-03-02 152842

Repro steps:

Incorrect instructions Lab 27

Module: 27

Lab/Demo: 01

Task: Enable Azure Databricks integration with Azure Data Factory

Step: Generate an access token

Description of issue: it says click on, In the Azure Databricks portal, at on the top left menu bar, select the username and then select User Settings from the drop-down.

Whereas it should be top right menu bar, please check and update.

Lab 9, provisioning setup script throws error

Module: 04

Lab/Demo: 09

Task: Provision an Azure Synapse Analytics workspace
Step: 05

cd dp-203/Allfiles/labs/09 ./setup.ps1

Description of issue
Suspend-AzSynapseSQLPool not loaded from module
image

Might be worth investigating - does not impact lab but can cause confusion for those running script independently.

Set up Synapse throws error

Module: 05

Lab/Demo: 01

Task: 02

Step: 05

Description of issue
I would like to do the following course https://learn.microsoft.com/en-us/training/modules/introduction-azure-synapse-analytics/. When I follow the steps (https://microsoftlearning.github.io/dp-203-azure-data-engineer/Instructions/Labs/01-Explore-Azure-Synapse.html) to set up Synapse I get an error for the setup.ps1 script in line 141.

Repro steps:

  1. Open azure.com
  2. Open power shell
  3. Enter in power shell:
rm -r dp-203 -f
git clone https://github.com/MicrosoftLearning/dp-203-azure-data-engineer dp-203
  1. Enter in power shell:
 cd dp-203/Allfiles/labs/01
 ./setup.ps1
  1. here you can see the complete power shell log:
    5a) as a file:
    Power shell error messages.txt

5b) as text:

Starting script at 04/26/2023 08:48:19
                                                                                                                        
Enter a password to use for the SQLUser login.                                                                          
    The password must meet complexity requirements:
     - Minimum 8 characters. 
     - At least one upper case English letter [A-Z]
     - At least one lower case English letter [a-z]
     - At least one digit [0-9]
     - At least one special character (!,@,#,%,^,&,$)
     : Data123#   
Password Data123# accepted. Make sure you remember this!
Registering resource providers...
Microsoft.Synapse : Registered
Microsoft.Sql : Registered
Microsoft.Storage : Registered
Microsoft.Compute : Registered
Your randomly-generated suffix for Azure resources is x3vf5wy
Finding an available region. This may take several minutes...
Trying centralus
Using centralus
Creating dp203-x3vf5wy resource group in centralus ...
Creating synapsex3vf5wy Synapse Analytics workspace in dp203-x3vf5wy resource group...
(This may take some time!)
New-AzResourceGroupDeployment: /home/cstx_a_schleh/dp-203/Allfiles/labs/01/setup.ps1:141
Line |
 141 |  New-AzResourceGroupDeployment -ResourceGroupName $resourceGroupName `
     |  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | 8:51:29 AM - Error: Code=InvalidTemplateDeployment; Message=Deployment failed with multiple errors: 'Authorization failed for template resource 'datalakex3vf5wy/default/files/Microsoft.Authorization/3aab02e9-9b0b-54ec-a155-85f5f652faaf' of type
     | 'Microsoft.Storage/storageAccounts/blobServices/containers/providers/roleAssignments'. The client '[email protected]' with object id 'cef3760c-22e4-488d-8d37-0a4007bb6fe4' does not have permission to perform action
     | 'Microsoft.Authorization/roleAssignments/write' at scope
     | '/subscriptions/ac7d4266-c54a-41d0-970b-5ba5bf9ddfae/resourceGroups/dp203-x3vf5wy/providers/Microsoft.Storage/storageAccounts/datalakex3vf5wy/blobServices/default/containers/files/providers/Microsoft.Authorization/roleAssignments/3aab02e9-9b0b-54ec-a155-85f5f652faaf'.:Authorization failed for template resource '95a444ef-4c02-5c8c-92e5-5541883279ee' of type 'Microsoft.Authorization/roleAssignments'. The client '[email protected]' with object id 'cef3760c-22e4-488d-8d37-0a4007bb6fe4' does not have permission to perform action 'Microsoft.Authorization/roleAssignments/write' at scope '/subscriptions/ac7d4266-c54a-41d0-970b-5ba5bf9ddfae/resourceGroups/dp203-x3vf5wy/providers/Microsoft.Authorization/roleAssignments/95a444ef-4c02-5c8c-92e5-5541883279ee'.'
New-AzResourceGroupDeployment: /home/cstx_a_schleh/dp-203/Allfiles/labs/01/setup.ps1:141
Line |
 141 |  New-AzResourceGroupDeployment -ResourceGroupName $resourceGroupName `
     |  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | The deployment validation failed
Pausing the adxx3vf5wy Data Explorer Pool...
Stop-AzSynapseKustoPool: /home/cstx_a_schleh/dp-203/Allfiles/labs/01/setup.ps1:156
Line |
 156 |  Stop-AzSynapseKustoPool -Name $adxpool -ResourceGroupName $resourceGr …
     |  ~~~~~~~~~~~~~~~~~~~~~~~
     | The 'Stop-AzSynapseKustoPool' command was found in the module 'Az.Synapse', but the module could not be loaded. For more information, run 'Import-Module Az.Synapse'.
Granting permissions on the datalakex3vf5wy storage account...
(you can ignore any warnings!)
New-AzRoleAssignment: /home/cstx_a_schleh/dp-203/Allfiles/labs/01/setup.ps1:164
Line |
 164 |  New-AzRoleAssignment -Objectid $id -RoleDefinitionName "Storage Blob  …
     |                                 ~~~
     | Cannot validate argument on parameter 'ObjectId'. The argument is null or empty. Provide an argument that is not null or empty, and then try the command again.
Creating the sqlx3vf5wy database...
Sqlcmd: Error: Microsoft ODBC Driver 18 for SQL Server : Login timeout expired.
Sqlcmd: Error: Microsoft ODBC Driver 18 for SQL Server : TCP Provider: Error code 0x2AF9.
Sqlcmd: Error: Microsoft ODBC Driver 18 for SQL Server : A network-related or instance-specific error has occurred while establishing a connection to synapsex3vf5wy.sql.azuresynapse.net. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online..
Loading data...

/home/cstx_a_schleh/dp-203/Allfiles/labs/01/data/DimCurrency.txt
SQLState = S1T00, NativeError = 0
Error = [Microsoft][ODBC Driver 18 for SQL Server]Login timeout expired
SQLState = 08001, NativeError = 11001
Error = [Microsoft][ODBC Driver 18 for SQL Server]TCP Provider: Error code 0x2AF9
SQLState = 08001, NativeError = 11001
Error = [Microsoft][ODBC Driver 18 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to synapsex3vf5wy.sql.azuresynapse.net. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.

/home/cstx_a_schleh/dp-203/Allfiles/labs/01/data/DimCustomer.txt
SQLState = S1T00, NativeError = 0
Error = [Microsoft][ODBC Driver 18 for SQL Server]Login timeout expired
SQLState = 08001, NativeError = 11001
Error = [Microsoft][ODBC Driver 18 for SQL Server]TCP Provider: Error code 0x2AF9
SQLState = 08001, NativeError = 11001
Error = [Microsoft][ODBC Driver 18 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to synapsex3vf5wy.sql.azuresynapse.net. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.

/home/cstx_a_schleh/dp-203/Allfiles/labs/01/data/DimDate.txt
SQLState = S1T00, NativeError = 0
Error = [Microsoft][ODBC Driver 18 for SQL Server]Login timeout expired
SQLState = 08001, NativeError = 11001
Error = [Microsoft][ODBC Driver 18 for SQL Server]TCP Provider: Error code 0x2AF9
SQLState = 08001, NativeError = 11001
Error = [Microsoft][ODBC Driver 18 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to synapsex3vf5wy.sql.azuresynapse.net. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.

/home/cstx_a_schleh/dp-203/Allfiles/labs/01/data/DimGeography.txt
SQLState = S1T00, NativeError = 0
Error = [Microsoft][ODBC Driver 18 for SQL Server]Login timeout expired
SQLState = 08001, NativeError = 11001
Error = [Microsoft][ODBC Driver 18 for SQL Server]TCP Provider: Error code 0x2AF9
SQLState = 08001, NativeError = 11001
Error = [Microsoft][ODBC Driver 18 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to synapsex3vf5wy.sql.azuresynapse.net. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.

/home/cstx_a_schleh/dp-203/Allfiles/labs/01/data/DimProduct.txt
SQLState = S1T00, NativeError = 0
Error = [Microsoft][ODBC Driver 18 for SQL Server]Login timeout expired
SQLState = 08001, NativeError = 11001
Error = [Microsoft][ODBC Driver 18 for SQL Server]TCP Provider: Error code 0x2AF9
SQLState = 08001, NativeError = 11001
Error = [Microsoft][ODBC Driver 18 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to synapsex3vf5wy.sql.azuresynapse.net. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.

/home/cstx_a_schleh/dp-203/Allfiles/labs/01/data/DimProductCategory.txt
SQLState = S1T00, NativeError = 0
Error = [Microsoft][ODBC Driver 18 for SQL Server]Login timeout expired
SQLState = 08001, NativeError = 11001
Error = [Microsoft][ODBC Driver 18 for SQL Server]TCP Provider: Error code 0x2AF9
SQLState = 08001, NativeError = 11001
Error = [Microsoft][ODBC Driver 18 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to synapsex3vf5wy.sql.azuresynapse.net. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.

/home/cstx_a_schleh/dp-203/Allfiles/labs/01/data/DimProductSubCategory.txt
SQLState = S1T00, NativeError = 0
Error = [Microsoft][ODBC Driver 18 for SQL Server]Login timeout expired
SQLState = 08001, NativeError = 11001
Error = [Microsoft][ODBC Driver 18 for SQL Server]TCP Provider: Error code 0x2AF9
SQLState = 08001, NativeError = 11001
Error = [Microsoft][ODBC Driver 18 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to synapsex3vf5wy.sql.azuresynapse.net. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.

/home/cstx_a_schleh/dp-203/Allfiles/labs/01/data/DimPromotion.txt
SQLState = S1T00, NativeError = 0
Error = [Microsoft][ODBC Driver 18 for SQL Server]Login timeout expired
SQLState = 08001, NativeError = 11001
Error = [Microsoft][ODBC Driver 18 for SQL Server]TCP Provider: Error code 0x2AF9
SQLState = 08001, NativeError = 11001
Error = [Microsoft][ODBC Driver 18 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to synapsex3vf5wy.sql.azuresynapse.net. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.

/home/cstx_a_schleh/dp-203/Allfiles/labs/01/data/DimSalesTerritory.txt
SQLState = S1T00, NativeError = 0
Error = [Microsoft][ODBC Driver 18 for SQL Server]Login timeout expired
SQLState = 08001, NativeError = 11001
Error = [Microsoft][ODBC Driver 18 for SQL Server]TCP Provider: Error code 0x2AF9
SQLState = 08001, NativeError = 11001
Error = [Microsoft][ODBC Driver 18 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to synapsex3vf5wy.sql.azuresynapse.net. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.

/home/cstx_a_schleh/dp-203/Allfiles/labs/01/data/FactInternetSales.txt
SQLState = S1T00, NativeError = 0
Error = [Microsoft][ODBC Driver 18 for SQL Server]Login timeout expired
SQLState = 08001, NativeError = 11001
Error = [Microsoft][ODBC Driver 18 for SQL Server]TCP Provider: Error code 0x2AF9
SQLState = 08001, NativeError = 11001
Error = [Microsoft][ODBC Driver 18 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to synapsex3vf5wy.sql.azuresynapse.net. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.
Pausing the sqlx3vf5wy SQL Pool...
Suspend-AzSynapseSqlPool: /home/cstx_a_schleh/dp-203/Allfiles/labs/01/setup.ps1:184
Line |
 184 |  Suspend-AzSynapseSqlPool -WorkspaceName $synapseWorkspace -Name $sqlD …
     |  ~~~~~~~~~~~~~~~~~~~~~~~~
     | The 'Suspend-AzSynapseSqlPool' command was found in the module 'Az.Synapse', but the module could not be loaded. For more information, run 'Import-Module Az.Synapse'.
Loading data...
Get-AzStorageAccount: /home/cstx_a_schleh/dp-203/Allfiles/labs/01/setup.ps1:188
Line |
 188 |  … geAccount = Get-AzStorageAccount -ResourceGroupName $resourceGroupNam …
     |                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | The Resource 'Microsoft.Storage/storageAccounts/datalakex3vf5wy' under resource group 'dp203-x3vf5wy' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix

sales.csv
Set-AzStorageBlobContent: /home/cstx_a_schleh/dp-203/Allfiles/labs/01/setup.ps1:195
Line |
 195 |      Set-AzStorageBlobContent -File $_.FullName -Container "files" -Bl …
     |      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | Could not get the storage context.  Please pass in a storage context or set the current storage context.
Script completed at 04/26/2023 08:54:11
PS /home/cstx_a_schleh/dp-203/Allfiles/labs/01> 

Lab 22 (synapse and purview) no longer works at powershell script launch - high importance

Module: 00

Lab/Demo: 00

Task: 00

Step: 00

Description of issue

image

I was able to immediately experience the problem on my own environment . The template is the cause of it.

New-AzResourceGroupDeployment: /home/user1-32139270/dp-203/Allfiles/labs/22/setup.ps1:142 Line | 142 | New-AzResourceGroupDeployment -ResourceGroupName $resourceGroupName
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| 6:56:26 AM - Error: Code=InvalidTemplateDeployment; Message=The template deployment 'setup' is not valid according to the validation procedure. The tracking id is 'af939892-c675-4780-ba72-f75044dbfdb2'. See inner errors for
| details.
New-AzResourceGroupDeployment: /home/user1-32139270/dp-203/Allfiles/labs/22/setup.ps1:142
Line |
142 | New-AzResourceGroupDeployment -ResourceGroupName $resourceGroupName | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 6:56:27 AM - Error: Code=35001; Message=Validation failed for purview1jtdza6. A Tenant-level Purview Account already exists for tenant: LODS-Prod-MSLearn-MCA with ID: bb7ed293-2674-4aef-a74a-dbf340a8ab33. If you want to create an | additional Purview Account, please reach out to Microsoft Support. Correlation ID: af939892-c675-4780-ba72-f75044dbfdb2 New-AzResourceGroupDeployment: /home/user1-32139270/dp-203/Allfiles/labs/22/setup.ps1:142 Line | 142 | New-AzResourceGroupDeployment -ResourceGroupName $resourceGroupName
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| The deployment validation failed
Granting permissions on the datalake1jtdza6 storage account...
(you can ignore any warnings!)
New-AzRoleAssignment: /home/user1-32139270/dp-203/Allfiles/labs/22/setup.ps1:160
Line |
160 | New-AzRoleAssignment -Objectid $id -RoleDefinitionName "Storage Blob …
| ~~~
| Cannot validate argument on parameter 'ObjectId'. The argument is null or empty. Provide an argument that is not null or empty, and then try the command again.
Loading data to data lake...
Get-AzStorageAccount: /home/user1-32139270/dp-203/Allfiles/labs/22/setup.ps1:165
Line |
165 | … geAccount = Get-AzStorageAccount -ResourceGroupName $resourceGroupNam …
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| The Resource 'Microsoft.Storage/storageAccounts/datalake1jtdza6' under resource group 'dp203-1jtdza6' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix

products.csv
Set-AzStorageBlobContent: /home/user1-32139270/dp-203/Allfiles/labs/22/setup.ps1:172
Line |
172 | Set-AzStorageBlobContent -File $_.FullName -Container "files" -Bl …
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| Could not get the storage context. Please pass in a storage context or set the current storage context.
Creating databases...
Sqlcmd: Error: Microsoft ODBC Driver 18 for SQL Server : Login timeout expired.
Sqlcmd: Error: Microsoft ODBC Driver 18 for SQL Server : TCP Provider: Error code 0x2AF9.
Sqlcmd: Error: Microsoft ODBC Driver 18 for SQL Server : A network-related or instance-specific error has occurred while establishing a connection to synapse1jtdza6-ondemand.sql.azuresynapse.net. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online..
Sqlcmd: Error: Microsoft ODBC Driver 18 for SQL Server : Login timeout expired.
Sqlcmd: Error: Microsoft ODBC Driver 18 for SQL Server : TCP Provider: Error code 0x2AF9.
Sqlcmd: Error: Microsoft ODBC Driver 18 for SQL Server : A network-related or instance-specific error has occurred while establishing a connection to synapse1jtdza6.sql.azuresynapse.net. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online..
Pausing the sql1jtdza6 SQL Pool...

Id Name PSJobTypeName State HasMoreData Location Command


2 Long Running O… AzureLongRunni… Running True localhost Suspend-AzSynapseSqlPool
Script completed at 06/29/2023 06:56:47`
Repro steps:

Instructions need Updated from "File" to "Generic Protocol"

Module: 01

Lab/Demo: 01

Task: Use the Copy Data task to create a pipeline

Step: 03

Description of issue

Instructions state to navigate to the File tab to get the HTTP icon. HTTP isnt under File, its under Generic Protocol - needs updated
image

image

Repro steps:

Lab: Transform files using a serverless SQL pool. Bug in code

Module: 2

Lab/Demo: 1

Task: Use SQL to query CSV files

Step: 4

Description of issue
The SQL query returns an error.

The code block is this one:

 SELECT
     TOP 100 *
 FROM
     OPENROWSET(
        BULK 'https://datalakex98z7wc.dfs.core.windows.net/files/sales/csv**',
         FORMAT = 'CSV',
         PARSER_VERSION='2.0'
     ) AS [result]

It returns an error:

Consecutive wildcard characters present in path 'https://datalakex98z7wc.dfs.core.windows.net/files/sales/csv**'.

If I change the line in question to this (adding a "/" before after the word csv) the query seems to work correctly:
BULK 'https://datalakex98z7wc.dfs.core.windows.net/files/sales/csv/**',

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.