azure-player / azure.synapse.tools Goto Github PK
View Code? Open in Web Editor NEWPowerShell module to deploy Synapse workspace (and more) in Microsoft Azure.
License: MIT License
PowerShell module to deploy Synapse workspace (and more) in Microsoft Azure.
License: MIT License
Deployment throws error "ASWT0029: Unknown object type: SparkConfiguration" when deploying a notebook (example shown below) that has a reference to a custom spark configuration. I believe this may be a very similar issue to #11 where SparkConfiguration would need to be added to the allowed valid types in private/!SynapseObject.class.ps1 as well as private/Get-SynapseObjectByName.ps1 so that it can pretend that the object referenced is valid and exists.
Notebook:
{
"name": "nb_example",
"properties": {
"folder": {
"name": "utility"
},
"nbformat": 4,
"nbformat_minor": 2,
"bigDataPool": {
"referenceName": "sparkSm",
"type": "BigDataPoolReference"
},
"targetSparkConfiguration": {
"referenceName": "CustomSparkConfig",
"type": "SparkConfigurationReference"
},
Failure:
**mVERBOSE: A***lyzing notebook dependencies...
**mVERBOSE: Folder: D:\a\***\drop\***\notebook
**mVERBOSE: - nb_example.json
Failure occurred while publishing artifacts to ***
##[debug]Error record:
##[debug]Exception: D:\a\_temp\a560ab36-63de-480f-888a-bc877***f23f0a.ps***:38
##[debug] | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##[debug] | ASWT0029: Unknown object type: SparkConfiguration.
azure.synapse.tools Version 0.23.000
When trying to use config-prod.csv to replace the script in a sqlscript as part of a synapse deployment, I found that the library saves the file with escaped values, e.g. \n instead of \n etc. After debugging I found that it is due to the $output = ($obj.Body | ConvertTo-Json -Compress:$true -Depth 100)
line in Save-SynapseObectAsFile.ps1.
Test script:
$output = "{type=SqlScript; name=Populate serverless; path=content.query; value=IF NOT EXISTS (SELECT * FROM sys.external_file_formats WHERE name = 'SynapseDeltaFor
mat') \n\t"
$output | ConvertTo-Json -Depth 100
Output:
{type=SqlScript; name=Populate serverless; path=content.query; value=IF NOT EXISTS (SELECT * FROM sys.external_file_formats WHERE name = \u0027SynapseDeltaFor\r\nmat\u0027) \\n\\t
Hello @NowinskiK,
This is not really an issue but more of a question related to the synapse module. Are you planning on developing an Azure DevOps task similar to the Data Factory tasks. We currently use them and it is awesome! Just curious if any plans about Synapse.
Thanks again,
Chris
Error occurs when deploying notebooks with a spark pool defined. Sample of notebook json:
{
"name": "my_notebook",
"properties": {
"folder": {
"name": "my_folder"
},
"nbformat": 4,
"nbformat_minor": 2,
"bigDataPool": {
"referenceName": "sspklbdp01",
"type": "BigDataPoolReference"
},
....
Returns error:
VERBOSE: Analyzing notebook dependencies...
VERBOSE: Folder: D:\a\1\b\synapse_deploy\notebook
VERBOSE: - my_notebook.json
##[debug]Error record:
##[debug]
##[debug]Exception: C:\Users\VssAdministrator\Documents\PowerShell\Modules\azure.synapse.tools\0.18.0\private\Import-SynapseObjects.ps1:20
##[debug]Line |
##[debug] 20 | Get-ChildItem "$folder" -Filter "*.json" | Where-Object { !$_.Nam โฆ
##[debug] | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##[debug] | ASWT0029: Unknown object type: BigDataPool.
azure.synapse.tools Version 0.18.000
I have a simple pipeline with one activity of type Notebook that reference a Spark notebook. When deploying this pipeline I get error message:
ASWT0014: Type [Notebook] is not supported.
Pipeline definition:
{ "name": "pl_execute_notebook_cicd", "properties": { "description": "Pipeline to trigger from Azure DevOps release to execute deployed Spark notebook dynamically", "activities": [ { "name": "execute_notebook", "type": "SynapseNotebook", "dependsOn": [], "policy": { "timeout": "7.00:00:00", "retry": 0, "retryIntervalInSeconds": 30, "secureOutput": false, "secureInput": false }, "userProperties": [], "typeProperties": { "notebook": { "referenceName": "generate_job_pipeline", "type": "NotebookReference" }, "snapshot": true, "sparkPool": { "referenceName": "labsparkpool01", "type": "BigDataPoolReference" } } } ], "parameters": { "notebook_name": { "type": "string" }, "sparkpool_name": { "type": "string" } }, "folder": { "name": "Lakehouse/Maintenance" }, "annotations": [] } }
Same pipeline had dynamic reference to Notebook from pipeline parameter:
"typeProperties": { "notebook": { "referenceName": { "value": "@pipeline().parameters.notebook_name", "type": "Expression" }, "type": "NotebookReference" }, "snapshot": true, "sparkPool": { "referenceName": { "value": "@pipeline().parameters.sparkpool_name", "type": "Expression" }, "type": "BigDataPoolReference" } }
With dynamic reference to Notebook I got another error message:
ASWT0029: Unknown object type: parameters.
Desired behaviour is to ignore reference if it is set by Expression.
STEP: Stopping triggers...
Getting triggers...
##[error]Unable to find type [Microsoft.Azure.Commands.Synapse.Models.PSTrigger].
Using Az.Synapse 0.19.0
enhancement
I've found the SynapseDocDiagram function that generates code to produce a mermaid diagram very useful for understanding dependencies when taking over an existing project.
I'd like the option to also include Activities within the diagram to have a holistic view of all elements included within a pipeline.
Example of Activities option:
$synapse = Import-SynapseFromFolder -RootFolder $RootFolder -SynapseWorkspaceName 'whatever' -IncludeActivities 'Yes'
Example of Activity displayed within diagram:
[activities:type].[activities:name]
Json file is below
Synapse.zip
When including a configfile (-Stage "$configCsvFile") I get an error.
STEP: Replacing all properties environment-related...
##[error]A parameter cannot be found that matches parameter name 'option'.
Code in Publish-SynapseFromJson.ps1:
if (![string]::IsNullOrEmpty($Stage)) {
Update-PropertiesFromFile -synapse $synapse -stage $Stage -option $opt
} else {
Write-Host "Stage parameter was not provided - action skipped."
}
Code in Update-PropertiesFromFile.ps1:
function Update-PropertiesFromFile {
[CmdletBinding()]
param (
[Parameter(Mandatory)] [Synapse] $synapse,
[Parameter(Mandatory)] [string] $stage,
[switch] $dryRun = $false
)
I guess this is similar/related to issue #2
enhancement
In a large/complex project the DocDiagram generated can become cluttered and confusing to understand due to the number of dependencies.
Please add an option to filter to a specific pipeline and display that particular pipeline and all downstream dependencies.
This will allow separate diagrams to be created for each pipeline if required.
Example:
$synapse = Import-SynapseFromFolder -RootFolder $RootFolder -SynapseWorkspaceName 'whatever' -PipelineFilter 'pipelinename'
Hi again Kamil, new customer for me and new use case for your great tool :).
I've got an error (Exception has been thrown by the target of an invocation) from Set-AzSynapsePipeline for a pipeline where I replace properties with Stage parameter. By comparing the pipeline before and after the replacement I found out that "activities" with only one child activity was missing [] in the final json file. In my case it was an foreach activity with only one child but the same error happens in a simple pipeline with one activity.
I googled a bit on ConvertTo-Json which seems to have this bug of treating arrays with single elements wrongly. But -Depth parameter should have resolved this error. So I figured my "activities" should be converted to array earlier. Since PowerShell is not my strongest side the solution became a foolish fix in form of the following code:
$arr = @()
$arr += $Item.$prop
$Item.$prop = $arr
in ConvertFrom-OrderedHashTablesToArrays in the end of if ($Item.$prop.GetType().Name -eq "OrderedDictionary") statement. You will surely find a more elegant fix.
Here is a test pipeline:
{ "name": "TestSingleActivity", "properties": { "activities": [ { "name": "Set variable1", "type": "SetVariable", "dependsOn": [], "userProperties": [], "typeProperties": { "variableName": "TestVariable", "value": "1" } } ], "variables": { "TestVariable": { "type": "String", "defaultValue": "test" } }, "annotations": [] } }
And a test config file:
type,name,path,value
pipeline,TestSingleActivity,variables.TestVariable.defaultValue,"test1"
Found a bug in the incremental deployment. If the blob does not exist in the container, the deployment will fail.
Hello,
I am trying to add a config replacement for a notebook parameter, but it requires a new line at the end of the string.
Example:
type,name,path,value
notebook,notebook1,$.properties.cells[0].source[14],"output_storage_account = 'myStorage'\r\n"
However, in the notebook, there is no new line and combines the next line with the following:
output_storage_account = 'myStorage'\\r\\ntemp_output="temp"
Is there a specific syntax I should be using in the csv?
Thanks.
When running the Publish-SynapseFromJson with a csv environment configuration pipelines that contain only a single activity fail.
The resulting "~pipeline.json" file is missing the array [] syntax in the output file. I suppose this is due to Powershell unboxing arrays.
Example:
pipeline.json
{ "name": "SetVariable", "properties": { "activities": [ { "name": "SetVar", "type": "SetVariable", "dependsOn": [], "userProperties": [], "typeProperties": { "variableName": "MyDummyVar", "value": "MyDummyVal" } } ], "variables": { "MyDummyVar": { "type": "String" } }, "annotations": [] } }
environment.csv
type,name,path,value pipeline,pipeline,activities[0].typeProperties.value,"MyProductionValue"
~pipeline.json
{ "name": "SetVariable", "properties": { "activities": { "name": "SetVar", "type": "SetVariable", "dependsOn": [], "userProperties": [], "typeProperties": { "variableName": "MyDummyVar", "value": "MyProductionValue" } }, "variables": { "MyDummyVar": { "type": "String" } }, "annotations": [] } }
I have tried deploying but it keeps giving me errors. I tried with all different types of objects. I tried a single LS with dependencies, a LS without dependencies, I deployed one manually to see if it got to the point where it checks that it has already been deployed but it never gets that far. Here's an example. Unfortunately, no helpful error message.
======================================================================================
### azure.synapse.tools Version 0.16.000 ###
======================================================================================
Invoking Publish-SynapseFromJson (https://github.com/SQLPlayer/azure.synapse.tools)
with the following parameters:
======================================================================================
RootFolder: D:\a\1\a\SynapseAnalytics\
ResourceGroupName: [REDACTED]
Synapse Workspace: [REDACTED]
Location: WestEurope
Stage:
Options provided: True
Publishing method: AzResource
======================================================================================
Publish options are provided.
STEP: Verifying whether Synapse workspace exists...
Synapse Workspace exists.
===================================================================================
STEP: Reading Synapse Workspace from JSON files...
IntegrationRuntimes: 2 object(s) loaded.
LinkedServices: 6 object(s) loaded.
Pipelines: 11 object(s) loaded.
DataSets: 4 object(s) loaded.
DataFlows: 1 object(s) loaded.
Triggers: 4 object(s) loaded.
SqlScripts: 0 object(s) loaded.
KqlScripts: 0 object(s) loaded.
Notebooks: 0 object(s) loaded.
Managed VNet: 1 object(s) loaded.
Managed Private Endpoints: 5 object(s) loaded.
# Number of objects marked as to be deployed: 1/34
- [linkedService].[LS_KEV]
===================================================================================
STEP: Replacing all properties environment-related...
Stage parameter was not provided - action skipped.
===================================================================================
STEP: Stopping triggers...
Getting triggers...
===================================================================================
STEP: Deployment of all Synapse objects...
Start deploying object: [linkedService].[LS_KEV] (0 dependency/ies)
##[error]
CorrelationId: 381e728a-bf2a-44dd-a9fb-bd68b7ebfc73
##[error]PowerShell exited with code '1'.
STEP: Stopping triggers...
Getting triggers...
##[error]The property 'RuntimeState' cannot be found on this object. Verify that the property exists.
My triggers have been successfully deployed when there weren't any present. The next time, it fails the step where it wants to stop them.
As the error message is saying, the property RuntimeState is missing for Synapse, where it is present for DataFactoryV2. Seems like a vital property to me..
Hello,
I was trying to use the option DeleteNotInSource
, but it does not look like the option is being applied to the publish step in Synapse. Can you please add that as a part of the Publish-AzSynapseFromJson
function?
Thanks.
I don't know if in DevOps is working or not
::: mermaid
graph LR
pipeline.pipeline1 --> dataset.DelimitedText1
dataset.DelimitedText1 --> linkedService.AzureBlobStorage1
IntegrationRuntime.AutoResolveIntegrationRuntime --> managedVirtualNetwork.default
:::
Parse error on line 4:
...nagedVirtualNetwork.default
-----------------------^
Expecting 'SEMI', 'NEWLINE', 'SPACE', 'EOF', 'SQS', 'AMP', 'STYLE_SEPARATOR', 'PS', '(-', 'STADIUMSTART', 'SUBROUTINESTART', 'CYLINDERSTART', 'DIAMOND_START', 'TAGEND', 'TRAPSTART', 'INVTRAPSTART', 'START_LINK', 'LINK', 'DOWN', 'NUM', 'COMMA', 'ALPHA', 'COLON', 'MINUS', 'BRKT', 'DOT', 'PUNCTUATION', 'UNICODE_TEXT', 'PLUS', 'EQUALS', 'MULT', 'UNDERSCORE', got 'DEFAULT'
##[error]A parameter cannot be found that matches parameter name 'option'.
Using this code:
$opt = New-SynapsePublishOption
$opt.StopStartTriggers = $false
Publish-SynapseFromJson -RootFolder "$RootFolder" -ResourceGroupName "$ResourceGroupName" -SynapseWorkspaceName "$SynapseWorkspaceName" -Location "$Location" -Option $opt
Probably because ApplyExclusionOptions.ps1 only has the $synapse parameter.
The synapse code is different from adf:
synapse:
# Apply Deployment Options if applicable
if ($null -ne $Option) {
ApplyExclusionOptions -synapse $synapse -option $opt
}
adf:
# Apply Deployment Options if applicable
if ($null -ne $Option) {
ApplyExclusionOptions -adf $adf
}
I've tried out this tool this morning and it give's a nice overview of the mess I've made ๐
The only thing is, I've added the Markdown Preview Mermaid Support pluging and found that your code starts with ::: mermaid. But I've found that, to show the diagram instead of the code, I needed ``` at the beginning of the file.
When I googled on Mermaid markdown, I found a lot of back ticks and not so many colons.
Capability of having an incremental deployment. Since there are no global parameters in Synapse, the idea is to utilize a storage account to hold state.
Is it on the horizon to be able to deploy managed private endpoints and managed virtual networks as it's currently not supported?
Cheers
I have a pipeline with a WebActivity that uses Basic Authentication, which throws an exception. I've narrowed it down to this part, because when I change authentication to MSI, it works fine.
Start deploying object: [pipeline].[PL_CHECK_DEVOPS_STATUS] (2 dependency/ies)
VERBOSE: 1) Depends on: [LinkedService].[LS_KEV]
VERBOSE: Object [linkedService].[LS_KEV] is already deployed.
VERBOSE: 2) Depends on: [IntegrationRuntime].[SelfHostedIntegrationRuntime]
Start deploying object: [IntegrationRuntime].[SelfHostedIntegrationRuntime] (0 dependency/ies)
VERBOSE: Ready to deploy from file: C:\Synapse\integrationRuntime\SelfHostedIntegrationRuntime.json
VERBOSE: Integration Runtime type detected: Self-Hosted
Finished deploying object: [IntegrationRuntime].[SelfHostedIntegrationRuntime]
VERBOSE: Ready to deploy from file: C:\Synapse\pipeline\PL_CHECK_DEVOPS_STATUS.json
Set-AzSynapsePipeline: C:\PowerShell\Modules\azure.synapse.tools\0.18.0\private\Deploy-SynapseObjectOnly.ps1:100
Line |
100 | Set-AzSynapsePipeline `
| ~~~~~~~~~~~~~~~~~~~~~~~
| Exception has been thrown by the target of an invocation.
Finished deploying object: [pipeline].[PL_CHECK_DEVOPS_STATUS]
It's related to this part of code in the pipeline:
"authentication": {
"type": "Basic",
"username": {
"value": "@pipeline().parameters.devopsUsername",
"type": "Expression"
},
"password": {
"type": "AzureKeyVaultSecret",
"store": {
"referenceName": "LS_KEV",
"type": "LinkedServiceReference"
},
"secretName": "<mySecret>"
}
}
When I change it to this, it works fine:
"authentication": {
"type": "MSI",
"resource": "https://management.azure.com/"
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.