Giter Site home page Giter Site logo

azure / aksarc Goto Github PK

View Code? Open in Web Editor NEW
109.0 2.6K 45.0 32.4 MB

# Welcome to the Azure Kubernetes Service on Azure Stack HCI repo This is where the AKS-HCI team will track features and issues with AKS-HCI. We will monitor this repo in order to engage with our community and discuss questions, customer scenarios, or feature requests. Checkout our projects tab to see the roadmap for AKS-HCI!

License: MIT License

PowerShell 99.38% C# 0.62%

aksarc's Introduction

Welcome to the AKS repo

What's new

We are now part of the Azure adaptive cloud and the AKS family. To reflect this huge move we have simplified the product name from "AKS hybrid deployment options" to AKS. When needed we will suffix AKS with a platform name i.e. AKS for Azure Stack HCI or AKS for Windows Server, in general though we are simply AKS which you can deploy on your own infrastructure and manage through the cloud using Azure Arc.

Why is the repo then named aksArc? There is already an AKS repo and merging into that will be one of our next steps we want to make.

To learn more about the Azure adaptive cloud check out this blog post.

This repo is where the AKS team tracks features and issues with you encounter with AKS on your infrastructure. We are monitoring this repo and triage new issues regularly.

Overview

All AKS versions Microsoft ships for edge or datacenter deployment are part of the "AKS" family, this includes:

What you will find here

This repository is a central place for tracking features and issues with AKS. This repository is monitored by the product team in order to engage with our community and discuss questions, customer scenarios, or feature requests.

Support through issues on this repository is provided on a best-effort basis for issues that are reproducible outside of a specific cluster configuration (see Bug Guidance below). To receive urgent support you should file a support request through official Azure support channels as production and urgent support is explicitly out of scope for issues filed in this repository.

IMPORTANT: For official customer support with response-time SLAs please see Azure Support options and AKS Support Policies.

Do not file issues for AKS-Engine, Virtual-Kubelet, Azure Container Instances, or services on this repository unless it is related to that feature/service and functionality with AKS. For other tools, products and services see the Upstream Azure Compute projects page.

We want to hear from you! Respond to this short and anonymous survey to share your thoughts with us.

Important AKS hybrid links

Evaluation Guide https://aka.ms/aks-hybrid-evaluate
Roadmap https://aka.ms/aks-hybrid-roadmap
Release Notes https://aka.ms/AKS-hybrid-Releasenotes
Known Issues https://aka.ms/AKS-hybrid-issues
Documentaton https://aka.ms/aks-hybrid-docs
Customer Voice Community https://aka.ms/aks-hybrid-community
Meet the product team in Microsoft Teams
Fill in the form and we will add you to the channel ASAP.
https://aka.ms/aks-hybrid-teams

Bug Reports

IMPORTANT: An inability to meet the below requirements for bug reports are subject to being closed by maintainers and routed to official Azure support channels to provide the proper support experience to resolve user issues.

Bug reports filed on this repository should follow the default issue template that is shown when opening a new issue. At a bare minimum, issues reported on this repository must:

  1. Be reproducible outside of the current cluster
  • This means that if you file an issue that would require direct access to your cluster and/or Azure resources you will be redirected to open an Azure support ticket. Microsoft employees may not ask for personal / subscription information on Github.
    • For example, if your issue is related to custom scenarios such as custom network devices, configuration, authentication issues related to your Azure subscription, etc.
  1. Contain the following information:
  • A good title: Clear, relevant and descriptive - so that a general idea of the problem can be grasped immediately
  • Description: Before you go into the detail of steps to replicate the issue, you need a brief description.
    • Assume that whomever is reading the report is unfamiliar with the issue/system in question
  • Clear, concise steps to replicate the issue outside of your specific cluster.
    • These should let anyone clearly see what you did to see the problem, and also allow them to recreate it easily themselves. This section should also include results - both expected and the actual - along with relevant URLs.
  • Be sure to include any supporting information you might have that could aid the developers.
    • This includes YAML files/deployments, scripts to reproduce, exact commands used, screenshots, etc.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

aksarc's People

Contributors

abhilashaagarwala avatar austonli avatar baziwane avatar benjaminarmstrong avatar dmc-tech avatar ekeleasonye avatar hariprasadv avatar jasongerend avatar jessicaguan-zz avatar leslielin-5 avatar mamezgeb avatar mattmcspirit avatar microsoft-github-operations[bot] avatar microsoftopensource avatar mkostersitz avatar olaseniadeniji avatar penorouzi avatar pragyadw avatar scooley avatar sethmanheim avatar subodhbhargava avatar tksh164 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aksarc's Issues

Unable to run Install-AksHci with DO blocked by firewall

Our organization firewalls anything delivered by Microsoft Delivery Optimization (like windows updates). Because of this, we can't fetch the sbs-config.cab file or proceed with any further steps. Can we please have a toggle to download all required files over HTTP? alternatively, can we download all the required files, upload them to the target server and have the installation performed with local sources?

WAC Set up AKS stuck on Package download

WAC is stuck on Package download page, WAC Maschine has full internet connection without any proxy needed, also the hci nodes doesn't need an proxy and can access internet...

the wizard just counts the duration, but never finishes nor get an error, as it looks like something is blocking the download apparently because even after almost 2hours of waiting it didn't finish!?

Provide the ability within the Azure Portal (via Arc) to link AKS HCI clusters to the hosting Azure Stack HCI cluster

Title: Provide the ability within the Azure Portal (via Arc) to link AKS HCI clusters to the hosting Azure Stack HCI cluster

Description:
Currently, there is no relationship between the Azure Stack HCI cluster and each of the AKS HCI K8s clusters that are running on there in the Azure Portal (via Arc).
It would be a really useful feature to show the relationships; e.g. if there is a performance issue on the HCI cluster, it could affect the workloads running on AKS HCI. It will help with service management discovery too.

[BUG] When using the PowerShell module to deploy the AKS cluster, an error is generated when adding the wssdcloudagent generic role

Describe the bug
When using the PowerShell module to deploy the AKS cluster, an error is generated when adding the wssdcloudagent generic role. The following output is received when running the install-akshci cmdlet:

  • Adding wssdcloudagent cluster generic service role (hci-aks-test1)
    Add-ClusterGenericServiceRole : Static network 10.18.73.0/25 was not configured. Please use -StaticAddress to use this network or use -IgnoreNetwork to ignore it.
    At C:\Program Files\WindowsPowerShell\Modules\akshci\0.2.5\AksHci.psm1:445 char:9
  •     Add-ClusterGenericServiceRole -Name $global:config["clusterRo ...
    
  •     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    • CategoryInfo : InvalidOperation: (:) [Add-ClusterGenericServiceRole], ClusterCmdletException
    • FullyQualifiedErrorId : InvalidOperation,Microsoft.FailoverClusters.PowerShell.AddClusterGenericServiceRoleCommand
  • Waiting for cloudagent API endpoint to be accessible...
  • Warning: this depends on DNS propogation and can take between and 10-30 minutes in some environments...

The environment uses DHCP.

This can be resolved by running on another HCI node:
Add-ClusterGenericServiceRole -Name hci-aks-test1 -ServiceName wssdcloudagent -StaticAddress <ip-address/mask> | Out-Null

Once it has run, the rest of the install-akshci script completes successfully.

To Reproduce
Steps to reproduce the behavior:

  1. From a HCI Node, set the AKS config
    Set-AksHciConfig -deploymentType MultiNode -wssdImageDir c:\ClusterStorage\Volume01\hci-aks-test1 -cloudConfigLocation c:\ClusterStorage\Volume01\hci-aks-test1 -vnetname ComputeSwitch -clusterRoleName hci-aks-test1
  2. run Install-akshci
  3. wait until the error appears
  4. To fix, from another HCI node, run
    Add-ClusterGenericServiceRole -Name hci-aks-test1 -ServiceName wssdcloudagent -StaticAddress <ip-address/mask> | Out-Null
  5. Once the role has installed, the cript continues.

Expected behavior
The cluster role is created without manual intervention, or a parameter is exposed which allows to specify the Static IP Address

Screenshots
image

image

Environment (please complete the following information):

  • AKS-HCI Version (i.e. Public Preview 1)
  • Kubernetes Version (i.e. 1.18.6)

[BUG] WAC Extension import error

Describe the bug
Unable to import extension to WAC in GW Mode.

Invoke-WebRequest : {"error":{"code":"PathTooLongException","message":"The specified path, file name, or both are too
long. The fully qualified file name must be less than 260 characters, and the directory name must be less than 248
characters."}}
At C:\Program Files\windows admin center\PowerShell\Modules\ExtensionTools\ExtensionTools.psm1:236 char:17
+     $response = Invoke-WebRequest @params
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-WebRequest], WebExc
   eption
    + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeWebRequestCommand
Failed to get the extensions
At C:\Program Files\windows admin center\PowerShell\Modules\ExtensionTools\ExtensionTools.psm1:238 char:9
+         throw "Failed to get the extensions"
+         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: (Failed to get the extensions:String) [], RuntimeException
    + FullyQualifiedErrorId : Failed to get the extensions

To Reproduce

Adding aka.ms/WSLab scenario

Expected behavior
Importing extension should work.

Environment (please complete the following information):

see wslab scenario
LabConfig.ps1.txt
Scenario.ps1.txt

Failed to retrieve azure-arc-onboarding pods

Describe the bug
failed to retrieve azure-arc-onboarding data

To Reproduce
Steps to reproduce the behavior:

  1. create cluster
  2. create a Service Principle
  3. Install-AksHciArcOnboarding -clusterName $clusterName -resourcegroup $resourceGroup -location $location -subscriptionid $subscriptionId -clientid $appId -clientsecret $password -tenantid $tenant
  4. kubectl logs job/azure-arc-onboarding -n azure-arc-onboarding --follow

Expected behavior
it show ARC related the pods.

Screenshots

  • Successfully retrieved cluster information.
    addon.msft.microsoft/arc-onboarding-arctestwcluster created
  • Arc Onboarding Agent has been installed to the cluster
  • To watch progress for the Arc Agents Onboarding run: kubectl logs job/azure-arc-onboarding -n azure-arc-onboarding --follow
    Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
    PS C:\E2EWorkload> kubectl logs job/azure-arc-onboarding -n azure-arc-onboarding --follow
    Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
    PS C:\E2EWorkload>

Environment (please complete the following information):
Name Value


cloudConfigLocation c:\clusterstorage\volume1\Config
controlplaneVmSize Standard_K8S3_v1
deploymentType MultiNode
vnetType Transparent
skipUpdates False
vlanID 0
nodeConfigLocation C:\programdata\wssdagent
nodeAgentPort 45000
clusterRoleName ca-4f04d38a-4490-48ef-89b6-4032f7ca790e
wssdImageDir c:\clusterstorage\volume1\Images
vnetName extSwitch
stagingShare http://netapplinux.guest.corp.microsoft.com/AzureEdge
vippoolStartIP
forceDnsReplication False
loadBalancerVmSize Standard_K8S_v1
useStagingShare False
nodeAgentAuthorizerPort 45001
skipHostLimitChecks False
useStagingCR False
macPoolEnd
cloudServiceCidr
sshPublicKey C:\Users\wolfpack.ssh\id_rsa.pub
akshciVersion 0.9.4.4
insecure False
vippoolEndIP
wssdDir C:\wssd
macPoolStart

cluster version:
Version : v1.18.8

[BUG] Scripts should be checking logical processor count in hosts

Describe the bug
When running in VMs (nested environment), infra fails to create with not really friendly error. Reason is (most probably, now testing again), that "host" does not have enough LPs. In case "host" (I mean VM that servers as host) that will be has let's say 2 CPUs, Install-AksHci will fail with below error. The real reason is, that linux VM will not start as it has 4 LPs. script will remove it right away and throws below error.

Screenshots
image

kubectl delete node does not remove workernode VM

Current Behavior
Installed single Linux node configuration. After a while, the worker node went into 'NotReady' mode after being 'Ready'. I still cannot figure out what actions caused it to happen.
I used the commandline to scale the cluster to two nodes. A new node was spun up, but the command did not terminate. This was probably because as stated in #1, the original node had gone into 'NotReady' state.
Used kubectl delete node to delete the node. It did not show up in the kubectl get nodes command but the VM continued to run.

Expected Behavior
Kubectl delete node would delete the node and associated VM
K8s would detect that node was deleted and automatically spin up new node.

[BUG] Get-AksHciConfig and Install-AksHci cannot find cloudconfig in the remote host

Describe the bug
No cloudconfig file can be found

To Reproduce
Not sure. I am deploying in Azure in a 2 node HCI cluster. I initialized the nodes and used Set-AksHciConfig, no difference

Expected behavior
Install-AksHciConfig and Get-AksHciConfig working

Screenshots

PS C:\Users\labadmin> set-akshciconfig
[10/20/2020 02:12:42] Creating configuration
 - Removing old configuration...
 - New configuration has been saved
PS C:\Users\labadmin> get-akshciconfig
[10/20/2020 02:07:27] Checking for configuration
 - Merging Windows Admin Center configuration
 - Loading Windows Admin Center configuration from 'C:\Users\labadmin\Windows Admin Center\aks-hci-settings.json'...
 - Processing configuration...
[10/20/2020 02:07:27] Creating configuration
 - Removing old configuration...
 - New configuration has been saved
[10/20/2020 02:07:27] Reading configuration
[10/20/2020 02:07:28] Validating configuration
[10/20/2020 02:07:28] Confirming Configuration
[10/20/2020 02:07:28] Determining deployment type
 - This is a multi-node deployment using failover cluster: AZSHCI
[10/20/2020 02:07:28] Verifying cloudconfig access file
 - Retrieving access file from WIN-MUCC37Q1OIO
Copy-Item : Cannot find path '\\WIN-MUCC37Q1OIO\C$\Users\labadmin\.wssd\cloudconfig' because it does not exist.
At C:\Program Files\WindowsPowerShell\Modules\Moc\0.2.8\Common.psm1:946 char:5
+     Copy-Item -Path $remotePath -Destination $destination
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (\\WIN-MUCC37Q1O...ssd\cloudconfig:String) [Copy-Item], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.CopyItemCommand
Unable to locate a valid cloudconfig access file.
At C:\Program Files\WindowsPowerShell\Modules\Moc\0.2.8\Common.psm1:1437 char:5
+     throw "Unable to locate a valid cloudconfig access file."
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: (Unable to locat...ig access file.:String) [], RuntimeException
    + FullyQualifiedErrorId : Unable to locate a valid cloudconfig access file.

Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]
  • AKS-HCI Version (i.e. Public Preview 1)
  • Kubernetes Version (i.e. 1.18.6)

Additional context
Add any other context about the problem here.

Collect log files

  • From a PowerShell Admin window run Collect-AksHciLogs
  • If you are running into issues with the deployment wizard in Windows Admin Center, run
    Get-SMEUILogs.ps1 from the machine hosting Windows Admin Center.”
PS C:\Users\labadmin> get-akshcilogs
[10/20/2020 02:15:02] Checking for configuration
 - Merging Windows Admin Center configuration
 - Loading Windows Admin Center configuration from 'C:\Users\labadmin\Windows Admin Center\aks-hci-settings.json'...
 - Processing configuration...
[10/20/2020 02:15:02] Creating configuration
 - Removing old configuration...
 - New configuration has been saved
[10/20/2020 02:15:02] Reading configuration
[10/20/2020 02:15:02] Validating configuration
[10/20/2020 02:15:02] Confirming Configuration
[10/20/2020 02:15:02] Determining deployment type
 - This is a multi-node deployment using failover cluster: AZSHCI
[10/20/2020 02:15:02] Verifying cloudconfig access file
 - Retrieving access file from WIN-PSIERIALC37
Copy-Item : Cannot find path '\\WIN-PSIERIALC37\C$\Users\labadmin\.wssd\cloudconfig' because it does not exist.
At C:\Program Files\WindowsPowerShell\Modules\AksHci\0.2.8\Common.psm1:946 char:5
+     Copy-Item -Path $remotePath -Destination $destination
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (\\WIN-PSIERIALC...ssd\cloudconfig:String) [Copy-Item], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.CopyItemCommand
Unable to locate a valid cloudconfig access file.
At C:\Program Files\WindowsPowerShell\Modules\AksHci\0.2.8\Common.psm1:1437 char:5
+     throw "Unable to locate a valid cloudconfig access file."
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: (Unable to locat...ig access file.:String) [], RuntimeException
    + FullyQualifiedErrorId : Unable to locate a valid cloudconfig access file.
PS C:\Users\labadmin> get-module -name akshci
ModuleType Version    Name                                ExportedCommands
---------- -------    ----                                ----------------
Script     0.2.8      AksHci                              {Get-AksHciCluster, Get-AksHciCredential, Get-AksHciKubernetesVersion, Get-AksHciLogs...}

Not sure where to find Get-SMEUILogs.ps1, it is not in the preview package I downloaded

Static IP Address configurations are not supported

Describe the bug
When using static IP addresses for virtual machines and the host nodes deployment fails.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
Static IP addresses to work

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]
  • AKS-HCI Version (i.e. Public Preview 1)
  • Kubernetes Version (i.e. 1.18.6)

Additional context
Add any other context about the problem here.

Collect log files

  • From a PowerShell Admin window run Collect-AksHciLogs
  • If you are running into issues with the deployment wizard in Windows Admin Center, run
    Get-SMEUILogs.ps1 from the machine hosting Windows Admin Center.”

AKS VMs should respect Hyper-V default folder structure

Title:
AKS VMs should respect Hyper-V default folder structure

Description:
In Hyper-V, there is a folder with VMName and then folders for Configuration (Virtual Machines) and for storage (Virtual Hard Disks). It would be nice, if AKS VMs would use the same.

Currently is VHD outside VM folder.

image

image

image

image

[BUG] NodeConfigLocation cannot be on CSV

Describe the bug
Not sure if a bug, but I since CSV is available from every node, why not saving config on csv?

Invoke-Command -ComputerName $servers -ScriptBlock {
        Set-AksHciConfig    -vnetName $using:vSwitchName `
                                        -deploymentType MultiNode `
                                        -wssdDir  c:\clusterstorage\$using:VolumeName\Images `
                                        -wssdImageDir 
c:\clusterstorage\$using:VolumeName\Images ` 
                                        -cloudConfigLocation c:\clusterstorage\$using:VolumeName\Config `
                                        -NodeConfigLocation c:\clusterstorage\$using:VolumeName\Config\$env:COMPUTERNAME `
-ClusterRoleName "$($using:ClusterName)_AKS"
}

specifically this line
-NodeConfigLocation c:\clusterstorage$using:VolumeName\Config$env:COMPUTERNAME

To Reproduce
Steps to reproduce the behavior:
Once NodeConfigLocation is configured to csv, Install-AksHCI complains

[09/24/2020 10:41:11] Exception caught!!!

  • The parameter nodeConfigLocation must specify a local file path. Please re-run configuration.

error: the server doesn't have a resource type "job" & "deployments"

Describe the bug
I created cluster and try to contact to ARC though my Azure subscription. after I run Install-AksHciArcOnboarding command, it says "Arc Onboarding Agent has been installed to the cluster".
However, if I try to check the deployments, pods, I got the error - " the server doesn't have a resource type "deployments"

I checked that the service principle was created successfully from Azure portal.

Environment (please complete the following information):
OS Versions


Linux {v1.17.11, v1.18.8, v1.16.10}
Windows {v1.18.8}

command line:
PS C:\Users\wolfpack> $servicePrinciple = az ad sp create-for-RBAC --name "azure-arc-for-k8s" --scope /subscriptions/$subscriptionId/resourceGroups/$resourceGroup
Changing "azure-arc-for-k8s" to a valid URI of "http://azure-arc-for-k8s", which is the required format used for service principal names
Found an existing application instance of "087ed6e3-e91e-485d-9f75-04645db3264f". We will patch it
Creating a role assignment under the scope of "/subscriptions/5bc38b0f-3105-4d74-8024-a6ee5fd3a211/resourceGroups/AzureArcTest"
Role assignment already exists.

PS C:\Users\wolfpack> $appId = ($servicePrinciple | ConvertFrom-Json).appId
PS C:\Users\wolfpack> $password = ($servicePrinciple | ConvertFrom-Json).password
PS C:\Users\wolfpack> $tenant = ($servicePrinciple | ConvertFrom-Json).tenant

PS C:\Users\wolfpack> Get-AksHciCluster -clusterName $clusterName

[11/12/2020 07:17:45] Check for 'AksHci' module updates

  • Refreshing repository credentials.
  • Installed module version is 0.2.9
  • Latest module version is 0.2.9
  • You are already up to date
    [11/12/2020 07:17:49] Checking for configuration
    [11/12/2020 07:17:49] Reading configuration
    [11/12/2020 07:17:49] Adjusting configuration
    [11/12/2020 07:17:49] Validating configuration
    [11/12/2020 07:17:49] Confirming Configuration
    [11/12/2020 07:17:49] Determining deployment type
  • This is a multi-node deployment using failover cluster: jMxKfTQyzpnW
    [11/12/2020 07:17:49] Retrieving configuration for workload cluster 'samplecluster'
  • Successfully retrieved cluster information.
    Name : samplecluster
    Version : v1.18.8
    Control Planes : 1
    Linux Workers : 1
    Windows Workers : 0
    Phase : provisioned
    Ready : True
    PS C:\Users\wolfpack> Install-AksHciArcOnboarding -clusterName $clusterName -resourcegroup $resourceGroup -location $location -subscriptionid $subscriptionId -clientid $appId -clientsecret $password -tenantid $tenant
    [11/12/2020 07:17:56] Check for 'AksHci' module updates
  • Refreshing repository credentials.
  • Installed module version is 0.2.9
  • Latest module version is 0.2.9
  • You are already up to date
    [11/12/2020 07:17:56] Checking for configuration
    [11/12/2020 07:17:56] Reading configuration
    [11/12/2020 07:17:56] Adjusting configuration
    [11/12/2020 07:17:56] Validating configuration
    [11/12/2020 07:17:56] Confirming Configuration
    [11/12/2020 07:17:56] Determining deployment type
  • This is a multi-node deployment using failover cluster: jMxKfTQyzpnW
    [11/12/2020 07:17:56] Retrieving configuration for workload cluster 'samplecluster'
  • Successfully retrieved cluster information.
    addon.msft.microsoft/arc-onboarding-samplecluster created
  • Arc Onboarding Agent has been installed to the cluster
  • To watch progress for the Arc Agents Onboarding run: kubectl logs job/azure-arc-onboarding -n azure-arc-onboarding --follow
    PS C:\Users\wolfpack> kubectl logs job/azure-arc-onboarding -n azure-arc-onboarding --follow
    error: the server doesn't have a resource type "job"
    PS C:\Users\wolfpack> kubectl -n azure-arc get deployments,pods
    error: the server doesn't have a resource type "deployments"
    PS C:\Users\wolfpack>

Enhanced Storage functionality

Tracking item for work to enhance the storage experience in AKS-HCI

  • ReadWriteMany volumes with capacity quotas
  • Volume expansion
  • Snapshot
  • QoS
  • Raw Block

[BUG] AksHci day0 install with PwSh stuck at testing cloud agent endpoint

Describe the bug
Installing AksHci via Install-AksHci on a 2-node HCI cluster deployed on Azure VMs. The installer runs more or less OK, meaning that there are some errors. First, at the very beginning it fails to delete the cloudagent directory both in the node where the Install-AksHci is running and in the neighbor node:

[10/23/2020 04:24:36] Cleaning up files on WIN-MUCC37Q1OIO
 - Removing yaml on WIN-MUCC37Q1OIO...
 - Removing cloudagent directory on WIN-MUCC37Q1OIO...
Access is denied
    + CategoryInfo          : NotSpecified: (:) [Remove-Item], Win32Exception
    + FullyQualifiedErrorId : System.ComponentModel.Win32Exception,Microsoft.PowerShell.Commands.RemoveItemCommand
    + PSComputerName        : WIN-MUCC37Q1OIO
 - Removing nodeagent directory on WIN-MUCC37Q1OIO...
 - Removing cloudagent registry on WIN-MUCC37Q1OIO...
 - Removing nodeagent registry on WIN-MUCC37Q1OIO...
 - Removing image directory on WIN-MUCC37Q1OIO...
 - Downloaded images will be preserved
 - Removing all of the installation directory contents on WIN-MUCC37Q1OIO...

[10/23/2020 04:24:37] Cleaning up files on WIN-PSIERIALC37
 - Removing yaml on WIN-PSIERIALC37...
 - Removing cloudagent directory on WIN-PSIERIALC37...
Access is denied
    + CategoryInfo          : NotSpecified: (:) [Remove-Item], Win32Exception
    + FullyQualifiedErrorId : System.ComponentModel.Win32Exception,Microsoft.PowerShell.Commands.RemoveItemCommand
    + PSComputerName        : WIN-PSIERIALC37
 - Removing nodeagent directory on WIN-PSIERIALC37...
 - Removing cloudagent registry on WIN-PSIERIALC37...
 - Removing nodeagent registry on WIN-PSIERIALC37...
 - Removing image directory on WIN-PSIERIALC37...
 - Downloaded images will be preserved
 - Removing all of the installation directory contents on WIN-PSIERIALC37...

Secondly, and probably more relevant, it fails at enabling DNS locally:

- Adding wssdcloudagent cluster generic service role (ca-7b01383a-150c-490b-af5d-141572ce4d9a)

 -  - Installing missing feature 'DNS-Server-Tools' ...
Name                                    OwnerNode       State
----                                    ---------       -----
ca-7b01383a-150c-490b-af5d-141572ce4d9a WIN-PSIERIALC37 Failed
Path   :
Online : True

 - Waiting for cloudagent API endpoint to be accessible...
 - Warning: this depends on DNS propogation and can take between and 10-30 minutes in some environments...
 - [10/23/2020 04:31:08] Testing cloudagent endpoint: ca-7b01383a-150c-490b-af5d-141572ce4d9a.azshci.local
 - [10/23/2020 04:31:29] Testing cloudagent endpoint: ca-7b01383a-150c-490b-af5d-141572ce4d9a.azshci.local
 - [10/23/2020 04:31:49] Testing cloudagent endpoint: ca-7b01383a-150c-490b-af5d-141572ce4d9a.azshci.local
 - [10/23/2020 04:32:09] Testing cloudagent endpoint: ca-7b01383a-150c-490b-af5d-141572ce4d9a.azshci.local
 - [10/23/2020 04:32:29] Testing cloudagent endpoint: ca-7b01383a-150c-490b-af5d-141572ce4d9a.azshci.local
 - [10/23/2020 04:32:50] Testing cloudagent endpoint: ca-7b01383a-150c-490b-af5d-141572ce4d9a.azshci.local
[goes on for 40min and counting]

To Reproduce
Following the instructions to deploy HCI on Azure, and to deploy AKS on HCI

Expected behavior
Install-AksHci to work without errors

Screenshots
See logs above

Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]
  • AKS-HCI Version (i.e. Public Preview 1)
PS C:\users\labadmin\Windows Admin Center> get-module -name akshci
ModuleType Version    Name                                ExportedCommands
---------- -------    ----                                ----------------
Script     0.2.8      AksHci                              {Get-AksHciCluster, Get-AksHciCredential, Get-AksHciKubernetesVersion, Get-AksHciLogs...}
  • Kubernetes Version (i.e. 1.18.6)

Additional context
Add any other context about the problem here.

Collect log files

  • From a PowerShell Admin window run Collect-AksHciLogs
  • If you are running into issues with the deployment wizard in Windows Admin Center, run
    Get-SMEUILogs.ps1 from the machine hosting Windows Admin Center.”

[BUG] NodeAgent provisioning fails in WSLab

Describe the bug

Provisioning NodeAgent takes forever. is "location: WssdLocation" right?

image

To Reproduce
Steps to reproduce the behavior:
https://github.com/microsoft/WSLab/blob/ec9c6a8405c457fb09255d9e3702b3bc79ea10f2/Scenarios/AzSHCI%20and%20Kubernetes/Scenario.ps1#L98

Config:

[azshci1]: PS C:\Users\LabAdmin\Documents> Get-AksHciConfig

[09/24/2020 12:51:31] Checking for AksHci configuration


[09/24/2020 12:51:31] Reading AksHci configuration


[09/24/2020 12:51:31] Adjusting AksHci configuration


[09/24/2020 12:51:31] Validating AksHci configuration


[09/24/2020 12:51:31] Confirming AksHci Configuration

 - The final configuration is as follows...

 - Configuration name 'clusterRoleName' with value 'AzSHCI-Cluster_AKS'
 - Configuration name 'useStagingShare' with value 'False'
 - Configuration name 'vnetName' with value 'vSwitch'
 - Configuration name 'nodeAgentPort' with value '45000'
 - Configuration name 'nodeAgentAuthorizerPort' with value '45001'
 - Configuration name 'vnetType' with value 'Transparent'
 - Configuration name 'loadBalancerVmSize' with value 'Standard_K8S_v1'
 - Configuration name 'vippoolStartIP' with value ''
 - Configuration name 'nodeConfigLocation' with value 'C:\programdata\wssdagent'
 - Configuration name 'useStagingCR' with value 'False'
 - Configuration name 'sshPublicKey' with value 'C:\Users\LabAdmin\.ssh\id_rsa.pub'
 - Configuration name 'skipHostLimitChecks' with value 'False'
 - Configuration name 'insecure' with value 'False'
 - Configuration name 'wssdImageDir' with value 'c:\clusterstorage\AKS\Images'
 - Configuration name 'forceDnsReplication' with value 'False'
 - Configuration name 'stagingShare' with value ''
 - Configuration name 'akshciVersion' with value '0.9.3.1'
 - Configuration name 'macPoolStart' with value ''
 - Configuration name 'skipUpdates' with value 'False'
 - Configuration name 'cloudConfigLocation' with value 'c:\clusterstorage\AKS\Config'
 - Configuration name 'vippoolEndIP' with value ''
 - Configuration name 'deploymentType' with value 'MultiNode'
 - Configuration name 'macPoolEnd' with value ''
 - Configuration name 'controlplaneVmSize' with value 'Standard_K8S3_v1'
 - Configuration name 'wssdDir' with value 'c:\clusterstorage\AKS\Images'

[09/24/2020 12:51:31] Determining deployment type

 - This is a multi-node deployment using failover cluster: AzSHCI-Cluster

Name                           Value
----                           -----
clusterRoleName                AzSHCI-Cluster_AKS
useStagingShare                False
vnetName                       vSwitch
nodeAgentPort                  45000
nodeAgentAuthorizerPort        45001
vnetType                       Transparent
loadBalancerVmSize             Standard_K8S_v1
vippoolStartIP
nodeConfigLocation             C:\programdata\wssdagent
useStagingCR                   False
sshPublicKey                   C:\Users\LabAdmin\.ssh\id_rsa.pub
skipHostLimitChecks            False
insecure                       False
wssdImageDir                   c:\clusterstorage\AKS\Images
forceDnsReplication            False
stagingShare
akshciVersion                  0.9.3.1
macPoolStart
skipUpdates                    False
cloudConfigLocation            c:\clusterstorage\AKS\Config
vippoolEndIP
deploymentType                 MultiNode
macPoolEnd
controlplaneVmSize             Standard_K8S3_v1
wssdDir                        c:\clusterstorage\AKS\Images

After failed bootstrap process node cannot join the cluster

Implement healthchecks for post bootstrap process on Linux and Windows to ensure the node virtual machines are configured correctly before adding them to the cluster.
This will prevent sporadic cluster join and node provisioning failures.

[BUG] Get-SMEUILogs throws error when wac directory does not exist

Describe the bug
Get-SMEUILogs.ps1 throws an error when run on a system where the %USERPROFILE%/wac directory does not exist

To Reproduce
Steps to reproduce the behavior:

  1. Run get-SMEUILogs.ps1 from an elevated session
  2. Enter params (running on local WAC system, tried with and without -NoCredentialPrompt)
  3. See error C:\nupkg\Get-SMEUILogs.ps1 : Something went wrong while writing out logs to destination(win10wac): Could not find a part of the path
    'C:\Users*user*\wac\aggregated-gateway-logs.json'.

Expected behavior
The command completes successfully and the logs are generated

Screenshots

image

Environment (please complete the following information):

  • OS: Windows 10 Enterprise
  • Browser N/A
  • Version
  • AKS-HCI Version Public Preview 1
  • Kubernetes Version N/A

Additional context

The issue is resolved by manually creating %userprofile%\wac directory

[BUG] Helm is not found correcly

Describe the bug
When running

az connectedk8s delete --name cluster-config --resource-group $resourcegroup

it complains it cannot find helm

image

But helm is copied to system32 and is known if ran "helm version"

To Reproduce
Steps to reproduce the behavior:

#region add configuration to the cluster https://docs.microsoft.com/en-us/azure/azure-arc/kubernetes/use-gitops-connected-cluster
#install helm to cluster nodes
$ClusterName="AzSHCI-Cluster"
$servers=(Get-ClusterNode -Cluster $ClusterName).name
$ProgressPreference="SilentlyContinue"
Invoke-WebRequest -Uri https://get.helm.sh/helm-v3.3.4-windows-amd64.zip -OutFile $env:USERPROFILE\Downloads\helm-v3.3.4-windows-amd64.zip
$ProgressPreference="Continue"
Expand-Archive -Path $env:USERPROFILE\Downloads\helm-v3.3.4-windows-amd64.zip -DestinationPath $env:USERPROFILE\Downloads
$sessions=New-PSSession -ComputerName $servers
foreach ($session in $sessions){
    Copy-Item -Path $env:userprofile\Downloads\windows-amd64\helm.exe -Destination $env:SystemRoot\system32\ -ToSession $session
}
#install az cli
Invoke-WebRequest -Uri https://aka.ms/installazurecliwindows -OutFile $env:userprofile\Downloads\AzureCLI.msi
Start-Process msiexec.exe -Wait -ArgumentList "/I  $env:userprofile\Downloads\AzureCLI.msi /quiet"
#restart powershell
exit
#login
az login
#create configuration
$ClusterName="AzSHCI-Cluster"
$KubernetesClusterName="demo"
$resourcegroup="$ClusterName-rg"
az k8sconfiguration create --name cluster-config --cluster-name $KubernetesClusterName --resource-group $resourcegroup --operator-instance-name cluster-config --operator-namespace cluster-config --repository-url https://github.com/Azure/arc-k8s-demo --scope cluster --cluster-type connectedClusters
#az connectedk8s delete --name cluster-config --resource-group $resourcegroup
#endregion

[BUG] RemoteException: The underlying connection was closed: error received on step 5- Review + Create

Describe the bug
On the final step of setting up AKS via the extension from within the HCI Cluster in WAC, I receive the following error:

Failed with errors
Duration: 0 minutes 7 seconds
localhost: (1) RemoteException: The underlying connection was closed: An unexpected error occurred on a send. (2) RemoteException: The property 'StatusCode' cannot be found on this object. Verify that the property exists. (3) RemoteException: The property 'content' cannot be found on this object. Verify that the property exists. (4) RemoteException: The property 'contentid' cannot be found on this object. Verify that the property exists. (5) RemoteException: The property 'contentid' cannot be found on this object. Verify that the property exists. (6) RemoteException: The property 'contentid' cannot be found on this object. Verify that the property exists. (7) RemoteException: Index was outside the bounds of the array. (8) RemoteException: Index was outside the bounds of the array. (9) RemoteException: Index was outside the bounds of the array. (10) RemoteException: Index was outside the bounds of the array. (11) RemoteException: The property 'name' cannot be found on this object. Verify that the property exists. (12) RemoteException: Cannot index into a null array. (13) RemoteException: Cannot index into a null array. (14) RemoteException: Exception calling "IndexOf" with "2" argument(s): "Value cannot be null. Parameter name: array"

To Reproduce
Steps to reproduce the behavior:

  1. Follow the Set Up Azure Kubernetes Service wizard from within the AKS Extension.
  2. After the step 'Host Configuration and package install, the error appears.

Expected behavior
The AKS cluster is deployed successfully.

Screenshots

image

Environment (please complete the following information):

  • OS: Windows 10 Enterprise

  • Browser Chrome

  • Version : AKS xtension: 0.314.0
    aggregated-gateway-logs.zip

  • AKS-HCI Version : Public Preview 1

  • Kubernetes Version (i.e. 1.18.6)

Additional context
Add any other context about the problem here.

Collect log files

  • From a PowerShell Admin window run Collect-AksHciLogs
  • If you are running into issues with the deployment wizard in Windows Admin Center, run
    Get-SMEUILogs.ps1 from the machine hosting Windows Admin Center.”

[BUG] Adding a AKS cluster deployed via PowerShell to WAC, error 'The cluster is unreachable'.

Describe the bug
When Adding a AKS cluster deployed via PowerShell to WAC, error 'The cluster is unreachable'. from the same system that WAC is running on, using kubectl and the same kubeconfig file, I can connect successfully

To Reproduce
Steps to reproduce the behavior:

  1. Go to Add on the WAC main screen
  2. Click on 'Kubernetes clusters (preview) - Add''
  3. Select a Kubeconfig file for the AKS cluster
  4. Select the Cluster and User from the drop downs
  5. Click on Submit
  6. See error ' The cluster is unreachable.'

Expected behavior
I would expect the AKS cluster to be added to WAC.

Screenshots
WAC error:
image

kubectl success:
image

Environment (please complete the following information):

  • OS: WIndows 10
  • Browser: Edge Chromium
  • Version [e.g. 22]
  • AKS-HCI Version: Public Preview 1
  • Kubernetes Version: 1.18.6

Additional context

Collect log files

  • From a PowerShell Admin window run Collect-AksHciLogs
  • If you are running into issues with the deployment wizard in Windows Admin Center, run
    Get-SMEUILogs.ps1 from the machine hosting Windows Admin Center.”
    aggregated-gateway-logs (2).zip

Enable GPU support for T4 chips

Title: Enable GPU support

Description: Many workloads require GPU support. AKS-HCI should have a simple integrated way to configure GPU enablement in a cluster of a node or a cluster and then advertise the functionality and schedule workloads in a simple to use way

[BUG] Install-AksHCI checks for space on c: + typo

Describe the bug
Install-AkSHCI is checking space on c: and you have typos there ("minumum" - should be "minimum")

[09/24/2020 01:19:42] Verifying host limits on AzSHCI2

Drive 'C:' has 51 GB free
A minumum of 40 GB disk space is required on drive 'C:'
Host has 13 GB free memory
A minumum of 10 GB memory is required

To Reproduce

Just run Install-AksHCI

Expected behavior

Should check space of volume that is configured for aks (c:\clusterstorage\something)

[BUG] Running some scripts remotely is not possible without CredSSP due to Double hop issue.

Describe the bug
Running scripts such as Initialize-AksHciNode is not possible to run against multiple servers without enabling CredSSP

To Reproduce
Steps to reproduce the behavior:
https://github.com/microsoft/WSLab/blob/462137922e0974b2c8d99ed9e4034b3362c674b8/Scenarios/AzSHCI%20and%20Kubernetes/Scenario.ps1#L73

If you run following code, because of double-hop, some actions will fail
Invoke-Command -ComputerName $servers -ScriptBlock {
Initialize-AksHciNode
}

Workaround is to enable credssp
https://github.com/microsoft/WSLab/blob/462137922e0974b2c8d99ed9e4034b3362c674b8/Scenarios/AzSHCI%20and%20Kubernetes/Scenario.ps1#L78

Expected behavior
Would be great to be able to specify node or HCI cluster (to avoid confusion what commands needs to be run on every node vs on just one node

Unable to specify a proxy server for download and deployment

Describe the bug
The environment requires the use of a proxy server for all outbound internet connections.
The Windows Admin Center and PowerShell commands do not honor the system proxy settings and there is also no way to specify proxy settings on the commandline.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]
  • AKS-HCI Version (i.e. Public Preview 1)
  • Kubernetes Version (i.e. 1.18.6)

Additional context
Add any other context about the problem here.

Collect log files

  • From a PowerShell Admin window run Collect-AksHciLogs
  • If you are running into issues with the deployment wizard in Windows Admin Center, run
    Get-SMEUILogs.ps1 from the machine hosting Windows Admin Center.”

[BUG] Install-AksHci or Initialize-AksHciNode should check/install RSAT-Clustering-PowerShell

Describe the bug
If you run Install-AksHCI, you it will fail because it tries to use Clustering posh. There is no check nor code, that would install rsat.

To Reproduce
Steps to reproduce the behavior:

Skip installing RSAT-Clustering-PowerShell and run install aks-hci

Expected behavior
There should be check in initialize-akshcinode, that validates readiness of hci nodes (such as RSAT-Clustering-PowerShell).

Screenshots
sorry, I already cleaned the lab. But if you want repro, just skip this step in wslab scenario
https://github.com/microsoft/WSLab/blob/462137922e0974b2c8d99ed9e4034b3362c674b8/Scenarios/AzSHCI%20and%20Kubernetes/Scenario.ps1#L96

Setup AKS on HCI nodes without need for Azure integration / authentication

We deliver our software for our clients to run completely on-prem. Setting up AKS through Windows Admin Center requires an Azure account or Azure AD Account with domain admin level permissions if Azure AD is already detected in the environment. Can we have an option to setup completely outside / disconnected from Azure all together?

Uninstall-AksHciArcOnboarding doesn't delete the azure-arc namespace

Uninstall-AksHciArcOnboarding doesn't delete the azure-arc namespace

Description:
The Uninstall-AksHciArcOnboarding PS cmdlet doesn't delete the azure-arc namespace or delete the resource in the Azure portal.
As per the Azure Arc enabled K8s public docs, Helm 3 is required to be installed on the host for Azure Arc enabled Kubernetes to properly manage Kubernetes cluster. Helm 3 is not installed on the AKS-HCI host as part of the install process. This makes for a confusing process when trying to uninstall the Azure Arc agents where users need to go review the Azure Arc enabled K8s public docs to figure what needs to be done to get the Azure Arc deleted from the AKS-HCI K8s cluster.

Helm 3 should be installed as part of the AKS-HCI installation to make the Azure Arc agent uninstall.

[BUG] LinuxNodeVMSize is ignored in New-AksHciCluster

Describe the bug
When running New-AksHciCluster -clusterName demo -linuxNodeCount 1 -linuxNodeVmSize Standard_A2_v2, the VM that is created has 8GB RAM.

According this table it should have just 4.

$global:vmSizeDefinitions =
@(
    # Name, CPU, MemoryGB
    ([VmSize]::Default, "4", "4"),
    ([VmSize]::Standard_A2_v2, "2", "4"),
    ([VmSize]::Standard_A4_v2, "4", "8"),
    ([VmSize]::Standard_D2s_v3, "2", "8"),
    ([VmSize]::Standard_D4s_v3, "4", "16"),
    ([VmSize]::Standard_D8s_v3, "8", "32"),
    ([VmSize]::Standard_D16s_v3, "16", "64"),
    ([VmSize]::Standard_D32s_v3, "32", "128"),
    ([VmSize]::Standard_DS2_v2, "2", "7"),
    ([VmSize]::Standard_DS3_v2, "2", "14"),
    ([VmSize]::Standard_DS4_v2, "8", "28"),
    ([VmSize]::Standard_DS5_v2, "16", "56"),
    ([VmSize]::Standard_DS13_v2, "8", "56"),
    ([VmSize]::Standard_K8S_v1, "4", "2"),
    ([VmSize]::Standard_K8S2_v1, "2", "2"),
    ([VmSize]::Standard_K8S3_v1, "4", "6"),
    ([VmSize]::Standard_NK6, "6", "12"),
    ([VmSize]::Standard_NV6, "6", "64"),
    ([VmSize]::Standard_NV12, "12", "128")

)

To Reproduce
follow WSLab scenario
https://github.com/microsoft/WSLab/tree/dev/Scenarios/AzSHCI%20and%20Kubernetes

Screenshots

image

[BUG] Creating Kubernetes Cluster fails in WAC

Describe the bug
When creating a Kubernetes workload cluster via WAC, I received the following error:

Failed with errors
Error: failed to get new provider: failed to create azurestackhci session: error: could not read cloudconfig file: open cloudconfig: The system cannot find the file specified.

msazure.visualstudio.com/msazure/msk8s/mgmtappl.git/sdk/pkg/appliance/provider.newProvider /home/nick/goroot/src/dev.azure.com/msazure/msk8s/mgmtappl/sdk/pkg/appliance/provider/provider.go:55 msazure.visualstudio.com/msazure/msk8s/mgmtappl.git/sdk/pkg/appliance/provider.New /home/nick/goroot/src/dev.azure.com/msazure/msk8s/mgmtappl/sdk/pkg/appliance/provider/provider.go:49 msazure.visualstudio.com/msazure/msk8s/mgmtappl.git/sdk/pkg/appliance.newApplianceClient /home/nick/goroot/src/dev.azure.com/msazure/msk8s/mgmtappl/sdk/pkg/appliance/client.go:86 msazure.visualstudio.com/msazure/msk8s/mgmtappl.git/sdk/pkg/appliance.New /home/nick/goroot/src/dev.azure.com/msazure/msk8s/mgmtappl/sdk/pkg/appliance/client.go:73 msazure.visualstudio.com/msazure/msk8s/mgmtappl.git/cmd/appliance/cmd.runClusterCreate /home/nick/goroot/src/dev.azure.com/msazure/msk8s/mgmtappl/cmd/appliance/cmd/cluster_create.go:50 msazure.visualstudio.com/msazure/msk8s/mgmtappl.git/cmd/appliance/cmd.glob..func2 /home/nick/goroot/src/dev.azure.com/msazure/msk8s/mgmtappl/cmd/appliance/cmd/cluster_create.go:38 github.com/spf13/cobra.(*Command).execute /home/nick/goroot/pkg/mod/github.com/spf13/[email protected]/command.go:842 github.com/spf13/cobra.(*Command).ExecuteC /home/nick/goroot/pkg/mod/github.com/spf13/[email protected]/command.go:950 github.com/spf13/cobra.(*Command).Execute /home/nick/goroot/pkg/mod/github.com/spf13/[email protected]/command.go:887

msazure.visualstudio.com/msazure/msk8s/mgmtappl.git/cmd/appliance/cmd.Execute /home/nick/goroot/src/dev.azure.com/msazure/msk8s/mgmtappl/cmd/appliance/cmd/root.go:46 main.main /home/nick/goroot/src/dev.azure.com/msazure/msk8s/mgmtappl/cmd/appliance/main.go:22 runtime.main /home/nick/go/src/runtime/proc.go:203 runtime.goexit /home/nick/go/src/runtime/asm_amd64.s:1357

To Reproduce
Steps to reproduce the behavior:

  1. Add - Kubernetes Cluster in WAC
  2. Click through / enter config for prereqs/Basics/Node Pools/Networing/Integration.
  3. Click on Create under Review + Create
  4. See error

Expected behavior
The application cluster is created.

Screenshots
If applicable, add screenshots to help explain your problem.
2020-10-27 09_38_37-Window

Environment (please complete the following information):

  • OS: Windows 10
  • Browser: Edge Chrome
  • Version 86.0.622.51
  • AKS-HCI Version (i.e. Public Preview October)
  • Kubernetes Version:
    aggregated-gateway-logs.zip

1.18.8

Additional context
Add any other context about the problem here.

Collect log files

  • From a PowerShell Admin window run Collect-AksHciLogs
  • If you are running into issues with the deployment wizard in Windows Admin Center, run
    Get-SMEUILogs.ps1 from the machine hosting Windows Admin Center.”

[BUG] Unable to add a Kubernetes cluster via WAC due to Persistent Storage location option not being available

Describe the bug
Unable to deploy a K8s cluster via WAC as the persistent storage location option not being available in the Integration step. The 'Next: Review + Create' button is blank.

To Reproduce
Steps to reproduce the behavior:

image

Expected behavior
To be able to proceed to the 'Review + Create' option

Environment (please complete the following information):

  • OS: Windows 10
  • Browser Edge Chromium
  • Version 86.0.622.51
  • AKS-HCI Version Oct-2020
  • Kubernetes Version 1.18.8

Additional context
On the same HCI cluster, I had previously deployed an AKS HCI app cluster using the PowerShell module successfully. This was uninstalled to test another bug I had logged for WAC - #51. I could previously get past this stage in WAC prior to deploying/uninstalling the workload cluster via PoSh. I've restarted the WAC engine / browser and get the same outcome

image

Collect log files

  • From a PowerShell Admin window run Collect-AksHciLogs
  • If you are running into issues with the deployment wizard in Windows Admin Center, run
    Get-SMEUILogs.ps1 from the machine hosting Windows Admin Center.”

aggregated-gateway-logs.zip

The HCI log files are 57MB - I can provide if required out-of-band

[BUG] Deploying an AKS cluster via Install-AksHci PowerShell Cmdlet throws an error

Describe the bug
Deploying an AKS cluster via Install-AksHci PowerShell Cmdlet throws an error:

Error: open C:\ClusterStorage\Volume01\wssd\kubeconfig-mgmt: The system cannot find the path specified.
C:\Program Files\AksHci\kvactl.exe create --configfile C:\wssd\yaml\appliance.yaml --outfile C:\ClusterStorage\Volume01\wssd\kubeconfig-mgmt failed to execute []
At C:\Program Files\WindowsPowerShell\Modules\akshci\0.2.8\Common.psm1:1248 char:9

  •     throw "$command $arguments failed to execute [$err]"
    
  •     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    • CategoryInfo : OperationStopped: (C:\Program File...d to execute []:String) [], RuntimeException
    • FullyQualifiedErrorId : C:\Program Files\AksHci\kvactl.exe create --configfile C:\wssd\yaml\appliance.yaml --outfile C:\ClusterStorage\Volume01\wssd\kubeconfig-mgmt failed to execute []

To Reproduce
Steps to reproduce the behavior:

  1. Import AKsHCI Module from Oct 2020 release on one of the HCI Nodes
  2. Run Initialize-AksHciNode
  3. Set-AksHciConfig -deploymentType MultiNode -wssdImageDir 'C:\ClusterStorage\Volume01\wssdImages' -cloudConfigLocation C:\ClusterStorage\Volume01\aks-clus01-config -vnetName Computeswitch
  4. Install-AksHCI
  5. Error is thrown
  6. Run & 'C:\Program Files\AksHci\kvactl.exe' create --configfile C:\wssd\yaml\appliance.yaml --outfile C:\ClusterStorage\Volume01\wssd\kubeconfig-mgmt

10-27-2020 06:18:35 [Status] azurestackhciProvider: RetrieveKubeconfig
10-27-2020 06:18:35 [Status] The appliance kubeconfig is already present. Resuming an existing deployment...
Error: open C:\ClusterStorage\Volume01\wssd\kubeconfig-mgmt: The system cannot find the path specified.

  1. create the WSSD directory and run the command in step 6 again.
    10-27-2020 06:19:25 [Status] azurestackhciProvider: RetrieveKubeconfig
    10-27-2020 06:19:25 [Status] The appliance kubeconfig is already present. Resuming an existing deployment...
    10-27-2020 06:19:25 [Status] Waiting for API server...
    10-27-2020 06:19:25 [Status] Waiting for pod 'Cloud Operator' to be ready...
    10-27-2020 06:19:25 [Status] Waiting for pod 'Cluster API core' to be ready...
    10-27-2020 06:19:25 [Status] Waiting for pod 'Bootstrap kubeadm' to be ready...
    10-27-2020 06:19:25 [Status] Waiting for pod 'Control Plane kubeadm' to be ready...
    10-27-2020 06:19:25 [Status] Waiting for pod 'Cluster API core Webhook' to be ready...
    10-27-2020 06:19:25 [Status] Waiting for pod 'Bootstrap kubeadm Webhook' to be ready...
    10-27-2020 06:19:25 [Status] Waiting for pod 'Control Plane kubeadm Webhook' to be ready...
    10-27-2020 06:19:25 [Status] azurestackhciProvider: PerformPostOperations
    10-27-2020 06:19:25 [Status] azurestackhciProvider: Waiting for pod 'AzureStackHCI Provider' to be ready...
    10-27-2020 06:19:25 [Status] azurestackhciProvider: Waiting for pod 'AzureStackHCI Provider Webhook' to be ready...

Expected behavior
I would expect that if the WSSD dir is required on the CSV, then it is created, or the config is placed into the cloudconfiglocation directory specified

Screenshots
image

image

image

Environment (please complete the following information):

  • OS: HCI OS

  • AKS-HCI Version Oct 2020

  • Kubernetes Version 1.18.8

Additional context
to fix this, I uninstalled-aksHCI, set the config , created the WSSD directory on the CSV and ran the install again

Collect log files

  • From a PowerShell Admin window run Collect-AksHciLogs
  • If you are running into issues with the deployment wizard in Windows Admin Center, run
    Get-SMEUILogs.ps1 from the machine hosting Windows Admin Center.”

[BUG] Uninstall-AksHCI is not cleaning cluster resources (ownergroup ca-154f2102-734b-4631-be78-0d802ce9cf71)

Describe the bug
Uninstall-AksHCI is not cleaning cluster objects (I think it might be under certain conditions, triage needed I guess)

To Reproduce
I think this was caused when I did not install failover clustering rsat and installed akshci (install-akshci). The script failed, and I had to cancel. Cluster resources were created, and even if I ran Uninstall-akshci, resources were still there, so Install-akshci was failing.

I had to manually clean all resources in clustergroup ca-154f2102-734b-4631-be78-0d802ce9cf71 + AD object.

Expected behavior
Uninstall-akshci should clean all - object from ad, and cluster resources

Allow run configuration commands remotely with offline environment in mind

Title:
Allow run configuration commands remotely with offline environment in mind

Description:
Allow run configuration commands remotely (from management machine) to do all management tasks. Also to be able to initialize downloading from management machine and then just transfer it to cluster (in case your cluster does not have internet connectivity). Something like Install-AksHCI -ClusterName . In case of completely offline environment, it would be great to have command to download all files as a package, that could be provided as parameter.

image

For multi-node clusters store the kubeconfig and RSA_Pub files on a CSV

Title: For multi-node clusters store the kubeconfig and RSA_Pub files on a CSV

Description:
For multi-node deployments, a number of critical config files and ssh keys are stored on the local HCI node that the Install-AksHci command was run on (e.g. kubeconfig in c:\wssd)

These critical config files should be stored on a CSV by default. If the node that the install was performed on crashed / re-installed etc. this could lock out the unsuspecting admin for AKS cluster management.

The user should be made aware of the security implications.

uninstall-akshci exception caught - unable to locate a valid cloudconfig access file

after ah failure attempt to deploy aks on my azure stack hci cluster, i end up now with the message in the wac: "An Azure Kubernetes Service is already setup on this Azure Stack HCI but is not configured correctly. Please clean up your environment to setup again."

but uninstall-akshci only ends up in the error in title... " exception caught - unable to locate a valid cloudconfig access file"

cloudconfig files does simply not exist in the userpath.wssd\ ...
also creating an empty file does not help do pass the uninstall wizard, as it looks like the wizard checks the files for validity..

so i ended up now with an unfinished interrupted deployment and can't clean up...
i've deleted the created clusterressources manually but looks like that wasn't enough, still wac complains about....

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.