Giter Site home page Giter Site logo

openshift-container-platform's Introduction

OpenShift Container Platform 3 Deployment Template

NOTE: Structure of Repo

The Master branch has been updated to deploy version 3.11

MAJOR UPDATES HAVE BEEN MADE - READ BEFORE DEPLOYING

The master branch contains the most current release of OpenShift Container Platform 3, which is currently version 3.11. We will maintain the templates for the current version of OCP only, as version 3.10 is no longer commercially suported by Red Hat. The older branches will not be deleted but will no longer be maintained or updated.

New as of August 27, 2019: I have added the azurestack-release-3.11 branch with templates and scripts for deploying OCP 3.11 to Azure Stack.

The following branches exist:

Commercial Azure

  • Release-3.6 (As is; no longer updated)
  • Release-3.7 (As is; no longer updated)
  • Release-3.9 (As is; no longer updated)
  • Release-3.10 (As is; no longer updated)

Azure Stack

  • azurestack-release-3.7 (As is; no longer updated)
  • azurestack-release-3.9 (As is; no longer updated)
  • azurestack-release-3.11

Bookmark aka.ms/OpenShift for future reference.

For OpenShift Origin refer to https://github.com/Microsoft/openshift-origin

OpenShift Container Platform 3.11 with Username / Password authentication for OpenShift

  1. Single master option available
  2. VM types that support Accelerated Networking will automatically have this feature enabled
  3. Custom and existing Vnet
  4. Support cluster with private masters (no public IP on load balancer in front of master nodes)
  5. Support cluster with private router (no public IP on load balancer in front of infra nodes)
  6. Support broker pool ID (for master and infra nodes) along with compute pool ID (for compute nodes)
  7. Support for default gallery RHEL On Demand image and 3rd party Marketplace offer such as BYOS image in Private Marketplace
  8. Support self-signed certificates or custom SSL certificates for master load balancer (Web Console)
  9. Support self-signed certificates or custom SSL certificates for infra load balancer (Router)

This template deploys OpenShift Container Platform with basic username / password for authentication to OpenShift. It includes the following resources:

Resource Properties
Virtual Network
Default
Address prefix: 10.0.0.0/14
Master subnet: 10.1.0.0/16
Infra subnet: 10.2.0.0/16
Node subnet: 10.3.0.0/16
Virtual Network
Custom
Address prefix: Your Choice
Master subnet: Your Choice
Infra subnet: Your Choice
CNS subnet: Your Choice
Node subnet: Your Choice
Master Load Balancer 1 probe and 1 rule for TCP 443
Infra Load Balancer 2 probes and 2 rules for TCP 80 and TCP 443
Public IP Addresses Bastion Public IP for Bastion Node
OpenShift Master public IP attached to Master Load Balancer (if masters are public)
OpenShift Router public IP attached to Infra Load Balancer (if router is public)
Storage Accounts
Unmanaged Disks
1 Storage Account for Bastion VM
1 Storage Account for Master VMs
1 Storage Account for Infra VMs
2 Storage Accounts for Node VMs
2 Storage Accounts for Diagnostics Logs
1 Storage Account for Private Docker Registry
Storage Accounts
Managed Disks
2 Storage Accounts for Diagnostics Logs
1 Storage Account for Private Docker Registry
Network Security Groups 1 Network Security Group for Bastion VM
1 Network Security Group Master VMs
1 Network Security Group for Infra VMs
1 Network Security Group for CNS VMs (if CNS enabled)
1 Network Security Group for Node VMs
Availability Sets 1 Availability Set for Master VMs
1 Availability Set for Infra VMs
1 Availability Set for CNS VMs (if CNS enabled)
1 Availability Set for Node VMs
Virtual Machines 1 Bastion Node - Used to run ansible playbook for OpenShift deployment
1, 3 or 5 Master Nodes
1, 2 or 3 Infra Nodes
3 or 4 CNS Nodes (if CNS enabled)
User-defined number of Nodes (1 to 30)
All VMs include a single attached data disk for Docker thin pool logical volume
CNS VMs include 3 additional data disks for glusterfs storage (if CNS enabled)

Cluster Diagram

READ the instructions in its entirety before deploying!

Additional documentation for deploying OpenShift in Azure can be found here: https://docs.microsoft.com/en-us/azure/virtual-machines/linux/openshift-get-started

This template deploys multiple VMs and requires some pre-work before you can successfully deploy the OpenShift Cluster. If you don't complete the pre-work correctly, you will most likely fail to deploy the cluster using this template. Please read the instructions completely before you proceed.

By default, this template uses the On-Demand Red Hat Enterprise Linux image from the Azure Gallery.

When using the On-Demand image, there is an additional hourly RHEL subscription charge for using this image on top of the normal compute, network and storage costs. At the same time, the instance will be registered to your Red Hat subscription, so you will also be using one of your entitlements. This will lead to "double billing". To avoid this, you would need to build your own RHEL image, which is defined in this Red Hat KB article.

If you have a valid Red Hat subscription, register for Cloud Access and request access to the BYOS RHEL image in the Private Azure Marketplace to avoid the double billing. To use a 3rd party marketplace offer (such as the BYOS private image), you need to provide the following information for the offer - publisher, offer, sku, and version. You also need to enable the offer for programmatic deployment.

If you are only using one pool ID for all nodes, then enter the same pool ID for both 'rhsmPoolId' and 'rhsmBrokerPoolId'.

Private Clusters

Deploying private OpenShift clusters requires more than just not having a public IP associated to the master load balancer (web console) or to the infra load balancer (router). A private cluster generally uses a custom DNS server (not the default Azure DNS), a custom domain name (such as contoso.com), and pre-defined virtual network(s). For private clusters, you will need to configure your virtual network with all the appropriate subnets and DNS server settings in advance. Then use existingMasterSubnetReference, existingInfraSubnetReference, existingCnsSubnetReference, and existingNodeSubnetReference to specify the existing subnet for use by the cluster.

If private masters is selected (masterClusterType=private), a static private IP needs to be specified for masterPrivateClusterIp which will be assigned to the front end of the master load balancer. This must be within the CIDR for the master subnet and not already in use. masterClusterDnsType must be set to "custom" and the master DNS name must be provided for masterClusterDns and this needs to map to the static Private IP and will be used to access the console on the master nodes.

If private router is selected (routerClusterType=private), a static private IP needs to be specified for routerPrivateClusterIp which will be assigned to the front end of the infra load balancer. This must be within the CIDR for the infra subnet and not already in use. routingSubDomainType must be set to "custom" and the wildcard DNS name for routing must be provided for routingSubDomain.

If private masters and private router is selected, the custom domain name must also be entered for domainName

After successful deployment, the Bastion Node is the only node with a public IP that you can ssh into. Even if the master nodes are configured for public access, they are not exposed for ssh access.

Prerequisites

Create Key Vault to store secret based information

You will need to create a Key Vault to store various secret information that will then be used as part of the deployment so that the information is not exposed via the parameters file. Secrets will need to be created for the SSH private key (sshPrivateKey), Azure AD client secret (aadClientSecret), OpenShift admin password (openshiftPassword), and Red Hat Subscription Manager password or activation key (rhsmPasswordOrActivationKey). Additionally, if custom SSL certificates are used, then 6 additional secrets will need to be created - routingcafile, routingcertfile, routingkeyfile, mastercafile, mastercertfile, and masterkeyfile. These will be explained in more detail.

The template references specific secret names so you must use the bolded names listed above (case sensitive).

It is recommend to create a separate Resource Group specifically to store the KeyVault. This way, you can reuse the KeyVault for other deployments and you won't have to create this every time you chose to deploy another OpenShift cluster.

Create Key Vault using Azure CLI

  1. Create new Resource Group: az group create -n <name> -l <location>
    Ex: az group create -n KeyVaultResourceGroupName -l 'East US'
  2. Create Key Vault: az keyvault create -n <vault-name> -g <resource-group> -l <location> --enabled-for-template-deployment true
    Ex: az keyvault create -n KeyVaultName -g KeyVaultResourceGroupName -l 'East US' --enabled-for-template-deployment true

Generate SSH Keys

You'll need to generate an SSH key pair (Public / Private) in order to provision this template. Ensure that you do NOT include a passphrase with the private key.

If you are using a Windows computer, you can download puttygen.exe. You will need to export to OpenSSH (from Conversions menu) to get a valid Private Key for use in the Template.

From a Linux or Mac, you can just use the ssh-keygen command. Once you are finished deploying the cluster, you can always generate new keys that uses a passphrase and replace the original ones used during initial deployment.

Store SSH Private key in Secret

  1. Create Secret: az keyvault secret set --vault-name <vault-name> -n <secret-name> --file <private-key-file-name>
    Ex: az keyvault secret set --vault-name KeyVaultName -n sshPrivateKey --file ~/.ssh/id_rsa

Generate Azure Active Directory (AAD) Service Principal

To configure Azure as the Cloud Provider for OpenShift Container Platform, you will need to create an Azure Active Directory Service Principal. The easiest way to perform this task is via the Azure CLI. Below are the steps for doing this.

Assigning permissions to the entire Subscription is the easiest method but does give the Service Principal permissions to all resources in the Subscription. Assigning permissions to only the Resource Group is the most secure as the Service Principal is restricted to only that one Resource Group.

Azure CLI 2.0

  1. Create Service Principal and assign permissions to Subscription
    a. az ad sp create-for-rbac -n <friendly name> --password <password> --role contributor --scopes /subscriptions/<subscription_id>
    Ex: az ad sp create-for-rbac -n openshiftcloudprovider --password Pass@word1 --role contributor --scopes /subscriptions/555a123b-1234-5ccc-defgh-6789abcdef01

  2. Create Service Principal and assign permissions to Resource Group
    a. If you use this option, you must have created the Resource Group first. Be sure you don't create any resources in this Resource Group before deploying the cluster.
    b. az ad sp create-for-rbac -n <friendly name> --password <password> --role contributor --scopes /subscriptions/<subscription_id>/resourceGroups/<Resource Group Name>
    Ex: az ad sp create-for-rbac -n openshiftcloudprovider --password Pass@word1 --role contributor --scopes /subscriptions/555a123b-1234-5ccc-defgh-6789abcdef01/resourceGroups/00000test

  3. Create Service Principal without assigning permissions to Resource Group
    a. If you use this option, you will need to assign permissions to either the Subscription or the newly created Resource Group shortly after you initiate the deployment of the cluster or the post installation scripts will fail when configuring Azure as the Cloud Provider.
    b. az ad sp create-for-rbac -n <friendly name> --password <password> --role contributor --skip-assignment
    Ex: az ad sp create-for-rbac -n openshiftcloudprovider --password Pass@word1 --role contributor --skip-assignment

You will get an output similar to:

{
  "appId": "2c8c6a58-44ac-452e-95d8-a790f6ade583",
  "displayName": "openshiftcloudprovider",
  "name": "http://openshiftcloudprovider",
  "password": "Pass@word1",
  "tenant": "12a345bc-1234-dddd-12ab-34cdef56ab78"
}

The appId is used for the aadClientId parameter. Store the password in the Key Vault.

az keyvault secret set --vault-name KeyVaultName -n aadClientSecret --value Pass@word1

OpenShift Admin Password

An initial OpenShift Cluster Admin user will be created after the cluster is deployed. This admin user will need a password. Store the password that you want to use in the Key Vault.

az keyvault secret set --vault-name KeyVaultName -n openshiftPassword --value Pass@word1

Red Hat Subscription Access

For security reasons, the method for registering the RHEL system allows the use of an Organization ID and Activation Key as well as a Username and Password. Please know that it is more secure to use the Organization ID and Activation Key.

You can determine your Organization ID by running subscription-manager identity on a registered machine. To create or find your Activation Key, please go here: https://access.redhat.com/management/activation_keys.

You will also need to get the Pool ID that contains your entitlements for OpenShift. You can retrieve this from the Red Hat portal by examining the details of the subscription that has the OpenShift entitlements. Or you can contact your Red Hat administrator to help you.

Store the password or activation key that you want to use in the Key Vault.

az keyvault secret set --vault-name KeyVaultName -n rhsmPasswordOrActivationKey --value Pass@word1

Custom Certificates

By default, the template will deploy an OpenShift cluster using self-signed certificates for the OpenShift web console and the routing domain. If you want to use custom SSL certificates, set 'routingCertType' to 'custom' and 'masterCertType' to 'custom'. You will need the CA, Cert, and Key files in .pem format for the certificates.

You will need to store these files in Key Vault secrets. Use the same Key Vault as the one used for the private key. Rather than require 6 additional inputs for the secret names, the template is hard-coded to use specific secret names for each of the SSL certificate files. Store the certficiate data using the information from the following table.

Secret Name Certificate file
mastercafile master CA file
mastercertfile master CERT file
masterkeyfile master Key file
routingcafile routing CA file
routingcertfile routing CERT file
routingkeyfile routing Key file

Create the secrets using the Azure CLI. Below is an example.

az keyvault secret set --vault-name KeyVaultName -n mastercafile --file ~/certificates/masterca.pem

azuredeploy.Parameters.json File Explained

Property Description Valid options Default value
_artifactsLocation URL for artifacts (json, scripts, etc.) https://raw.githubusercontent.com/Microsoft/openshift-container-platform/master
location Azure region to deploy resources to
masterVmSize Size of the Master VM. Select from one of the allowed VM sizes listed in the azuredeploy.json file Standard_E2s_v3
infraVmSize Size of the Infra VM. Select from one of the allowed VM sizes listed in the azuredeploy.json file Standard_D4s_v3
nodeVmSize Size of the App Node VM. Select from one of the allowed VM sizes listed in the azuredeploy.json file Standard_D4s_v3
cnsVmSize Size of the CNS Node VM. Select from one of the allowed VM sizes listed in the azuredeploy.json file Standard_E4s_v3
osImageType The RHEL image to use. defaultgallery: On-Demand; marketplace: 3rd Party image - "defaultgallery"

- "marketplace"
defaultgallery
marketplaceOsImage If osImageType is marketplace, then enter the appropriate values for 'publisher', 'offer', 'sku', 'version' of the marketplace offer. This is an object type
storageKind The type of storage to be used. - "managed"
- "unmanaged"
managed
openshiftClusterPrefix Cluster Prefix used to configure hostnames for all nodes. Between 1 and 20 characters mycluster
minoVersion The minor version of OpenShift Container Platform 3.11 to deploy 188
masterInstanceCount Number of Masters nodes to deploy - 1, 3, 5 3
infraInstanceCount Number of infra nodes to deploy - 1, 2, 3 3
nodeInstanceCount Number of Nodes to deploy - 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 2
cnsInstanceCount Number of CNS nodes to deploy - 3, 4 3
osDiskSize Size of OS disk for the VM (in GB) - 64
- 128
- 256
- 512
- 1024
- 2048
64
dataDiskSize Size of data disk to attach to nodes for Docker volume (in GB) - 32
- 64
- 128
- 256
- 512
- 1024
- 2048
128
cnsGlusterDiskSize Size of data disk to attach to CNS nodes for use by gluster (in GB) - 32
- 64
- 128
- 256
- 512
- 1024
- 2048
128
adminUsername Admin username for both OS (VM) login and initial OpenShift user ocpadmin
enableMetrics Enable Metrics. Metrics require more resources so select proper size for Infra VM - "true"
- "false"
false
enableLogging Enable Logging. elasticsearch pod requires 8 GB RAM so select proper size for Infra VM - "true"
- "false"
false
enableCNS Enable Container Native Storage (CNS) - "true"
- "false"
false
rhsmUsernameOrOrgId Red Hat Subscription Manager Username or Organization ID
rhsmPoolId The Red Hat Subscription Manager Pool ID that contains your OpenShift entitlements for compute nodes
rhsmBrokerPoolId The Red Hat Subscription Manager Pool ID that contains your OpenShift entitlements for masters and infra nodes. If you don't have different pool IDs, enter same pool ID as 'rhsmPoolId'
sshPublicKey Copy your SSH Public Key here
keyVaultSubscriptionId The Subscription ID of the subscription that contains the Key Vault
keyVaultResourceGroup The name of the Resource Group that contains the Key Vault
keyVaultName The name of the Key Vault you created
enableAzure Enable Azure Cloud Provider - "true"
- "false"
true
aadClientId Azure Active Directory Client ID also known as Application ID for Service Principal
domainName Name of the custom domain name to use (if applicable). Set to "none" if not deploying fully private cluster none
masterClusterDnsType Domain type for OpenShift web console. 'default' will use DNS label of master infra public IP. 'custom' allows you to define your own name. - "default"
- "custom"
default
masterClusterDns The custom DNS name to use to access the OpenShift web console if you selected 'custom' for masterClusterDnsType console.contoso.com
routingSubDomainType This will either be nipio (if you don't have your own domain) or 'custom' if you have your own domain that you would like to use for routing - "nipio"
- "custom"
nipio
routingSubDomain The wildcard DNS name you would like to use for routing if you selected 'custom' for routingSubDomainType apps.contoso.com
virtualNetworkNewOrExisting Select whether to use an existing Virtual Network or create a new Virtual Network - "existing"
- "new"
new
virtualNetworkResourceGroupName Name of the Resource Group for the new Virtual Network if you selected 'new' for virtualNetworkNewOrExisting resourceGroup().name
virtualNetworkName The name of the new Virtual Network to create if you selected 'new' for virtualNetworkNewOrExisting openshiftvnet
addressPrefixes Address prefix of the new virtual network 10.0.0.0/14
masterSubnetName The name of the master subnet mastersubnet
masterSubnetPrefix CIDR used for the master subnet - needs to be a subset of the addressPrefix 10.1.0.0/16
infraSubnetName The name of the infra subnet infrasubnet
infraSubnetPrefix CIDR used for the infra subnet - needs to be a subset of the addressPrefix 10.2.0.0/16
nodeSubnetName The name of the node subnet nodesubnet
nodeSubnetPrefix CIDR used for the node subnet - needs to be a subset of the addressPrefix 10.3.0.0/16
existingMasterSubnetReference Full reference to existing subnet for master nodes. Not needed if creating new vNet / Subnet
existingInfraSubnetReference Full reference to existing subnet for infra nodes. Not needed if creating new vNet / Subnet
existingCnsSubnetReference Full reference to existing subnet for cns nodes. Not needed if creating new vNet / Subnet
existingNodeSubnetReference Full reference to existing subnet for compute nodes. Not needed if creating new vNet / Subnet
masterClusterType Specify whether the cluster uses private or public master nodes. If private is chosen, the master nodes will not be exposed to the Internet via a public IP. Instead, it will use the private IP specified in the masterPrivateClusterIp - "public"
- "private"
public
masterPrivateClusterIp If private master nodes is selected, then a private IP address must be specified for use by the internal load balancer for master nodes. This will be a static IP so it must reside within the CIDR block for the master subnet and not already in use. If public master nodes is selected, this value will not be used but must still be specified. 10.1.0.200
routerClusterType Specify whether the cluster uses private or public infra nodes. If private is chosen, the infra nodes will not be exposed to the Internet via a public IP. Instead, it will use the private IP specified in the routerPrivateClusterIp - "public"
- "private"
public
routerPrivateClusterIp If private infra nodes is selected, then a private IP address must be specified for use by the internal load balancer for infra nodes. This will be a static IP so it must reside within the CIDR block for the master subnet and not already in use. If public infra nodes is selected, this value will not be used but must still be specified. 10.2.0.200
routingCertType Use custom certificate for routing domain or the default self-signed certificate - follow instructions in Custom Certificates section - "selfsigned"
- "custom"
selfsigned
masterCertType Use custom certificate for master domain or the default self-signed certificate - follow instructions in Custom Certificates section - "selfsigned"
- "custom"
selfsigned

Deploy Template

Once you have collected all of the prerequisites for the template, you can deploy the template by populating the azuredeploy.parameters.json file and executing Resource Manager deployment commands with PowerShell or the Azure CLI.

Azure CLI 2.0

  1. Create Resource Group: az group create -n <name> -l <location>
    Ex: az group create -n openshift-cluster -l westus
  2. Create Resource Group Deployment: az group deployment create --name <deployment name> --template-file <template_file> --parameters @<parameters_file> --resource-group <resource group name> --nowait
    Ex: az group deployment create --name ocpdeployment --template-file azuredeploy.json --parameters @azuredeploy.parameters.json --resource-group openshift-cluster --no-wait

NOTE

The OpenShift Ansible playbook does take a while to run when using VMs backed by Standard Storage. VMs backed by Premium Storage are faster. If you want Premium Storage, select a DS, Es, or GS series VM. It is highly recommended that Premium storage be used.


If the Azure Cloud Provider is not enabled, then the Service Catalog and Ansible Template Service Broker will not be installed as Service Catalog requires persistent storage.

Be sure to follow the OpenShift instructions to create the necessary DNS entry for the OpenShift Router for access to applications.

A Standard Storage Account is provisioned to provide persistent storage for the integrated OpenShift Registry as Premium Storage does not support storage of anything but VHD files.

TROUBLESHOOTING

If you encounter an error during deployment of the cluster, please view the deployment status. The following Error Codes will help to narrow things down.

  1. Exit Code 3: Your Red Hat Subscription User Name / Password or Organization ID / Activation Key is incorrect
  2. Exit Code 4: Your Red Hat Pool ID is incorrect or there are no entitlements available
  3. Exit Code 5: Unable to provision Docker Thin Pool Volume
  4. Exit Code 99: Configuration playbooks were not downloaded

Before opening an issue, ssh to the Bastion node and review the stdout and stderr files as explained below. The stdout file will most likely contain the most useful information so please do include the last 50 lines of the stdout file in the issue description. Do NOT copy the error output from the Azure portal.

You can SSH to the Bastion node and from there SSH to each of the nodes in the cluster and fix the issues.

A common cause for the failures related to the node service not starting is the Service Principal did not have proper permissions to the Subscription or the Resource Group. If this is indeed the issue, then assign the correct permissions and manually re-run the script that failed an all subsequent scripts. Be sure to restart the service that failed (e.g. systemctl restart atomic-openshift-node.service) before executing the scripts again.

For further troubleshooting, please SSH into your Bastion node on port 22. You will need to be root (sudo su -) and then navigate to the following directory: /var/lib/waagent/custom-script/download

You should see a folder named '0' and '1'. In each of these folders, you will see two files, stderr and stdout. You can look through these files to determine where the failure occurred.

Post-Deployment Operations

Service Catalog

Service Catalog

If you enable Azure or CNS for storage these scripts will deploy the service catalog as a post deployment option.

Metrics and logging

Metrics

If you deployed Metrics, it will take a few extra minutes for deployment to complete. Please be patient.

Once the deployment is complete, log into the OpenShift Web Console and complete an addition configuration step. Go to the openshift-infra project, click on Hawkster metrics route, and accept the SSL exception in your browser.

Logging

If you deployed Logging, it will take a few extra minutes for deployment to complete. Please be patient.

Once the deployment is complete, log into the OpenShift Web Console and complete an addition configuration step. Go to the logging project, click on the Kubana route, and accept the SSL exception in your browser.

Creation of additional users

To create additional (non-admin) users in your environment, login to your master server(s) via SSH and run:
htpasswd /etc/origin/master/htpasswd mynewuser

Additional OpenShift Configuration Options

You can configure additional settings per the official (OpenShift Container Platform Documentation).

openshift-container-platform's People

Contributors

abdelsalam-abbas avatar alezzandro avatar asinn826 avatar borisb2015 avatar evenh avatar gsacavdm avatar haroldwongms avatar jamesread avatar l3n41c avatar mglantz avatar microsoft-github-policy-service[bot] avatar microsoftopensource avatar msftgits avatar novacain1 avatar themiri avatar vincepower avatar vkacherov avatar wmhussain avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openshift-container-platform's Issues

Deployment docker-registry #2 failed.

I deployed OpenShift cluster 3.10 and I am getting these errors while accessing the pre-created projects

  1. Deployment docker-registry #2 failed.
  2. Deployment logging-es-data-master-aq7mmcyb #1 failed.

Also, I tried adding Jenkins to the project and I got the following error.

The service is not yet ready. Error provisioning ServiceInstance of ClusterServiceClass (K8S: "70ede23f-c6f0-11e8-85cb-000d3a3b2057" ExternalName: "jenkins-persistent") at ClusterServiceBroker "template-service-broker": Status: 409; ErrorMessage: ; Description: ; ResponseError:

osImageType now required by Azure?

I get errors when deploying a 3.9 cluster in Azure when it tries to spin up the nodes. It is complaining that the osImageType is missing. My deployments were working back in June.

"details": [
{
"code": "BadRequest",
"message": "{\r\n "error": {\r\n "code": "InvalidTemplate",\r\n "message": "Deployment template validation failed: 'The value for the template parameter 'osImageType' at line '41' and column '18' is not provided. Please see https://aka.ms/arm-deploy/#parameter-file for usage details.'."\r\n }\r\n}"
},

I see that the master / 3.10 templates have this field. Was this a change in Azure, breaking the 3.9 deployments? Or am I missing something else? I tried to find anything referencing this requirement in Azure, but I'm not having any luck.

Deployment of Release-3.10 fails with "The value for the template parameter 'routingCertType' at line '65' and column '22' is not provided" error

I am trying to deploy OCP from release-3.10 branch and the deployment fails asking me for parameters that are relevant only for master (3.11 release).

When looking at the azuredeploy.json of release-3.10 branch , I can see that _artifactsLocation is still pointing to master, and red-hat tags is showing version 3.9.

"_artifactsLocation": {
			"value": "https://raw.githubusercontent.com/Microsoft/openshift-container-platform/master"
},
and 
"redHatTags": {
"app": "OpenShiftContainerPlatform",
"version": "3.9",
"platform": "AzurePublic",
"provider": "9d2c71fc-96ba-4b4a-93b3-14def5bc96fc"
},

OpenShift deploy fails during "Rebooting cluster to complete installation", post "Cloud Provider setup of OpenShift Cluster completed successfully"

I'm working on building a POC of OpenShift (OCP) on Azure for one of our customer. I ran into multiple issues (#48, 51 & 53) and i bypassed them by following the workarounds suggested and with the help of @dwaiba. Now i run into below issue:

New-AzureRmResourceGroupDeployment : 5:30:59 PM - Resource Microsoft.Resources/deployments 'OpenShiftDeployment' failed with message '{
"status": "Failed",
"error": {
"code": "ResourceDeploymentFailure",
"message": "The resource operation completed with terminal provisioning state 'Failed'.",
"details": [
{
"code": "DeploymentFailed",
"message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.",
"details": [
{
"code": "Conflict",
"message": "{\r\n "status": "Failed",\r\n "error": {\r\n "code": "ResourceDeploymentFailure",\r\n "message": "The resource operation completed with terminal provisioning state 'Failed'.",\r\n "details": [\r\n {\r\n
"code": "VMExtensionProvisioningError",\r\n "message": "VM has reported a failure when processing extension 'deployOpenShift'. Error message: \"Enable failed: failed to execute command: command terminated with exit
status=1\n[stdout]\nonf']})\nchanged: [ocpcluster-master-2] => (item={u'key': u'kubeletArguments.cloud-provider', u'value': [u'azure']})\n\nRUNNING HANDLER [restart atomic-openshift-node] ********************************\nchanged: [ocpcluster-master-2]\n\nPLAY
RECAP *********************************************************************\nocpcluster-master-0 : ok=4 changed=2 unreachable=0 failed=0 \nocpcluster-master-1 : ok=4 changed=2 unreachable=0 failed=0 \nocpcluster-master-2 :
ok=4 changed=2 unreachable=0 failed=0 \n\nThu Mar 22 00:27:10 UTC 2018 - Cloud Provider setup of node config on Master Nodes completed successfully\nThu Mar 22 00:27:10 UTC 2018 - Sleep for 60\n\nPLAY [nodes:!masters]
**********************************************************\n\nTASK [make sure /etc/azure exists] *********************************************\nchanged: [ocpcluster-infra-0]\n\nTASK [populate /etc/azure/azure.conf]
******************************************\nchanged: [ocpcluster-infra-0]\n\nTASK [insert the azure disk config into the node] ******************************\nchanged: [ocpcluster-infra-0] => (item={u'key': u'kubeletArguments.cloud-config', u'value':
[u'/etc/azure/azure.conf']})\nchanged: [ocpcluster-infra-0] => (item={u'key': u'kubeletArguments.cloud-provider', u'value': [u'azure']})\n\nRUNNING HANDLER [restart atomic-openshift-node] ********************************\nchanged: [ocpcluster-infra-0]\n\nPLAY
[nodes:!masters] **********************************************************\n\nTASK [make sure /etc/azure exists] *********************************************\nchanged: [ocpcluster-infra-1]\n\nTASK [populate /etc/azure/azure.conf]
******************************************\nchanged: [ocpcluster-infra-1]\n\nTASK [insert the azure disk config into the node] ******************************\nchanged: [ocpcluster-infra-1] => (item={u'key': u'kubeletArguments.cloud-config', u'value':
[u'/etc/azure/azure.conf']})\nchanged: [ocpcluster-infra-1] => (item={u'key': u'kubeletArguments.cloud-provider', u'value': [u'azure']})\n\nRUNNING HANDLER [restart atomic-openshift-node] ********************************\nchanged: [ocpcluster-infra-1]\n\nPLAY
[nodes:!masters] **********************************************************\n\nTASK [make sure /etc/azure exists] *********************************************\nchanged: [ocpcluster-node-0]\n\nTASK [populate /etc/azure/azure.conf]
******************************************\nchanged: [ocpcluster-node-0]\n\nTASK [insert the azure disk config into the node] ******************************\nchanged: [ocpcluster-node-0] => (item={u'key': u'kubeletArguments.cloud-config', u'value':
[u'/etc/azure/azure.conf']})\nchanged: [ocpcluster-node-0] => (item={u'key': u'kubeletArguments.cloud-provider', u'value': [u'azure']})\n\nRUNNING HANDLER [restart atomic-openshift-node] ********************************\nchanged: [ocpcluster-node-0]\n\nPLAY RECAP
*********************************************************************\nocpcluster-infra-0 : ok=4 changed=4 unreachable=0 failed=0 \nocpcluster-infra-1 : ok=4 changed=4 unreachable=0 failed=0 \nocpcluster-node-0 : ok=4
changed=4 unreachable=0 failed=0 \n\nThu Mar 22 00:28:23 UTC 2018 - Cloud Provider setup of node config on App Nodes completed successfully\nThu Mar 22 00:28:23 UTC 2018 - Sleep for 120\n\nPLAY [masters]
*****************************************************************\n\nTASK [set masters as unschedulable] ********************************************\nchanged: [ocpcluster-master-0]\nchanged: [ocpcluster-master-2]\nchanged: [ocpcluster-master-1]\n\nPLAY RECAP
*********************************************************************\nocpcluster-master-0 : ok=1 changed=1 unreachable=0 failed=0 \nocpcluster-master-1 : ok=1 changed=1 unreachable=0 failed=0 \nocpcluster-master-2 : ok=1
changed=1 unreachable=0 failed=0 \n\nThu Mar 22 00:30:25 UTC 2018 - Cloud Provider setup of OpenShift Cluster completed successfully\nThu Mar 22 00:30:25 UTC 2018 - Rebooting cluster to complete installation\n\n[stderr]\n % Total % Received % Xferd
Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 6 100 6 0 0 590 0 --:--:--
--:--:-- --:--:-- 666\n[DEPRECATION WARNING]: 'include' for playbook includes. You should use \n'import_playbook' instead. This feature will be removed in version 2.8. \nDeprecation warnings can be disabled by setting deprecation_warnings=False in
\nansible.cfg.\n[DEPRECATION WARNING]: 'include' for playbook includes. You should use \n'import_playbook' instead. This feature will be removed in version 2.8. \nDeprecation warnings can be disabled by setting deprecation_warnings=False in
\nansible.cfg.\n[DEPRECATION WARNING]: The use of 'static' for 'include_role' has been \ndeprecated. Use 'import_role' for static inclusion, or 'include_role' for \ndynamic inclusion. This feature will be removed in a future release. \nDeprecation warnings can be
disabled by setting deprecation_warnings=False in \nansible.cfg.\n[DEPRECATION WARNING]: The use of 'include' for tasks has been deprecated. Use \n'import_tasks' for static inclusions or 'include_tasks' for dynamic inclusions.\n This feature will be removed in a
future release. Deprecation warnings can be \ndisabled by setting deprecation_warnings=False in ansible.cfg.\n[DEPRECATION WARNING]: include is kept for backwards compatibility but usage is\n discouraged. The module documentation details page may explain more about
this\n rationale.. This feature will be removed in a future release. Deprecation \nwarnings can be disabled by setting deprecation_warnings=False in ansible.cfg.\n[DEPRECATION WARNING]: The use of 'static' has been deprecated. Use \n'import_tasks' for static
inclusion, or 'include_tasks' for dynamic inclusion. \nThis feature will be removed in a future release. Deprecation warnings can be \ndisabled by setting deprecation_warnings=False in ansible.cfg.\n [WARNING]: Could not match supplied host pattern, ignoring:
oo_all_hosts\n [WARNING]: Could not match supplied host pattern, ignoring: oo_lb_to_config\n [WARNING]: Could not match supplied host pattern, ignoring: oo_nfs_to_config\n [WARNING]: Consider using yum, dnf or zypper module rather than running rpm\n [WARNING]:
Consider using unarchive module rather than running tar\n [WARNING]: Consider using get_url or uri module rather than running curl\n [WARNING]: Could not match supplied host pattern, ignoring:\noo_containerized_master_nodes\n [WARNING]: Could not match supplied
host pattern, ignoring:\noo_nodes_use_flannel\n [WARNING]: Could not match supplied host pattern, ignoring:\noo_nodes_use_calico\n [WARNING]: Could not match supplied host pattern, ignoring:\noo_nodes_use_contiv\n [WARNING]: Could not match supplied host pattern,
ignoring: oo_nodes_use_kuryr\n [WARNING]: Could not match supplied host pattern, ignoring: oo_nodes_use_nuage\n [WARNING]: Could not match supplied host pattern, ignoring: glusterfs\n [WARNING]: Could not match supplied host pattern, ignoring: glusterfs_registry\n
[WARNING]: Module did not set no_log for stats_password\n [WARNING]: Module did not set no_log for external_host_password\nWarning: Permanently added 'ocpcluster-master-0,10.1.0.9' (ECDSA) to the list of known hosts.\r\nerror: 'openshift-infra' already has a value
(apiserver), and --overwrite is false\n\"."\r\n }\r\n ]\r\n }\r\n}"
}
]
}
]
}
}'
At line:1 char:1

  • New-AzureRmResourceGroupDeployment -Name OCPDeploy -ResourceGroupName ...
  •   + CategoryInfo          : NotSpecified: (:) [New-AzureRmResourceGroupDeployment], Exception
      + FullyQualifiedErrorId : Microsoft.Azure.Commands.ResourceManager.Cmdlets.Implementation.NewAzureResourceGroupDeploymentCmdlet
    
    

New-AzureRmResourceGroupDeployment : 5:30:59 PM - At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.
At line:1 char:1

  • New-AzureRmResourceGroupDeployment -Name OCPDeploy -ResourceGroupName ...
  •   + CategoryInfo          : NotSpecified: (:) [New-AzureRmResourceGroupDeployment], Exception
      + FullyQualifiedErrorId : Microsoft.Azure.Commands.ResourceManager.Cmdlets.Implementation.NewAzureResourceGroupDeploymentCmdlet
    
    

New-AzureRmResourceGroupDeployment : 5:30:59 PM - Template output evaluation skipped: at least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.
At line:1 char:1

  • New-AzureRmResourceGroupDeployment -Name OCPDeploy -ResourceGroupName ...
  •   + CategoryInfo          : NotSpecified: (:) [New-AzureRmResourceGroupDeployment], Exception
      + FullyQualifiedErrorId : Microsoft.Azure.Commands.ResourceManager.Cmdlets.Implementation.NewAzureResourceGroupDeploymentCmdlet
    
    

New-AzureRmResourceGroupDeployment : 5:30:59 PM - Template output evaluation skipped: at least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.
At line:1 char:1

  • New-AzureRmResourceGroupDeployment -Name OCPDeploy -ResourceGroupName ...
  •   + CategoryInfo          : NotSpecified: (:) [New-AzureRmResourceGroupDeployment], Exception
      + FullyQualifiedErrorId : Microsoft.Azure.Commands.ResourceManager.Cmdlets.Implementation.NewAzureResourceGroupDeploymentCmdlet
    
    
    

Details:
Deployment mode: Powershell
Command: PS C:\WINDOWS\system32> New-AzureRmResourceGroupDeployment -Name OCPDeploy -ResourceGroupName OCPRG -TemplateFile C:\Users\mandava\openshift-container-platform-master\azuredeploy.json -TemplateParameterFile C:\Users\mandava\openshift-container-platform-master\azuredeploy
.parameters.json
Docker version: 1.12.6
OpenShift version: 3.7
Instructions followed from: https://docs.microsoft.com/en-us/azure/virtual-machines/linux/openshift-prerequisites

I did looked into logs in folders "0" and "1" on bastion node and i did not find any more details other than what's in the failure. I did not find anything wrong in deployOpenShift.sh script and i do not see a parameter "--overwrite" to change the value to "true".
Attaching logs from "0" & "1" folders for reference along with the scripts i'm using. Please suggest a solution for this failure. Thank you!

stderr_1.txt
stdout_1.txt
stderr.txt
stdout.txt
bastionPrep.sh.txt
deployOpenShift.sh.txt

azuredeploy.parameters.json.txt
azuredeploy.json.txt

VM has reported a failure when processing extension 'deployOpenShift' on Azure Stack.

Please can somebody help, the installation keeps on failing with the message "

"VM has reported a failure when processing extension 'deployOpenShift'. Error message: Enable failed: failed to execute command: command terminated with exit status=4"

For now i am using a RHEL trial license with 10 hosts and OpenShift.
masters = 3
Infra = 2
Node =1
CNS =true (3 hosts)
Bastion = 1

If somebody can help or if more information is needed, please let me know.

{"code":"DeploymentFailed","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.","details":[{"code":"Conflict","message":"{\r\n "status": "Failed",\r\n "error": {\r\n "code": "ResourceDeploymentFailure",\r\n "message": "The resource operation completed with terminal provisioning state 'Failed'.",\r\n "details": [\r\n {\r\n "code": "VMExtensionProvisioningError",\r\n "message": "VM has reported a failure when processing extension 'deployOpenShift'. Error message: Enable failed: failed to execute command: command terminated with exit status=4\n[stdout]\nK [Evaluate oo_etcd_hosts_to_backup] ****************************************\nok: [localhost] => (item=tmw-d-mstr-0)\nok: [localhost] => (item=tmw-d-mstr-1)\nok: [localhost] => (item=tmw-d-mstr-2)\n\nTASK [Evaluate oo_nodes_to_config] *********************************************\nok: [localhost] => (item=tmw-d-mstr-0)\nok: [localhost] => (item=tmw-d-mstr-1)\nok: [localhost] => (item=tmw-d-mstr-2)\nok: [localhost] => (item=tmw-d-inf-0)\nok: [localhost] => (item=tmw-d-inf-1)\nok: [localhost] => (item=tmw-d-node-0)\nok: [localhost] => (item=tmw-d-cns-0)\nok: [localhost] => (item=tmw-d-cns-1)\nok: [localhost] => (item=tmw-d-cns-2)\n\nTASK [Add master to oo_nodes_to_config] ****************************************\nskipping: [localhost] => (item=tmw-d-mstr-0) \nskipping: [localhost] => (item=tmw-d-mstr-1) \nskipping: [localhost] => (item=tmw-d-mstr-2) \n\nTASK [Evaluate oo_lb_to_config] ************************************************\n\nTASK [Evaluate oo_nfs_to_config] ***********************************************\n\nTASK [Evaluate oo_glusterfs_to_config] *****************************************\nok: [localhost] => (item=tmw-d-cns-0)\nok: [localhost] => (item=tmw-d-cns-1)\nok: [localhost] => (item=tmw-d-cns-2)\n\nTASK [Evaluate oo_etcd_to_migrate] *********************************************\nok: [localhost] => (item=tmw-d-mstr-0)\nok: [localhost] => (item=tmw-d-mstr-1)\nok: [localhost] => (item=tmw-d-mstr-2)\n\nPLAY [Install and configure NetworkManager] ************************************\n\nTASK [Gathering Facts] *********************************************************\nfatal: [tmw-d-inf-0]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Failed to connect to the host via ssh: ssh: Could not resolve hostname tmw-d-inf-0: Name or service not known\\r\\n\", \"unreachable\": true}\nfatal: [tmw-d-cns-1]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Failed to connect to the host via ssh: ssh: Could not resolve hostname tmw-d-cns-1: Name or service not known\\r\\n\", \"unreachable\": true}\nok: [tmw-d-cns-0]\nok: [tmw-d-cns-2]\nok: [tmw-d-mstr-0]\nok: [tmw-d-mstr-2]\nok: [tmw-d-inf-1]\nok: [tmw-d-node-0]\nok: [tmw-d-mstr-1]\n\nTASK [install NetworkManager] **************************************************\nok: [tmw-d-inf-1]\nok: [tmw-d-mstr-0]\nok: [tmw-d-mstr-2]\nok: [tmw-d-cns-2]\nok: [tmw-d-mstr-1]\nok: [tmw-d-node-0]\nok: [tmw-d-cns-0]\n\nTASK [configure NetworkManager] ************************************************\nchanged: [tmw-d-inf-1] => (item=USE_PEERDNS)\nchanged: [tmw-d-mstr-2] => (item=USE_PEERDNS)\nchanged: [tmw-d-node-0] => (item=USE_PEERDNS)\nchanged: [tmw-d-cns-2] => (item=USE_PEERDNS)\nchanged: [tmw-d-mstr-0] => (item=USE_PEERDNS)\nchanged: [tmw-d-mstr-1] => (item=USE_PEERDNS)\nchanged: [tmw-d-cns-0] => (item=USE_PEERDNS)\nok: [tmw-d-mstr-2] => (item=NM_CONTROLLED)\nok: [tmw-d-inf-1] => (item=NM_CONTROLLED)\nok: [tmw-d-node-0] => (item=NM_CONTROLLED)\nok: [tmw-d-mstr-0] => (item=NM_CONTROLLED)\nok: [tmw-d-cns-2] => (item=NM_CONTROLLED)\nok: [tmw-d-cns-0] => (item=NM_CONTROLLED)\nok: [tmw-d-mstr-1] => (item=NM_CONTROLLED)\n\nTASK [enable and start NetworkManager] *****************************************\nok: [tmw-d-mstr-0]\nok: [tmw-d-node-0]\nok: [tmw-d-cns-0]\nok: [tmw-d-cns-2]\nok: [tmw-d-mstr-2]\nok: [tmw-d-inf-1]\nok: [tmw-d-mstr-1]\n\nPLAY RECAP *********************************************************************\nlocalhost : ok=12 changed=0 unreachable=0 failed=0 \ntmw-d-cns-0 : ok=4 changed=1 unreachable=0 failed=0 \ntmw-d-cns-1 : ok=0 changed=0 unreachable=1 failed=0 \ntmw-d-cns-2 : ok=4 changed=1 unreachable=0 failed=0 \ntmw-d-inf-0 : ok=0 changed=0 unreachable=1 failed=0 \ntmw-d-inf-1 : ok=4 changed=1 unreachable=0 failed=0 \ntmw-d-mstr-0 : ok=4 changed=1 unreachable=0 failed=0 \ntmw-d-mstr-1 : ok=4 changed=1 unreachable=0 failed=0 \ntmw-d-mstr-2 : ok=4 changed=1 unreachable=0 failed=0 \ntmw-d-node-0 : ok=4 changed=1 unreachable=0 failed=0 \n\n\n[stderr]\n# tmw-d-cns-0:22 SSH-2.0-OpenSSH_7.4\n# tmw-d-cns-0:22 SSH-2.0-OpenSSH_7.4\n# tmw-d-cns-0:22 SSH-2.0-OpenSSH_7.4\nWarning: Permanently added the ECDSA host key for IP address '10.1.0.6' to the list of known hosts.\r\ngetaddrinfo tmw-d-cns-1: Name or service not known\r\ngetaddrinfo tmw-d-cns-1: Name or service not known\r\ngetaddrinfo tmw-d-cns-1: Name or service not known\r\nssh: Could not resolve hostname tmw-d-cns-1: Name or service not known\r\n# tmw-d-cns-2:22 SSH-2.0-OpenSSH_7.4\n# tmw-d-cns-2:22 SSH-2.0-OpenSSH_7.4\n# tmw-d-cns-2:22 SSH-2.0-OpenSSH_7.4\nWarning: Permanently added the ECDSA host key for IP address '10.1.0.7' to the list of known hosts.\r\n [WARNING]: Could not create retry file '/usr/share/ansible/openshift-\nansible/playbooks/openshift-node/network_manager.retry'. [Errno 13]\nPermission denied: u'/usr/share/ansible/openshift-ansible/playbooks/openshift-\nnode/network_manager.retry'\n"\r\n }\r\n ]\r\n }\r\n}"}]}

Kind Regards,
Arie Heukels

Do Not use Premium_LRS for Docker Registry storage account

Not sure if you have a Notes section for this project, however, I ran into the following so I thought I would make folks aware and maybe save people some time. Ran into the following issue when I changed the Docker Registry storage account to Premium_LRS:

https://github.com/theobolo/kube-registry-azure/issues/1

When the Docker Registry storage account is set to Premium_LRS, everything will deploy fine, however, when you go to deploy an application, the build will bomb out because the Docker Registry storage account will be inaccessible. To fix, scale the docker-registry pod to zero, delete the storage account and recreate with a different name using Standard_LRS, then create a new container under the new storage account using the same name as in the previous storage account. Once created, copy the access key for the new storage account and in the environment settings for the docker-registry deployment, change the storage account name to the new name and paste the access key into REGISTRY_STORAGE_AZURE_ACCOUNTKEY. Save changes and a new docker-registry pod will deploy. Application deployments will now work correctly.

asb fails for 3.7

--> Scaling asb-1 to 1
--> Error listing events for replication controller asb-1: Get https://172.30.0.1:443/api/v1/namespaces/openshift-ansible-service-broker/events?fieldSelector=involvedObject.kind%3DReplicationController%2CinvolvedObject.name%3Dasb-1%2CinvolvedObject.namespace%3Dopenshift-ansible-service-broker%2CinvolvedObject.uid%3D0837631d-2dba-11e8-b8e9-000d3a395454: dial tcp 172.30.0.1:443: getsockopt: connection refused
error: update acceptor rejected asb-1: watch closed before Until timeout

Marketplace OS Image - BYOS - cannot be used

Hello Guys,

I'm raising an issue regarding the marketplaceOSImage.
So I don't pay double for the RHEL subscriptions, I'm trying to use the marketplace image that is BYOS.

I entered the appropriate values for 'publisher', 'offer', 'sku', 'version' of the marketplace offer. (they got updated, we can now chose between rhel-raw75, rhel-lvm74 and rhel-lvm75).

The issue I have is that this template is private. I tried to accept the marketplace terms, but it's not working :) !

b'{"error":{"code":"MarketplacePurchaseEligibilityFailed","message":"Marketplace purchase eligibilty check returned errors. See inner errors for details. ","details":[{"code":"BadRequest","message":"Offer with PublisherId: redhat, OfferId: rhel-byos cannot be purchased due to validation errors. See details for more information.[{\"Legal terms have not been accepted for this item on this subscription. To accept legal terms using PowerShell, please use Get-AzureRmMarketplaceTerms and Set-AzureRmMarketplaceTerms API(https://go.microsoft.com/fwlink/?linkid=862451) or deploy via the Azure portal to accept the terms\":\"StoreApi\"},{\"Offer with PublisherId: redhat, OfferId: rhel-byos, PlanId: rhel-raw75 is private and can not be purchased by subscritpionId: *******************************\":\"StoreApi\"}]"}]}}'
msrest.exceptions : Marketplace purchase eligibilty check returned errors. See inner errors for details.

If you have an idea, that would be wonderful!!

Thank you so much,

Best regards,
William

Private masterClusterType role-out fails

Describe the bug
When choosing masterClusterType = Private, the deployment fails.

To Reproduce
Steps to reproduce the behavior:

  1. Set the masterClusterType Private
  2. Deploy a cluster with minimal settings (no metrics, no logging, no cns), 1 master, 1 infra, 1 node (I have tried with 3 masters, 3 infra and 3 nodes as well, same issue)
  3. Wait for the deployment to fail.
  4. See the error

Expected behavior
I expect that this would result in a cluster with a internal master loadbalancer. When choosing public everyting works fine.

stdout
changed: [ose11vip-master-0]
TASK [ansible_service_broker : create route for dashboard-redirector service] ***
TASK [ansible_service_broker : Set Ansible Service Broker deployment config] ***
changed: [ose11vip-master-0]
TASK [ansible_service_broker : set auth name and type facts if needed] *********
TASK [ansible_service_broker : Create config map for ansible-service-broker] ***
changed: [ose11vip-master-0]
TASK [ansible_service_broker : oc_secret] **************************************
TASK [ansible_service_broker : Create the Broker resource in the catalog] ******
changed: [ose11vip-master-0]
TASK [ansible_service_broker : include_tasks] **********************************
TASK [template_service_broker : include_tasks] *********************************
included: /usr/share/ansible/openshift-ansible/roles/template_service_broker/tasks/install.yml for ose11vip-master-0
TASK [template_service_broker : include_tasks] *********************************
included: /usr/share/ansible/openshift-ansible/roles/template_service_broker/tasks/deploy.yml for ose11vip-master-0
TASK [template_service_broker : oc_project] ************************************
changed: [ose11vip-master-0]
TASK [template_service_broker : command] ***************************************
ok: [ose11vip-master-0]
TASK [template_service_broker : Copy admin client config] **********************
ok: [ose11vip-master-0]
TASK [template_service_broker : copy] ******************************************
changed: [ose11vip-master-0] => (item=apiserver-template.yaml)
changed: [ose11vip-master-0] => (item=rbac-template.yaml)
changed: [ose11vip-master-0] => (item=template-service-broker-registration.yaml)
changed: [ose11vip-master-0] => (item=apiserver-config.yaml)
TASK [template_service_broker : yedit] *****************************************
ok: [ose11vip-master-0]
TASK [template_service_broker : slurp] *****************************************
ok: [ose11vip-master-0]
TASK [template_service_broker : Apply template file] ***************************
changed: [ose11vip-master-0]
TASK [template_service_broker : Reconcile with RBAC file] **********************
changed: [ose11vip-master-0]
TASK [template_service_broker : Verify that TSB is running] ********************
FAILED - RETRYING: Verify that TSB is running (60 retries left).
FAILED - RETRYING: Verify that TSB is running (59 retries left).
ok: [ose11vip-master-0]
TASK [template_service_broker : slurp] *****************************************
ok: [ose11vip-master-0]
TASK [template_service_broker : Register TSB with broker] **********************
changed: [ose11vip-master-0]
TASK [template_service_broker : file] ******************************************
ok: [ose11vip-master-0]
TASK [template_service_broker : include_tasks] *********************************
PLAY [Service Catalog Install Checkpoint End] **********************************
TASK [Set Service Catalog install 'Complete'] **********************************
ok: [ose11vip-master-0]
PLAY RECAP *********************************************************************
localhost : ok=11 changed=0 unreachable=0 failed=0
ose11vip-infra-0 : ok=0 changed=0 unreachable=0 failed=0
ose11vip-master-0 : ok=131 changed=49 unreachable=0 failed=0
ose11vip-node-0 : ok=0 changed=0 unreachable=0 failed=0
INSTALLER STATUS ***************************************************************
Initialization : Complete (0:00:41)
Service Catalog Install : Complete (0:04:00)
Now using project "osba" on server "https://ose11vip-master-0:443".
You can add applications to this project with the 'new-app' command. For example, try:
oc new-app centos/ruby-25-centos7~https://github.com/sclorg/ruby-ex.git
to build a new example application in Ruby.
secret/osba-redis created
secret/osba-open-service-broker-azure-auth created
secret/osba-open-service-broker-azure created
service/osba-redis created
service/osba-open-service-broker-azure created
persistentvolumeclaim/osba-redis-pv-claim created
deployment.extensions/osba-redis created
deployment.extensions/osba-open-service-broker-azure created
clusterservicebroker.servicecatalog.k8s.io/osba created
Thu Feb 14 15:41:56 UTC 2019 - Configure cluster for private masters

stderr
Warning: Permanently added 'ose11vip-master-0,10.1.0.5' (ECDSA) to the list of known hosts.
[WARNING]: Could not match supplied host pattern, ignoring: oo_lb_to_config
[WARNING]: Could not match supplied host pattern, ignoring: oo_nfs_to_config
ERROR! vars file vars.yaml was not found on the Ansible Controller.
If you are using a module and expect the file to exist on the remote, see the remote_src option

Template Information (please complete the following information):

  • OS: Redhat (standard from template, latest installs RHEL 7.5
  • Branch: Master

Additional context
This problem is here already from start. However it has not been solved yet in the role-out of 3.11.

Deployment Error 3.9 Release

Deployed as following:

  1. docker run -ti docker4x/create-sp-azure openshiftsp

Your access credentials ================================================== AD ServicePrincipal App ID: xxxxxx AD ServicePrincipal App Secret: xxxxxx AD ServicePrincipal Tenant ID: xxxxxx

  1. az group create -n openshiftkvrg -l 'East US' && az keyvault create -n openshiftprivkeys -g openshiftkvrg -l 'East US' --enabled-for-template-deployment true && az keyvault secret set --vault-name <<vault>> -n <<secret>> --file id_rsa

Deploying total 10 -> 1bastion/3masters/3infra/3nodes as default for 3.9

  1. az group create -l eastus -n openshiftclusterrg && az group deployment create -g openshiftclusterrg -n openshiftcluster --template-uri https://raw.githubusercontent.com/microsoft/openshift-container-platform/master/azuredeploy.json --parameters "{\"openshiftPassword\":{\"value\":\"<<password>>\"},\"enableMetrics\":{\"value\":\"true\"},\"enableLogging\":{\"value\":\"true\"},\"keyVaultResourceGroup\":{\"value\":\"openshiftkvrg\"},\"sshPublicKey\":{\"value\":\"<<ssh public key>>\"},\"rhsmUsernameOrOrgId\":{\"value\":\"<<rhsm username>>\"},\"rhsmPasswordOrActivationKey\":{\"value\":\"<<rhsm password>>"},\"rhsmPoolId\":{\"value\":\"<<xxxxxx>>"},\"keyVaultName\":{\"value\":\"<<vaultname>>\"},\"keyVaultSecret\":{\"value\":\"<<secret>>\"},\"aadClientId\":{\"value\":\"<<xxxxxx>>\"},\"aadClientSecret\":{\"value\":\"<<xxxxxx>>\"}}" --debug

Hitting the following error:

`

Deployment failed. Correlation ID: xxxxxxxxxxxxx. {
  "status": "Failed",
  "error": {
    "code": "ResourceDeploymentFailure",
    "message": "The resource operation completed with terminal provisioning state 'Failed'.",
    "details": [
      {
        "code": "DeploymentFailed",
        "message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.",
        "details": [
          {
            "code": "Conflict",
            "message": "{\r\n  \"status\": \"Failed\",\r\n  \"error\": {\r\n    \"code\": \"ResourceDeploymentFailure\",\r\n    \"message\": \"The resource operation completed with terminal provisioning state 'Failed'.\",\r\n    \"details\": [\r\n      {\r\n        \"code\": \"VMExtensionProvisioningError\",\r\n        \"message\": \"VM has reported a failure when processing extension 'deployOpenShift'. Error message: \\\"Enable failed: failed to execute command: command terminated with exit status=1\\n[stdout]\\nThu Jun 21 11:52:16 UTC 2018  - Starting Script\\nConfiguring SSH ControlPath to use shorter path name\\nThu Jun 21 11:52:16 UTC 2018  - Create Ansible Playbooks for Post Installation tasks\\nThu Jun 21 11:52:16 UTC 2018  - Creating Master nodes grouping\\nThu Jun 21 11:52:16 UTC 2018  - Creating Infra nodes grouping\\nThu Jun 21 11:52:16 UTC 2018  - Creating Nodes grouping\\n\\n[stderr]\\n  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\\n                                 Dload  Upload   Total   Spent    Left  Speed\\n\\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\\r100     6  100     6    0     0    599      0 --:--:-- --:--:-- --:--:--   666\\nfatal: destination path 'openshift-container-platform-playbooks' already exists and is not an empty directory.\\n\\\".\"\r\n      }\r\n    ]\r\n  }\r\n}"
          }
        ]
      }
    ]
  }
}

`

after installation, the node tags are missing - causes fluentd deployment not to take place

As part of the install process, the scripts delete the node and restarts the atomic-openshift-node agent to recreate the node config: https://github.com/Microsoft/openshift-container-platform/blob/master/scripts/deployOpenShift.sh#L375

this causes all tags - especially "logging-infra-fluentd: 'true'" to go missing, hence fluentd is not deployed.

Repair: just label all nodes again with logging-infra-fluentd: 'true'

oc label nodes --all logging-infra-fluentd=true

@ivanthelad @rschickhaus

Feature Request: Support for Satellite servers as a method of attaching entitlements.

We're using Red Hat Satellite to manage entitlements. It would be helpful to allow the use of a Satellite server during the provisioning process.

In our use case, a large University, multiple schools run different satellite servers, but we share an enterprise account for license management, so the current deployment in the Marketplace and this repository are, currently, not usable.

Thanks!

bastionPrep.sh caused recent deploy to fail

This line in ./scripts/bastionPrep.sh caused my recent deploy to fail...

yum install atomic-openshift-excluder atomic-openshift-docker-excluder

From stdout on bastion node...
atomic-openshift-excluder not found

Script line should be...
yum -y install atomic-openshift-excluder atomic-openshift-docker-excluder

Execute the deployopenshift.sh script and report an error

Describe the bug
An error occurred in azure executing your script in China
Modify the script link: https://openshift1.blob.core.chinacloudapi.cn/script/deployOpenShift.sh

To Reproduce
An error was reported after running the script

Expected behavior
Successfully deployed openshift automation scripts in azure China

Screenshots
error

stdout
Include the last 100 lines of stdout from Bastion host - see troubleshooting https://docs.microsoft.com/en-us/azure/virtual-machines/linux/openshift-troubleshooting

Template Information (please complete the following information):

  • OS: [e.g. iOS]
  • Branch: [e.g. master]

Additional context
Add any other context about the problem here.

OpenShift deploy fails during "Rebooting cluster to complete installation"

I'm working on building a POC of OpenShift (OCP) on Azure for one of our customer. I ran into multiple issues (#48, 51 & 53) and i bypassed them by following the workarounds suggested and with the help of @dwaiba. Now i run into below issue:

New-AzureRmResourceGroupDeployment : 5:30:59 PM - Resource Microsoft.Resources/deployments 'OpenShiftDeployment' failed with message '{
"status": "Failed",
"error": {
"code": "ResourceDeploymentFailure",
"message": "The resource operation completed with terminal provisioning state 'Failed'.",
"details": [
{
"code": "DeploymentFailed",
"message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.",
"details": [
{
"code": "Conflict",
"message": "{\r\n "status": "Failed",\r\n "error": {\r\n "code": "ResourceDeploymentFailure",\r\n "message": "The resource operation completed with terminal provisioning state 'Failed'.",\r\n "details": [\r\n {\r\n
"code": "VMExtensionProvisioningError",\r\n "message": "VM has reported a failure when processing extension 'deployOpenShift'. Error message: "Enable failed: failed to execute command: command terminated with exit
status=1\n[stdout]\nonf']})\nchanged: [ocpcluster-master-2] => (item={u'key': u'kubeletArguments.cloud-provider', u'value': [u'azure']})\n\nRUNNING HANDLER [restart atomic-openshift-node] ********************************\nchanged: [ocpcluster-master-2]\n\nPLAY
RECAP *********************************************************************\nocpcluster-master-0 : ok=4 changed=2 unreachable=0 failed=0 \nocpcluster-master-1 : ok=4 changed=2 unreachable=0 failed=0 \nocpcluster-master-2 :
ok=4 changed=2 unreachable=0 failed=0 \n\nThu Mar 22 00:27:10 UTC 2018 - Cloud Provider setup of node config on Master Nodes completed successfully\nThu Mar 22 00:27:10 UTC 2018 - Sleep for 60\n\nPLAY [nodes:!masters]
**********************************************************\n\nTASK [make sure /etc/azure exists] *********************************************\nchanged: [ocpcluster-infra-0]\n\nTASK [populate /etc/azure/azure.conf]
******************************************\nchanged: [ocpcluster-infra-0]\n\nTASK [insert the azure disk config into the node] ******************************\nchanged: [ocpcluster-infra-0] => (item={u'key': u'kubeletArguments.cloud-config', u'value':
[u'/etc/azure/azure.conf']})\nchanged: [ocpcluster-infra-0] => (item={u'key': u'kubeletArguments.cloud-provider', u'value': [u'azure']})\n\nRUNNING HANDLER [restart atomic-openshift-node] ********************************\nchanged: [ocpcluster-infra-0]\n\nPLAY
[nodes:!masters] **********************************************************\n\nTASK [make sure /etc/azure exists] *********************************************\nchanged: [ocpcluster-infra-1]\n\nTASK [populate /etc/azure/azure.conf]
******************************************\nchanged: [ocpcluster-infra-1]\n\nTASK [insert the azure disk config into the node] ******************************\nchanged: [ocpcluster-infra-1] => (item={u'key': u'kubeletArguments.cloud-config', u'value':
[u'/etc/azure/azure.conf']})\nchanged: [ocpcluster-infra-1] => (item={u'key': u'kubeletArguments.cloud-provider', u'value': [u'azure']})\n\nRUNNING HANDLER [restart atomic-openshift-node] ********************************\nchanged: [ocpcluster-infra-1]\n\nPLAY
[nodes:!masters] **********************************************************\n\nTASK [make sure /etc/azure exists] *********************************************\nchanged: [ocpcluster-node-0]\n\nTASK [populate /etc/azure/azure.conf]
******************************************\nchanged: [ocpcluster-node-0]\n\nTASK [insert the azure disk config into the node] ******************************\nchanged: [ocpcluster-node-0] => (item={u'key': u'kubeletArguments.cloud-config', u'value':
[u'/etc/azure/azure.conf']})\nchanged: [ocpcluster-node-0] => (item={u'key': u'kubeletArguments.cloud-provider', u'value': [u'azure']})\n\nRUNNING HANDLER [restart atomic-openshift-node] ********************************\nchanged: [ocpcluster-node-0]\n\nPLAY RECAP
*********************************************************************\nocpcluster-infra-0 : ok=4 changed=4 unreachable=0 failed=0 \nocpcluster-infra-1 : ok=4 changed=4 unreachable=0 failed=0 \nocpcluster-node-0 : ok=4
changed=4 unreachable=0 failed=0 \n\nThu Mar 22 00:28:23 UTC 2018 - Cloud Provider setup of node config on App Nodes completed successfully\nThu Mar 22 00:28:23 UTC 2018 - Sleep for 120\n\nPLAY [masters]
*****************************************************************\n\nTASK [set masters as unschedulable] ********************************************\nchanged: [ocpcluster-master-0]\nchanged: [ocpcluster-master-2]\nchanged: [ocpcluster-master-1]\n\nPLAY RECAP
*********************************************************************\nocpcluster-master-0 : ok=1 changed=1 unreachable=0 failed=0 \nocpcluster-master-1 : ok=1 changed=1 unreachable=0 failed=0 \nocpcluster-master-2 : ok=1
changed=1 unreachable=0 failed=0 \n\nThu Mar 22 00:30:25 UTC 2018 - Cloud Provider setup of OpenShift Cluster completed successfully\nThu Mar 22 00:30:25 UTC 2018 - Rebooting cluster to complete installation\n\n[stderr]\n % Total % Received % Xferd
Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 6 100 6 0 0 590 0 --:--:--
--:--:-- --:--:-- 666\n[DEPRECATION WARNING]: 'include' for playbook includes. You should use \n'import_playbook' instead. This feature will be removed in version 2.8. \nDeprecation warnings can be disabled by setting deprecation_warnings=False in
\nansible.cfg.\n[DEPRECATION WARNING]: 'include' for playbook includes. You should use \n'import_playbook' instead. This feature will be removed in version 2.8. \nDeprecation warnings can be disabled by setting deprecation_warnings=False in
\nansible.cfg.\n[DEPRECATION WARNING]: The use of 'static' for 'include_role' has been \ndeprecated. Use 'import_role' for static inclusion, or 'include_role' for \ndynamic inclusion. This feature will be removed in a future release. \nDeprecation warnings can be
disabled by setting deprecation_warnings=False in \nansible.cfg.\n[DEPRECATION WARNING]: The use of 'include' for tasks has been deprecated. Use \n'import_tasks' for static inclusions or 'include_tasks' for dynamic inclusions.\n This feature will be removed in a
future release. Deprecation warnings can be \ndisabled by setting deprecation_warnings=False in ansible.cfg.\n[DEPRECATION WARNING]: include is kept for backwards compatibility but usage is\n discouraged. The module documentation details page may explain more about
this\n rationale.. This feature will be removed in a future release. Deprecation \nwarnings can be disabled by setting deprecation_warnings=False in ansible.cfg.\n[DEPRECATION WARNING]: The use of 'static' has been deprecated. Use \n'import_tasks' for static
inclusion, or 'include_tasks' for dynamic inclusion. \nThis feature will be removed in a future release. Deprecation warnings can be \ndisabled by setting deprecation_warnings=False in ansible.cfg.\n [WARNING]: Could not match supplied host pattern, ignoring:
oo_all_hosts\n [WARNING]: Could not match supplied host pattern, ignoring: oo_lb_to_config\n [WARNING]: Could not match supplied host pattern, ignoring: oo_nfs_to_config\n [WARNING]: Consider using yum, dnf or zypper module rather than running rpm\n [WARNING]:
Consider using unarchive module rather than running tar\n [WARNING]: Consider using get_url or uri module rather than running curl\n [WARNING]: Could not match supplied host pattern, ignoring:\noo_containerized_master_nodes\n [WARNING]: Could not match supplied
host pattern, ignoring:\noo_nodes_use_flannel\n [WARNING]: Could not match supplied host pattern, ignoring:\noo_nodes_use_calico\n [WARNING]: Could not match supplied host pattern, ignoring:\noo_nodes_use_contiv\n [WARNING]: Could not match supplied host pattern,
ignoring: oo_nodes_use_kuryr\n [WARNING]: Could not match supplied host pattern, ignoring: oo_nodes_use_nuage\n [WARNING]: Could not match supplied host pattern, ignoring: glusterfs\n [WARNING]: Could not match supplied host pattern, ignoring: glusterfs_registry\n
[WARNING]: Module did not set no_log for stats_password\n [WARNING]: Module did not set no_log for external_host_password\nWarning: Permanently added 'ocpcluster-master-0,10.1.0.9' (ECDSA) to the list of known hosts.\r\nerror: 'openshift-infra' already has a value
(apiserver), and --overwrite is false\n"."\r\n }\r\n ]\r\n }\r\n}"
}
]
}
]
}
}'
At line:1 char:1

New-AzureRmResourceGroupDeployment -Name OCPDeploy -ResourceGroupName ...

  • CategoryInfo : NotSpecified: (:) [New-AzureRmResourceGroupDeployment], Exception
  • FullyQualifiedErrorId : Microsoft.Azure.Commands.ResourceManager.Cmdlets.Implementation.NewAzureResourceGroupDeploymentCmdlet

New-AzureRmResourceGroupDeployment : 5:30:59 PM - At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.
At line:1 char:1

New-AzureRmResourceGroupDeployment -Name OCPDeploy -ResourceGroupName ...

  • CategoryInfo : NotSpecified: (:) [New-AzureRmResourceGroupDeployment], Exception
  • FullyQualifiedErrorId : Microsoft.Azure.Commands.ResourceManager.Cmdlets.Implementation.NewAzureResourceGroupDeploymentCmdlet

New-AzureRmResourceGroupDeployment : 5:30:59 PM - Template output evaluation skipped: at least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.
At line:1 char:1

New-AzureRmResourceGroupDeployment -Name OCPDeploy -ResourceGroupName ...

  • CategoryInfo : NotSpecified: (:) [New-AzureRmResourceGroupDeployment], Exception
  • FullyQualifiedErrorId : Microsoft.Azure.Commands.ResourceManager.Cmdlets.Implementation.NewAzureResourceGroupDeploymentCmdlet

New-AzureRmResourceGroupDeployment : 5:30:59 PM - Template output evaluation skipped: at least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.
At line:1 char:1

New-AzureRmResourceGroupDeployment -Name OCPDeploy -ResourceGroupName ...

  • CategoryInfo : NotSpecified: (:) [New-AzureRmResourceGroupDeployment], Exception
  • FullyQualifiedErrorId : Microsoft.Azure.Commands.ResourceManager.Cmdlets.Implementation.NewAzureResourceGroupDeploymentCmdlet

Details:
Deployment mode: Powershell
Command: PS C:\WINDOWS\system32> New-AzureRmResourceGroupDeployment -Name OCPDeploy -ResourceGroupName OCPRG -TemplateFile C:\Users\mandava\openshift-container-platform-master\azuredeploy.json -TemplateParameterFile C:\Users\mandava\openshift-container-platform-master\azuredeploy
.parameters.json
Docker version: 1.12.6
OpenShift version: 3.7
Instructions followed from: https://docs.microsoft.com/en-us/azure/virtual-machines/linux/openshift-prerequisites

I did looked into logs in folders "0" and "1" on bastion node and i did not find any more details other than what's in the failure. I did not find anything wrong in deployOpenShift.sh script and i do not see a parameter "--overwrite" to change the value to "true".
Attaching logs from "0" & "1" folders for reference along with the scripts i'm using. Please suggest a solution for this failure. Thank you!

3.7 support

Before I try to deploy a new cluster, I just wanted to see if Origin 3.7 is supported with the Azure deployment scripts.

docker cannot be started due to the OPTION configuration

I am trying to install OCP3.11 with this scripts and during the step 1 it shows the docker daemon cannot be started on Masters and Nodes.

I guess it may be related to: (in masterPrep.sh and nodePrep.sh)

Update docker storage

echo "

Adding insecure-registry option required by OpenShift

OPTIONS="$OPTIONS --insecure-registry 172.30.0.0/16"
" >> /etc/sysconfig/docker

Where is the $OPTIONS comes from ? I cannot find any var linked to this, and if it directly add "$OPTION" into /etc/sysconfig/docker, it do cannot be started.

So could you help with this ? Thank you.

deployOpenshift.sh unable to find etc/ansible/ansible.cfg

Deployment failed. Correlation ID: 7a334184-682a-48d4-bec6-041a67552f9c. {
"status": "Failed",
"error": {
"code": "ResourceDeploymentFailure",
"message": "The resource operation completed with terminal provisioning state 'Failed'.",
"details": [
{
"code": "DeploymentFailed",
"message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.",
"details": [
{
"code": "Conflict",
"message": "{\r\n "status": "Failed",\r\n "error": {\r\n "code": "ResourceDeploymentFailure",\r\n "message": "The resource operation completed with terminal provisioning state 'Failed'.",\r\n "details": [\r\n {\r\n "code": "VMExtensionProvisioningError",\r\n "message": "VM has reported a failure when processing extension 'deployOpenShift'. Error message: \"Enable failed: failed to execute command: command terminated with exit status=2\n[stdout]\nThu May 17 10:25:46 UTC 2018 - Starting Script\nConfiguring SSH ControlPath to use shorter path name\n\n[stderr]\n % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 13 100 13 0 0 1013 0 --:--:-- --:--:-- --:--:-- 1083\nsed: can't read /etc/ansible/ansible.cfg: No such file or directory\n\"."\r\n }\r\n ]\r\n }\r\n}"
}
]
}
]
}
}

A snip in deployOpenshift.sh
echo "Configuring SSH ControlPath to use shorter path name"

sed -i -e "s/^# control_path = %(directory)s/%%h-%%r/control_path = %(directory)s/%%h-%%r/" /etc/ansible/ansible.cfg
sed -i -e "s/^#host_key_checking = False/host_key_checking = False/" /etc/ansible/ansible.cfg
sed -i -e "s/^#pty=False/pty=False/" /etc/ansible/ansible.cfg

Shouldn't all prep scripts for nodes invoke
yum -y install ansible ??

Deployment failed: GlusterFS not starting on CNS-nodes

Describe the bug
Deployment of the scripts fails on all CNS nodes due to the CNS Containers being unable to start. The only output we can see is: env variable is set. Update in gluster-blockd.service.

Checking the pods showed:

[occlusteradmin@poc-azure-gluster-master-0 ~]$ date
Fr 15. Feb 18:09:13 UTC 2019
[occlusteradmin@poc-azure-gluster-master-0 ~]$ oc get pods
NAME                      READY     STATUS    RESTARTS   AGE
glusterfs-storage-9mzxn   0/1       Running   2          47m
glusterfs-storage-k2pwh   0/1       Running   2          47m
glusterfs-storage-vn59v   0/1       Running   2          47m

We suspected the kernel to be not compatible with the GlusterFS version.

Kernel version of the nodes:

[root@poc-azure-gluster-cns-2 occlusteradmin]# date
Fr 15. Feb 18:11:56 UTC 2019
[root@poc-azure-gluster-cns-2 occlusteradmin]# uname -r
3.10.0-862.11.6.el7.x86_64

We did an upgrade of the kernel to version:

[occlusteradmin@poc-azure-gluster-cns-2 ~]$ date
Fr 15. Feb 18:15:43 UTC 2019
[occlusteradmin@poc-azure-gluster-cns-2 ~]$ uname -r
3.10.0-957.5.1.el7.x86_64

Afterwards the pod is starting successfully:

[occlusteradmin@poc-azure-gluster-master-0 ~]$ date
Fr 15. Feb 18:16:57 UTC 2019
[occlusteradmin@poc-azure-gluster-master-0 ~]$ oc get pods -o wide
NAME                      READY     STATUS    RESTARTS   AGE       IP         NODE                      NOMINATED NODE
glusterfs-storage-9mzxn   0/1       Running   2          1h        10.3.0.9   poc-azure-gluster-cns-0   <none>
glusterfs-storage-k2pwh   1/1       Running   3          1h        10.3.0.4   poc-azure-gluster-cns-2   <none>
glusterfs-storage-vn59v   0/1       Running   2          1h        10.3.0.7   poc-azure-gluster-cns-1   <none>

stdout
stderr
stdout

Template Information (please complete the following information):

  • Branch: master

Additional context
To deploy the scripts we used these parameters

Memory and CPU prerequisites

I've noted that memory and CPU prerequisites are ignored, any special reason for this?
Seems to me, minimum node spec should be Standard_DS3_v2 and minimum master spec should be Standard_DS4_v2.

Randomly master-2 only has 32GB of disk

master-0 and master-1, but occasionally master-2 only has 32GB assigned and it causes the deployment to fail. infra and nodes are fine too. It seems to be specific to master-2.

I don't know how this would happen since it is a loop. I've seen it three times in the last few days of testing.

CHECK [memory_availability : ocpcluster-master-2] ****************************** fatal: [ocpcluster-master-2]: FAILED! => {"changed": true, "checks": {"disk_availability": {"failed": true, "failures": [["OpenShiftCheckException", "Available disk space in \"/var\" (31.2 GB) is below minimum recommended (40.0 GB)"]], "msg": "Available disk space in \"/var\" (31.2 GB) is below minimum recommended (40.0 GB)"}, "docker_image_availability": {"skipped": true, "skipped_reason": "Disabled by user request"}, "docker_storage": {"changed": true, "data_pct_used": 0.015188001463989655, "data_threshold": 90.0, "data_total": "54.69 GB", "data_total_bytes": 58722940354.56, "data_used": "20.45 MB", "data_used_bytes": 21443379.2, "metadata_pct_used": 9.139437777983693e-05, "metadata_threshold": 90.0, "metadata_total": "138.4 MB", "metadata_total_bytes": 145122918.4, "metadata_used": "73.73 kB", "metadata_used_bytes": 75499.52, "msg": "Thinpool usage is within thresholds.", "vg_free": "76.80g", "vg_free_bytes": 82463372083.2}, "memory_availability": {"skipped": true, "skipped_reason": "Disabled by user request"}, "package_availability": {"changed": false, "invocation": {"module_args": {"packages": ["PyYAML", "atomic-openshift", "atomic-openshift-clients", "atomic-openshift-master", "atomic-openshift-node", "atomic-openshift-sdn-ovs", "bash-completion", "bind", "ceph-common", "cockpit-bridge", "cockpit-docker", "cockpit-system", "cockpit-ws", "dnsmasq", "docker", "etcd", "firewalld", "flannel", "glusterfs-fuse", "httpd-tools", "iptables", "iptables-services", "iscsi-initiator-utils", "libselinux-python", "nfs-utils", "ntp", "openssl", "pyparted", "python-httplib2", "yum-utils"]}}}, "package_version": {"changed": false, "invocation": {"module_args": {"package_list": [{"check_multi": false, "name": "openvswitch", "version": ["2.6", "2.7", "2.8"]}, {"check_multi": false, "name": "docker", "version": "1.12"}, {"check_multi": true, "name": "atomic-openshift", "version": "3.7"}, {"check_multi": true, "name": "atomic-openshift-master", "version": "3.7"}, {"check_multi": true, "name": "atomic-openshift-node", "version": "3.7"}], "package_mgr": "yum"}}}}, "msg": "One or more checks failed", "playbook_context": "install"}

logging installation on 3.7 branch fails with azure cloud provider enabled

When using the 3.7 branch with Azure cloud provider enabled, the installation of the logging component fails due to the PVC for elasticsearch containing spec selectors.
openshift_logging_storage_labels={'storage': 'logging'}

Label selectors are not supported by azure-disk see kubernetes/kubernetes#58413 and as such for the installation to complete the line above should be commented out. I see that it is indeed commented on master by not on the release-3.7 branch.

Setup of Container Native Storage

Hey there,

What do you think about a pull request which adds support for running OpenShift CNS (Container Native Storage)?

More info here: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/container-native_storage_for_openshift_container_platform/

TL;DR: Tight integration with Gluster based software defined storage. Provisioning, setup, configuration of persistent storage for pods happens automatically. Communication goes via a heketi pod to gluster nodes which also can be run as pods on specific nodes where you have your dedicated disk devices. Gluster then takes responsibility of replicating the data on the local disk across the different CNS nodes.

The initial configuration I have in mind is re-using the infra nodes to deploy Gluster and Heketi pods on - as it doesn't require much additional work. It's also a supported configuration of CNS.

Hitting the following error

While deploying via the portal with all default parameters(logging, azure cloud, metrics and tower as false)

Hitting the following error:
{"code":"DeploymentFailed","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.","details":[{"code":"Conflict","message":"{\r\n \"status\": \"Failed\",\r\n \"error\": {\r\n \"code\": \"ResourceDeploymentFailure\",\r\n \"message\": \"The resource operation completed with terminal provisioning state 'Failed'.\",\r\n \"details\": [\r\n {\r\n \"code\": \"DeploymentFailed\",\r\n \"message\": \"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.\",\r\n \"details\": [\r\n {\r\n \"code\": \"Conflict\",\r\n \"message\": \"{\\r\\n \\\"status\\\": \\\"Failed\\\",\\r\\n \\\"error\\\": {\\r\\n \\\"code\\\": \\\"ResourceDeploymentFailure\\\",\\r\\n \\\"message\\\": \\\"The resource operation completed with terminal provisioning state 'Failed'.\\\",\\r\\n \\\"details\\\": [\\r\\n {\\r\\n \\\"code\\\": \\\"VMExtensionProvisioningError\\\",\\r\\n \\\"message\\\": \\\"VM has reported a failure when processing extension 'deployOpenShift'. Error message: \\\\\\\"Enable failed: failed to execute command: command terminated with exit status=2\\\\n[stdout]\\\\ning (24 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (23 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (22 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (21 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (20 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (19 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (18 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (17 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (16 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (15 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (14 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (13 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (12 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (11 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (10 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (9 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (8 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (7 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (6 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (5 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (4 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (3 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (2 retries left).\\\\nFAILED - RETRYING: Verify that TSB is running (1 retries left).\\\\nfatal: [mycluster-master-0]: FAILED! => {\\\\\\\"attempts\\\\\\\": 120, \\\\\\\"changed\\\\\\\": false, \\\\\\\"cmd\\\\\\\": [\\\\\\\"curl\\\\\\\", \\\\\\\"-k\\\\\\\", \\\\\\\"https://apiserver.openshift-template-service-broker.svc/healthz\\\\\\\"], \\\\\\\"delta\\\\\\\": \\\\\\\"0:00:01.023344\\\\\\\", \\\\\\\"end\\\\\\\": \\\\\\\"2018-02-07 10:03:31.482913\\\\\\\", \\\\\\\"msg\\\\\\\": \\\\\\\"non-zero return code\\\\\\\", \\\\\\\"rc\\\\\\\": 7, \\\\\\\"start\\\\\\\": \\\\\\\"2018-02-07 10:03:30.459569\\\\\\\", \\\\\\\"stderr\\\\\\\": \\\\\\\" % Total % Received % Xferd Average Speed Time Time Time Current\\\\\\\\n Dload Upload Total Spent Left Speed\\\\\\\\n\\\\\\\\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\\\\\\\\r 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0curl: (7) Failed connect to apiserver.openshift-template-service-broker.svc:443; Connection refused\\\\\\\", \\\\\\\"stderr_lines\\\\\\\": [\\\\\\\" % Total % Received % Xferd Average Speed Time Time Time Current\\\\\\\", \\\\\\\" Dload Upload Total Spent Left Speed\\\\\\\", \\\\\\\"\\\\\\\", \\\\\\\" 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\\\\\\\", \\\\\\\" 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0curl: (7) Failed connect to apiserver.openshift-template-service-broker.svc:443; Connection refused\\\\\\\"], \\\\\\\"stdout\\\\\\\": \\\\\\\"\\\\\\\", \\\\\\\"stdout_lines\\\\\\\": []}\\\\n\\\\nPLAY RECAP *********************************************************************\\\\nlocalhost : ok=12 changed=0 unreachable=0 failed=0 \\\\nmycluster-infra-0 : ok=192 changed=64 unreachable=0 failed=0 \\\\nmycluster-infra-1 : ok=192 changed=64 unreachable=0 failed=0 \\\\nmycluster-master-0 : ok=655 changed=270 unreachable=0 failed=1 \\\\nmycluster-master-1 : ok=421 changed=153 unreachable=0 failed=0 \\\\nmycluster-master-2 : ok=421 changed=153 unreachable=0 failed=0 \\\\nmycluster-node-0 : ok=192 changed=64 unreachable=0 failed=0 \\\\nmycluster-node-1 : ok=192 changed=64 unreachable=0 failed=0 \\\\n\\\\n\\\\nINSTALLER STATUS ***************************************************************\\\\nInitialization : Complete\\\\nHealth Check : Complete\\\\netcd Install : Complete\\\\nMaster Install : Complete\\\\nMaster Additional Install : Complete\\\\nNode Install : Complete\\\\nHosted Install : Complete\\\\nService Catalog Install : In Progress\\\\n\\\\tThis phase can be restarted by running: playbooks/byo/openshift-cluster/service-catalog.yml\\\\n\\\\n\\\\n\\\\nFailure summary:\\\\n\\\\n\\\\n 1. Hosts: mycluster-master-0\\\\n Play: Service Catalog\\\\n Task: Verify that TSB is running\\\\n Message: non-zero return code\\\\n\\\\n[stderr]\\\\n % Total % Received % Xferd Average Speed Time Time Time Current\\\\n Dload Upload Total Spent Left Speed\\\\n\\\\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\\\\r100 6 100 6 0 0 593 0 --:--:-- --:--:-- --:--:-- 666\\\\n[DEPRECATION WARNING]: 'include' for playbook includes. You should use \\\\n'import_playbook' instead. This feature will be removed in version 2.8. \\\\nDeprecation warnings can be disabled by setting deprecation_warnings=False in \\\\nansible.cfg.\\\\n[DEPRECATION WARNING]: 'include' for playbook includes. You should use \\\\n'import_playbook' instead. This feature will be removed in version 2.8. \\\\nDeprecation warnings can be disabled by setting deprecation_warnings=False in \\\\nansible.cfg.\\\\n[DEPRECATION WARNING]: The use of 'static' for 'include_role' has been \\\\ndeprecated. Use 'import_role' for static inclusion, or 'include_role' for \\\\ndynamic inclusion. This feature will be removed in a future release. \\\\nDeprecation warnings can be disabled by setting deprecation_warnings=False in \\\\nansible.cfg.\\\\n[DEPRECATION WARNING]: The use of 'include' for tasks has been deprecated. Use \\\\n'import_tasks' for static inclusions or 'include_tasks' for dynamic inclusions.\\\\n This feature will be removed in a future release. Deprecation warnings can be \\\\ndisabled by setting deprecation_warnings=False in ansible.cfg.\\\\n[DEPRECATION WARNING]: include is kept for backwards compatibility but usage is\\\\n discouraged. The module documentation details page may explain more about this\\\\n rationale.. This feature will be removed in a future release. Deprecation \\\\nwarnings can be disabled by setting deprecation_warnings=False in ansible.cfg.\\\\n[DEPRECATION WARNING]: The use of 'static' has been deprecated. Use \\\\n'import_tasks' for static inclusion, or 'include_tasks' for dynamic inclusion. \\\\nThis feature will be removed in a future release. Deprecation warnings can be \\\\ndisabled by setting deprecation_warnings=False in ansible.cfg.\\\\n [WARNING]: Could not match supplied host pattern, ignoring: oo_all_hosts\\\\n [WARNING]: Could not match supplied host pattern, ignoring: oo_lb_to_config\\\\n [WARNING]: Could not match supplied host pattern, ignoring: oo_nfs_to_config\\\\n [WARNING]: Consider using yum, dnf or zypper module rather than running rpm\\\\n [WARNING]: Consider using unarchive module rather than running tar\\\\n [WARNING]: Consider using get_url or uri module rather than running curl\\\\n [WARNING]: Could not match supplied host pattern, ignoring:\\\\noo_containerized_master_nodes\\\\n [WARNING]: Could not match supplied host pattern, ignoring:\\\\noo_nodes_use_flannel\\\\n [WARNING]: Could not match supplied host pattern, ignoring:\\\\noo_nodes_use_calico\\\\n [WARNING]: Could not match supplied host pattern, ignoring:\\\\noo_nodes_use_contiv\\\\n [WARNING]: Could not match supplied host pattern, ignoring: oo_nodes_use_kuryr\\\\n [WARNING]: Could not match supplied host pattern, ignoring: oo_nodes_use_nuage\\\\n [WARNING]: Could not match supplied host pattern, ignoring: glusterfs\\\\n [WARNING]: Could not match supplied host pattern, ignoring: glusterfs_registry\\\\n [WARNING]: Module did not set no_log for stats_password\\\\n [WARNING]: Module did not set no_log for external_host_password\\\\n [WARNING]: Could not create retry file '/usr/share/ansible/openshift-\\\\nansible/playbooks/byo/config.retry'. [Errno 13] Permission denied:\\\\nu'/usr/share/ansible/openshift-ansible/playbooks/byo/config.retry'\\\\n\\\\\\\".\\\"\\r\\n }\\r\\n ]\\r\\n }\\r\\n}\"\r\n }\r\n ]\r\n }\r\n ]\r\n }\r\n}"}]}

failed to mount azure disks

Deployment failed to start Logging/metrics pods
While the azure storageclass is able to create the blobs on azure storage account, disks aren't discovered/mounted

Unable to mount volumes for pod "hawkular-cassandra-1-96ps3_openshift-infra(6fce6fb3-eb0a-11e7-bcd0-000d3a23c3e8)": timeout expired waiting for volumes to attach/mount for pod "openshift-infra"/"hawkular-cassandra-1-96ps3". list of unattached/unmounted volumes=[cassandra-data]

Running scope as unit run-84014.scope. mount: special device /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/azure-disk/mounts/kubernetes-dynamic-pvc-69d650dd-eb0a-11e7-84c4-000d3a23c124.vhd does not exist

Deployment keeps failing if system is already registered

Hi,

it seems that Azure deployment keeps failing also after the PR #75.

In file

openshift-container-platform/scripts/bastionPrep.sh

and the other preparation scripts:

subscription-manager register --username="$USERNAME_ORG" --password="$PASSWORD_ACT_KEY" || subscription-manager register --activationkey="$PASSWORD_ACT_KEY" --org="$USERNAME_ORG"

if [ $? -eq 0 ]
then
   echo "Subscribed successfully"
elif [ $? -eq 64 ]
then
           echo "This system is already registered."
else
   echo "Incorrect Username / Password or Organization ID / Activation Key specified"
   exit 3
fi

The "$?" value is consumed by the first IF, it should be saved in a variable to keep checking it in a IF/ELSIF construct.

I've also checked that the next IF will not match always the exact string (sometimes subscription-manager will output more than a string, for example while re-generating certificates):


subscription-manager attach --pool=$POOL_ID > attach.log
if [ $? -eq 0 ]
then
   echo "Pool attached successfully"
else
   evaluate=$( cut -f 2-5 -d ' ' attach.log )
   if [[ $evaluate == "unit has already had" ]]
      then
         echo "Pool $POOL_ID was already attached and was not attached again."
	  else
         echo "Incorrect Pool ID or no entitlements available"
         exit 4
   fi
fi

I'll submit a PR for fixing these issues.

Thanks.

Deployment Failure

Deployed as following:

  1. docker run -ti docker4x/create-sp-azure openshiftsp

Your access credentials ================================================== AD ServicePrincipal App ID: xxxxxx AD ServicePrincipal App Secret: xxxxxx AD ServicePrincipal Tenant ID: xxxxxx

  1. az group create -n openshiftkvrg -l 'East US' && az keyvault create -n openshiftprivkeys -g openshiftkvrg -l 'East US' --enabled-for-template-deployment true && az keyvault secret set --vault-name <<vault>> -n <<secret>> --file id_rsa

Deploying total 10 -> 1bastion/3cns/3masters/2infra/1node

  1. az group create -l eastus -n openshiftclusterrg && az group deployment create -g openshiftclusterrg -n openshiftcluster --template-uri https://raw.githubusercontent.com/microsoft/openshift-container-platform/master/azuredeploy.json --parameters "{\"openshiftPassword\":{\"value\":\"<<password>>\"},\"enableMetrics\":{\"value\":\"true\"},\"enableLogging\":{\"value\":\"true\"},\"keyVaultResourceGroup\":{\"value\":\"openshiftkvrg\"},\"sshPublicKey\":{\"value\":\"<<ssh public key>>\"},\"infraInstanceCount\":{\"value\": 2},\"nodeInstanceCount\":{\"value\": 1},\"rhsmUsernameOrOrgId\":{\"value\":\"<<rhsm username>>\"},\"rhsmPasswordOrActivationKey\":{\"value\":\"<<rhsm password>>"},\"rhsmPoolId\":{\"value\":\"<<xxxxxx>>"},\"keyVaultName\":{\"value\":\"<<vaultname>>\"},\"keyVaultSecret\":{\"value\":\"<<secret>>\"},\"aadClientId\":{\"value\":\"<<xxxxxx>>\"},\"aadClientSecret\":{\"value\":\"<<xxxxxx>>\"}}" --debug

Hitting the following error:

{"code":"DeploymentFailed","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.","details":[{"code":"Conflict","message":"{\r\n \"status\": \"Failed\",\r\n \"error\": {\r\n \"code\": \"ResourceDeploymentFailure\",\r\n \"message\": \"The resource operation completed with terminal provisioning state 'Failed'.\",\r\n \"details\": [\r\n {\r\n \"code\": \"VMExtensionProvisioningError\",\r\n \"message\": \"VM has reported a failure when processing extension 'deployOpenShift'. Error message: \\\"Enable failed: failed to execute command: command terminated with exit status=2\\n[stdout]\\nodeSelector\\\": {\\\"glusterfs\\\": \\\"storage-host\\\"}, \\\"restartPolicy\\\": \\\"Always\\\", \\\"schedulerName\\\": \\\"default-scheduler\\\", \\\"securityContext\\\": {}, \\\"serviceAccount\\\": \\\"default\\\", \\\"serviceAccountName\\\": \\\"default\\\", \\\"terminationGracePeriodSeconds\\\": 30, \\\"tolerations\\\": [{\\\"effect\\\": \\\"NoSchedule\\\", \\\"key\\\": \\\"node.kubernetes.io/memory-pressure\\\", \\\"operator\\\": \\\"Exists\\\"}, {\\\"effect\\\": \\\"NoExecute\\\", \\\"key\\\": \\\"node.kubernetes.io/not-ready\\\", \\\"operator\\\": \\\"Exists\\\"}, {\\\"effect\\\": \\\"NoExecute\\\", \\\"key\\\": \\\"node.kubernetes.io/unreachable\\\", \\\"operator\\\": \\\"Exists\\\"}, {\\\"effect\\\": \\\"NoSchedule\\\", \\\"key\\\": \\\"node.kubernetes.io/disk-pressure\\\", \\\"operator\\\": \\\"Exists\\\"}], \\\"volumes\\\": [{\\\"hostPath\\\": {\\\"path\\\": \\\"/var/lib/heketi\\\", \\\"type\\\": \\\"\\\"}, \\\"name\\\": \\\"glusterfs-heketi\\\"}, {\\\"emptyDir\\\": {}, \\\"name\\\": \\\"glusterfs-run\\\"}, {\\\"hostPath\\\": {\\\"path\\\": \\\"/run/lvm\\\", \\\"type\\\": \\\"\\\"}, \\\"name\\\": \\\"glusterfs-lvm\\\"}, {\\\"hostPath\\\": {\\\"path\\\": \\\"/etc/glusterfs\\\", \\\"type\\\": \\\"\\\"}, \\\"name\\\": \\\"glusterfs-etc\\\"}, {\\\"hostPath\\\": {\\\"path\\\": \\\"/var/log/glusterfs\\\", \\\"type\\\": \\\"\\\"}, \\\"name\\\": \\\"glusterfs-logs\\\"}, {\\\"hostPath\\\": {\\\"path\\\": \\\"/var/lib/glusterd\\\", \\\"type\\\": \\\"\\\"}, \\\"name\\\": \\\"glusterfs-config\\\"}, {\\\"hostPath\\\": {\\\"path\\\": \\\"/dev\\\", \\\"type\\\": \\\"\\\"}, \\\"name\\\": \\\"glusterfs-dev\\\"}, {\\\"hostPath\\\": {\\\"path\\\": \\\"/var/lib/misc/glusterfsd\\\", \\\"type\\\": \\\"\\\"}, \\\"name\\\": \\\"glusterfs-misc\\\"}, {\\\"hostPath\\\": {\\\"path\\\": \\\"/sys/fs/cgroup\\\", \\\"type\\\": \\\"\\\"}, \\\"name\\\": \\\"glusterfs-cgroup\\\"}, {\\\"hostPath\\\": {\\\"path\\\": \\\"/etc/ssl\\\", \\\"type\\\": \\\"\\\"}, \\\"name\\\": \\\"glusterfs-ssl\\\"}, {\\\"hostPath\\\": {\\\"path\\\": \\\"/usr/lib/modules\\\", \\\"type\\\": \\\"\\\"}, \\\"name\\\": \\\"kernel-modules\\\"}, {\\\"hostPath\\\": {\\\"path\\\": \\\"/etc/target\\\", \\\"type\\\": \\\"\\\"}, \\\"name\\\": \\\"glusterfs-target\\\"}, {\\\"name\\\": \\\"default-token-4lzcn\\\", \\\"secret\\\": {\\\"defaultMode\\\": 420, \\\"secretName\\\": \\\"default-token-4lzcn\\\"}}]}, \\\"status\\\": {\\\"conditions\\\": [{\\\"lastProbeTime\\\": null, \\\"lastTransitionTime\\\": \\\"2018-06-07T14:31:17Z\\\", \\\"status\\\": \\\"True\\\", \\\"type\\\": \\\"Initialized\\\"}, {\\\"lastProbeTime\\\": null, \\\"lastTransitionTime\\\": \\\"2018-06-07T14:31:17Z\\\", \\\"message\\\": \\\"containers with unready status: [glusterfs]\\\", \\\"reason\\\": \\\"ContainersNotReady\\\", \\\"status\\\": \\\"False\\\", \\\"type\\\": \\\"Ready\\\"}, {\\\"lastProbeTime\\\": null, \\\"lastTransitionTime\\\": \\\"2018-06-07T14:31:29Z\\\", \\\"status\\\": \\\"True\\\", \\\"type\\\": \\\"PodScheduled\\\"}], \\\"containerStatuses\\\": [{\\\"image\\\": \\\"rhgs3/rhgs-server-rhel7:latest\\\", \\\"imageID\\\": \\\"\\\", \\\"lastState\\\": {}, \\\"name\\\": \\\"glusterfs\\\", \\\"ready\\\": false, \\\"restartCount\\\": 0, \\\"state\\\": {\\\"waiting\\\": {\\\"reason\\\": \\\"ContainerCreating\\\"}}}], \\\"hostIP\\\": \\\"10.1.0.8\\\", \\\"phase\\\": \\\"Pending\\\", \\\"podIP\\\": \\\"10.1.0.8\\\", \\\"qosClass\\\": \\\"Burstable\\\", \\\"startTime\\\": \\\"2018-06-07T14:31:17Z\\\"}}], \\\"kind\\\": \\\"List\\\", \\\"metadata\\\": {\\\"resourceVersion\\\": \\\"\\\", \\\"selfLink\\\": \\\"\\\"}}], \\\"returncode\\\": 0}, \\\"state\\\": \\\"list\\\"}\\n\\nPLAY RECAP *********************************************************************\\nlocalhost : ok=13 changed=0 unreachable=0 failed=0 \\nmycluster-cns-0 : ok=146 changed=58 unreachable=0 failed=0 \\nmycluster-cns-1 : ok=146 changed=58 unreachable=0 failed=0 \\nmycluster-cns-2 : ok=146 changed=58 unreachable=0 failed=0 \\nmycluster-infra-0 : ok=143 changed=56 unreachable=0 failed=0 \\nmycluster-infra-1 : ok=143 changed=56 unreachable=0 failed=0 \\nmycluster-master-0 : ok=479 changed=188 unreachable=0 failed=1 \\nmycluster-master-1 : ok=349 changed=142 unreachable=0 failed=0 \\nmycluster-master-2 : ok=349 changed=141 unreachable=0 failed=0 \\nmycluster-node-0 : ok=143 changed=56 unreachable=0 failed=0 \\n\\n\\nINSTALLER STATUS ***************************************************************\\nInitialization : Complete (0:00:47)\\nHealth Check : Complete (0:01:47)\\netcd Install : Complete (0:02:51)\\nMaster Install : Complete (0:07:28)\\nMaster Additional Install : Complete (0:05:00)\\nNode Install : Complete (0:13:36)\\nGlusterFS Install : In Progress (0:05:55)\\n\\tThis phase can be restarted by running: playbooks/openshift-glusterfs/config.yml\\n\\n\\n\\nFailure summary:\\n\\n\\n 1. Hosts: mycluster-master-0\\n Play: Configure GlusterFS\\n Task: Wait for GlusterFS pods\\n Message: Failed without returning a message.\\n\\n[stderr]\\n % Total % Received % Xferd Average Speed Time Time Time Current\\n Dload Upload Total Spent Left Speed\\n\\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\\r100 6 100 6 0 0 470 0 --:--:-- --:--:-- --:--:-- 500\\n# mycluster-cns-0:22 SSH-2.0-OpenSSH_7.4\\n# mycluster-cns-0:22 SSH-2.0-OpenSSH_7.4\\n# mycluster-cns-0:22 SSH-2.0-OpenSSH_7.4\\nWarning: Permanently added the ECDSA host key for IP address '10.1.0.8' to the list of known hosts.\\r\\n# mycluster-cns-1:22 SSH-2.0-OpenSSH_7.4\\n# mycluster-cns-1:22 SSH-2.0-OpenSSH_7.4\\n# mycluster-cns-1:22 SSH-2.0-OpenSSH_7.4\\nWarning: Permanently added the ECDSA host key for IP address '10.1.0.9' to the list of known hosts.\\r\\n# mycluster-cns-2:22 SSH-2.0-OpenSSH_7.4\\n# mycluster-cns-2:22 SSH-2.0-OpenSSH_7.4\\n# mycluster-cns-2:22 SSH-2.0-OpenSSH_7.4\\nWarning: Permanently added the ECDSA host key for IP address '10.1.0.10' to the list of known hosts.\\r\\n [WARNING]: Could not match supplied host pattern, ignoring: oo_lb_to_config\\n [WARNING]: Could not match supplied host pattern, ignoring: oo_nfs_to_config\\n [WARNING]: Could not match supplied host pattern, ignoring:\\noo_hosts_containerized_managed_true\\n [WARNING]: Could not match supplied host pattern, ignoring: oo_lb_to_config\\n [WARNING]: Could not match supplied host pattern, ignoring: oo_nfs_to_config\\n [WARNING]: Consider using yum, dnf or zypper module rather than running rpm\\n [WARNING]: Consider using file module with mode rather than running chmod\\n [WARNING]: Consider using unarchive module rather than running tar\\n [WARNING]: Consider using get_url or uri module rather than running curl\\n [WARNING]: Could not match supplied host pattern, ignoring:\\noo_containerized_master_nodes\\n [WARNING]: Could not match supplied host pattern, ignoring:\\noo_nodes_use_flannel\\n [WARNING]: Could not match supplied host pattern, ignoring:\\noo_nodes_use_calico\\n [WARNING]: Could not match supplied host pattern, ignoring:\\noo_nodes_use_contiv\\n [WARNING]: Could not match supplied host pattern, ignoring: oo_nodes_use_kuryr\\n [WARNING]: Could not match supplied host pattern, ignoring: oo_nodes_use_nuage\\n [WARNING]: Could not match supplied host pattern, ignoring: glusterfs_registry\\n [WARNING]: Could not create retry file '/usr/share/ansible/openshift-\\nansible/playbooks/deploy_cluster.retry'. [Errno 13] Permission denied:\\nu'/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry'\\n\\\".\"\r\n }\r\n ]\r\n }\r\n}"}]}

Deployment failed on azure

Trying new deployment on azure .. with
Enable Metrices : True
Enable Logging : True
Enable Cockbit : True
Enable Azure : True
And all other default values

Deployment failed with error : failed to start atomic-openshift-node.service on the infra-1 node.
below the error on azure side
[mycluster-infra-1]\n\nTASK [populate /etc/azure/azure.conf] ******************************************\nchanged: [mycluster-infra-1]\n\nTASK [insert the azure disk config into the node] ******************************\nchanged: [mycluster-infra-1] => (item={u'value': [u'/etc/azure/azure.conf'], u'key': u'kubeletArguments.cloud-config'})\nchanged: [mycluster-infra-1] => (item={u'value': [u'azure'], u'key': u'kubeletArguments.cloud-provider'})\n\nTASK [delete the node so it can recreate itself] *******************************\nchanged: [mycluster-infra-1 -> mycluster-bastion]\n\nTASK [sleep to let node come back to life] *************************************\nPausing for 90 seconds\n(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)\r\nok: [mycluster-infra-1]\n\nRUNNING HANDLER [restart atomic-openshift-node] ********************************\nfatal: [mycluster-infra-1]: FAILED! => {\n "changed": false, \n "failed": true\n}\n\nMSG:\n\nUnable to restart service atomic-openshift-node: Job for atomic-openshift-node.service failed because the control process exited with error code. See "systemctl status atomic-openshift-node.service" and "journalctl -xe" for details.\n\n\tto retry, use: --limit @/home/ocpadmin/setup-azure-node.retry\n\nPLAY RECAP

on node infra-1 >>> #journalctl -xe
Dec 27 13:21:40 mycluster-infra-1 atomic-openshift-node[76756]: I1227 13:21:40.505400 76756 server.go:127] Starting to listen on 0.0.0.0:10250
Dec 27 13:21:40 mycluster-infra-1 atomic-openshift-node[76756]: E1227 13:21:40.508422 76756 kubelet.go:1170] Image garbage collection failed: unable to find data for container /
Dec 27 13:21:40 mycluster-infra-1 atomic-openshift-node[76756]: I1227 13:21:40.508584 76756 kubelet_node_status.go:253] Setting node annotation to enable volume controller attach/detach
Dec 27 13:21:40 mycluster-infra-1 atomic-openshift-node[76756]: I1227 13:21:40.518859 76756 server.go:298] Adding debug handlers to kubelet server.
Dec 27 13:21:40 mycluster-infra-1 atomic-openshift-node[76756]: W1227 13:21:40.547310 76756 sdn_controller.go:38] Could not find an allocated subnet for node: mycluster-infra-1, Waiting...
Dec 27 13:21:40 mycluster-infra-1 atomic-openshift-node[76756]: W1227 13:21:40.750448 76756 sdn_controller.go:38] Could not find an allocated subnet for node: mycluster-infra-1, Waiting...
Dec 27 13:21:41 mycluster-infra-1 atomic-openshift-node[76756]: W1227 13:21:41.153590 76756 sdn_controller.go:38] Could not find an allocated subnet for node: mycluster-infra-1, Waiting...
Dec 27 13:21:41 mycluster-infra-1 atomic-openshift-node[76756]: W1227 13:21:41.956885 76756 sdn_controller.go:38] Could not find an allocated subnet for node: mycluster-infra-1, Waiting...
Dec 27 13:21:43 mycluster-infra-1 atomic-openshift-node[76756]: W1227 13:21:43.560179 76756 sdn_controller.go:38] Could not find an allocated subnet for node: mycluster-infra-1, Waiting...

azuredeploy.json only provide one RHSM PoolId but mine subscription has two

I have to PoolId's both are RH Openshift Enterprise.
One is the "Red Hat OpenShift Container Platform Broker/Master Infrastructure" and the other is
"Red Hat OpenShift Container Platform, Standard, 2-Core".
I doesn't matter which one i use, in the error logging it state
"Incorrect Pool ID or no entitlements available"
I want 9 system connect to the 2-core subscription and the others to the infrastructure subscription.
Hope you can help

fix samples.json with proper github URL

See https://github.com/Microsoft/openshift-container-platform/blob/master/azuredeploy.parameters.json
and https://github.com/Microsoft/openshift-container-platform/blob/master/azuredeploy.parameters.sample.json
It contains
"_artifactsLocation": {
"value": "https://raw.githubusercontent.com/haroldwongms/openshift-containerplatform/master"
},

In that case, the deployment will fail complaining about diagStorageAccount and Role not being valid inputs in the template.

Even though both files are just samples, for the sake of first time users, please update the github URL to point to this repo ("https://raw.githubusercontent.com/Microsoft/openshift-container-platform/master") or delete the variable altogether. Otherwise, the error is very esoteric and hard to troubleshoot.

Thanks

OpenShiftDeployment stage failing

My deployment is failing on the "Enabling a static-website in the web storage account" step of the deployOpenShift.sh script. The error message is:

az storage blob service-properties update: error: Storage account 'XXXXXXXXXX' not found.

However, the storage account is definitely there and when I run the following command, it seems to work fine:

az storage blob service-properties update --account-name XXXXXXXXXXXX --static-website
{
  "cors": [],
  "deleteRetentionPolicy": {
    "days": null,
    "enabled": false
  },
  "hourMetrics": {
    "enabled": true,
    "includeApis": true,
    "retentionPolicy": {
      "days": 7,
      "enabled": true
    },
    "version": "1.0"
  },
  "logging": {
    "delete": false,
    "read": false,
    "retentionPolicy": {
      "days": null,
      "enabled": false
    },
    "version": "1.0",
    "write": false
  },
  "minuteMetrics": {
    "enabled": false,
    "includeApis": null,
    "retentionPolicy": {
      "days": null,
      "enabled": false
    },
    "version": "1.0"
  },
  "staticWebsite": {
    "enabled": true,
    "errorDocument_404Path": null,
    "indexDocument": null
  }
}

Development torwards OCP 3.7?

Hi all,

OpenShift Container Platform 3.7 is out. Should all new development then perhaps be geared against 3.7? I'm myself porting my own stuff to 3.7 now. If you like I can create a pull request based on my 3.7 findings.

/root/.ssh/id_rsa: No such file or directory

Got this error

[root@ocpcluster-bastion 1]# cat /var/lib/waagent/custom-script/download/1/stderr
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100     6  100     6    0     0    988      0 --:--:-- --:--:-- --:--:--  1200
sed: can't read /etc/ansible/ansible.cfg: No such file or directory
{"status":"Failed","error":{"code":"ResourceDeploymentFailure","message":"The resource operation completed with terminal provisioning state 'Failed'.","details":[{"code":"DeploymentFailed","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.","details":[{"code":"Conflict","message":"{\r\n \"status\": \"Failed\",\r\n \"error\": {\r\n \"code\": \"ResourceDeploymentFailure\",\r\n \"message\": \"The resource operation completed with terminal provisioning state 'Failed'.\",\r\n \"details\": [\r\n {\r\n \"code\": \"VMExtensionProvisioningError\",\r\n \"message\": \"VM has reported a failure when processing extension 'deployOpenShift'. Error message: \\\"Enable failed: failed to execute command: command terminated with exit status=2\\\".\"\r\n }\r\n ]\r\n }\r\n}"}]}]}}```

Scale up/down and upgrade

Hello folks,

Is it possible to scale the deployment up/down and upgrade it, after the initial deployment has been completed? Any pointers how to do this, would be great. Thanks

OCP 3.7 cluster deployment is failing while processing "deployOpenshift"

While deploying Openshift cluster -release 3.7, it is failing with the following error message:
Can somebody look in to this issue.
It was deploying succesfully a couple of days back.

error message:

{"code":"DeploymentFailed","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.","details":[{"code":"Conflict","message":"{\r\n "status": "Failed",\r\n "error": {\r\n "code": "ResourceDeploymentFailure",\r\n "message": "The resource operation completed with terminal provisioning state 'Failed'.",\r\n "details": [\r\n {\r\n "code": "VMExtensionProvisioningError",\r\n "message": "VM has reported a failure when processing extension 'deployOpenShift'. Error message: \"Enable failed: failed to execute command: command terminated with exit status=2\n[stdout]\n: u'', u'rc': 0, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': False, u'_raw_params': u\"oc get deploymentconfig router --namespace default --config /etc/origin/master/admin.kubeconfig -o jsonpath='{ .status.latestVersion }'\", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, 'stdout_lines': [u'1'], u'start': u'2018-11-02 06:42:53.992642', '_ansible_ignore_errors': None, 'failed': False}]) => {\"attempts\": 1, \"changed\": true, \"cmd\": [\"oc\", \"get\", \"replicationcontroller\", \"router-1\", \"--namespace\", \"default\", \"--config\", \"/etc/origin/master/admin.kubeconfig\", \"-o\", \"jsonpath={ .metadata.annotations.openshift\\\\.io/deployment\\\\.phase }\"], \"delta\": \"0:00:00.217747\", \"end\": \"2018-11-02 06:42:54.924090\", \"failed_when_result\": true, \"item\": [{\"certificate\": {\"cafile\": \"/etc/origin/master/ca.crt\", \"certfile\": \"/etc/origin/master/openshift-router.crt\", \"keyfile\": \"/etc/origin/master/openshift-router.key\"}, \"edits\": [{\"action\": \"put\", \"key\": \"spec.strategy.rollingParams.intervalSeconds\", \"value\": 1}, {\"action\": \"put\", \"key\": \"spec.strategy.rollingParams.updatePeriodSeconds\", \"value\": 1}, {\"action\": \"put\", \"key\": \"spec.strategy.activeDeadlineSeconds\", \"value\": 21600}], \"images\": \"openshift3/ose-${component}:${version}\", \"name\": \"router\", \"namespace\": \"default\", \"ports\": [\"80:80\", \"443:443\"], \"replicas\": \"1\", \"selector\": \"region=infra\", \"serviceaccount\": \"router\", \"stats_port\": 1936}, {\"_ansible_ignore_errors\": null, \"_ansible_item_result\": true, \"_ansible_no_log\": false, \"_ansible_parsed\": true, \"changed\": true, \"cmd\": [\"oc\", \"get\", \"deploymentconfig\", \"router\", \"--namespace\", \"default\", \"--config\", \"/etc/origin/master/admin.kubeconfig\", \"-o\", \"jsonpath={ .status.latestVersion }\"], \"delta\": \"0:00:00.238601\", \"end\": \"2018-11-02 06:42:54.231243\", \"failed\": false, \"invocation\": {\"module_args\": {\"_raw_params\": \"oc get deploymentconfig router --namespace default --config /etc/origin/master/admin.kubeconfig -o jsonpath='{ .status.latestVersion }'\", \"_uses_shell\": false, \"chdir\": null, \"creates\": null, \"executable\": null, \"removes\": null, \"stdin\": null, \"warn\": true}}, \"item\": {\"certificate\": {\"cafile\": \"/etc/origin/master/ca.crt\", \"certfile\": \"/etc/origin/master/openshift-router.crt\", \"keyfile\": \"/etc/origin/master/openshift-router.key\"}, \"edits\": [{\"action\": \"put\", \"key\": \"spec.strategy.rollingParams.intervalSeconds\", \"value\": 1}, {\"action\": \"put\", \"key\": \"spec.strategy.rollingParams.updatePeriodSeconds\", \"value\": 1}, {\"action\": \"put\", \"key\": \"spec.strategy.activeDeadlineSeconds\", \"value\": 21600}], \"images\": \"openshift3/ose-${component}:${version}\", \"name\": \"router\", \"namespace\": \"default\", \"ports\": [\"80:80\", \"443:443\"], \"replicas\": \"1\", \"selector\": \"region=infra\", \"serviceaccount\": \"router\", \"stats_port\": 1936}, \"rc\": 0, \"start\": \"2018-11-02 06:42:53.992642\", \"stderr\": \"\", \"stderr_lines\": [], \"stdout\": \"1\", \"stdout_lines\": [\"1\"]}], \"rc\": 0, \"start\": \"2018-11-02 06:42:54.706343\", \"stderr\": \"\", \"stderr_lines\": [], \"stdout\": \"Failed\", \"stdout_lines\": [\"Failed\"]}\n\nPLAY RECAP *********************************************************************\nlocalhost : ok=12 changed=0 unreachable=0 failed=0 \nmycluster-infra-0 : ok=198 changed=67 unreachable=0 failed=0 \nmycluster-master-0 : ok=548 changed=203 unreachable=0 failed=1 \nmycluster-node-0 : ok=198 changed=67 unreachable=0 failed=0 \n\n\nINSTALLER STATUS ***************************************************************\nInitialization : Complete\nHealth Check : Complete\netcd Install : Complete\nMaster Install : Complete\nMaster Additional Install : Complete\nNode Install : Complete\nHosted Install : In Progress\n\tThis phase can be restarted by running: playbooks/byo/openshift-cluster/openshift-hosted.yml\n\n\n\nFailure summary:\n\n\n 1. Hosts: mycluster-master-0\n Play: Create Hosted Resources - router\n Task: Poll for OpenShift pod deployment success\n Message: All items completed\n\n[stderr]\n % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 6 100 6 0 0 1010 0 --:--:-- --:--:-- --:--:-- 1200\n[DEPRECATION WARNING]: 'include' for playbook includes. You should use \n'import_playbook' instead. This feature will be removed in version 2.8. \nDeprecation warnings can be disabled by setting deprecation_warnings=False in \nansible.cfg.\n[DEPRECATION WARNING]: 'include' for playbook includes. You should use \n'import_playbook' instead. This feature will be removed in version 2.8. \nDeprecation warnings can be disabled by setting deprecation_warnings=False in \nansible.cfg.\n[DEPRECATION WARNING]: The use of 'static' for 'include_role' has been \ndeprecated. Use 'import_role' for static inclusion, or 'include_role' for \ndynamic inclusion. This feature will be removed in a future release. \nDeprecation warnings can be disabled by setting deprecation_warnings=False in \nansible.cfg.\n[DEPRECATION WARNING]: The use of 'include' for tasks has been deprecated. Use \n'import_tasks' for static inclusions or 'include_tasks' for dynamic inclusions.\n This feature will be removed in a future release. Deprecation warnings can be \ndisabled by setting deprecation_warnings=False in ansible.cfg.\n[DEPRECATION WARNING]: include is kept for backwards compatibility but usage is\n discouraged. The module documentation details page may explain more about this\n rationale.. This feature will be removed in a future release. Deprecation \nwarnings can be disabled by setting deprecation_warnings=False in ansible.cfg.\n[DEPRECATION WARNING]: The use of 'static' has been deprecated. Use \n'import_tasks' for static inclusion, or 'include_tasks' for dynamic inclusion. \nThis feature will be removed in a future release. Deprecation warnings can be \ndisabled by setting deprecation_warnings=False in ansible.cfg.\n [WARNING]: Could not match supplied host pattern, ignoring: oo_all_hosts\n [WARNING]: Could not match supplied host pattern, ignoring: oo_lb_to_config\n [WARNING]: Could not match supplied host pattern, ignoring: oo_nfs_to_config\n [WARNING]: Consider using yum, dnf or zypper module rather than running rpm\n [WARNING]: Consider using get_url or uri module rather than running curl\n [WARNING]: Could not match supplied host pattern, ignoring:\noo_containerized_master_nodes\n [WARNING]: Could not match supplied host pattern, ignoring:\noo_nodes_use_flannel\n [WARNING]: Could not match supplied host pattern, ignoring:\noo_nodes_use_calico\n [WARNING]: Could not match supplied host pattern, ignoring:\noo_nodes_use_contiv\n [WARNING]: Could not match supplied host pattern, ignoring: oo_nodes_use_kuryr\n [WARNING]: Could not match supplied host pattern, ignoring: oo_nodes_use_nuage\n [WARNING]: Could not match supplied host pattern, ignoring: glusterfs\n [WARNING]: Could not match supplied host pattern, ignoring: glusterfs_registry\n [WARNING]: Module did not set no_log for stats_password\n [WARNING]: Module did not set no_log for external_host_password\n [WARNING]: Could not create retry file '/usr/share/ansible/openshift-\nansible/playbooks/byo/config.retry'. [Errno 13] Permission denied:\nu'/usr/share/ansible/openshift-ansible/playbooks/byo/config.retry'\n\"."\r\n }\r\n ]\r\n }\r\n}"}]}

installation fails due to docker 1.13

Environment:

RHEL 7.4
Openshift 3.7

Ansible installation fails on this environment because the prep scripts installed docker 1.13. As per the documentation for 3.7 here https://docs.openshift.com/container-platform/3.7/install_config/install/host_preparation.html#installing-docker the docker version should be fixed at 1.12.6. Could be an issue with atomic-openshift-docker-excluder not blocking 1.13 for some reason.

Current work around is to use the fixed version yum -y install docker-1.12.6 and disable the package version check in the deployOpenshift.sh file by updating openshift_disable_check parameter.
openshift_disable_check=memory_availability,docker_image_availability,package_version

Would create a PR for 3.7 branch if it is the only acceptable solution at the moment.

Request from customer on changing the port from default 8443 to 443.

Given than the router is running on a different host from the master(s). It would be nice if there is an option to change the Web UI port number by providing the following parameter in the inventory file.

The two parameters below would be used if you want API Server and Master running on 443 instead of 8443.

In this cluster 443 is used by router, so we cannot use 443 for master

openshift_master_api_port=443
openshift_master_console_port=443
Thanks!

Openshift role out by Ansible fails

Describe the bug
OpenShift Container Platform Deployment using Ansible can never work.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy the platform according to the documentation
  2. Results in error: The error was: 'master_publicip_fqdn' is undefined

Expected behavior
A deployed cluster using Ansible

Additional context
The problem here is that in the playbook a new play is started on the Bastion host, while all variables were set on localhost. Hence the deployment fails as the Bastion is not aware of all the facts that were set on localhost. This should be easy to fix by referencing all facts that were set in de play before as follows:

{{ hostvars['localhost']['fact'] }}

It looks like this playbook has not been tested, as it could have never worked.

MySQL or MongoDB persistent template from the OCP 3.6.1 catalog non working

Version: oc v3.6.1

I've deployed a 3.6.1 OCP cluster on Azure 2-3 weeks ago using Azure template. Deployment succeeded and cluster is up&running. I've deployed some images (like NGINX and others) and everything looks fine.
However if I deploy a MongoDB from OCP catalog (plain vanilla MongoDB) using persistent storage (PV) the deployment fails with this messagge (simlar error if I deploy MySQL or even Cassandra DB):

=> MongoDB
=> sourcing 10-check-env-vars.sh ...

| => sourcing 20-setup-wiredtiger-cache.sh ...
| => [Wed Mar 21 09:17:40] wiredTiger cacheSizeGB set to 1
| => sourcing 30-set-config-file.sh ...
| => sourcing 35-setup-default-datadir.sh ...
| ERROR: Couldn't write into /var/lib/mongodb/data
| CAUSE: current user doesn't have permissions for writing to /var/lib/mongodb/data directory
| DETAILS: current user id = 184, user groups: 997 0
| DETAILS: directory permissions: drwxr-xr-x owned by 0:1000360000, SELinux: system_u:object_r:svirt_sandbox_file_t:s0:c9,c19

Digging down the problem, I see that PV are successfully created on Azure Storage account and mounted to the OCP working nodes. If log to a specific node I see all mounted volumes.

Some required package(s) are available at a version that is higher than requested

Hi there,

The deployment of the master branch fails with the message "Some required package(s) are available at a version that is higher than requested docker-1.13.1".

In deployOpenshift.sh I added the flag package_version to openshift_disable_check, the line looks like this in my fork:
openshift_disable_check=memory_availability,docker_image_availability,package_version

With that the installation seems to work.

Or should I be using the branch 3.7 instead of master?

Cluster DNS not configured properly?

Deployed 3x3x3 master/infra/node configuration. Can login and deploy basic pod apps, however, deploying things like example CI/CD pipelines w/ Jenkins integration...
https://raw.githubusercontent.com/kenthua/nodejs-ex/demo/bluegreen-pipeline.yml
...fails to work properly. Build pipeline does not work. Looking at the log for the Jenkins pod, it's unable to reach openshift.default.svc. Issue seems to point to cluster DNS? Or is there some other basic configuration that is missing that I need to complete post deployment of the ARM template to Azure?
Saw similar behavior when trying to deploy this CoolStore demo: https://github.com/alezzandro/coolstore-microservice
The components of the demo can not deploy correctly, because once started, none of the builds can lookup the nexus svc.

Here is kubernetes/DNS configuration on OpenShift 3.5 on Azure:
oc get service kubernetes -n default -o yaml
apiVersion: v1
kind: Service
metadata:
creationTimestamp: 2017-09-25T05:36:27Z
labels:
component: apiserver
provider: kubernetes
name: kubernetes
namespace: default
resourceVersion: "170"
selfLink: /api/v1/namespaces/default/services/kubernetes
uid: 79c495ad-a1b3-11e7-ba76-000d3a931eae
spec:
clusterIP: 172.30.0.1
ports:

  • name: https
    port: 443
    protocol: TCP
    targetPort: 8443
  • name: dns
    port: 53
    protocol: UDP
    targetPort: 53
  • name: dns-tcp
    port: 53
    protocol: TCP
    targetPort: 53
    sessionAffinity: ClientIP
    type: ClusterIP
    status:
    loadBalancer: {}

[user@host]$ oc project huatest1
Already on project "huatest1" on server "https://obfuscated-mstr01.region.cloudapp.azure.com:8443".
[user@host]$ oc get pods
NAME READY STATUS RESTARTS AGE
jenkins-1-xkj7h 1/1 Running 0 2h
mongodb-1-s01pz 1/1 Running 0 2h
[user@host]$ oc rsh jenkins-1-xkj7h
sh-4.2$ curl openshift.default.svc:443
curl: (6) Could not resolve host: openshift.default.svc; Name or service not known
sh-4.2$ exit
exit

Compared to kubernetes/DNS configuration on OpenShift CDK running minishift locally...

oc get service kubernetes -n default -o yaml
apiVersion: v1
kind: Service
metadata:
creationTimestamp: 2017-09-15T22:01:01Z
labels:
component: apiserver
provider: kubernetes
name: kubernetes
namespace: default
resourceVersion: "620"
selfLink: /api/v1/namespaces/default/services/kubernetes
uid: 5c877e73-9a61-11e7-96f8-525400ab5a26
spec:
clusterIP: 172.30.0.1
ports:

  • name: https
    port: 443
    protocol: TCP
    targetPort: 8443
  • name: dns
    port: 53
    protocol: UDP
    targetPort: 8053
  • name: dns-tcp
    port: 53
    protocol: TCP
    targetPort: 8053
    sessionAffinity: ClientIP
    type: ClusterIP
    status:
    loadBalancer: {}

 [user@host]$ oc project huatest1
Now using project "huatest1" on server "https://192.168.42.205:8443".
 [user@host]$ oc get pods
NAME READY STATUS RESTARTS AGE
jenkins-1-pqzxr 1/1 Running 1 1h
mongodb-1-8q3z7 1/1 Running 1 1h
nodejs-mongodb-example-1-build 0/1 Completed 0 1h
nodejs-mongodb-example-green-1-tkqm9 1/1 Running 1 1h
  [user@host]$ oc rsh jenkins-1-pqzxr
sh-4.2$ curl openshift.default.svc:443
�����
sh-4.2$ exit
exit

HA not working after closing the first Master

/etc/origin/master/master-config.yaml (is not updated and point to the first master)
so when you close the first master all the apps stop serving (app not available)

Is it possible that activate-private-lb.31x.yaml doesnt get invoke during installation ?

Change defaultSubDomain installation failed (Azure Stack)

Before deploying OpenShift 3.9 on Azure Stack i changed the following parameters and the deployment failed. The customer want's to use their own domain for the OpenShift deployment.

defaultSubDomainType nipio to custom
defaultSubDomain changeme to apps.tmw-d.domain.nl

domain.nl is the domain of the customer is it possible to do it this way or am i missing something.

Kind Regards,
Arie

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.