Giter Site home page Giter Site logo

fusion-cloud-native's Introduction

Fusion Cloud Native on Kubernetes

This repo contains scripts for installing Fusion 5.x on Kubernetes (K8s). The scripts provide an option to create Kubernetes clusters that are suitable for demo / proof-of-concept purposes only. We assume that you’ll want to control how your production clusters are provisioned, secured, and managed, as these are typically concerns we’re not able to script for you.

Helm Chart Releases

Version Date Notes

5.9.4

2024-06-25

Fusion 5.9.4 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.12.0

2024-03-28

Fusion 5.12.0 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.9.3

2024-02-26

Fusion 5.9.3 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.11.0

2024-01-03

Fusion 5.11.0 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.9.2

2023-12-12

Fusion 5.9.2 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.10.0

2023-09-28

Fusion 5.10.0 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.9.1

2023-09-05

Fusion 5.9.1 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.9.0

2023-06-29

Fusion 5.9.0 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.8.1

2023-06-08

Fusion 5.8.1 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.8.0

2023-03-22

Fusion 5.8.0 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.7.1

2023-01-30

Fusion 5.7.1 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.7.0

2022-12-13

Fusion 5.7.0 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.6.0

2022-08-11

Fusion 5.6.0 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.5.2

2022-05-31

Fusion 5.5.2 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.5.1

2022-04-13

Fusion 5.5.1 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.5.0

2022-03-18

Fusion 5.5.0 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.4.5

2021-12-17

Fusion 5.4.5 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.4.4

2021-12-02

Fusion 5.4.4 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.4.3

2021-09-29

Fusion 5.4.3 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.4.2

2021-06-17

Fusion 5.4.2 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.4.1

2021-05-06

Fusion 5.4.1 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.4.0

2021-04-23

Fusion 5.4.0 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.3.5

2021-03-08

Fusion 5.3.5 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.3.4

2021-02-12

Fusion 5.3.4 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.3.3

2021-02-04

Fusion 5.3.3 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.3.2

2020-12-14

Fusion 5.3.2 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.3.1

2020-12-11

Fusion 5.3.1 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.3.0

2020-11-17

Fusion 5.3.0 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.2.2

2020-10-20

Fusion 5.2.2 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.1.5

2020-10-20

Fusion 5.1.5 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.2.0

2020-08-18

Fusion 5.2.0 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.1.4

2020-06-25

Fusion 5.1.4 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.1.3

2020-06-18

Fusion 5.1.3 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.1.2

2020-05-14

Fusion 5.1.2 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide. IMPORTANT: Fusion 5.1.2 does not work with Kubernetes version 1.17+ due to an issue introduced in Java 1.8.0_252 which prevents access to the K8s API service[release notes]. If you’re running K8s 1.17+, please run Fusion 5.1.1 until 5.1.3 is released.

5.1.1

2020-04-08

Fusion 5.1.1 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.1.0

2020-03-12

Fusion 5.1.0 release, see official release notes. For upgrades from previous versions of Fusion 5.0.x, see the Migration Guide.

5.0.3-4

2020-02-26

Updated query pipeline service to support filtering Solr pods by hostname regex match, e.g. lw.nodeFilter=host:search will send queries to Solr pods that contain "search" in their hostname. This is useful when using TLOG and PULL replica types where you don’t want queries to go to nodes hosting TLOG replicas and only want to target PULL replicas.

5.0.3-3

2020-02-19

Updated apiVersion to apps/v1 for the logstash statefulset to support Kubernetes v1.17+. If you’re running previous versions of the Fusion 5 Helm chart, you will need to delete the logstash statefulset before upgrading to 5.0.3-3 (or beyond); kubectl delete sts <RELEASE>-logstash. The logstash statefulset will get re-created during the upgrade; this operation does not delete the PVC, so the data will remain intact.

5.0.3-2

2020-01-23

Improved accuracy of histogram metrics reported for query pipelines. Improved the ML model service Helm chart to allow easier overriding of the Python sidecar image.

5.0.3-1

2020-01-08

Update webapps service to correctly deploy AppStudio WAR files

5.0.2

2019-12-18

Fusion 5.0.2 release, see official release notes. Please be sure to upgrade to Helm v3 for installing Fusion 5.0.2.

Please update the CHART_VERSION in the upgrade script for your cluster to point at the latest version of the Helm chart.

Prerequisites

This section covers prerequisites and background knowledge needed to help you understand the structure of this document and how the Fusion installation process works with Kubernetes.

Release Name and Namespace

Before installing Fusion, you need to choose a Kubernetes namespace to install Fusion into. Think of a K8s namespace as a virtual cluster within a physical cluster. You can install multiple instances of Fusion in the same cluster in separate namespaces. However, please do not install more than one Fusion release in the same namespace.

NOTE: All Fusion services must run in the same namespace, i.e. you should not try to split a Fusion cluster across multiple namespaces.

Use a short name for the namespace, containing only letters, digits, or dashes (no dots or underscores). The setup scripts in this repo use the namespace for the Helm release name by default.

Install Helm

Helm is a package manager for Kubernetes that helps you install and manage applications on your Kubernetes cluster. Regardless of which Kubernetes platform you’re using, you need to install helm as it is required to install Fusion for any K8s platform. On MacOS, you can do:

brew install kubernetes-helm

If you already have helm installed, make sure you’re using the latest version:

brew upgrade kubernetes-helm

For other OS, please refer to the Helm installation docs: https://helm.sh/docs/using_helm/

The Fusion helm chart requires that helm is greater than version 3.0.0; check your Helm version by running helm version --short.

Helm User Permissions

If you require that fusion is installed by a user with minimal permissions, instead of an admin user, then the role and cluster role that will have to be assigned to the user within the namespace that you wish to install fusion in are documented in the install-roles directory.

Note
When working with Kubernetes on the command-line, it’s useful to create a shell alias for kubectl, e.g.:
alias k=kubectl

To use these role in a cluster, as an admin user first create the namespace that you wish to install fusion into:

k create namespace fusion-namespace

Apply the role.yaml and cluster-role.yaml files to that namespace

k apply -f cluster-role.yaml
k config set-context --current --namespace=$NAMESPACE
k apply -f role.yaml

Then bind the rolebinding and clusterolebinding to the install user:

k create --namespace fusion-namespace rolebinding fusion-install-rolebinding --role fusion-installer --user <install_user>
k create clusterrolebinding fusion-install-rolebinding --clusterrole fusion-installer --user <install_user>

You will then be able to run the helm install command as the <install_user>

Clone fusion-cloud-native from Github

You should clone this repo from github as you’ll need to run the scripts on your local workstation:

git clone https://github.com/lucidworks/fusion-cloud-native.git

You should get into the habit of pulling this repo for the latest changes before performing any maintenance operations on your Fusion cluster to ensure you have the latest updates to the scripts.

cd fusion-cloud-native
git pull

Cloning the github repo is preferred so that you can pull in updates to the scripts, but if you are not a git user, then you can download the project: https://github.com/lucidworks/fusion-cloud-native/archive/master.zip. Once downloaded, extract the zip and cd into the fusion-cloud-native-master directory.

Google Kubernetes Engine (GKE)

The setup_f5_gke.sh script provided in this repo is strictly optional. The script is mainly to help those new to Kubernetes and/or Fusion get started quickly. If you’re already familiar with K8s, Helm, and GKE, then you can skip the script and just use Helm directly to install Fusion into an existing cluster or one you create yourself using the process described here.

Set up the Google Cloud SDK (one time only)

If you’ve already installed the gcloud command-line tools, you can skip to Create a Fusion cluster in GKE.

These steps set up your local Google Cloud SDK environment so that you’re ready to use the command-line tools to manage your Fusion deployment.

Usually, you only need to perform these setup steps once. After that, you’re ready to create a cluster.

For a nice getting started tutorial for GKE, see: Deploy an app to a GKE cluster.

How to set up the Google Cloud SDK
  1. Enable the Kubernetes Engine API.

  2. Log in to Google Cloud: gcloud auth login

  3. Set up the Google Cloud SDK:

    1. gcloud config set compute/zone <zone-name>

      If you are working with regional clusters instead of zone clusters, use gcloud config set compute/region <region-name> instead.

    2. gcloud config set core/account <email address>

    3. New GKE projects only: gcloud projects create <new-project-name>

      If you have already created a project, for example in the Google Cloud Platform console, then skip to the next step.

    4. gcloud config set project <project-name>

Make sure you install the Kubernetes command-line tool kubectl using:

gcloud components install kubectl
gcloud components update

Create a single-node demo cluster

Run the setup_f5_gke.sh script to install Fusion 5.x in a GKE cluster. To create a new, single-node demo cluster and install Fusion, simply do:

./setup_f5_gke.sh -c <cluster_name> -p <gcp_project_id> --create demo

Use the --help option to see script usage. If you want the script to create a cluster for you, then you need to pass the --create option with either demo or multi_az. If you don’t want the script to create a cluster, then you need to create a cluster before running the script and simply pass the name of the existing cluster using the -c parameter.

If you pass --create demo to the script, then we create a single node GKE cluster (defaults to using n1-standard-8 node type). The minimum node type you’ll need for a 1 node cluster is an n1-standard-8 (on GKE) which has 8 CPU and 30 GB of memory. This is cutting it very close in terms of resources as you also need to host all of the Kubernetes system pods on this same node. Obviously, this works for kicking the tires on Fusion 5.1 but is not sufficient for production workloads.

You can change the instance type using the -i parameter; see: https://cloud.google.com/compute/docs/regions-zones/#available for an list of which machine types are available in your desired region.

Note: If not provided the script generates a custom values file named gke_<cluster>_<namespace>_fusion_values.yaml which you can use to customize the Fusion chart.

WARNING If using Helm V2, the setup_f5_gke.sh script installs Helm’s tiller component into your GKE cluster with the cluster admin role. If you don’t want this, then please upgrade to Helm v3.

If you see an error similar to the following, then wait a few seconds and try running the setup_f5_gke.sh script again with the same arguments as this is usually a transient issue:

Error: could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request

After running the setup_f5_gke.sh script, proceed to the Verifying the Fusion Installation section below.

When you’re ready to deploy Fusion to a production-like environment, see more information at Fusion 5 Survival Guide.

Create a three-node regional cluster to withstand a zone outage

With a three-node regional cluster, nodes are deployed across three separate availability zones.

./setup_f5_gke.sh -c <cluster> -p <project> -n <namespace> --region <region-name> --create multi_az
  • <cluster> value should be the name of a non-existent cluster; the script will create the new cluster.

  • <project> must match the name of an existing project in GKE. Run gcloud config get-value project to get this value, or see the GKE setup instructions.

  • <namespace> Kubernetes namespace to install Fusion into, defaults to default with release f5

  • <region-name> value should be the name of a GKE region, defaults to us-west1. Run gcloud config get-value compute/zone to get this value, or see the GKE setup instructions to set the value.

In this configuration, Kubernetes deploys a ZooKeeper and Solr pod on each of the three nodes, which allows the cluster to retain ZK quorum and remain operational after losing one node, such as during an outage in one availability zone.

When running in a multi-zone cluster, each Solr node has the solr_zone system property set to the zone it is running in, such as -Dsolr_zone=us-west1-a.

After running the setup_f5_gke.sh script, proceed to the Verifying the Fusion Installation section below.

When you’re ready to deploy Fusion to a production-like environment, see more information at Fusion 5 Survival Guide.

GKE Ingress and TLS

The Fusion proxy service provides authentication and serves as an API gateway for accessing all other Fusion services. It’s typical to use an Ingress for TLS termination in front of the proxy service.

The setup_f5_gke.sh supports creating an Ingress with an TLS cert for a domain you own by passing: -t -h <hostname>

After the script runs, you need to create an A record in GCP’s DNS service to map your domain name to the Ingress IP. Once this occurs, our script setup uses Let’s Encrypt to issue a TLS cert for your Ingress.

To see the status of the Let’s Encrypt issued certificate, do:

kubectl get managedcertificates -n <namespace> -o yaml

Please refer to the Kubernetes documentation on configuring an Ingress for GKE: Setting up HTTP Load Balancing with Ingress

Note
The GCP Ingress defaults to a 30 second timeout, which can lead to false negatives for long running requests such as importing apps. To configure the timeout for the backend in kubernetes:

Create a BackendConfig object in your namespace:

---
apiVersion: cloud.google.com/v1beta1
kind: BackendConfig
metadata:
  name: backend_config_name
spec:
  timeoutSec: 120
  connectionDraining:
    drainingTimeoutSec: 60

Then make sure that the following entries are in the right place in your values.yaml file:

api-gateway:
  service:
    annotations:
      beta.cloud.google.com/backend-config: '{"ports": {"6764":"backend_config_name"}}'

and upgrade your release to apply the configuration changes

Ingresses and externalTrafficPolicy

When running a fusion cluster behind an externally controlled LoadBalancer it can be advantageous to configure the externalTrafficPolicy of the proxy service to Local. This preserves the client source IP and avoids a second hop for LoadBalancer and NodePort type services, but risks potentially imbalanced traffic spreading. Although when running in a cluster with a dedicated pool for spark jobs that can scale up and down freely it can prevent unwanted request failures. This behaviour can be altered with the api-gateway.service.externalTrafficPolicy value, which is set to Local if the example values file is used.

You must use externalTrafficPolicy=Local for the Trusted HTTP Realm to work correctly.

If you are already using a custom 'values.yaml' file, create an entry for externalTrafficPolicy under api-gateway service.

api-gateway:
  service:
    externalTrafficPolicy: Local

Considerations when using the nginx ingress controller

If you are using the nginx ingress controller to fulfil your ingress definitions there are a couple of options that are recommended to be set in the configmap:

enable-underscores-in-headers: "true"   # Fusion can return some headers that have underscores, these have to be explicitly enabled in nginx
proxy-body-size: "0"        # By default nginx places a maximum size on request bodies, either increase as needed or disable by setting to 0
proxy-read-timeout: "300"   # Increases the timeout for potential slow queries.

Custom values

There are some example values files that can be used as a starting point for resources, affinity and replica count configuration in the example-values folder. These can be passed to the install script using the --values option, for example:

./setup_f5_gke.sh -c <cluster> -p <project> -r <release> -n <namespace> \
  --values example-values/affinity.yaml --values example-values/resources.yaml --values example-values/replicas.yaml

The --values option can be passed multiple times, if the same configuration property is contained within multiple values files then the values from the latest file passed as a --values option are used.

Connectors custom values

If you are using Fusion 5.9 or later, you can specify resources and replica count per connector. This allows you to set different resource limits for each connector. If you do not set custom values for a connector, that connector uses the default values.

Set each connector’s resource values in the connector-plugin section under pluginValues. The pluginValues section is a list of plugins and its resources. The following sample shows an example.

 pluginValues:
   - id: "plugin-id" (1)
     resources: (2)
       limits:
         cpu: "2"
         memory: "3Gi"
       requests:
         cpu: "250m"
         memory: "2Gi"
     replicaCount: 1 (3)
  1. The plugin ID. The plugin ID must match the plugin ID on the plugin ZIP file. without the lucidworks. prefix. For example, if the plugin ID on the plugin ZIP file is lucidworks.sharepoint-optimized, the plugin ID is sharepoint-optimized.

  2. The resources settings. You may specify the limits, the requests, and the CPU and memory for each.

  3. The number of replicas per connector. This value is 1 by default.

IMPORTANT After editing the connector-plugin section, you must reinstall the affected connector.

Upgrades and Ingress

IMPORTANT If you used the -t -h <hostname> options when installing your cluster, our script created an additional values yaml file named tls-values.yaml.

To make things easier for you when upgrading, you should add the settings from this file into your main custom values yaml file, e.g.:

api-gateway:
  service:
    type: "NodePort"
  ingress:
    enabled: true
    host: "<hostname>"
    tls:
      enabled: true
    annotations:
      "networking.gke.io/managed-certificates": "<RELEASE>-managed-certificate"
      "kubernetes.io/ingress.class": "gce"

This way you don’t have to remember to pass the additional tls-values.yaml file when upgrading.

Upgrade Fusion on GKE

Before you begin, please consult the Migration Guide.

During installation, the setup script generates a file named gke_<cluster>_<release>_fusion_values.yaml; use this file to customize Fusion settings.

In addition, the setup script creates a helper upgrade script to streamline the upgrade process. Look in the directory where you ran the setup script initially for a file named:

gke_<cluster>_<release>_upgrade_fusion.sh

where <release> is typically the same as your namespace unless you overrode the default value using the -r option.

After running the upgrade, use kubectl get pods to see the changes being applied to your cluster. It may take several minutes to perform the upgrade as new Docker images need to be pulled from DockerHub. To see the versions of running pods, do:

kubectl get po -o jsonpath='{..image}'  | tr -s '[[:space:]]' '\n' | sort | uniq

ML Model Service Permissions

A user must grant permissions to the Google service account so the ML Model Service can use Google Cloud Storage. This way you can always reference your model even if nodes are created or destroyed as part of cluster scaling.

Grant the default service account read/write access to a GCS bucket by upgrading with these changes:

To get the service account, do:

gcloud iam service-accounts list | grep 'default service' | grep compute

In the values.yaml, provide:

ml-model-service:
  modelRepoImpl: gcs
  gcsBucketName: <GCS_BUCKET_NAME>
  gcsBaseDirectoryName: dev

Amazon Elastic Kubernetes Service (EKS)

The setup_f5_eks.sh script provided in this repo is strictly optional. The script is mainly to help those new to Kubernetes and/or Fusion get started quickly. If you’re already familiar with K8s, Helm, and EKS, then you use Helm directly to install Fusion into an existing cluster or one you create yourself using the process described here.

If you’re new to Amazon Web Services (AWS), then please visit the Amazon Web Services Getting Started Center to set up an account.

If you’re new to Kubernetes and EKS, then we recommend going through Amazon’s EKS Workshop before proceeding with Fusion.

Set up the AWS CLI tools

Before launching an EKS cluster, you need to install and configure kubectl, aws, eksctl, aws-iam-authenticator using the links provided below:

Required AWS Command-line Tools:
  1. kubectl: Install kubectl

  2. aws: Installing the AWS CLI

  3. eksctl: Getting Started with eksctl

  4. aws-iam-authenticator: AWS IAM Authenticator for Kubernetes

Run aws configure to configure a profile for authenticating to AWS. You’ll use the profile name you configure in this step, which defaults to default, as the -p argument to the setup_f5_eks.sh script in the next section.

Note
When working in Ubuntu, avoid using the eksctl snap version. Alternative sources can have different versions that could cause command failures. Also, always make sure you are using the latest version for each one of the required tools.

Set up Fusion on EKS

To create a cluster in EKS the following IAM policies are required:

  • AmazonEC2FullAccess

  • AWSCloudFormationFullAccess

Table 1. EKS Permissions

eks:DeleteCluster

eks:UpdateClusterVersion

eks:ListUpdates

eks:DescribeUpdate

eks:DescribeCluster

eks:ListClusters

eks:CreateCluster

Table 2. VPC Permissions

ec2:DeleteSubnet

ec2:DeleteVpcEndpoints

ec2:CreateVpc

ec2:AttachInternetGateway

ec2:DetachInternetGateway

ec2:DisassociateSubnetCidrBlock

ec2:DescribeVpcAttribute

ec2:AssociateVpcCidrBlock

ec2:ModifySubnetAttribute

ec2:DisassociateVpcCidrBlock

ec2:CreateVpcEndpoint

ec2:DescribeVpcs

ec2:CreateInternetGateway

ec2:AssociateSubnetCidrBlock

ec2:ModifyVpcAttribute

ec2:DeleteInternetGateway

ec2:DeleteVpc

ec2:CreateSubnet

ec2:DescribeSubnets

ec2:ModifyVpcEndpoint

Table 3. IAM Permissions

iam:CreateInstanceProfile

iam:DeleteInstanceProfile

iam:GetRole

iam:GetPolicyVersion

iam:UntagRole

iam:GetInstanceProfile

iam:GetPolicy

iam:TagRole

iam:RemoveRoleFromInstanceProfile

iam:DeletePolicy

iam:CreateRole

iam:DeleteRole

iam:AttachRolePolicy

iam:PutRolePolicy

iam:ListInstanceProfiles

iam:AddRoleToInstanceProfile

iam:CreatePolicy

iam:ListInstanceProfilesForRole

iam:PassRole

iam:DetachRolePolicy

iam:DeleteRolePolicy

iam:CreatePolicyVersion

iam:GetRolePolicy

iam:DeletePolicyVersion

Download and run the setup_f5_eks.sh script to install Fusion 5.x in an EKS cluster.

Note
This script does not support multiple node pools and should not be used for production clusters.
  • To create a new cluster and install Fusion, run the following command:

    ./setup_f5_eks.sh -c my-eks-cluster -p profile-name -n fusion-namespace --create demo
    • Replace my-eks-cluster, profile-name, and fusion-namespace with your cluster, profile, and namespace values.

    • Pass the --create option with either demo or multi_az.

  • To use an existing cluster and install Fusion, run the following command:

    ./setup_f5_eks.sh -c cluster-name -p profile-name
    • Replace cluster-name with the name of the cluster you already created.

    • Replace profile-name with the name of your profile.

The profile is automatically set to default if you ran the AWS configure command without giving the profile a name.

Use the --help option to see full script usage.

Warning
If using Helm V2, the setup_f5_eks.sh script installs Helm’s tiller component into your EKS cluster with the cluster admin role. If you don’t want this, then please upgrade to Helm v3.
Warning
The setup_f5_eks.sh script creates a service account that provides S3 read-only permissions to the created pods.

After running the setup_f5_eks.sh script, proceed to the Verifying the Fusion Installation section below.

EKS cluster overview

The EKS cluster is created using eksctl (https://eksctl.io/). By default it will setup the following resources in your AWS account:

  • A dedicated VPC for the EKS cluster in the specified region with CIDR: 192.168.0.0/16

  • 3 Public and 3 Private subnets within the created VPC, each with a /19 CIDR range, along with the corresponding route tables.

  • A NAT gateway in each Public subnet

  • An Auto Scaling Group of the instance type specified by the script, which defaults to m5.2xlarge, with 3 instances spanning the public subnets.

See https://eksctl.io/usage/vpc-networking/ for more information on the networking setup.

EKS Ingress

The setup_f5_eks.sh script exposes the Fusion proxy service on an external DNS name provided by an ELB over HTTP. This is done for demo or getting started purposes. However, you’re strongly encouraged to configure a K8s Ingress with TLS termination in front of the proxy service. See: https://aws.amazon.com/premiumsupport/knowledge-center/terminate-https-traffic-eks-acm/

Our EKS script creates a classic ELB for exposing fusion proxy service. In case you need to change this behavior and use AWS Load Balancer Controller instead you can use the following parameters when running the setup_f5_eks.sh script:

--deploy-alb     # Tells the script to deploy an ALB

By default the kube-system namespace is being used for installing the aws-load-balancer-controller because pods priorityClassName is set to system-cluster-critical.

In case you need to deploy an internal ALB you can use the --internal-alb option. This will create the nodes in the internal subnets. Fusion will be reachable from an AWS instance located in any of the external subnets on the same VPC. To use an ALB also an ingress with a DNS name is required, you can use the -h option to create an ingress with the required DNS name.

Finally, use Route 53 or your DNS provider for creating an A ALIAS DNS record for your DNS name pointing to the ingress ADRESS. You can get the address listing the ingress using the command kubectl get ing.

Upgrade Fusion on EKS

Before you begin, please consult the Migration Guide.

To make things easier for you, our setup script creates an upgrade script you can use to perform upgrades, see:

eks_<cluster>_<release>_upgrade_fusion.sh

Provide access to the EKS cluster to other users

Initially, only the user that created the Amazon EKS cluster has system:masters permissions to configure the cluster. In order to extend the permissions, a ConfigMap should be created to allow access to IAM users or roles.

For providing these permissions, use the following yaml file as a template, replacing the required values:

aws-auth.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: <node_instance_role_arn>
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes
  mapUsers: |
    - userarn: arn:aws:iam::<account_id>:user/<username>
      username: <username>
      groups:
        - system:masters

Use the following command for applying the yaml file: kubectl apply -f aws-auth.yaml

Remove EKS cluster

In case you have deployed an ALB ingress controller, you would need to remove the policy that was created for managing the ALB before removing the cluster. You can use the following command for it:

aws iam --profile <profile-name> delete-policy --policy-arn arn:aws:iam::<account_id>:policy/eksctl-<cluster-name>-alb-policy

Also you can remove it manually using the AWS IAM console, searching for eksctl-<cluster-name>-alb-policy.

After that you should remove the ALB with helm delete, list the current releases with helm list.

The EKS cluster is created using Cloudformation stacks so you need to remove them to delete the cluster, you can check them in the AWS Cloudformation Console, check for the following stacks:

  • eksctl-<cluster-name>-nodegroup-standard-workers

  • eksctl-<cluster-name>-cluster

The eksctl-<cluster-name>-nodegroup-standard-workers stack should be the first to be removed. After that we can remove the eksctl-<cluster-name>-cluster stack.

Also you can use the following commands>:

aws cloudformation --profile <profile-name> delete-stack --stack-name eksctl-<cluster-name>-nodegroup-standard-workers
aws cloudformation --profile <profile-name> delete-stack --stack-name eksctl-<cluster-name>-cluster

Azure Kubernetes (AKS)

The setup_f5_aks.sh script provided in this repo is strictly optional. The script is mainly to help those new to Kubernetes and/or Fusion get started quickly. If you’re already familiar with K8s, Helm, and AKS, then you use Helm directly to install Fusion into an existing cluster or one you create yourself using the process described here.

If you’re new to Azure, then please visit https://azure.microsoft.com/en-us/free/search/ to set up an account.

Set up the AKS CLI tools

Before launching an AKS cluster, you need to install and configure kubectl and az using the links provided below:

Required AKS Command-line Tools:
  1. kubectl: Install kubectl

  2. az: Installing the Azure CLI

To confirm your account access and command-line tools are set up correctly, run the az login command (az login –help to see available options).

Azure Prerequisites

To launch a cluster in AKS (or pretty much do anything with Azure) you need to setup a Resource Group. Resource Groups are a way of organizing and managing related resources in Azure. For more information about resource groups, see https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-overview#resource-groups.

You also need to choose a location where you want to spin up your AKS cluster, such as westus2. For a list of locations you can choose, see https://azure.microsoft.com/en-us/global-infrastructure/locations/.

Use the Azure console in your browser to create a resource group, or simply do:

az group create -g $AZURE_RESOURCE_GROUP -l $AZURE_LOCATION
To recap, you should have the following requirements in place:
  1. Azure Account set up.

  2. azure-cli (az) command-line tools installed.

  3. az login working.

  4. Created an Azure Resource Group and selected a location to launch the cluster.

Set up Fusion on AKS

Download and run the setup_f5_aks.sh script to install Fusion 5.x in a AKS cluster. To create a new cluster and install Fusion, simply do:

./setup_f5_aks.sh -c <cluster_name> -p <aks_resource_group>

If you don’t want the script to create a cluster, then you need to create a cluster before running the script and simply pass the name of the existing cluster using the -c parameter.

Use the --help option to see full script usage.

By default, our script installs Fusion into the default namespace; think of a K8s namespace as a virtual cluster within a physical cluster. You can install multiple instances of Fusion in the same cluster in separate namespaces. However, please do not install more than one Fusion release in the same namespace.

You can override the namespace using the -n option. In addition, our script uses f5 for the Helm release name; you can customize this using the -r option. Helm uses the release name you provide to track a specific instance of an installation, allowing you to perform updates and rollback changes for that specific release only.

You can also pass the --preview option to the script, which enables soon-to-be-released features for AKS, such as deploying a multi-zone cluster across 3 availability zones for higher availability guarantees. For more information about the Availability Zone feature, see https://docs.microsoft.com/en-us/azure/aks/availability-zones.

It takes a while for AKS to spin up the new cluster. The cluster will have three Standard_D4_v3 nodes which have 4 CPU cores and 16 GB of memory. Behind the scenes, our script calls the az aks create command.

Warning
If using Helm V2, the setup_f5_aks.sh script installs Helm’s tiller component into your AKS cluster with the cluster admin role. If you don’t want this, then please upgrade to Helm v3.

After running the setup_f5_aks.sh script, proceed to Verifying the Fusion Installation.

AKS Ingress

The setup_f5_aks.sh script exposes the Fusion proxy service on an external IP over HTTP. This is done for demo or getting started purposes. However, you’re strongly encouraged to configure a K8s Ingress with TLS termination in front of the proxy service.

Use the -t and -h <hostname> options to have our script create an Ingress with a TLS certificate issued by Let’s Encrypt.

Upgrades and Ingress

Important
If you used the -t -h <hostname> options when installing your cluster, our script created an additional values yaml file named tls-values.yaml.

To make things easier for you when upgrading, you should add the settings from this file into your main custom values yaml file. For example:

api-gateway:
  service:
    type: "NodePort"
  ingress:
    enabled: true
    host: "<hostname>"
    tls:
      enabled: true
    annotations:
      "networking.gke.io/managed-certificates": "<RELEASE>-managed-certificate"
      "kubernetes.io/ingress.class": "gce"

This way, you don’t have to remember to pass the additional tls-values.yaml file when upgrading.

Upgrade Fusion on AKS

Before you begin, please consult the Migration Guide.

To make things easier for you, our setup script creates an upgrade script you can use to perform upgrades, see:

aks_<cluster>_<release>_upgrade_fusion.sh

Other Kubernetes Platforms

If you’re not running on a managed K8s platform like GKE, AKS, or EKS, you can use Helm to install the Fusion chart to an existing Kubernetes cluster.

Fusion version 5.5 now includes support for the Rancher Kubernetes Engine (RKE) platform. Before deploying Fusion to RKE, you must download and install the RKE software. After configuring your cluster, you can proceed with the Helm v3 installation.

Note
You must have a working cluster configured before performing the Helm v3 installation.

Use Helm v3 to Install Fusion

You should upgrade to the latest version of Helm v3 for working with Fusion. If you need to keep Helm V2 for other clusters, ensure Helm V3 is ahead of Helm V2 in your working shell’s PATH before proceeding.

Customize Fusion Chart Settings

Fusion aims to be well-configured out-of-the-box, but you can customize any of the built-in settings using a custom values YAML file. If you use one of our setup scripts, such as setup_f5_gke.sh, then it will create a custom values YAML file for you the first time you run it using the customize_fusion_values.yaml.example as a template.

If you’re working with Helm directly and not using one of our setup scripts, then run the customize_fusion_values.sh script to create a custom values YAML file from our customize_fusion_values.yaml.example template as a starting point:

./customize_fusion_values.sh  -c <cluster> -n <namespace> \
  --provider <provider> --num-solr 1 --node-pool "<node_pool>"
Note
Pass --help for usage details.

In this example:

  • <provider> is the K8s platform you’re running on, such as gke

  • <cluster> is the name of your cluster

  • <namespace> is the K8s namespace where you plan to install Fusion

Note
The --node-pool option specifies the node selector label for determining which nodes to run Fusion pods. You can pass "{}" to let Kubernetes decide which nodes to schedule pods on.

This file is referred to as ${MY_VALUES} in the commands belo. Replace the filename with the correct filename for your environment. Keep this file handy, as you’ll need it to customize Fusion settings and upgrade to a newer version.

Review the settings in the custom values YAML file to ensure the defaults are appropriate for your environment, including the number of Solr and Zookeeper replicas.

Add the Lucidworks Helm repo:

helm repo add lucidworks https://charts.lucidworks.com

The customize_fusion_values.sh script creates an upgrade script to install/upgrade Fusion into Kubernetes using Helm. Look in the directory where you ran customize_fusion_values.sh for a script named like: <provider>_<cluster>_<namespace>_upgrade_fusion.sh. Run this script to install Fusion.

Upgrade Existing Installation with Helm V3

Before you begin, please consult the Migration Guide.

To update an existing installation, do:

RELEASE=f5
NAMESPACE=default
helm repo update
helm upgrade ${RELEASE} "lucidworks/fusion" --namespace "${NAMESPACE}" --values "${MY_VALUES}"

Except for Zookeeper, all K8s deployments and statefulsets use a RollingUpdate update policy:

  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate

Zookeeper instances use OnDelete to avoid changing critical stateful pods in the Fusion deployment. To apply changes to Zookeeper after performing the upgrade (uncommon), you need to manually delete the pods. For example:

kubectl delete pod f5-zookeeper-0
Important
Delete one pod at a time, and verify the new pod is healthy and serving traffic before deleting the next healthy pod.

Alternatively, you can set the updateStrategy under the zookeeper section in your "${MY_VALUES}" file:

solr:
  ...
  zookeeper:
    updateStrategy:
      type: "RollingUpdate"

RedHat OpenShift

We can deploy Fusion in an existing OpenShift cluster. This cluster should be created using OpenShift Infrastructure Provider. A Red Hat Customer Portal account is required. OpenShift Online services are not supported.

The easiest way to install on OpenShift is to run the setup_f5_k8s.sh script for your existing cluster; use the --help option to see script usage. For instance, the following command will install Fusion 5 into the specified namespace (-n) and OpenShift cluster (-c):

./setup_f5_k8s.sh -c <CLUSTER> -n <NAMESPACE> --provider oc

Tip: kubectl should work with your OpenShift cluster (see: https://docs.openshift.com/container-platform/4.1/cli_reference/usage-oc-kubectl.html) and Lucidworks recommends installing the latest kubectl for your workstation instead of using oc for installing Fusion 5. However, if you do not have kubectl installed, then you’ll need to update the upgrade script created by setup_f5_k8s.sh to use oc instead of kubectl (search and replace on the BASH script using a text editor).

When you’re ready to deploy Fusion to a production-like environment, see more information at Fusion 5 Survival Guide.

Lucidworks recommends using Helm v3, but in case Tiller is required for Helm v2, the cluster security needs to be relaxed to allow images to run with different UIDs:

oc adm policy add-scc-to-group anyuid system:authenticated

Verifying the Fusion Installation

In this section, we provide some tips on how to verify the Fusion installation.

Tip
Check if the Fusion Admin UI is available at https://<fusion-host>:6764/admin/.

Let’s review some useful kubectl commands.

Enhance the K8s Command-line Experience

Here is a list of tools we found useful for improving your command-line experience with Kubernetes:

Useful kubectl commands

Set the namespace for kubectl if not using the default:

kubectl config set-context --current --namespace=<NAMESPACE>

This saves you from having to pass -n with every command.

Get a list of running pods: k get pods

Get logs for a pod using a label: k logs –l app.kubernetes.io/component=query-pipeline

Get pod deployment spec and details: k get pods <pod_id> -o yaml

Get details about a pod events: k describe po <pod_id>

Port forward to a specific pod: k port-forward <pod_id> 8983:8983

SSH into a pod: k exec -it <pod_id> — /bin/bash

CPU/Memory usage report for pods: k top pods

Forcefully kill a pod: k delete po <pod_id> --force --grace-period 0

Scale up (or down) a deployment: k scale deployment.v1.apps/<id> --replicas=N

Get a list of pod versions: k get po -o jsonpath='{..image}' | tr -s '' '\n' | sort | uniq

Check Fusion Pods and Services

Once the install script completes, you can check that all pods and services are available using:

kubectl get pods

If all goes well, you should see a list of pods similar to:

NAME                                                        READY   STATUS    RESTARTS   AGE
seldon-controller-manager-6675874894-qxwrv                  1/1     Running   0          8m45s
f5-admin-ui-74d794f4f8-m5jms                                1/1     Running   0          8m45s
f5-ambassador-fd6b9b5dc-7ghf6                               1/1     Running   0          8m43s
f5-api-gateway-6b9998b9c-tmchk                              1/1     Running   0          8m45s
f5-auth-ui-7565564b4c-rdc74                                 1/1     Running   0          8m42s
f5-classic-rest-service-0                                   1/1     Running   3          8m44s
f5-devops-ui-77bb867ffb-fbzxd                               1/1     Running   0          8m42s
f5-fusion-admin-78b8f8fc7f-4d7l8                            1/1     Running   0          8m42s
f5-fusion-indexing-599c8d448-xzsvm                          1/1     Running   0          8m44s
f5-insights-665fd9f6fc-g5psw                                1/1     Running   0          8m43s
f5-job-launcher-84dd4c5c96-p8528                            1/1     Running   0          8m44s
f5-job-rest-server-6d44d964b8-xtnxw                         1/1     Running   0          8m45s
f5-logstash-0                                               1/1     Running   0          8m45s
f5-ml-model-service-6987dc94c9-9ppp8                        2/2     Running   1          8m45s
f5-monitoring-grafana-5d499dbb58-pzw72                      1/1     Running   0          10m
f5-monitoring-prometheus-kube-state-metrics-54d6678dv9h7h   1/1     Running   0          10m
f5-monitoring-prometheus-pushgateway-7d65c65b85-vwrwf       1/1     Running   0          10m
f5-monitoring-prometheus-server-0                           2/2     Running   0          10m
f5-pm-ui-86cbc5bb65-nd2n8                                   1/1     Running   0          8m44s
f5-pulsar-bookkeeper-0                                      1/1     Running   0          8m45s
f5-pulsar-broker-b56cc776f-56msx                            1/1     Running   0          8m45s
f5-query-pipeline-5d75d7d5f4-l2mdf                          1/1     Running   0          8m43s
f5-connectors-7bb6cfc65f-7wfs2                              1/1     Running   0          8m42s
f5-connectors-backend-987fdc648-dldwv                       1/1     Running   0          8m45s
f5-rules-ui-6b9d55b78f-9hzzj                                1/1     Running   0          8m43s
f5-solr-0                                                   1/1     Running   0          8m44s
f5-solr-exporter-c4687c785-jsm7x                            1/1     Running   0          8m45s
f5-ui-6cdbcc68c6-rj9cq                                      1/1     Running   0          8m45s
f5-webapps-6d6bb9bfd-hm4qx                                  1/1     Running   0          8m45s
f5-workflow-controller-7b66679fb7-sjbvp                     1/1     Running   0          8m44s
f5-zookeeper-0                                              1/1     Running   0          8m45s

The number of pods per deployment / statefulset will vary based on your cluster size and replicaCount settings in your custom values YAML file. Also, don’t worry if you see some pods having been restarted as that just means they were too slow to come up and Kubernetes killed and restarted them. You do want to see at least one pod running for every service. If a pod is not running after waiting a sufficient amount of time, use kubectl logs <pod_id> to see the logs for that pod; to see the logs for previous versions of a pod, use: kubectl logs <pod_id> -p. You can also look at the actions Kubernetes performed on the pod using kubectl describe po <pod_id>.

To see a list of Fusion services, do:

kubectl get svc

For an overview of the various Fusion 5 microservices, see: Fusion microservices.

Once you’re ready to build a Fusion cluster for production, please see see more information at Fusion 5 Survival Guide.

Upgrading with Zero Downtime

One of the most powerful features provided by Kubernetes and a cloud-native microservices architecture is the ability to do a rolling update on a live cluster. Fusion 5 allows customers to upgrade from Fusion 5.x.y to a later 5.x.z version on a live cluster with zero downtime or disruption of service.

When Kubernetes performs a rolling update to an individual microservice, there will be a mix of old and new services in the cluster concurrently (only briefly in most cases) and requests from other services will be routed to both versions. Consequently, Lucidworks ensures all changes we make to our service do not break the API interface exposed to other services in the same 5.x line of releases. We also ensure stored configuration remains compatible in the same 5.x release line.

Lucidworks releases minor updates to individual services frequently, so our customers can pull in those upgrades using Helm at their discretion.

To upgrade your cluster at any time, use the --upgrade option with our setup scripts in this repo.

The scripts in this repo automatically pull in the latest chart updates from our Helm repository and deploy any updates needed by doing a diff of your current installation and the latest release from Lucidworks. To see what would be upgraded, you can pass the --dry-run option to the script.

Grafana Dashboards

Get the initial Grafana password from a K8s secret by doing:

kubectl get secret --namespace "${NAMESPACE}" ${RELEASE}-monitoring-grafana \
  -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

With Grafana, you can either setup a temporary port-forward to a Grafana pod or expose Grafana on an external IP using a K8s LoadBalancer. To define a LoadBalancer, do (replace ${RELEASE} with your Helm release label):

kubectl expose deployment ${RELEASE}-monitoring-grafana --type=LoadBalancer --name=grafana --port=3000 --target-port=3000

You can use kubectl get services --namespace <namespace> to determine when the load balancer is setup and its IP address. Direct your browser to http://<GrafanaIP>:3000 and enter the username admin@localhost and the password that was returned in the previous step.

This will log you into the application. It is recommended that you create another administrative user with a more desirable password.

The dashboards and datasoure will be setup for you in grafana, simply navigate to DashboardsManage to view the vailable dashboards

fusion-cloud-native's People

Contributors

acesar avatar apurvamishra20 avatar asmartlucidworks avatar bdw617 avatar captainron2000 avatar chairbender avatar ctargett avatar dgarson avatar dianaprince avatar dustinguericke avatar dzmitryk avatar eedmiston avatar falloutdurham avatar ian-thebridge-lucidworks avatar irepan avatar jeremynovey avatar josecortezsv avatar jvancetw avatar kiranchitturi avatar laurel avatar luis-munoz avatar marcussorealheis avatar mohamed-maged avatar molsza avatar puneetkhanal avatar roblucar avatar sujindocs avatar thelabdude avatar thinline72 avatar yawboateng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fusion-cloud-native's Issues

Reinstate config-sync for customer use

We are currently looking to deploy Fusion into our Kubernetes environments, and we strive very much to fit the GitOps model of operating, so a feature like config-sync would be perfect for us in order to remove manual interaction when spinning up a new cluster.

I can see the PR to remove it from customer use (#189), but curious as to the reasons why and whether there is any chance of bringing it back.

fusion-classic-rest-service pod could not start

Hi,

we are facing issues that fusion-classic-rest-service-0 could not be started and is restarted over and over

fusion-admin-ui-85b6866fbb-hfsxl                                  1/1     Running                 0          26d
fusion-ambassador-8588f45b44-qs976                                1/1     Running                 0          36d
fusion-api-gateway-5d4cf67975-k56nd                               1/1     Running                 0          20d
fusion-argo-ui-8db6b5887-2b5tm                                    1/1     Running                 0          36d
fusion-auth-ui-555cfbbf54-qmzqc                                   1/1     Running                 0          26d
fusion-classic-rest-service-0                                     0/1     Init:CrashLoopBackOff   4699       23d
fusion-devops-ui-6f6c5466bd-r6fvs                                 1/1     Running                 0          26d
fusion-fusion-admin-59cd7d4c96-flxqc                              1/1     Running                 0          26d
fusion-fusion-indexing-54b6474f57-w7wl7                           1/1     Running                 26         26d
fusion-fusion-log-forwarder-66bc598c7-wpfss                       1/1     Running                 0          26d
fusion-insights-6d9cbc5769-p99cj                                  1/1     Running                 0          26d
fusion-job-launcher-5ccc758859-jklbc                              1/1     Running                 0          26d
fusion-job-rest-server-78897f8886-8kgt8                           1/1     Running                 0          26d
fusion-ml-model-service-5c4cffd47d-gq5bd                          1/1     Running                 0          26d
fusion-monitoring-grafana-7f9d5cccf8-6m7bw                        1/1     Running                 0          36d
fusion-monitoring-prometheus-kube-state-metrics-66f6cc4bb-k8pk7   1/1     Running                 0          36d
fusion-monitoring-prometheus-pushgateway-7996489596-r4rs7         1/1     Running                 0          36d
fusion-monitoring-prometheus-server-0                             2/2     Running                 0          36d
fusion-mysql-7b97f56bdc-9rw8s                                     1/1     Running                 0          36d
fusion-pm-ui-747576df49-qqsp6                                     1/1     Running                 0          26d
fusion-pulsar-bookkeeper-0                                        1/1     Running                 0          36d
fusion-pulsar-bookkeeper-1                                        1/1     Running                 0          36d
fusion-pulsar-bookkeeper-2                                        1/1     Running                 0          36d
fusion-pulsar-broker-0                                            1/1     Running                 0          36d
fusion-pulsar-broker-1                                            1/1     Running                 0          36d
fusion-query-pipeline-6dbbf8886c-qswsc                            1/1     Running                 0          26d
fusion-rest-service-6ffc8f9cc4-ndhw4                              1/1     Running                 0          26d
fusion-rpc-service-66b5c4885-cjn4j                                1/1     Running                 0          36d
fusion-rules-ui-9ccb6db59-m74sw                                   1/1     Running                 0          26d
fusion-solr-0                                                     1/1     Running                 0          36d
fusion-solr-exporter-6fccf89d5f-4pdxq                             1/1     Running                 0          36d
fusion-templating-c96f57955-gdh9f                                 1/1     Running                 0          26d
fusion-webapps-69cc458d47-847rj                                   1/1     Running                 0          26d
fusion-workflow-controller-ffc878cc-2scvd                         1/1     Running                 0          36d
fusion-zookeeper-0                                                1/1     Running                 0          36d
fusion-zookeeper-1                                                1/1     Running                 0          36d
fusion-zookeeper-2                                                1/1     Running                 0          36d
milvus-writable-588d6c755d-w2j8m                                  1/1     Running                 0          36d
seldon-controller-manager-86f68fbcd-dk6db                         1/1     Running                 6          36d

When I described failing pod (fusion-classic-rest-service-0) and got the information that the one of init containers ("check-zk") fails

Init Containers:
  check-zk:
    Container ID:  containerd://86cbec8dd8bb25ea8239aaa551f28e7bc8164771f911cb29be0a61a79247e5cc
    Image:         lucidworks/check-fusion-dependency:v1.2.0
    Image ID:      docker.io/lucidworks/check-fusion-dependency@sha256:9829ccb6a0bea76ac92851b51f8fd8451b7f803019adf27865f093d168a6b19e
    Port:          <none>
    Host Port:     <none>
    Args:
      zookeeper
    State:          Running
      Started:      Thu, 08 Apr 2021 14:03:56 +0200
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 08 Apr 2021 13:56:55 +0200
      Finished:     Thu, 08 Apr 2021 13:58:55 +0200
    Ready:          False
    Restart Count:  4700
    Limits:
      cpu:     200m
      memory:  32Mi
    Requests:
      cpu:     200m
      memory:  32Mi
    Environment:
      ZOOKEEPER_CONNECTION_STRING:  fusion-zookeeper-0.fusion-zookeeper-headless:2181,fusion-zookeeper-1.fusion-zookeeper-headless:2181,fusion-zookeeper-2.fusion-zookeeper-headless:2181
      CHECK_INTERVAL:               5s
      CHECK_TIMEOUT:                2s
      TIMEOUT:                      2m
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from fusion-classic-rest-service-token-hr8sx (ro)

So I got the log from init container

2021/04/08 12:03:56 Check returned error: dial tcp: lookup fusion-zookeeper-2.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:01 Check returned error: dial tcp: lookup fusion-zookeeper-1.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:06 Check returned error: dial tcp: lookup fusion-zookeeper-1.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:11 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:16 Check returned error: dial tcp: lookup fusion-zookeeper-2.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:21 Check returned error: dial tcp: lookup fusion-zookeeper-2.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:26 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:31 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:36 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:41 Check returned error: dial tcp: lookup fusion-zookeeper-2.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:46 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:51 Check returned error: dial tcp: lookup fusion-zookeeper-1.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:04:56 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:01 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:06 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:11 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:16 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:21 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:26 Check returned error: dial tcp: lookup fusion-zookeeper-1.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:31 Check returned error: dial tcp: lookup fusion-zookeeper-1.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:36 Check returned error: dial tcp: lookup fusion-zookeeper-1.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:41 Check returned error: dial tcp: lookup fusion-zookeeper-2.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:46 Check returned error: dial tcp: lookup fusion-zookeeper-0.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:51 Check returned error: dial tcp: lookup fusion-zookeeper-1.fusion-zookeeper-headless on 10.237.48.10:53: dial udp 10.237.48.10:53: connect: network is unreachable
2021/04/08 12:05:56 Error checking zookeeper is running: Timed out waiting for check to complete successfully

Here is a list of all services

NAME                                              TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                               AGE
admin                                             ClusterIP      10.237.52.18    <none>         8765/TCP                              36d
admin-ui                                          ClusterIP      10.237.60.242   <none>         8080/TCP                              36d
auth-ui                                           ClusterIP      10.237.55.132   <none>         8080/TCP                              36d
connector-plugin-service                          ClusterIP      10.237.50.132   <none>         9020/TCP                              36d
connectors                                        ClusterIP      10.237.48.90    <none>         9010/TCP                              36d
connectors-classic                                ClusterIP      None            <none>         9000/TCP                              36d
connectors-rpc                                    ClusterIP      10.237.62.128   <none>         8771/TCP                              36d
devops-ui                                         ClusterIP      10.237.51.36    <none>         8080/TCP                              36d
fusion-ambassador                                 ClusterIP      10.237.60.104   <none>         80/TCP,443/TCP                        36d
fusion-argo-ui                                    ClusterIP      10.237.48.220   <none>         2746/TCP                              36d
fusion-monitoring-grafana                         ClusterIP      10.237.61.14    <none>         80/TCP                                36d
fusion-monitoring-prometheus-kube-state-metrics   ClusterIP      None            <none>         80/TCP,81/TCP                         36d
fusion-monitoring-prometheus-pushgateway          ClusterIP      10.237.49.238   <none>         9091/TCP                              36d
fusion-monitoring-prometheus-server               ClusterIP      10.237.62.189   <none>         80/TCP                                36d
fusion-monitoring-prometheus-server-headless      ClusterIP      None            <none>         80/TCP                                36d
fusion-mysql                                      ClusterIP      10.237.52.91    <none>         3306/TCP                              36d
fusion-pulsar-bookkeeper                          ClusterIP      None            <none>         3181/TCP,8000/TCP                     36d
fusion-pulsar-broker                              ClusterIP      None            <none>         8080/TCP,6650/TCP                     36d
fusion-solr-exporter                              ClusterIP      10.237.54.108   <none>         9983/TCP                              36d
fusion-solr-headless                              ClusterIP      None            <none>         8983/TCP                              36d
fusion-solr-svc                                   ClusterIP      10.237.61.138   <none>         8983/TCP                              36d
fusion-zookeeper                                  ClusterIP      10.237.55.91    <none>         2181/TCP,2281/TCP                     36d
fusion-zookeeper-headless                         ClusterIP      None            <none>         2181/TCP,3888/TCP,2888/TCP,2281/TCP   36d
indexing                                          ClusterIP      10.237.62.46    <none>         8765/TCP                              36d
insights                                          ClusterIP      10.237.53.178   <none>         8080/TCP                              36d
job-launcher                                      ClusterIP      10.237.50.233   <none>         8083/TCP                              36d
job-rest-server                                   ClusterIP      10.237.63.32    <none>         8081/TCP                              36d
milvus                                            ClusterIP      10.237.57.195   <none>         19530/TCP,19121/TCP                   36d
ml-model-grpc                                     ClusterIP      10.237.63.47    <none>         6565/TCP                              36d
ml-model-service                                  ClusterIP      10.237.56.36    <none>         8086/TCP                              36d
pm-ui                                             ClusterIP      10.237.61.241   <none>         8080/TCP                              36d
proxy                                             LoadBalancer   10.237.56.89    20.50.14.165   6764:31028/TCP                        36d
pulsar-broker                                     ClusterIP      None            <none>         8080/TCP,6650/TCP                     36d
query                                             ClusterIP      10.237.50.250   <none>         8787/TCP                              36d
rules-ui                                          ClusterIP      10.237.48.49    <none>         8080/TCP                              36d
seldon-webhook-service                            ClusterIP      10.237.53.165   <none>         443/TCP                               36d
templating                                        ClusterIP      10.237.54.124   <none>         5250/TCP                              36d
webapps                                           ClusterIP      10.237.61.72    <none>         8780/TCP                              36d

And a list of endpoints

NAME                                              ENDPOINTS                                                          AGE
admin                                             10.234.1.47:8765                                                   36d
admin-ui                                          10.234.1.55:8080                                                   36d
auth-ui                                           10.234.1.40:8080                                                   36d
connector-plugin-service                          <none>                                                             36d
connectors                                        10.234.0.132:9010                                                  36d
connectors-classic                                                                                                   36d
connectors-rpc                                    10.234.1.29:8771                                                   36d
devops-ui                                         10.234.0.146:8080                                                  36d
fusion-ambassador                                 10.234.0.151:8443,10.234.0.151:8080                                36d
fusion-argo-ui                                    10.234.1.32:2746                                                   36d
fusion-monitoring-grafana                         10.234.0.233:3000                                                  36d
fusion-monitoring-prometheus-kube-state-metrics   10.234.1.42:8081,10.234.1.42:8080                                  36d
fusion-monitoring-prometheus-pushgateway          10.234.0.139:9091                                                  36d
fusion-monitoring-prometheus-server               10.234.0.145:9090                                                  36d
fusion-monitoring-prometheus-server-headless      10.234.0.145:9090                                                  36d
fusion-mysql                                      10.234.1.53:3306                                                   36d
fusion-pulsar-bookkeeper                          10.234.0.140:8000,10.234.0.246:8000,10.234.1.49:8000 + 3 more...   36d
fusion-pulsar-broker                              10.234.0.141:6650,10.234.1.57:6650,10.234.0.141:8080 + 1 more...   36d
fusion-solr-exporter                              10.234.1.44:9983                                                   36d
fusion-solr-headless                              10.234.0.138:8983                                                  36d
fusion-solr-svc                                   10.234.0.138:8983                                                  36d
fusion-zookeeper                                  10.234.0.157:2181,10.234.0.241:2181,10.234.1.43:2181 + 3 more...   36d
fusion-zookeeper-headless                         10.234.0.157:2888,10.234.0.241:2888,10.234.1.43:2888 + 9 more...   36d
indexing                                          10.234.0.149:8765                                                  36d
insights                                          10.234.0.131:8080                                                  36d
job-launcher                                      10.234.0.235:8083                                                  36d
job-rest-server                                   10.234.0.231:8081                                                  36d
milvus                                            10.234.0.227:19530,10.234.0.227:19121                              36d
ml-model-grpc                                     10.234.0.249:6565                                                  36d
ml-model-service                                  10.234.0.249:8086                                                  36d
pm-ui                                             10.234.0.236:8080                                                  36d
proxy                                             10.234.0.230:6764                                                  36d
pulsar-broker                                     10.234.0.141:6650,10.234.1.57:6650,10.234.0.141:8080 + 1 more...   36d
query                                             10.234.1.45:8787                                                   36d
rules-ui                                          10.234.0.234:8080                                                  36d
seldon-webhook-service                            10.234.1.51:443                                                    36d
templating                                        10.234.0.133:5250                                                  36d
webapps                                           10.234.0.251:8780                                                  36d

And description of endpoint fusion-zookeper-headless

Name:         fusion-zookeeper-headless
Namespace:    fusion
Labels:       app=zookeeper
              app.kubernetes.io/managed-by=Helm
              chart=zookeeper-2.4.2
              heritage=Helm
              release=fusion
              service.kubernetes.io/headless=
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2021-03-02T14:16:05Z
Subsets:
  Addresses:          10.234.0.157,10.234.0.241,10.234.1.43
  NotReadyAddresses:  <none>
  Ports:
    Name       Port  Protocol
    ----       ----  --------
    server     2888  TCP
    client     2181  TCP
    tlsclient  2281  TCP
    election   3888  TCP

Events:  <none>

and description of service fusion-zookeper-headless

Name:              fusion-zookeeper-headless
Namespace:         fusion
Labels:            app=zookeeper
                   app.kubernetes.io/managed-by=Helm
                   chart=zookeeper-2.4.2
                   heritage=Helm
                   release=fusion
Annotations:       meta.helm.sh/release-name: fusion
                   meta.helm.sh/release-namespace: fusion
Selector:          app=zookeeper,release=fusion
Type:              ClusterIP
IP:                None
Port:              client  2181/TCP
TargetPort:        client/TCP
Endpoints:         10.234.0.157:2181,10.234.0.241:2181,10.234.1.43:2181
Port:              election  3888/TCP
TargetPort:        election/TCP
Endpoints:         10.234.0.157:3888,10.234.0.241:3888,10.234.1.43:3888
Port:              server  2888/TCP
TargetPort:        server/TCP
Endpoints:         10.234.0.157:2888,10.234.0.241:2888,10.234.1.43:2888
Port:              tlsclient  2281/TCP
TargetPort:        tlsclient/TCP
Endpoints:         10.234.0.157:2281,10.234.0.241:2281,10.234.1.43:2281
Session Affinity:  None
Events:            <none>

Can somebody please advice me what is wrong and why the init container of "fusion-classic-rest-service-0" tries to reach fusion-zookeper-headless on such strange IP which differs to IP defined in service fusion-zookeper-headless?

Manual helm v2 install without Tiller on EKS - unknown field "affinity"

Following your instructions under Use Helm to Install Fusion, subsection Don't want to use Tiller? (Helm v2), on to AWS EKS cluster:

  • For LoadBalancer service type, service annotation is required for EKS
    • service.beta.kubernetes.io/aws-load-balancer-type: nlb for public facing
    • service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0 for private facing
      so we must use customization values.yaml. In our case, EKS is private only, so the customized values.yaml content is as follows:
sql-service:
  service:
    thrift:
      type: "ClusterIP"
api-gateway:
  service:
    annotations: { service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0 }

With the above, and with RELEASE and NAMESPACE both set to 'f5', run kubectl apply -f "${RELEASE}_${NAMESPACE}_fusion_install.yaml" --namespace "${NAMESPACE}", got validation error:

error: error validating "f5_f5_fusion_install.yaml": error validating data: ValidationError(Deployment.spec): unknown field "affinity" in io.k8s.api.apps.v1.DeploymentSpec; if you choose to ignore these errors, turn validation off with --validate=false

Rerun kubectl with --validate=false, deployment succeeds, and http://:6764 from Chrome brings up a log in page!

Upgrade fails to upgrade SOLR jvm and other settings

fusion-jupyter:
  enabled: true

and

 javaMem: "-Xmx14g -Dfusion_node_type=system"

in values.yaml file are not deployed with upgrade.sh script.
CLI:

./aks_pi-001_f5_upgrade_fusion.sh \
--values ./aks/aks_pi-001_f5_fusion_values.yaml \
--values ./aks/aks_pi-001_f5_fusion_replicas.yaml \
--values ./aks/aks_pi-001_f5_fusion_affinity.yaml \
--values ./aks/aks_pi-001_f5_fusion_replicas.yaml \
--values ./aks/aks_pi-001_f5_monitoring_values.yaml

ps in f5-solr-* pod:

solr          1  0.0  0.0   2292   776 ?        Ss   21:40   0:00 /usr/bin/tini -- solr -f
solr         19 33.6  6.0 8584128 2000760 ?     Sl   21:40  14:05 /usr/local/openjdk-11/bin/java -server -Xmx2g -D.......

error: deployment "*******" exceeded its progress deadline

helm install ${RELEASE} lucidworks/fusion --timeout=240s --namespace "${NAMESPACE}" --values ./minikube_default_f5_fusion_values.yaml --version 5.0.3-4
SUCCEEDS

kubectl rollout status deployment/${RELEASE}-api-gateway --timeout=600s --namespace "${NAMESPACE}"
error: deployment "f5-api-gateway" exceeded its progress deadline
FAILED

kubectl version 1.16
minikube version 1.17.3
helm 3.0.1
Ubuntu 18.04

AKS: Not all services are starting

The README.md for running setup_f5_aks.sh states that: "The cluster will have three Standard_D4_v3 nodes which have 4 CPU cores and 16 GB of memory.", however it does not seem to be enough for all services to start.

kubectl get pods
NAME READY STATUS RESTARTS AGE
f5-admin-ui-6896d75965-g4mqp 1/1 Running 0 6d17h
f5-ambassador-57c8d798d-cjzwc 0/1 CrashLoopBackOff 1869 6d17h
f5-api-gateway-df9b65dc6-dfsd5 0/1 Init:CrashLoopBackOff 1361 6d17h
f5-argo-ui-7c87f99b64-vmx2s 1/1 Running 0 6d17h
f5-auth-ui-7765cd5488-h5wx8 1/1 Running 0 6d17h
f5-classic-rest-service-0 0/1 Init:0/3 1361 6d17h
f5-connectors-68f64c488f-m6mts 0/1 Pending 0 6d17h
f5-connectors-backend-69d877b594-k6tb5 0/1 Pending 0 6d17h
f5-devops-ui-86bc48f54-2c65h 1/1 Running 0 6d17h
f5-fusion-admin-75794787c9-pn294 0/1 Pending 0 6d17h
f5-fusion-indexing-8479f45ffc-bmqkj 0/1 Init:CrashLoopBackOff 1361 6d17h
f5-fusion-log-forwarder-9c768c45-tg4m9 0/1 Init:CrashLoopBackOff 1360 6d17h
f5-insights-5ff56c5d-95vcd 1/1 Running 0 6d17h
f5-job-launcher-6f7896dc-59g8m 0/1 CrashLoopBackOff 2220 6d17h
f5-job-rest-server-58994d99dd-6v64z 0/1 Init:CrashLoopBackOff 1361 6d17h
f5-ml-model-service-7448f97bf6-s9m6s 0/1 Init:CrashLoopBackOff 1357 6d17h
f5-monitoring-grafana-6647cddd56-m45cl 1/1 Running 0 6d17h
f5-monitoring-prometheus-kube-state-metrics-647cd65579-qc8kc 1/1 Running 0 6d17h
f5-monitoring-prometheus-pushgateway-5dd445ff4f-pccht 1/1 Running 0 6d17h
f5-monitoring-prometheus-server-0 2/2 Running 0 6d17h
f5-mysql-5666f7474f-xz7cs 1/1 Running 0 6d17h
f5-pm-ui-5d4cb9f8f6-xbsr8 1/1 Running 0 6d17h
f5-pulsar-bookkeeper-0 0/1 Init:CrashLoopBackOff 1360 6d17h
f5-pulsar-broker-0 0/1 Init:0/4 0 6d17h
f5-pulsar-broker-1 0/1 Init:0/4 0 6d17h
f5-query-pipeline-6c4ff48788-8rw6c 0/1 Pending 0 6d17h
f5-rules-ui-5fd49b5974-smq4k 1/1 Running 0 6d17h
f5-solr-0 0/1 Init:CrashLoopBackOff 1360 6d17h
f5-solr-exporter-778cfc8566-fqtg8 0/1 Init:0/1 0 6d17h
f5-templating-567f74c8c4-d8skj 0/1 Pending 0 6d17h
f5-tikaserver-6bbd4dd778-59hw8 1/1 Running 0 6d17h
f5-webapps-c5cb654cc-njjcs 0/1 Init:CrashLoopBackOff 1360 6d17h
f5-workflow-controller-7bc469557b-l2dml 1/1 Running 0 6d17h
f5-zookeeper-0 1/1 Running 0 6d17h
f5-zookeeper-1 0/1 Pending 0 6d17h
milvus-writable-64bc9f8b75-hdfsw 1/1 Running 0 6d17h
seldon-controller-manager-85cc4458dc-w9zmw 1/1 Running 2 6d17h

I believe most containers are in CrashLoopBackOff because they cannot verify connection to the zookeeper.

kubectl describe pod f5-api-gateway-df9b65dc6-dfsd5
Name: f5-api-gateway-df9b65dc6-dfsd5
Namespace: default
Priority: 0
Node: aks-agentpool-20404971-vmss000000/10.240.0.4
Start Time: Wed, 03 Nov 2021 01:26:45 +0000
Labels: app.kubernetes.io/component=api-gateway
app.kubernetes.io/instance=f5
app.kubernetes.io/part-of=fusion
pod-template-hash=df9b65dc6
Annotations: prometheus.io/path: /actuator/prometheus
prometheus.io/port: 6764
prometheus.io/scrape: true
Status: Pending
IP: 10.244.0.18
IPs:
IP: 10.244.0.18
Controlled By: ReplicaSet/f5-api-gateway-df9b65dc6
Init Containers:
check-zk:
Container ID: containerd://764fad878747462caeb8147f618c8613ef9e1be76d446a31a61c452f8630056e
Image: lucidworks/check-fusion-dependency:v1.2.0
Image ID: docker.io/lucidworks/check-fusion-dependency@sha256:9829ccb6a0bea76ac92851b51f8fd8451b7f803019adf27865f093d168a6b19e
Port:
Host Port:
Args:
zookeeper
State: Waiting
Reason: CrashLoopBackOff

Events for kubectl describe pod f5-zookeeper-1:

Events:
Type Reason Age From Message


Warning FailedScheduling 2m41s default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 91m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 79m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636478510}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 78m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636478510}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 78m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636478510}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 77m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 67m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479234}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 66m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479234}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 65m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479234}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 65m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 55m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479958}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 53m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636479958}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 53m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 52m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 42m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636480742}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 41m default-scheduler 0/3 nodes are available: 1 node(s) exceed max volume count, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 40m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {node.cloudprovider.kubernetes.io/shutdown: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 40m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 39m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 29m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636481532}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 28m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636481532}, that the pod didn't tolerate, 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 27m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636481532}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 27m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 17m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482255}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 15m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482255}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 15m default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482255}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 15m default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 5m2s default-scheduler 0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482979}, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 3m47s default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482979}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 3m36s default-scheduler 0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) had taint {ToBeDeletedByClusterAutoscaler: 1636482979}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity.
Normal NotTriggerScaleUp 2m42s (x29319 over 3d23h) cluster-autoscaler pod didn't trigger scale-up: 1 max node group size reached

Would you be able to let me know what resources are needed in order to have all services up?

Thanks,
Greg

argo-cluster-crdview installed twice?

kubectl 1.16.0
minikube 1.9.0
git pull 04/02/2020

manifest_sorter.go:175: info: skipping unknown hook: "crd-install"
manifest_sorter.go:175: info: skipping unknown hook: "crd-install"
manifest_sorter.go:175: info: skipping unknown hook: "crd-install"
manifest_sorter.go:175: info: skipping unknown hook: "crd-install"
manifest_sorter.go:175: info: skipping unknown hook: "crd-install"
manifest_sorter.go:175: info: skipping unknown hook: "crd-install"
manifest_sorter.go:175: info: skipping unknown hook: "crd-install"
manifest_sorter.go:175: info: skipping unknown hook: "crd-install"
manifest_sorter.go:175: info: skipping unknown hook: "crd-install"
manifest_sorter.go:175: info: skipping unknown hook: "crd-install"
manifest_sorter.go:175: info: skipping unknown hook: "crd-install"
manifest_sorter.go:175: info: skipping unknown hook: "crd-install"
manifest_sorter.go:175: info: skipping unknown hook: "crd-install"
manifest_sorter.go:175: info: skipping unknown hook: "crd-install"
Error: rendered manifests contain a resource that already exists. Unable to continue with install: existing resource conflict: kind: ClusterRole, namespace: , name: fusion01-argo-cluster-crdview

Waiting up to 10 minutes to see the Fusion API Gateway deployment come online ...



Default location of "useast2" isn't a proper Azure region

If you don't specify -z with a region name, the install will error out. Suggest this line be changed to a proper region (see available in the list below)

AZURE_LOCATION="useast2"

PS /home/ernie/lucidworks/fusion-cloud-native> ./setup_f5_aks.sh -c k8s-f5-test -p rg-f5-test --aks 1.17.7

Logged in as: [email protected]

WARNING: rg-f5-test not found! Creating new with default location useast2

The provided location 'useast2' is not available for resource group. 

List of available regions is 'centralus,eastasia,southeastasia,eastus,eastus2,westus,westus2,northcentralus,southcentralus,westcentralus,northeurope,westeurope,japaneast,japanwest,brazilsouth,australiasoutheast,australiaeast,westindia,southindia,centralindia,canadacentral,canadaeast,uksouth,ukwest,koreacentral,koreasouth,francecentral,southafricanorth,uaenorth,australiacentral,switzerlandnorth,germanywestcentral,norwayeast'.

ERROR: Unable to create resource group: rg-f5-test in azure location: useast2 check account permissions!

AKS: Creating new cluster fails

When running ./setup_f5_aks.sh -c fusion -p resource_group -z eastus with existing resource group but non existing AKS cluster, I get the following message:

Launching AKS cluster fusion in resource group resource_group in location eastus for deploying Lucidworks Fusion 5 ...

(ResourceNotFound) The Resource 'Microsoft.OperationalInsights/workspaces/DefaultWorkspace-a0850143-21c8-48ac-a534-c95f3e652bd8-WUS' under resource group 'DefaultResourceGroup-WUS' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix
Code: ResourceNotFound
Message: The Resource 'Microsoft.OperationalInsights/workspaces/DefaultWorkspace-a0850143-21c8-48ac-a534-c95f3e652bd8-WUS' under resource group 'DefaultResourceGroup-WUS' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix

ERROR: Create AKS cluster fusion failed! Look for previously reported errors or check the Azure portal before proceeding!

I believe that the problem is with:

az aks create ${PREVIEW_OPTS}
--enable-addons http_application_routing,monitoring
--resource-group ${AZURE_RESOURCE_GROUP}
--name ${CLUSTER_NAME}
--node-count ${NODE_COUNT}
--node-vm-size ${INSTANCE_TYPE}
--kubernetes-version ${AKS_MASTER_VERSION}
--generate-ssh-keys
cluster_created=$?

in setup_f5_aks.sh, specifically the monitoring addon is trying to deploy into DefaultResourceGroup-WUS which is for westus even though I want to deploy into eastus.

Would it be possible to have monitoring as an optional parameter or be able to specify into which resource group it should be installed?

Thanks,
Greg

gke_scale_namespace_up_or_down.sh doesn't shut down all the connector or pulsar-broker

I ran gke_scale_namespace_up_or_down.sh on my freshly created F5 instance to bring it down:

./gke_scale_namespace_up_or_down.sh down -c lw-sales-us-west1 -n carlos-wesco-poc -p lw-sales

and while that brought down most of the pods there were some pods that stayed up:

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
carlos-wesco-poc-argo-ui-85465d7cb7-6j7zt 1/1 Running 0 2d3h
carlos-wesco-poc-connector-plugin-service-box-56b77d9487-tpkb5 0/1 Running 1 40h
carlos-wesco-poc-connector-plugin-service-ldap-7fbf59bc97-p4pfx 0/1 Running 1 40h
carlos-wesco-poc-connector-plugin-service-sharepoint-85b5cc2hkl 0/1 Running 1 40h
carlos-wesco-poc-fusion-log-forwarder-fd6b9dc9-m5dlf 1/1 Running 0 40h
carlos-wesco-poc-pulsar-broker-0 0/1 CrashLoopBackOff 6 40h
carlos-wesco-poc-pulsar-broker-1 0/1 CrashLoopBackOff 6 40h
carlos-wesco-poc-templating-d664f7996-sdnzt 1/1 Running 0 69m

After conferring with Connor for a few minutes I added the following to line 171 of gke_scale_namespace_up_or_down.sh:

declare -a deployments=("admin-ui" "api-gateway" "auth-ui" "devops-ui" "fusion-admin" "fusion-indexing" "fusion-jupyter" "monitoring-grafana" "insights" "job-launcher" "job-rest-server" "ml-model-service" "pm-ui" "monitoring-prometheus-kube-state-metrics" "monitoring-prometheus-pushgateway" "query-pipeline" "rest-service" "rpc-service" "rules-ui" "solr-exporter" "webapps" "ambassador" "pulsar-broker" "workflow-controller" "ui" "sql-service-cm" "sql-service-cr" "argo-ui" "connector-plugin-service-box" "connector-plugin-service-ldap" "connector-plugin-service-sharepoint" "fusion-log-forwarder" "templating")

That shut down all but pulsar-broker-[01].

Not sure what is happening, but the script should probably be updated to reflect the connector specific pods.

Thanks!

Installation failed

1. Script generated

sh customize_fusion_values.sh -c kubernetes -n lw5 --provider kubernetes  --num-solr 1 --node-pool "{}"

2. Execute the generated script

./kubernetes_kubernetes_lw5_upgrade_fusion.sh

The results of

namespace/lw5 created

Created namespace lw5 with owner label 

Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "lucidworks" chart repository
...Successfully got an update from the "stable" chart repository
Update Complete. ⎈Happy Helming!⎈
Upgrading the 'lw5' release (Fusion chart: 5.2.0) in the 'lw5' namespace in the 'kubernetes' cluster using values:
      kubernetes_kubernetes_lw5_fusion_values.yaml

NOTE: If this will be a long-running cluster for production purposes, you should save the following file(s) in version control:
  kubernetes_kubernetes_lw5_fusion_values.yaml

Release "lw5" does not exist. Installing it now.
coalesce.go:199: warning: destination for client is a table. Ignoring non-table value 2181
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
NAME: lw5
LAST DEPLOYED: Sun Sep 13 17:53:04 2020
NAMESPACE: lw5
STATUS: deployed
REVISION: 1

Waiting up to 10 minutes to see the Fusion API Gateway deployment come online ...

Waiting for deployment spec update to be observed...
Waiting for deployment spec update to be observed...
Waiting for deployment "lw5-api-gateway" rollout to finish: 0 of 1 updated replicas are available...
error: timed out waiting for the condition

Waiting up to 5 minutes to see the Fusion Indexing deployment come online ...

Waiting for deployment "lw5-fusion-indexing" rollout to finish: 0 of 1 updated replicas are available...
error: deployment "lw5-fusion-indexing" exceeded its progress deadline
Context "kubernetes-admin@kubernetes" modified.

NAME	NAMESPACE	REVISION	UPDATED                                	STATUS  	CHART       	APP VERSION
lw5 	lw5      	1       	2020-09-13 17:53:04.223112255 +0800 CST	deployed	fusion-5.2.0	5.2.0  

logs -> kubectl get all -n lw5

NAME                                             READY   STATUS                  RESTARTS   AGE
pod/lw5-admin-ui-fc648b4bc-df8k6                 1/1     Running                 0          16m
pod/lw5-ambassador-5c56f99f85-wb98r              1/1     Running                 0          16m
pod/lw5-api-gateway-56cb494746-w9xwt             0/1     Init:CrashLoopBackOff   5          16m
pod/lw5-argo-ui-97688cdd5-ww6t5                  1/1     Running                 0          16m
pod/lw5-auth-ui-744bf58697-rqmnl                 1/1     Running                 0          16m
pod/lw5-classic-rest-service-0                   0/1     Pending                 0          16m
pod/lw5-devops-ui-84cf4bbb9f-d68gc               1/1     Running                 0          16m
pod/lw5-fusion-admin-7c66d87d99-dm75c            0/1     Init:CrashLoopBackOff   5          16m
pod/lw5-fusion-indexing-fd7f886b-hsjl9           0/1     Init:CrashLoopBackOff   5          16m
pod/lw5-fusion-log-forwarder-655f65c864-g2tmk    0/1     Init:CrashLoopBackOff   5          16m
pod/lw5-insights-6c4c6f6464-k96wl                1/1     Running                 0          16m
pod/lw5-job-launcher-7bfc6d9878-49mn6            0/1     Running                 7          16m
pod/lw5-job-rest-server-575685b498-cggtd         0/1     Init:CrashLoopBackOff   5          16m
pod/lw5-ml-model-service-858585f586-cr266        0/2     Init:CrashLoopBackOff   5          16m
pod/lw5-pm-ui-cf68fd6b6-w6s7g                    1/1     Running                 0          16m
pod/lw5-pulsar-bookkeeper-0                      0/1     Pending                 0          16m
pod/lw5-pulsar-broker-0                          0/1     Init:0/4                0          16m
pod/lw5-pulsar-broker-1                          0/1     Init:0/4                0          16m
pod/lw5-query-pipeline-5c44887974-2l5cw          0/1     Init:CrashLoopBackOff   5          16m
pod/lw5-rest-service-6f87f5f488-sf58k            0/1     Init:CrashLoopBackOff   5          16m
pod/lw5-rpc-service-59f4c7c5cb-xx7d4             0/1     Init:CrashLoopBackOff   5          16m
pod/lw5-rules-ui-7d6cc45486-jd6dw                1/1     Running                 0          16m
pod/lw5-solr-0                                   0/1     Pending                 0          16m
pod/lw5-solr-exporter-74677cf947-t46nx           0/1     Init:0/1                0          16m
pod/lw5-templating-57c96d65fc-whq8c              0/1     Init:CrashLoopBackOff   5          16m
pod/lw5-webapps-55b64587f8-gbspb                 0/1     Init:CrashLoopBackOff   5          16m
pod/lw5-workflow-controller-5b877d7c67-sx4km     1/1     Running                 0          16m
pod/lw5-zookeeper-0                              0/1     Pending                 0          16m
pod/seldon-controller-manager-7b855d7f5c-zs4cr   1/1     Running                 0          16m

NAME                               TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                               AGE
service/admin                      ClusterIP      10.109.175.238   <none>        8765/TCP                              16m
service/admin-ui                   ClusterIP      10.96.62.2       <none>        8080/TCP                              16m
service/auth-ui                    ClusterIP      10.103.125.149   <none>        8080/TCP                              16m
service/connector-plugin-service   ClusterIP      10.107.48.157    <none>        9020/TCP                              16m
service/connectors                 ClusterIP      10.97.196.223    <none>        9010/TCP                              16m
service/connectors-classic         ClusterIP      None             <none>        9000/TCP                              16m
service/connectors-rpc             ClusterIP      10.101.196.44    <none>        8771/TCP                              16m
service/devops-ui                  ClusterIP      10.107.250.159   <none>        8080/TCP                              16m
service/indexing                   ClusterIP      10.104.226.98    <none>        8765/TCP                              16m
service/insights                   ClusterIP      10.96.173.133    <none>        8080/TCP                              16m
service/job-launcher               ClusterIP      10.111.187.156   <none>        8083/TCP                              16m
service/job-rest-server            ClusterIP      10.97.153.182    <none>        8081/TCP                              16m
service/lw5-ambassador             ClusterIP      10.102.138.235   <none>        80/TCP,443/TCP                        16m
service/lw5-argo-ui                ClusterIP      10.96.18.81      <none>        2746/TCP                              16m
service/lw5-pulsar-bookkeeper      ClusterIP      None             <none>        3181/TCP,8000/TCP                     16m
service/lw5-pulsar-broker          ClusterIP      None             <none>        8080/TCP,6650/TCP                     16m
service/lw5-solr-exporter          ClusterIP      10.99.38.190     <none>        9983/TCP                              16m
service/lw5-solr-headless          ClusterIP      None             <none>        8983/TCP                              16m
service/lw5-solr-svc               ClusterIP      10.96.213.18     <none>        8983/TCP                              16m
service/lw5-zookeeper              ClusterIP      10.101.97.38     <none>        2181/TCP,2281/TCP                     16m
service/lw5-zookeeper-headless     ClusterIP      None             <none>        2181/TCP,3888/TCP,2888/TCP,2281/TCP   16m
service/ml-model-grpc              ClusterIP      10.101.108.200   <none>        6565/TCP                              16m
service/ml-model-service           ClusterIP      10.108.67.146    <none>        8086/TCP                              16m
service/pm-ui                      ClusterIP      10.100.61.168    <none>        8080/TCP                              16m
service/proxy                      LoadBalancer   10.108.80.194    <pending>     6764:30949/TCP                        16m
service/pulsar-broker              ClusterIP      None             <none>        8080/TCP,6650/TCP                     16m
service/query                      ClusterIP      10.110.49.50     <none>        8787/TCP                              16m
service/rules-ui                   ClusterIP      10.110.122.11    <none>        8080/TCP                              16m
service/seldon-webhook-service     ClusterIP      10.107.47.242    <none>        443/TCP                               16m
service/templating                 ClusterIP      10.108.50.41     <none>        5250/TCP                              16m
service/webapps                    ClusterIP      10.100.239.137   <none>        8780/TCP                              16m

NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/lw5-admin-ui                   1/1     1            1           16m
deployment.apps/lw5-ambassador                 1/1     1            1           16m
deployment.apps/lw5-api-gateway                0/1     1            0           16m
deployment.apps/lw5-argo-ui                    1/1     1            1           16m
deployment.apps/lw5-auth-ui                    1/1     1            1           16m
deployment.apps/lw5-connector-plugin-service   0/0     0            0           16m
deployment.apps/lw5-devops-ui                  1/1     1            1           16m
deployment.apps/lw5-fusion-admin               0/1     1            0           16m
deployment.apps/lw5-fusion-indexing            0/1     1            0           16m
deployment.apps/lw5-fusion-log-forwarder       0/1     1            0           16m
deployment.apps/lw5-insights                   1/1     1            1           16m
deployment.apps/lw5-job-launcher               0/1     1            0           16m
deployment.apps/lw5-job-rest-server            0/1     1            0           16m
deployment.apps/lw5-ml-model-service           0/1     1            0           16m
deployment.apps/lw5-pm-ui                      1/1     1            1           16m
deployment.apps/lw5-query-pipeline             0/1     1            0           16m
deployment.apps/lw5-rest-service               0/1     1            0           16m
deployment.apps/lw5-rpc-service                0/1     1            0           16m
deployment.apps/lw5-rules-ui                   1/1     1            1           16m
deployment.apps/lw5-solr-exporter              0/1     1            0           16m
deployment.apps/lw5-templating                 0/1     1            0           16m
deployment.apps/lw5-webapps                    0/1     1            0           16m
deployment.apps/lw5-workflow-controller        1/1     1            1           16m
deployment.apps/seldon-controller-manager      1/1     1            1           16m

NAME                                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/lw5-admin-ui-fc648b4bc                    1         1         1       16m
replicaset.apps/lw5-ambassador-5c56f99f85                 1         1         1       16m
replicaset.apps/lw5-api-gateway-56cb494746                1         1         0       16m
replicaset.apps/lw5-argo-ui-97688cdd5                     1         1         1       16m
replicaset.apps/lw5-auth-ui-744bf58697                    1         1         1       16m
replicaset.apps/lw5-connector-plugin-service-6787cc8d46   0         0         0       16m
replicaset.apps/lw5-devops-ui-84cf4bbb9f                  1         1         1       16m
replicaset.apps/lw5-fusion-admin-7c66d87d99               1         1         0       16m
replicaset.apps/lw5-fusion-indexing-fd7f886b              1         1         0       16m
replicaset.apps/lw5-fusion-log-forwarder-655f65c864       1         1         0       16m
replicaset.apps/lw5-insights-6c4c6f6464                   1         1         1       16m
replicaset.apps/lw5-job-launcher-7bfc6d9878               1         1         0       16m
replicaset.apps/lw5-job-rest-server-575685b498            1         1         0       16m
replicaset.apps/lw5-ml-model-service-858585f586           1         1         0       16m
replicaset.apps/lw5-pm-ui-cf68fd6b6                       1         1         1       16m
replicaset.apps/lw5-query-pipeline-5c44887974             1         1         0       16m
replicaset.apps/lw5-rest-service-6f87f5f488               1         1         0       16m
replicaset.apps/lw5-rpc-service-59f4c7c5cb                1         1         0       16m
replicaset.apps/lw5-rules-ui-7d6cc45486                   1         1         1       16m
replicaset.apps/lw5-solr-exporter-74677cf947              1         1         0       16m
replicaset.apps/lw5-templating-57c96d65fc                 1         1         0       16m
replicaset.apps/lw5-webapps-55b64587f8                    1         1         0       16m
replicaset.apps/lw5-workflow-controller-5b877d7c67        1         1         1       16m
replicaset.apps/seldon-controller-manager-7b855d7f5c      1         1         1       16m

NAME                                        READY   AGE
statefulset.apps/lw5-classic-rest-service   0/1     16m
statefulset.apps/lw5-pulsar-bookkeeper      0/3     16m
statefulset.apps/lw5-pulsar-broker          0/2     16m
statefulset.apps/lw5-solr                   0/1     16m
statefulset.apps/lw5-zookeeper              0/3     16m

NAME                                           SCHEDULE    SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cronjob.batch/lw5-job-launcher                 0 * * * *   False     0        10m             16m
cronjob.batch/lw5-job-launcher-spark-cleanup   0 * * * *   False     0        10m             16m

Wrong POD log

[root@p45460v chenpeng6]# kubectl get pods -n lw5 | grep CrashLoopBackOff |awk '{print "kubectl logs "  $1 " -n lw5"}'|sh -x
+ kubectl logs lw5-api-gateway-56cb494746-w9xwt -n lw5
Error from server (BadRequest): container "api-gateway" in pod "lw5-api-gateway-56cb494746-w9xwt" is waiting to start: PodInitializing
+ kubectl logs lw5-fusion-admin-7c66d87d99-dm75c -n lw5
Error from server (BadRequest): container "admin" in pod "lw5-fusion-admin-7c66d87d99-dm75c" is waiting to start: PodInitializing
+ kubectl logs lw5-fusion-indexing-fd7f886b-hsjl9 -n lw5
Error from server (BadRequest): container "fusion-indexing" in pod "lw5-fusion-indexing-fd7f886b-hsjl9" is waiting to start: PodInitializing
+ kubectl logs lw5-fusion-log-forwarder-655f65c864-g2tmk -n lw5
Error from server (BadRequest): container "fusion-log-forwarder" in pod "lw5-fusion-log-forwarder-655f65c864-g2tmk" is waiting to start: PodInitializing
+ kubectl logs lw5-job-launcher-7bfc6d9878-49mn6 -n lw5
Picked up JAVA_TOOL_OPTIONS: -XX:+ExitOnOutOfMemoryError -Dlogging.config=classpath:logback-kube.xml
Failed to connect to Pulsar topic persistent://lw5/_logs/system_logs at : pulsar://lw5-pulsar-broker:6650 due to: org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.ExecutionException: org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.CompletionException: java.net.UnknownHostException: failed to resolve 'lw5-pulsar-broker' after 2 queries ; will re-try after brief wait ...
+ kubectl logs lw5-job-rest-server-575685b498-cggtd -n lw5
Error from server (BadRequest): container "job-rest-server" in pod "lw5-job-rest-server-575685b498-cggtd" is waiting to start: PodInitializing
+ kubectl logs lw5-ml-model-service-858585f586-cr266 -n lw5
error: a container name must be specified for pod lw5-ml-model-service-858585f586-cr266, choose one of: [java-service python-service] or one of the init containers: [check-admin]
+ kubectl logs lw5-query-pipeline-5c44887974-2l5cw -n lw5
Error from server (BadRequest): container "query-pipeline" in pod "lw5-query-pipeline-5c44887974-2l5cw" is waiting to start: PodInitializing
+ kubectl logs lw5-rest-service-6f87f5f488-sf58k -n lw5
Error from server (BadRequest): container "rest-service" in pod "lw5-rest-service-6f87f5f488-sf58k" is waiting to start: PodInitializing
+ kubectl logs lw5-rpc-service-59f4c7c5cb-xx7d4 -n lw5
Error from server (BadRequest): container "rpc-service" in pod "lw5-rpc-service-59f4c7c5cb-xx7d4" is waiting to start: PodInitializing
+ kubectl logs lw5-templating-57c96d65fc-whq8c -n lw5
Error from server (BadRequest): container "templating" in pod "lw5-templating-57c96d65fc-whq8c" is waiting to start: PodInitializing
+ kubectl logs lw5-webapps-55b64587f8-gbspb -n lw5
Error from server (BadRequest): container "webapps" in pod "lw5-webapps-55b64587f8-gbspb" is waiting to start: PodInitializing

Generate the file

file.zip

Helm Fails To install Fusion Platform On Minikube

Failed to install the fusion platform on minikube

fusion-platform: 5.4.0
minikube: v1.23.0
k8s: 1.22.1

Command used to install fusion platform
./minikube_fusion-platform_fusion_upgrade_fusion.sh

Error
Error: failed to install CRD crds/seldon-core-operator-crd.yaml: unable to recognize "": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1beta1

Please note that in the version of k8s that minikube uses by default does not support this API version
customresourcedefinitions crd,crds apiextensions.k8s.io/v1 false CustomResourceDefinition

Thank You

Issue with running Vanilla install procedure.

ERROR: Unrecognized or misplaced argument: --provider!

Use this script to install Fusion 5 on GKE; optionally create a GKE cluster in the process

Usage: -bash [OPTIONS] ... where OPTIONS include:

-c Name of the GKE cluster (required)

-p GCP Project ID (required)

-r Helm release name for installing Fusion 5, defaults to 'f5'

-n Kubernetes namespace to install Fusion 5 into, defaults to 'default'

-z GCP Zone to launch the cluster in, defaults to 'us-west1'

-i Instance type, defaults to 'e2-standard-8'

-t Enable TLS for the ingress, requires a hostname to be specified with -h

-h Hostname for the ingress to route requests to this Fusion cluster. If used with the -t parameter,
then the hostname must be a public DNS record that can be updated to point to the IP of the LoadBalancer

--prometheus Enable Prometheus and Grafana for monitoring Fusion services, pass one of: install, provided, none;
defaults to 'install' which installs Prometheus and Grafana from the stable Helm repo,
'provided' enables pod annotations on Fusion services to work with Prometheus but does not install anything

--gke GKE Master version; defaults to '-' which uses the default version for the selected region / zone (differs between zones)

--version Fusion Helm Chart version; defaults to the latest release from Lucidworks, such as 5.0.3-2

--values Custom values file containing config overrides; defaults to gke___fusion_values.yaml
(can be specified multiple times to add additional yaml files, see example-values/*.yaml)

--num-solr Number of Solr pods to deploy, defaults to 1

--node-pool Node pool label to assign pods to specific nodes, this option is only useful for existing clusters where you defined a custom node pool;
defaults to 'cloud.google.com/gke-nodepool: default-pool', wrap the arg in double-quotes

--create Create a cluster in GKE; provide the mode of the cluster to create, one of: demo, multi_az

--upgrade Perform a Helm upgrade on an existing Fusion installation

--dry-run Perform a dry-run of the upgrade to see what would change

--purge Uninstall and purge all Fusion objects from the specified namespace and cluster.
Be careful! This operation cannot be undone.

--force Force upgrade or purge a deployment if your account is not the value 'owner' label on the namespace

Saving session...
...copying shared history...
...saving history...truncating history files...
...completed.

[Process completed]

EKS installation with ALB, hostname and multi AZ configured, target group healthcheck fails.

Hello everyone,
When deploying a Fusion 5 EKS setup with an ALB, hostname, multi AZ and no monitoring configured on top of default options as follows:
./setup_f5_eks.sh -c sandbox-f5 -p eks-sandbox -z us-west-2 -i m5.2xlarge --deploy-alb -h sandbox-f5.example.com --prometheus none --num-solr 1 --solr-disk-gb 50 --create multi_az
the resulting target group you are left with fails healthchecks which blocks access to the cluster.

The reason behind this seems to be because the target group is deployed with alb.ingress.kubernetes.io/healthcheck-path not set which then uses it's default value "/".

By adding alb.ingress.kubernetes.io/healthcheck-path: "/auth/" after line 450 of setup_f5_eks.sh this seems to fix the issue so that the only thing required after applying the command above (with your personal values for options) is to take care of DNS mapping.

Hope this help others, maybe a PR needs to be made for this issue to fix it for everyone using setup_f5_eks.sh to properly deploy Fusion 5 to EKS with an ALB (--deploy-alb) and a hostname (e.g -h sandbox-f5.example.com) ?

Support more recent Kubernetes version (> 1.21)

Many policies etc are deprecated in 1.21.

I am asking for support for 1.23+ in Kubernetes.

Examples (not exhaustive)
batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget

Generalize Fusion 5 Survival Guide [Feedback]

NodePool naming: it seems that concept is used by GCP to define a set of node having the same definitions (for ressources) I've tried to check that thing in kubernetes before figuring out it wasn't a kube thing. while I understand the documentation was written with gcp in my I think it would benefit to avoid or anotate the references made to specific cloud providers to distinguish the kubernetes knowledge and some contextualized topic for GCP, AWS and other clouds

Feature Request: Add FileBeat Sidecar Integration

Hi there :)

Love this chart as it works quite well for us!

I was wondering if there would be interest in adding filebeat integration like jFrog does for their stuff.

If there was interest I could work on a P.R. for this.

https://github.com/jfrog/charts/blob/c3da4288d4ac24d4de5c5b948d2b59662b29b19f/stable/artifactory/values.yaml#L1402

  filebeatYml: |
    logging.level: info
    path.data: {{ .Values.fusion.persistence.mountPath }}/log/filebeat
    name: fusion-filebeat
    queue.spool: ~
    filebeat.inputs:
    - type: log
      enabled: true
      close_eof: ${CLOSE:false}
      paths:
         - {{ .Values.fusion.persistence.mountPath }}/log/*.log
      fields:
        service: "fusion"
        log_type: "fusion"
    output:
      logstash:
         hosts: ["{{ .Values.filebeat.logstashUrl }}"]

Customize script on EKS cluster

Hello guys, i'm trying to test out the customize script on a EKS cluster already created.

my setup:

eks k8s version 1.18
helm version.BuildInfo{Version:"v3.4.0"}

and i'm getting this

Release "andrews" does not exist. Installing it now.
coalesce.go:199: warning: destination for client is a table. Ignoring non-table value 2181
Error: failed pre-install: warning: Hook pre-install fusion/charts/api-gateway/templates/createjksjob.yaml failed: roles.rbac.authorization.k8s.io "andrews-api-gateway-jks-create" already exists

Waiting up to 10 minutes to see the Fusion API Gateway deployment come online ...

Error from server (NotFound): deployments.apps "andrews-api-gateway" not found

Waiting up to 5 minutes to see the Fusion Indexing deployment come online ...

Error from server (NotFound): deployments.apps "andrews-fusion-indexing" not found

NAME   	NAMESPACE	REVISION	UPDATED                               	STATUS	CHART       	APP VERSION
andrews	andrews  	1       	2021-01-08 08:01:33.41895756 -0800 PST	failed	fusion-5.3.0	5.3.0

i generated the bash file using this

./customize_fusion_values.sh -c dev-eks-op5qkok3 -n andrews --provider eks --prometheus false --node-pool "{}"

since the helm deployment failed i get no resources to be created on the namespace except for a serviceaccount (andrews-api-gateway-jks-create)

any ideas?

thanks in advance

pulsar-bookkeeper is not exporting metrics

What is the issue?

  • the 'customize_fusion_values.yaml' example is showing that for 'pulsar-bookkeeper' we have the default endpoint '/metrics' on port '8000' for monitoring, but the 'pulsar-bookkeeper' service is not exposing that port, it's only exposing 'client:3181' port.

What is the problem?

  • on Prometheus targets i'm not seeing the '/metrics' endpoint 'UP' and the metrics are not being got by the scraper.

What is expected?

  • have the port '8000' exposed on 'pulsar-bookkeeper' service for '/metrics' endpoint and Prometheus targets reaching the metrics responses and labels.

--

p.s.: for all other five endpoints (api-gateway, indexing, pulsar-broker, query-pipeline and solr), the metrics are being scrapped fine.

skipping unknown hook: "crd-install"

Helm 3.0.1
Manifest is a crd-install hook. This hook is no longer
supported in v3 and all CRDs should also exist the crds/ directory at the top level
of the chart


manifest_sorter.go:175: info: skipping unknown hook: "crd-install"

AKS: Instalation failed - deprecated API warnings

Hi,

I'm trying to install the fusion on the existing AKS and there are few issues:

  1. deprecated API warnings (not a problem yet):
W1228 15:13:32.926579    4190 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W1228 15:13:33.047102    4190 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W1228 15:13:33.156323    4190 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W1228 15:13:33.272742    4190 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W1228 15:13:33.410749    4190 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W1228 15:13:33.521012    4190 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W1228 15:13:33.675929    4190 warnings.go:67] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
  1. once installed several pods didn't start and in the logs is the same message:
Failed to query service connection API: 'https://akstestwe03-dns-4e5bb286.hcp.westeurope.azmk8s.io/api/v1/namespaces/fusion/pods/fusion-fusion-api-gateway-5676d987d5-fmpgp/log?tailLines=200&container=api-gateway'. Status Code: 'BadRequest', Response from server: '{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"container \"api-gateway\" in pod \"fusion-fusion-api-gateway-5676d987d5-fmpgp\" is waiting to start: PodInitializing","reason":"BadRequest","code":400}'
NAME                                                  READY   STATUS                  RESTARTS   AGE
fusion-fusion-admin-ui-5b58bdd89c-lfrtx               1/1     Running                 0          16h
fusion-fusion-ambassador-8565797fb-rg45f              1/1     Running                 0          16h
fusion-fusion-api-gateway-5676d987d5-fmpgp            0/1     Init:1/2                140        16h
fusion-fusion-argo-ui-d5c96bd94-pgp9z                 1/1     Running                 0          16h
fusion-fusion-auth-ui-6c8459548-d2pvz                 1/1     Running                 0          16h
fusion-fusion-classic-rest-service-0                  0/1     Pending                 0          16h
fusion-fusion-devops-ui-84cf877b68-2ttcp              1/1     Running                 0          16h
fusion-fusion-fusion-admin-5cb4877887-m7kn7           0/1     Evicted                 0          16h
fusion-fusion-fusion-admin-5cb4877887-tmglm           0/1     Init:CrashLoopBackOff   27         169m
fusion-fusion-fusion-indexing-668bfcc5f9-t6ssn        0/1     Init:1/3                140        16h
fusion-fusion-fusion-log-forwarder-64946cd948-stc56   0/1     Init:CrashLoopBackOff   139        16h
fusion-fusion-insights-5f8c7b664d-n6cd2               1/1     Running                 0          16h
fusion-fusion-job-launcher-6698cfd87b-xbfkt           0/1     CrashLoopBackOff        225        16h
fusion-fusion-job-rest-server-6f56cd7b47-52pbx        0/1     Evicted                 0          16h
fusion-fusion-job-rest-server-6f56cd7b47-6cqdm        0/1     CrashLoopBackOff        45         162m
fusion-fusion-ml-model-service-55996f759c-5lgq4       0/1     Init:Error              27         169m
fusion-fusion-ml-model-service-55996f759c-6npc8       0/1     Evicted                 0          16h
fusion-fusion-ml-model-service-55996f759c-tvrnb       0/1     Evicted                 0          169m
fusion-fusion-mysql-6d745bbccf-mdmxs                  1/1     Running                 0          16h
fusion-fusion-pm-ui-777b9cdf7-dfvrr                   1/1     Running                 0          16h
fusion-fusion-pulsar-bookkeeper-0                     0/1     Pending                 0          16h
fusion-fusion-pulsar-broker-0                         0/1     Pending                 0          16h
fusion-fusion-pulsar-broker-1                         0/1     Pending                 0          16h
fusion-fusion-query-pipeline-d6f8bd7b4-n68zp          0/1     Init:1/2                140        16h
fusion-fusion-rest-service-577fbb7495-ll65t           0/1     Init:1/2                140        16h
fusion-fusion-rpc-service-7d755d5c9d-74tn7            0/1     Init:1/3                140        16h
fusion-fusion-rules-ui-844ff6c67-rbfzt                1/1     Running                 0          16h
fusion-fusion-solr-0                                  1/1     Running                 0          16h
fusion-fusion-solr-exporter-7bbcd89bc-89q9w           1/1     Running                 0          16h
fusion-fusion-templating-6fcd7797f7-dmh7j             0/1     Init:CrashLoopBackOff   140        16h
fusion-fusion-webapps-f7b9f8bb8-26tk8                 0/1     Init:1/2                140        16h
fusion-fusion-workflow-controller-59777cb76-pbg95     1/1     Running                 0          16h
fusion-fusion-zookeeper-0                             1/1     Running                 0          16h
fusion-fusion-zookeeper-1                             1/1     Running                 0          16h
fusion-fusion-zookeeper-2                             1/1     Running                 0          16h
milvus-writable-5b5554c554-62rmf                      0/1     Pending                 0          16h
seldon-controller-manager-86f68fbcd-rgplb             1/1     Running                 0          16h

Any ideas or data you need to solve this issue?

Thx a lot,
Roman

Question about "classic-rest-service"

In the Survival Guide I see this information for "classic-rest-service":

Microservice Protocol Deployment or StatefulSet Node Pool Assignment Autoscaling Supported Description
classic-rest-service REST/HTTP StatefulSet analytics or system Yes (CPU or custom metric) REST service for supporting non-RPC connector plugins.

But in the default replicas.yaml properties this service doesn't have autoscaling settings.
Plus, if manually scale this StatefulSet and check get top pod metrics you can see that always work only one Pod and the system doesn't does not distribute loads between few pods, so I have to give this Service a lot of memory and CPU resources.

Could someone describe me why it happens and where I can be wrong?
Thanks

Region is not set correctly when demo cluster is choosen

Hi

If I try to create a demo cluster It always fails because the following line can not extract the zone correctly.
The variable "GCLOUD_ZONE" is empty.

GCLOUD_ZONE=$(gcloud compute zones list --filter=region:${GCLOUD_REGION} | grep -m1 "${GCLOUD_REGION}-[a-z]" | cut -d' ' -f 1 | tail -1)

Especially the last part fails: cut -d' ' -f 1 | tail -1

Can you please check this?

Greetings
Umut Saribiyik

typo confirmation for solr auto-scaling policy

hi,

while reading the documentation of this repository, i want to validate the property to set the solr auto-scaling policy, on fusion-admin's definitions.

Q: is it with a typo?

documentation link

--

could you please confirm if should i use solrAutocalingPolicyJson or solrAutoscalingPolicyJson (or any other property)?

thank you in advance.

.Error from server (NotFound): ingresses.extensions "f5-api-gateway" not found

When ingress is supplied to setup_f5_k8s.sh, the install script never satisfies the gateway found.
Using Minikube 1.9.0, kubectl 1.16.0.
git pull fusion-cloud-native on 04/01/2020

Getting:
 ".Error from server (NotFound): ingresses.extensions "f5-api-gateway" not found"
when:
f5-api-gateway-58c7fb78f7-fvsqp              1/1     Running           0          35m
with:
./setup_f5_k8s.sh -c minikube -r f5 -n fusion01 --provider k8s --ingress fusion01_ingress \
 --num-solr 1 --solr-disk-gb 10 --force --prometheus none

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.