Giter Site home page Giter Site logo

helm-charts's Introduction

New Relic Open Source community plus project banner.

New Relic's Helm charts repository

This is the official Helm charts repository for New Relic. It is indexed at [Helm Hub][helm-hub], where you can find the list of available charts and their documentation.

Prerequisites

Install

You can have all the information about the installation in the New Relic Documentation page for installaing the Kubernetes integration using Helm

Just as a glance of the process of installation and configuration the process involves to create a values.yaml that will look like this:

global:
  licenseKey: YOUR_LICENSE_KEY
  cluster: YOUR_CLUSTER_NAME
nri-kube-events:
  enabled: true
nri-metadata-injection:
  enabled: true
nri-prometheus:
  enabled: true
newrelic-logging:
  enabled: true
kube-state-metrics:
  enabled: true

Add the official repository:

helm repo add newrelic https://helm-charts.newrelic.com

Then, run helm upgrade:

helm upgrade --install newrelic-bundle newrelic/nri-bundle -f your-custom-values.yaml

You can find a list of all the global values in the nri-bundle's README. There you can find also links to the values of all the subcharts.

Examples

The following example installs the nri-bundle chart, which groups multiple New Relic charts into one. nri-bundle contains:

Upgrading to the new version of KSM

You can find additional information on how to upgrade to the new version of KSM in the New Relic Documentation page for Kubernetes Compatibility and Requirements

Development

You can use the Helm CLI to develop a chart and add it to this repository.

  1. Clone this repository on your local machine.
  2. Add or modify the files for the desired chart.
  3. To install the chart locally, run helm install dev-chart charts/<YOUR_CHART>
  4. Verify that the chart works as expected.
  5. Remove the installed chart with helm uninstall dev-chart.
  6. Create your pull request and follow the instructions below.

Feel free to add different values to the chart.

Automated version bumps

This repository uses Renovate to automatically bump dependencies. It currently supports updating dependencies on nri-bundle whenever an individual chart gets released.

Unfortunately, renovate does not support yet updating appVersion, nor image.tag entries in values.yaml.

Testing

See chart testing

Contributing

See our Contributing docs and our review guidelines

A note about vulnerabilities

As noted in our security policy, New Relic is committed to the privacy and security of our customers and their data. We believe that providing coordinated disclosure by security researchers and engaging with the security community are important means to achieve our security goals.

If you believe you have found a security vulnerability in this project or any of New Relic's products or websites, we welcome and greatly appreciate you reporting it to New Relic through HackerOne.

If you would like to contribute to this project, review these guidelines.

To all contributors, we thank you! Without your contribution, this project would not be what it is today.

Support

Should you need assistance with New Relic products, you are in good hands with several support diagnostic tools and support channels.

If the issue has been confirmed as a bug or is a feature request, file a GitHub issue.

Support Channels

Issues / Enhancement Requests

Issues and enhancement requests can be submitted in the Issues tab of this repository. Please search for and review the existing open issues before submitting a new issue.

Troubleshoot

Getting "Couldn't load repositories file" (Helm 2)

You need to initialize Helm with:

helm init

License

The project is released under version 2.0 of the Apache license.

helm-charts's People

Contributors

aimichelle avatar alejandrodnm avatar alvarocabanas avatar ardias avatar arvdias avatar bpschmitt avatar danybmx avatar davidgit avatar dependabot[bot] avatar fryckbos avatar github-actions[bot] avatar gsanchezgavier avatar jorik avatar jsubirat avatar juanjjaramillo avatar kang-makes avatar kaschaefer-nr avatar kondracek-nr avatar lromer22 avatar luckslovez avatar marcsanmi avatar marcusnoble avatar marcuspereznr avatar mderemer-nr avatar paologallinaharbur avatar renovate[bot] avatar roobre avatar sadafarshad avatar sigilioso avatar vkasanneni-nr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

helm-charts's Issues

[php-daemon]: There is no php-daemon Helm Chart

As a developer, I'd like to be able to deloy the New Relic PHP-Daemon as a chart with vendor-specific infrastructure configuration handled by New Relic.

Is your feature request related to a problem? Please describe.

I don't really want to maintain my own custom deployment of the PHP-Daemon.

Describe the solution you'd like

I'd much prefer a simple chart install.

Describe alternatives you've considered

Obviously, running my own Helm Chart, or creating my own deployment, but I'd rather not.

[newrelic-logging] Update parsers with fluentbits parsers.conf

Is your feature request related to a problem? Please describe.

The fluentbit parsers.conf includes additional parsers for apache and nginx. I'd like to either update the chart with these, or add the ability to specify custom parsers which would be covered by #75 I believe.

Describe the solution you'd like

The parsers.conf to be updated, or the ability to specify custom parsers

Describe alternatives you've considered

  • Patching the configmap, however we use Infrastructure as Code via ArgoCD which makes this task much harder.
  • Using a forked branch. We're doing this now but this means our deployment will likely fall out of date. Would much prefer an upstream solution.

Additional context

I can make a PR for this change. I just need to know whether we'd like to include more parsers by default or omit the parsers and have users specify their own custom ones instead.

[nri-metadata-injection] UPGRADE FAILED: cannot patch "newrelic-bundle-nri-metadata-injection-job" with kind Job

Bug description

This chart when it updates the job version number attempts to update the job which kubernetes doesn't allow a Job's spec to be overwritten so we get this error:

UPGRADE FAILED: cannot patch "newrelic-bundle-nri-metadata-injection-job" with kind Job: Job.batch "newrelic-bundle-nri-metadata-injection-job" is invalid: spec.template: Invalid value: core.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"app.kubernetes.io/instance":"newrelic-bundle", "app.kubernetes.io/managed-by":"Helm", "app.kubernetes.io/name":"nri-metadata-injection", "app.kubernetes.io/version":"1.3.1", "controller-uid":"12eb587d-eb66-4eba-a576-554bf3116b30", "helm.sh/chart":"nri-metadata-injection-1.3.1", "job-name":"newrelic-bundle-nri-metadata-injection-job"}, Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:core.PodSpec{Volumes:[]core.Volume(nil), InitContainers:[]core.Container(nil), Containers:[]core.Container{core.Container{Name:"nri-metadata-injection-job", Image:"newrelic/k8s-webhook-cert-manager:1.3.1", Command:[]string{"./generate_certificate.sh"}, Args:[]string{"--service", "newrelic-bundle-nri-metadata-injection", "--webhook", "newrelic-bundle-nri-metadata-injection", "--secret", "newrelic-bundle-nri-metadata-injection", "--namespace", "kube-system"}, WorkingDir:"", Ports:[]core.ContainerPort(nil), EnvFrom:[]core.EnvFromSource(nil), Env:[]core.EnvVar(nil), Resources:core.ResourceRequirements{Limits:core.ResourceList(nil), Requests:core.ResourceList(nil)}, VolumeMounts:[]core.VolumeMount(nil), VolumeDevices:[]core.VolumeDevice(nil), LivenessProbe:(*core.Probe)(nil), ReadinessProbe:(*core.Probe)(nil), StartupProbe:(*core.Probe)(nil), Lifecycle:(*core.Lifecycle)(nil), TerminationMessagePath:"/dev/termination-log", TerminationMessagePolicy:"File", ImagePullPolicy:"IfNotPresent", SecurityContext:(*core.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, EphemeralContainers:[]core.EphemeralContainer(nil), RestartPolicy:"Never", TerminationGracePeriodSeconds:(*int64)(0xc013cca9b0), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"ClusterFirst", NodeSelector:map[string]string(nil), ServiceAccountName:"newrelic-bundle-nri-metadata-injection", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", SecurityContext:(*core.PodSecurityContext)(0xc0121c4c80), ImagePullSecrets:[]core.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*core.Affinity)(nil), SchedulerName:"default-scheduler", Tolerations:[]core.Toleration(nil), HostAliases:[]core.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), PreemptionPolicy:(*core.PreemptionPolicy)(nil), DNSConfig:(*core.PodDNSConfig)(nil), ReadinessGates:[]core.PodReadinessGate(nil), RuntimeClassName:(*string)(nil), Overhead:core.ResourceList(nil), EnableServiceLinks:(*bool)(nil), TopologySpreadConstraints:[]core.TopologySpreadConstraint(nil)}}: field is immutable

Version of Helm and Kubernetes

helm 3.4.2
kubernetes 1.18 (eks)

Which chart?

nri-metadata-injection via nri-bundle
2.1.1

What happened?

Ran helm upgrade (via helmfile):

PATH:
Z:\Build\Work\7ec7c37239a99c76\kubernetes\tools\helm\3.4.2\helm.exe

ARGS:
  0: helm (4 bytes)
  1: upgrade (7 bytes)
  2: --install (9 bytes)
  3: --reset-values (14 bytes)
  4: newrelic-bundle (15 bytes)
  5: newrelic/nri-bundle (19 bytes)
  6: --wait (6 bytes)
  7: --create-namespace (18 bytes)
  8: --namespace (11 bytes)
  9: kube-system (11 bytes)
  10: --values (8 bytes)
  11: Z:\Build\Temp\buildTmp\values617232677 (38 bytes)
  12: --history-max (13 bytes)
  13: 10 (2 bytes)

What you expected to happen?

Expected the chart to be updated and for no errors to stop our ci/cd process.

How to reproduce it?

Install via helm with one chart version, install the next point release and it should happen. If you rerun it, it may succeed (basically it just fails once).

Anything else we need to know?

Surprise us!

[newrelic-infrastructure] integrations_config should be able to pull passwords from k8s secrets

Configuration of additional integrations can be handled by the integrations_config value for the infrastructure chart. However many of these integrations require passwords for resources. Unfortunately, this is stored in a configmap today in plain text. It would be nice if there was a way to specify a kubernetes secret which could be referenced for the password instead.

Is your feature request related to a problem? Please describe.

The problem here is having passwords stored in configmaps and enterprise security groups do not like this.

Describe the solution you'd like

I would like to be able to specify a secret name which could be attached to the container and referenced for the secret.

Describe alternatives you've considered

I believe hashicorp vault is supported, however, our organization isn't quite there yet.

Additional context

Add any other context or screenshots about the feature request here.

[nri-bundle] Windows 2004 Build Support

Is your feature request related to a problem? Please describe.
We are unable to deploy the new relic infrastructure bundle on Windows nodes running the 2004 version.

Describe the solution you'd like

We would like to be able to reference a version of the chart that works with Windows 2004.

Describe alternatives you've considered

N/A

Additional context

It seems the current version is built with 1809 (which works on our 1909 nodes), but 1809 is not compatible with 2004.

[newrelic-logging] New FILTER "grep" rule doesn't work

Bug description

[Originally posted by @d00m178 in the deprecated kubernetes-logging repository]

I have changed the filter-kubenetes.con part in fluent-conf.yml:

  filter-kubernetes.conf: |
    [FILTER]
        Name record_modifier
        Match *
        Record cluster_name ${CLUSTER_NAME}

    [FILTER]
        Name           kubernetes
        Match          kube.*
        Kube_URL       https://kubernetes.default.svc.cluster.local:443
        Merge_JSON_Log Off

    [FILTER]
        Name grep
        Match *
        Exclude message HealthChecker

And want to exclude log messages like:
172.18.110.109 - - [29/May/2020:13:02:35 +0000] "GET / HTTP/1.1" 200 2804 "-" "ELB-HealthChecker/2.0" "-"

But seem it doesn't work - such log messages still coming to NR Logs page.
Also I checked:

Exclude log HealthChecker
the same behavior.
Please advice - is the Filter rules works in New Relic Fluent Bit output plugin for Kubernetes?

Version of Helm and Kubernetes

The version of Helm and Kubernetes where the problem occurs.

Which chart?

The chart name and version.

What happened?

Described what happened, as detailed as possible and necessary.

What you expected to happen?

Please avoid saying "It should just work".

How to reproduce it?

Steps to reproduce the problem, as minimally and precisely as possible.

Anything else we need to know?

Surprise us!

[nri-metadata-injection] Pod and Job can't run in restrictive environment

Is your feature request related to a problem? Please describe.

When running in a cluster with restrictive pod security policies (read only rootfs set to true, RunAsUser set to MustRunAsNonRoot) the metadata injector and it's associated job fail to run.

Describe the solution you'd like

Update the nri-metadata-injection chart so Pod Security Context is configurable and add a volume/volume mount config for the k8s-webhook-cert-manager pod so that tmp can be mounted as an emptyDir volume.

Describe alternatives you've considered

You could add a Pod Security Policy to the chart, but as the pod doesn't need to run as root (as far as I can tell) and the readonly root issues can be resolved with a emptyDir mount there is no reason to do this.

Alternative changing the related container build for the injector to set a user. The certificate job script could be changed to use /dev/shm as the folder for creating the certs to get around the read only filesystem. Didn't really like this idea.

newrelic-infrastructure: needs affinity values

add .Values.affinity to the Daemonset's spec.template.spec.affinity

Is your feature request related to a problem? Please describe.

We need to prevent this from getting deployed to fargate on eks and the most sane way is an affinity rule to avoid fargate's lavel

Describe the solution you'd like
Add

{{- if .Values.affinity }}
affinity:
{{ toYaml .Values.affinity | indent 8 }}
{{- end }}
to the Daemonset's template like
https://github.com/helm/charts/blob/90590531b829dbc1142833a5d124cef7cc17b893/stable/prometheus-node-exporter/templates/daemonset.yaml#L80

Describe alternatives you've considered

We considered adding a standard label to all our nodes which wouldn't be applied to the fargate instance which we could use with the newrelic-infrastructure's nodeSelector but that would require a redeploy of all of our clusters and doesn't make much sense to have a node label that applies to all nodes (except fargate)

nri-bundle YAML error in daemonset-windows.yaml on deployment

Bug description

When deploying the nri-bundle helm template I get the following error:
Error: UPGRADE FAILED: YAML parse error on nri-bundle/charts/newrelic-infrastructure/templates/daemonset-windows.yaml: error converting YAML to JSON: yaml: line 54: did not find expected key

Version of Helm and Kubernetes

Helm 3.3.4, Kubernetes 1.18.8

Which chart?

name: nri-bundle
version: 1.9.1

How to reproduce it?

Deploy with Windows infrastructure support. This was the config we were using:

newrelic-infrastructure:
  enableWindows: true
  privileged: true
  nodeSelector:
    kubernetes.io/os: linux
  windowsNodeSelector:
    kubernetes.io/os: windows

[newrelic-logging] Specify logtype via Kubernetes API object metadata

Is your feature request related to a problem? Please describe.

According to New Relic documentation, a logtype attribute can be set on logs in order to parse them in a certain way (e.g. to automatically extract structured information from Nginx logs): https://docs.newrelic.com/docs/logs/log-management/ui-data/logs-parsing-built-rules-custom-parsing#logtype

Currently there seems to be no way to specify the logtype for logs originating from containers running on Kubernetes.

Describe the solution you'd like

I would like to be able to set this logtype attribute in the object metadata via Kubernetes API (e.g. something like metadata > annotations > "newrelic.com/logtype" = "nginx").

Describe alternatives you've considered

I do not see any way to achieve the result of parsing structured logs for containers running on Kubernetes with current New Relic capabilities.

Additional context

Originally asked as a question, but received no reply with a solution that would achieve the desired result: https://discuss.newrelic.com/t/kubernetes-helm-nri-bundle-logtype/120712

This might be somewhat related to #78 (they both are about using Kubernetes object metadata to influence New Relic log ingest behaviour).

[newrelic-infrastructure] doesn't reload its own ConfigMap

Bug description

subj
newrelic-infrastructure doesn't reload its configuration

Version of Helm and Kubernetes

% helm version
version.BuildInfo{Version:"v3.3.4", GitCommit:"a61ce5633af99708171414353ed49547cf05013d", GitTreeState:"dirty", GoVersion:"go1.15.2"}
% kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-12T01:09:16Z", GoVersion:"go1.15.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.15-gke.4300", GitCommit:"7ed5ddc0e67cb68296994f0b754cec45450d6a64", GitTreeState:"clean", BuildDate:"2020-10-28T09:23:22Z", GoVersion:"go1.13.15b4", Compiler:"gc", Platform:"linux/amd64"}

Which chart?

% helm list | grep newrelic-infrastructure
newrelic-infrastructure	newrelic 	1       	2020-12-14 10:18:01.137575 +0300 MSK	deployed	newrelic-infrastructure-1.6.5	1.26.8

What happened?

I fed the newrelic-infrastructure chart with a following config (templated by gotmpl as I use helmfile, but it'll give some basic understanding / labels and so on were substituted):

% cat newrelic-infrastructure.yaml.gotmpl
USER-SUPPLIED VALUES:
cluster: "{{ .Values.cluster_name }}"
config:
  custom_attributes:
    cluster: "{{ .Values.cluster_name }}"
licenseKey: "{{ .Values.nri_license_key }}"
integrations_config:
  - name: nri-redis.yaml
    data:
      discovery:
        command:
          exec: /var/db/newrelic-infra/nri-discovery-kubernetes
          match:
            label.app: redis
      integrations:
        - name: nri-redis
          env:
            HOSTNAME: ${discovery.ip}
            PORT: 6379
            PASSWORD: "{{ .Values.redis_pass }}"
          labels:
            env: "{{ .Values.cluster_name }}"
            role: redis
  - name: nri-rabbitmq.yaml
    data:
      discovery:
        command:
          exec: /var/db/newrelic-infra/nri-discovery-kubernetes
          match:
            label.app: rabbit
      integrations:
        - name: nri-rabbitmq
          env:
            HOSTNAME: ${discovery.ip}
            PORT: 15672
            USERNAME: "{{ .Values.rabbitmq_user }}"
            PASSWORD: "{{ .Values.rabbitmq_pass }}"
          labels:
            env: "{{ .Values.cluster_name }}"
            role: rabbitmq

Applied the chart to my cluster with these values. Went to the UI, started to see rabbitmq and redis metrics. Then decided to add a block for postgresql. Kind of:

<...>
  - name: postgresql.yaml
    data:
      integration_name: com.newrelic.postgresql
      instances:
        - name: production-postgres
          command: all_data
          arguments:
            username: "{{ .Values.postgres_username }}"
            password: "{{ .Values.postgres_password }}"
            hostname: "{{ .Values.postgres_host }}"
            port: {{ .Values.postgres_port }}
            collection_list: '["dbname"]'
            collect_db_lock_metrics: false
            timeout: {{ .Values.postgres_nri_scrape_timeout }}
          labels:
            env: "{{ .Values.cluster_name }}"
            role: postgresql

Updated chart with these values. Went to the UI, didn't find any postgresql metrics at all. Waited for an hour -- w/o any result.
Deleted the chart completely and applied it again. Started to see postgresql metrics immediately.

As all the changes related to the ConfigMap, it's clear now that newrelic-infrastructure doesn't reload its own configuration.

What you expected to happen?

Expected the config to be reloaded. To see postgresql metrics in the UI with all the other metrics.

[nri-kube-events] Missing podAnnotations value

Hello.

I would like to be able to set podAnnotations for the chart nri-kube-events so I can add the annotation "cluster-autoscaler.kubernetes.io/safe-to-evict": "true". Currently emptyDir: {} is blocking Kubernetes Cluster Autoscaler from rescheduling the pods.

Describe the solution you'd like

I would like to be able to set the value podAnnotations

Additional context

Cluster Autoscaler log:

Fast evaluation: node xxx.compute.internal cannot be removed: pod with local storage present: new-relic-nri-kube-events-76cc55f5f6-crt6c

[nri-bundle] We should document how to modify configs in chart dependencies

When deploying nri-bundle chart it is possible to modify values for the dependencies charts:

This is possible already but I believe we should add at least a hint with some common configs to show how to do so
Es:

#values.yaml of nri-bundle
infrastructure:
  enabled: true
# <removed the rest for brevity>
# configure the k8s integration dependency
newrelic-infrastructure:
#  enableLinux: true
# enableWindows: false
#  verboseLog: false
#  image:
#    repository: newrelic/infrastructure-k8s
#    tag: ""
#    windowsTag: 1.21.0-windows-1809-alpha
#    pullPolicy: IfNotPresent
# configure another dependency
nri-kube-events:
  key: value

[newrelic-logging] Remove cluster domain from Kube_URL

Bug description

Configmap for newrelic-logging sets Kube_URL for fluent-bit kubernetes filter to https://kubernetes.default.svc.cluster.local:443 which doesn't work if the cluster has different domain than cluster.local.

Version of Helm and Kubernetes

helm 3.3.4
kubernetes 1.17.6

Which chart?

newrelic-logging v1.2.1

What happened?

If kubernetes cluster has domain different than cluster.local, fluent-bit cannot resolve the api controller endpoint and fails to connect to it.

What you expected to happen?

fluent-bit connects to api controller successfully.

How to reproduce it?

Create a kubernetes cluster with custom domain name and install newrelic-logging.

Anything else we need to know?

Current Kube_API value was default in fluent-bit 1.0 (currently used in chart). Since version 1.1 of fluent-bit, default value of Kube_API parameter is generic https://kubernetes.default.svc:443 which works for any cluster domain.

[nri-kube-events] attempting to write warnings to RO filesystem

nri-kube-events's kube-events container keeps crashing every few minutes as a watch warning causes the service to try to write to /tmp, which is a readonly filesystem. I am not sure if this is necessarily a bug in kube-events or the container should just be updated to allow it to write to a tmpfs like the side-car infra-agent container already can.

Version of Helm and Kubernetes

$ kubectl version
...
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4", GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T20:55:23Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
$ helm version
version.BuildInfo{Version:"v3.2.3", GitCommit:"8f832046e258e2cb800894579b1b3b50c2d83492", GitTreeState:"clean", GoVersion:"go1.13.12"}

Which chart?

nri-kube-events / kube-events

What happened?

When any of the watchers emit a warning, it attempts to log it to /tmp. This is a read-only filesystem which causes the container to crash and restart.

W0623 14:41:44.507228       1 reflector.go:302] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98: watch of *v1.Event ended with: The resourceVersion for the provided watch is too old.
log: exiting because of error: log: cannot create log: open /tmp/nri-kube-events.newrelic-events-nri-kube-events-c6bfbd6ff-6bm5s.nri-kube-events.log.WARNING.20200623-144144.1: read-only file system

What you expected to happen?

kube-events warnings shouldn't cause the container to restart. This should either be made into a writable directory or kube-events should only send warnings to stdout.

How to reproduce it?

This is just a generic Rancher deployed K8s cluster on v1.17.4

nri-metadata-injection namespaceSelector wrong nesting level

Bug description

The namespaceSelector field in the created MutatingWebhookConfiguration is nested on a wrong level. It should be on the same level as the rules field, as per https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#matching-requests-namespaceselector .

The chart wrongly generates it on the same level as a single rule. The effect is the MutatingWebhookConfiguration is created without the proper namespaceSelector element, making it impossible to restrict the injection to selected namespaces.

Version of Helm and Kubernetes

Helm: v2.14.2
Kubernetes: v1.14.10-gke.36

Which chart?

nri-metadata-injection 1.0.3

What happened?

MutatingWebhookConfiguration was created without the proper namespaceSelector configuration.

What you expected to happen?

MutatingWebhookConfiguration created with:

  namespaceSelector:
    matchLabels:
      newrelic-metadata-injection: enabled

How to reproduce it?
As Helm values, use:

webhook:
  enabled: true
nri-metadata-injection:
  injectOnlyLabeledNamespaces: true

from the bundle chart, or:

injectOnlyLabeledNamespaces: true

from the chart itself.

Anything else we need to know?

One curious thing I don't understand is how the k8s API server doesn't error out when Helm creates the MutatingWebhookConfiguration, instead it just creates it with empty namespaceSelector. When applying the generated (eg. via helm get) Helm manifest manually I get an explicit API error, eg.:

error validating data: ValidationError(MutatingWebhookConfiguration.webhooks[0].rules[0]): unknown field "namespaceSelector" in io.k8s.api.admissionregistration.v1beta1.RuleWithOperations

[synthetics-minion] synthetics.privateLocationKey should be the name of a secret instead of the key itself

Is your feature request related to a problem? Please describe.

Right now, I need to pass in synthetics.privateLocationKey as a parameter to the helm chart (which then winds up as an environment variable in the StatefulSet.) I'm using ArgoCD to deploy and manage my helm deployments, and I'm using a Git repository to store my ArgoCD application manifests - this means I don't have any choice but to check my private location key into my Git repo in order for ArgoCD to deploy the helm chart. I don't feel like that's a good practice. (Additionally, the location key can be viewed in plaintext by describing the minion StatefulSet.)

Describe the solution you'd like

I would love it if we could pass in the name of a pre-created secret as the value for synthetics.privateLocationKey, instead of the key itself. The Helm chart could then refer to the secret's value and inject that into the pod's environment variables. That way, the key doesn't have to be checked into Git in order for ArgoCD to maintain helm deployments, and we can treat it like we would any other secret.

Describe alternatives you've considered

I can't think of a better way to obscure this secret so it doesn't wind up in the Git repo I use to manage ArgoCD. If there's a way to have the app manifest itself pull arguments from a secret, I haven't found it.

Additional context

For reference, this is what deploying a helm chart via ArgoCD is like:

https://argoproj.github.io/argo-cd/user-guide/helm/

[newrelic-infrastructure] support configuration of process metrics

Currently unable to configure the NRIA_ENABLE_PROCESS_METRICS environment variable for the newrelic-infrastructure agent.

Adding the NRIA_ENABLE_PROCESS_METRICS environment variable as a configuration option in the values file allows the user to turn on or off process metrics on the newrelic-infrastructure agent. After speaking to customer support this is not currently available as custom attributes will only list as a tag.

[nri-metadata-injection] Helm Upgrade clears caBundle which breaks the webhook calls

Bug description

When running helm upgrade on an installed version of the webhook it will set the state to what helm keeps track of. Right now in the mutatingwebhookconfiguration template the caBundle is set to an empty string. Then helm upgrades it will set it to that empty string even though the job patched it after first install.

Version of Helm and Kubernetes

1.17.9

Which chart?

nri-metadata-injection

What happened?

MutatingWebhookConfigurations caBundle is cleared on helm upgrade.

What you expected to happen?

The caBundle is left alone since it was patched by the job at first install.

How to reproduce it?

Install the chart. Make a change that would make help re-apply the chart. Run helm upgrade and the caBundle is cleared.

Anything else we need to know?

The quick fix would be the remove the caBundle property from the MutatingWebhookConfiguration template that way helm does not manage it. Then when the cert job runs it will patch it. Another option is to setup the job to always run on install and upgrade using a hook which should make sure it is patched.

nri-bundle Alias Dependencies

We should alias the chart dependencies in the nri-bundle umbrella chart. Currently our values file must like like this

infrastructure:
  enabled: true

newrelic-infrastructure:
  enableLinux: true
  enableWindows: true

When ideally we could do this

infrastructure:
  enabled: true
  enableLinux: true
  enableWindows: true

This should only require us to update the requirements.yaml to look like this

dependencies:
  - name: newrelic-infrastructure
    repository: file://../newrelic-infrastructure
    condition: infrastructure.enabled
    version: 1.3.1
    alias: infrastructure

newrelic-infrastructure Windows support

Is your feature request related to a problem? Please describe.

Currently the newrelic-infrastructure Helm chart can only be run on Linux nodes. With the introduction of Windows nodes in EKS & GKE, it would be nice to have a version of this chart that can be run on Windows nodes. I see a version of the Docker newrelic/infrastructure-k8s image is now available with windows-1909 and windows-1809 variants (in alpha).

Describe the solution you'd like

A version of the newrelic-infrastructure Helm chart that can be run on Windows nodes.

Describe alternatives you've considered

I've gone down the path of patching the existing chart with Windows alternatives - ripping out things like hostNetwork, which is currently not supported in Windows. In my mind, there should really be an official New Relic version though.

Additional context

New Relic Kubernetes integration is a helm install command away for Linux nodes, but not quite as straight and forward for Windows.

[nri-kube-events] nri-kube-events chart should support custom attributes

nri-kube-events helm chart should support custom_sttributes just like the rest of the product family (nri-infrastructure). This feature is needed for us to monitor and set up alerts based on the custom_attributes

Reference code for implementation:
https://github.com/newrelic/helm-charts/blob/master/charts/newrelic-infrastructure/values.yaml#L91
https://github.com/newrelic/helm-charts/blob/master/charts/newrelic-infrastructure/templates/configmap.yaml#L10
https://github.com/newrelic/helm-charts/blob/master/charts/newrelic-infrastructure/templates/daemonset.yaml#L142

[newrelic-logging] installation is bumpy

a few things:

  • the memory resources allocated are too slim, these containers kept getting OOM killed on cluster with low activity
    resources:
    limits:
    cpu: 500m
    memory: 128Mi

  • on OpenShift 4.4 we get errors using the hostPath definition, then errors accessing the volumes
    we had to add
    securityContext:
    privileged: true
    and
    oc adm policy add-scc-to-user privileged system:serviceaccount:my_namespace:newrelic-logging

  • to use a proxy, we had to use an environment variable
    "http_proxy" and this is not documented

[newrelic-logging] Add ability to specify custom attributes and parsers

Is your feature request related to a problem? Please describe.

I would like to be able to add additional custom attributes to logs sent to NewRelic Logs within the helm chart. It would also be nice to specify custom parsers from within the helm chart.

Describe the solution you'd like

This could be added by using an logs.customAttributes map for the chart variables, that would create records at:

filter-kubernetes.conf: |
[FILTER]
Name record_modifier
Match *
Record cluster_name ${CLUSTER_NAME}

Much like the cluster_name custom attribute, added by default.

So, for example, if we wanted to add some location info about the cluster, e.g. Cloud, Region, etc.

[FILTER]
    Name record_modifier
    Match *
    Record cluster_name ${CLUSTER_NAME}
    Record cloud aws
    Record cloud_region ap-southeast-2

Describe alternatives you've considered

Patch the ConfigMap separately, this isn't easy when using Infrastructure as Code, e.g. Terraform, which is what we are using.

Additional context

Add any other context or screenshots about the feature request here.

[nri-metadata-injection] DigitalOcean Kubernetes: Mutating webhook with a TimeoutSeconds value greater than 29 seconds will block upgrades

Bug description

Managed DigitalOcean Kubernetes clusters cannot update automatically as long as New Relic instrumentation is installed.

Version of Helm and Kubernetes

I am using Terraform 0.13.5 with hashicorp/helm 1.3.2 ran against Kubernetes 1.18.10-do.1.

Which chart?

nri-bundle version 1.9.1.

What happened?

Trying to update my Kubernetes cluster on DigitalOcean using their UI, I see following messages:

These block automatic upgrades. Sadly the links to the documentation lead into the void - their targets do not exist.

What you expected to happen?

The DigitalOcean Kubernetes linter (integrated into their web UI) should not report anything and automatic upgrades of DigitalOcean Kubernetes clusters should just work without user intervention.

How to reproduce it?

  1. Create a DigitalOcean Kubernetes cluster at version 1.18.10-do.1.
  2. Install nri-bundle 1.9.1 using Helm.
  3. Try to upgrade the Kubernetes cluster to version 1.19.3-do.1.

[newrelic-logging] Add support for HTTP/HTTPS proxy

Is your feature request related to a problem? Please describe.

[Originally extracted from the issue filed by @tetsushiawano in the deprecated kubernetes-logging repository]

The New Relic Fluent Bit output plugin currently supports HTTP/HTTPS proxy. However, configuring such proxy is not available in the Helm charts and manual K8s manifests. Please add support for this.

Describe the solution you'd like

Provide configuration parameters that allow configuring the proxy using the settings supported by the New Relic output plugin (Go).

Describe alternatives you've considered

Alternatively, when the official support for New Relic is provided by Fluent Bit (in C), use the proxy settings of Fluent Bit to enable this.

Additional context

None.

[newrelic-infrastructure] Inconsistent ConfigMap names

Hi,

In https://docs.newrelic.com/docs/integrations/kubernetes-integration/link-apps-services/monitor-services-running-kubernetes#configmap

We have this:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: nri-integration-cfg
  namespace: default
data:

But in the chart you have a different name :

name: {{ template "newrelic.fullname" . }}-integrations-cfg

Either update the doc, or the chart, but there is no reason for having different names!

Thanks!

[newrelic-logging] Add support for OpenShift (OKD)

Is your feature request related to a problem? Please describe.

[Originally extracted from this feature request posted by @tetsushiawano in the deprecated kubernetes-logging repository]

Please add support for OpenShift (OKD). A proposal on how to implement this was provided for the manual K8s manifests here.

Describe the solution you'd like

A proposal on how to implement this was provided for the manual K8s manifests here.

Describe alternatives you've considered

None.

Additional context

None.

[newrelic-logging] Helm Chart has Incomplete Env Variables

Bug description

[Originally posted by @druidsbane in the newrelic-fluent-bit-output repository]

Version of Helm and Kubernetes

<Please provide this information, @druidsbane, @jasonchester>

Which chart?

newrelic-logging (current version)

What happened?

The pods were unable to get host status until I added:

- name: HOSTNAME
  valueFrom:
    fieldRef:
        apiVersion: v1
        fieldPath: metadata.name

This also happened on Azure Kubernetes Service (AKS) and was fixed by the above addition.

Can we get this added to the Helm chart and the regular deployment file?

More details on a fluent-bit issue: fluent/fluent-bit#850 (comment)

What you expected to happen?

<Please provide this information, @druidsbane, @jasonchester>

How to reproduce it?

<Please provide this information, @druidsbane, @jasonchester>

Kubernetes information as provided by @jasonchester:

{Major:"1", Minor:"16", 
GitVersion:"v1.16.7", GitCommit:"e2d9f8479783020904aba3de7499a49be6c75ebd", GitTreeState:"clean",
BuildDate:"2020-04-09T02:31:15Z", 
GoVersion:"go1.13.6", Compiler:"gc", 
Platform:"linux/amd64"}

Anything else we need to know?

<Please provide this information, @druidsbane, @jasonchester>

[newrelic-logging] Custom Fluentd config

Is your feature request related to a problem? Please describe.

[Originally posted by @shicholas in the deprecated kubernetes-logging repository]

We'd like to have two Fluentd outs, one to New Relic, and the other to our archival storage. If custom fluentd.conf are possible today, the documentation isn't clear on how to do so. If not, then I'm happy to write a PR if you point me in the write direction.

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[nri-logging] 1.3.0 failing in openshift cluster

When I deploy the 1.3.0 in an openshift it fails:

coreint-test-newrelic-logging-dbq5t                    0/1     CrashLoopBackOff    2          35s

Logs:

Fluent Bit v1.6.2
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2020/11/16 14:10:30] [ info] [engine] started (pid=1)
[2020/11/16 14:10:30] [ info] [storage] version=1.0.6, initializing...
[2020/11/16 14:10:30] [ info] [storage] in-memory
[2020/11/16 14:10:30] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/11/16 14:10:30] [error] [config] kubernetes: unknown configuration property 'Merge_JSON_Log'. The following properties are allowed: buffer_size, tls.debug, tls.verify, tls.vhost, merge_log, merge_parser, merge_log_key, merge_log_trim, keep_log, kube_url, kube_meta_preload_cache_dir, kube_ca_file, kube_ca_path, kube_tag_prefix, kube_token_file, labels, annotations, k8s-logging.parser, k8s-logging.exclude, use_journal, regex_parser, dummy_meta, dns_retries, and dns_wait_time.
[2020/11/16 14:10:30] [ help] try the command: /fluent-bit/bin/fluent-bit -F kubernetes -h

[2020/11/16 14:10:30] [ info] [input] pausing tail.0
[2020/11/16 14:10:30] [error] [lib] backend failed

If I deploy the same chart having the version 1.3.2 it works fine.

Did we bumper the App version here #165? I believe we are using an option no longer present

@bmcfeely @jodeev

[newrelic-infrastructure] Add custom certificates for integrations

Is your feature request related to a problem? Please describe.

When an integration needs to connect to the service using a certificate there is no a current value to add that using the chart

Describe the solution you'd like

Would be nice to have a value to add a certificate file .pem to a secret and that secret to be mounted in the pod so could be used by the integration.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

CI/CD currentl broken "kubernetes-charts.storage.googleapis.com" is deprecated

Currently the CI/CD is broken since the old helm stable repository has been deprecated and helm 3.4.0 exits with an error:

helm repo add ppp https://charts.helm.sh/stable
WARNING: "kubernetes-charts.storage.googleapis.com" is deprecated for "stable" and will be deleted Nov. 13, 2020.
WARNING: You should switch to "https://charts.helm.sh/stable"
"ppp" has been added to your repositories

Updating the CI and the requirements of nri-bundle is needed

Newrelic Logging & Infrastructure: Requires Volume var which is of an unsupported type

Bug description

When deploying the new relic charts to AKS when using aci linux it doesn't allow you to use the /var file path to mount.

Version of Helm and Kubernetes

The versino of Helm and Kubernetes where the problem occurs.

newrelic-logging:1.3.0 and newrelic-infrastructure:1.26.1

The chart name and version.

What happened?

Pod stays in a pending state.

What you expected to happen?

Pod to be healthy.

How to reproduce it?

Create an AKS Cluster using aci and deploy the helm charts.

Anything else we need to know?

The supported ACI volume mount type are AzureFile

Please refer to the following document: https://docs.microsoft.com/en-us/azure/container-instances/container-instances-volume-azure-files#deploy-container-and-mount-volume---yaml

volumes:

  • name: filesharevolume
    azureFile:
    sharename: acishare
    storageAccountName:
    storageAccountKey:

[Repolinter] Open Source Policy Issues

Repolinter Report

๐Ÿค–This issue was automatically generated by repolinter-action, developed by the Open Source and Developer Advocacy team at New Relic. This issue will be automatically updated or closed when changes are pushed. If you have any problems with this tool, please feel free to open a GitHub issue or give us a ping in #help-opensource.

This Repolinter run generated the following results:

โ— Error โŒ Fail โš ๏ธ Warn โœ… Pass Ignored Total
0 3 1 3 0 7

Fail #

โŒ readme-contains-link-to-security-policy #

Doesn't contain a link to the security policy for this repository (README.md). New Relic recommends putting a link to the open source security policy for your project (https://github.com/newrelic/<repo-name>/security/policy or ../../security/policy) in the README. For an example of this, please see the "a note about vulnerabilities" section of the Open By Default repository. For more information please visit https://nerdlife.datanerd.us/new-relic/security-guidelines-for-publishing-source-code.

โŒ readme-contains-discuss-topic #

Doesn't contain a link to the appropriate discuss.newrelic.com topic (README.md). New Relic recommends directly linking the your appropriate discuss.newrelic.com topic in the README, allowing developer an alternate method of getting support. For more information please visit https://nerdlife.datanerd.us/new-relic/security-guidelines-for-publishing-source-code.

โŒ code-of-conduct-should-not-exist-here #

New Relic has moved the CODE_OF_CONDUCT file to a centralized location where it is referenced automatically by every repository in the New Relic organization. Because of this change, any other CODE_OF_CONDUCT file in a repository is now redundant and should be removed. Note that you will need to adjust any links to the local CODE_OF_CONDUCT file in your documentation to point to the central file (README and CONTRIBUTING will probably have links that need updating). For more information please visit https://docs.google.com/document/d/1y644Pwi82kasNP5VPVjDV8rsmkBKclQVHFkz8pwRUtE/view. Found files. Below is a list of files or patterns that failed:

  • CODE_OF_CONDUCT.md
    • ๐Ÿ”จ Suggested Fix: Remove file

Warning #

Click to see rules

โš ๏ธ third-party-notices-file-exists #

A THIRD_PARTY_NOTICES.md file can be present in your repository to grant attribution to all dependencies being used by this project. This document is necessary if you are using third-party source code in your project, with the exception of code referenced outside the project's compiled/bundled binary (ex. some Java projects require modules to be pre-installed in the classpath, outside the project binary and therefore outside the scope of the THIRD_PARTY_NOTICES). Please review your project's dependencies and create a THIRD_PARTY_NOTICES.md file if necessary. For JavaScript projects, you can generate this file using the oss-cli. For more information please visit https://docs.google.com/document/d/1y644Pwi82kasNP5VPVjDV8rsmkBKclQVHFkz8pwRUtE/view. Did not find a file matching the specified patterns. Below is a list of files or patterns that failed:

  • THIRD_PARTY_NOTICES*
  • THIRD-PARTY-NOTICES*
  • THIRDPARTYNOTICES*

Passed #

Click to see rules

โœ… license-file-exists #

Found file (LICENSE). New Relic requires that all open source projects have an associated license contained within the project. This license must be permissive (e.g. non-viral or copyleft), and we recommend Apache 2.0 for most use cases. For more information please visit https://docs.google.com/document/d/1vML4aY_czsY0URu2yiP3xLAKYufNrKsc7o4kjuegpDw/edit.

โœ… readme-file-exists #

Found file (README.md). New Relic requires a README file in all projects. This README should give a general overview of the project, and should point to additional resources (security, contributing, etc.) where developers and users can learn further. For more information please visit https://github.com/newrelic/open-by-default.

โœ… readme-starts-with-community-plus-header #

The first 5 lines contain all of the requested patterns. (README.md). The README of a community plus project should have a community plus header at the start of the README. If you already have a community plus header and this rule is failing, your header may be out of date, and you should update your header with the suggested one below. For more information please visit https://opensource.newrelic.com/oss-category/.

[nri-kube-events] number of replicas in the deployment

Is your feature request related to a problem? Please describe.

Is there any specific reason for hardcoding the number of replicas for nri-kube-events and nri-metadata-injection to 1?

Describe the solution you'd like

It should be at least 2 pods in the deployment for the redundancy.

Describe alternatives you've considered

Make it as variable, to allow users to pass from their values.yaml

[newrelic-logging] Add ability to tag the resources you want the logs captures (or opt-out)

Is your feature request related to a problem? Please describe.

[Originally posted by @shicholas here]

Installation and setup with this was easy and straightforward, nice work!

It'd be nice to opt-in or out of New Relic collecting logs based on metadata tags we assign to our K8 resources, or specify a K8 namespace to collect logs from. We have some third-party managed resources we do not care to collect logs from (e.g. Azure's Key Vault).

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

[nri-kube-events] Ensure pods start when Pod Security Policies enforce MustRunAsNonRoot

Is your feature request related to a problem? Please describe.

Unfortunately when you try and deploy this chart on a Kubernetes cluster where the default Pod Security Policy enforces MustRunAsNonRoot the nri-kube-events pod fails to start.

Interestingly the chart configures for the newrelic infra agent via a SecurityContext so that it runs as user 1000, but then goes on to set runAsNonRoot: false which then fall foul of the Pod Security Policy's MustRunAsNonRoot configuration.

Once this is changed to true the infra agent starts but the events container still fails as it has named user that Kubernetes can't work out if the user is root or not as it doesn't have a numeric id.

Really both these containers should probably work with a MustRunAsNonRoot configuration.

Describe the solution you'd like

The deployment of the nri-kube-events should be configured (if it can) to run as non-root for both containers. So fixing runAsNonRoot: false so it is runAsNonRoot: true for the infra agent and setting a security context for the events container would likely fix these issues.

Describe alternatives you've considered

None

synthetics-minion, Crashlooping not enough resources.

Bug description

I get an error of: The node where job synthetic-healthcheck-2345bc066c4f453cb246b4879fa34c3e was scheduled did not have enough available resources.

Kubernetes version 1.17.11
ubernetes where the problem occurs.

Which chart?
repository: https://helm-charts.newrelic.com/charts
name: synthetics-minion
version: "1.0.18"

What happened?
If the node it is scheduling it on is close to full but it can fit, it is slotted on to the node and then I believe it does the synthetic health check it runs in into a resource issue:

 kubectl describe po newrelic-synthetics-minion-2  -nmonitoring
Name:           newrelic-synthetics-minion-2
Namespace:      monitoring
Priority:       0
Node:           aks-defaultpool-33270332-vmss000015/10.240.0.9
Start Time:     Tue, 24 Nov 2020 17:21:11 -0600
Labels:         app.kubernetes.io/instance=newrelic-synthetics-minion
                app.kubernetes.io/name=synthetics-minion
                controller-revision-hash=newrelic-synthetics-minion-fb78b49b6
                statefulset.kubernetes.io/pod-name=newrelic-synthetics-minion-2
Annotations:    <none>
Status:         Running
IP:             10.244.8.253
Controlled By:  StatefulSet/newrelic-synthetics-minion
Init Containers:
  update-mounted-subpath-permissions:
    Container ID:  docker://8521fb9b517952a3faf3df5a937c6b206c016b6e8f07dff8c67bd6a156116497
    Image:         quay.io/newrelic/synthetics-minion:3.0.28
    Image ID:      docker-pullable://quay.io/newrelic/synthetics-minion@sha256:8035c60d4d282d3c7a2007d17ed515821de3f123f867930e905e5e0c4eca763d
    Port:          <none>
    Host Port:     <none>
    Command:
      update-k8s-mounted-subpath-permissions
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 24 Nov 2020 17:21:46 -0600
      Finished:     Tue, 24 Nov 2020 17:21:46 -0600
    Ready:          True
    Restart Count:  0
    Environment:
      MINION_UID:                   2379
      MINION_GID:                   3729
      TMP_PATH:                     /tmp
      PERMANENT_DATA_STORAGE_PATH:  /var/lib/newrelic/synthetics
      CUSTOM_MODULES_PATH:          /var/lib/newrelic/synthetics/modules
      USER_DEFINED_VARIABLES_PATH:  /var/lib/newrelic/synthetics/variables
    Mounts:
      /tmp from minion-volume (rw,path="newrelic-synthetics-minion/tmp")
      /var/run/secrets/kubernetes.io/serviceaccount from newrelic-synthetics-minion-token-xppxf (ro)
Containers:
  synthetics-minion:
    Container ID:  docker://02c931329f4199fd104b39000cadbd17809a00d1ab95ec5c3180d61792306c97
    Image:         quay.io/newrelic/synthetics-minion:3.0.28
    Image ID:      docker-pullable://quay.io/newrelic/synthetics-minion@sha256:8035c60d4d282d3c7a2007d17ed515821de3f123f867930e905e5e0c4eca763d
    Ports:         8080/TCP, 8180/TCP, 5005/TCP, 65100/TCP, 65101/TCP, 65102/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    State:         Terminated
      Reason:      Error
      Message:     .fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:568)
! at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:507)
! at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.doGetLog(PodOperationsImpl.java:186)
! at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.getLog(PodOperationsImpl.java:195)
! at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.getLog(PodOperationsImpl.java:83)
! at com.newrelic.synthetics.minion.containersystemdriver.kubernetes.KubernetesHelper.getPodContainerLogs(KubernetesHelper.java:86)
! ... 22 common frames omitted

2020-11-25 00:00:59,140 - The node where job synthetic-healthcheck-93d8c8163bd145938020dc0de43aa1c2 was scheduled did not have enough available resources
2020-11-25 00:00:59,140 - Health check job synthetic-healthcheck-93d8c8163bd145938020dc0de43aa1c2 exited with 126
2020-11-25 00:00:59,140 - Runner Docker Image: synthetics-minion-runner:3.0.28 - Container System Driver (with KubernetesClient) failed completing the health check Job, as it returned with status failed.
2020-11-25 00:00:59,886 - Service/dependency 'ContainerSystemDriverHealthCheck' is not healthy 'Result{isHealthy=false, message=ContainerSystemDriver is not healthy: Runner Docker Image: synthetics-minion-runner:3.0.28 - Container System Driver (with KubernetesClient) failed completing the health check Job, as it returned with status failed., timestamp=2020-11-25T00:00:59.202Z}'
2020-11-25 00:00:59,887 -
2020-11-25 00:00:59,887 - **************************************************************************************************
2020-11-25 00:00:59,887 - ***                       MINION FAILED FIRST HEALTH CHECK (see above)                         ***
2020-11-25 00:00:59,887 - **************************************************************************************************
2020-11-25 00:00:59,887 -
2020-11-25 00:00:59,887 - *** QUITTING FORCEFULLY ***

      Exit Code:  1
      Started:    Tue, 24 Nov 2020 18:00:46 -0600
      Finished:   Tue, 24 Nov 2020 18:01:00 -0600
    Last State:   Terminated
      Reason:     Error
      Message:    .fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:568)
! at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:507)
! at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.doGetLog(PodOperationsImpl.java:186)
! at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.getLog(PodOperationsImpl.java:195)
! at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.getLog(PodOperationsImpl.java:83)
! at com.newrelic.synthetics.minion.containersystemdriver.kubernetes.KubernetesHelper.getPodContainerLogs(KubernetesHelper.java:86)
! ... 22 common frames omitted

2020-11-24 23:55:34,430 - The node where job synthetic-healthcheck-7702253108e349abac33e5d9ef95044a was scheduled did not have enough available resources
2020-11-24 23:55:34,430 - Health check job synthetic-healthcheck-7702253108e349abac33e5d9ef95044a exited with 126
2020-11-24 23:55:34,430 - Runner Docker Image: synthetics-minion-runner:3.0.28 - Container System Driver (with KubernetesClient) failed completing the health check Job, as it returned with status failed.
2020-11-24 23:55:35,412 - Service/dependency 'ContainerSystemDriverHealthCheck' is not healthy 'Result{isHealthy=false, message=ContainerSystemDriver is not healthy: Runner Docker Image: synthetics-minion-runner:3.0.28 - Container System Driver (with KubernetesClient) failed completing the health check Job, as it returned with status failed., timestamp=2020-11-24T23:55:34.444Z}'
2020-11-24 23:55:35,413 -
2020-11-24 23:55:35,413 - **************************************************************************************************
2020-11-24 23:55:35,413 - ***                       MINION FAILED FIRST HEALTH CHECK (see above)                         ***
2020-11-24 23:55:35,413 - **************************************************************************************************
2020-11-24 23:55:35,413 -
2020-11-24 23:55:35,413 - *** QUITTING FORCEFULLY ***

      Exit Code:    1
      Started:      Tue, 24 Nov 2020 17:55:22 -0600
      Finished:     Tue, 24 Nov 2020 17:55:35 -0600
    Ready:          False
    Restart Count:  12
    Limits:
      cpu:     750m
      memory:  1717986918400m
    Requests:
      cpu:     500m
      memory:  800Mi
    Liveness:  http-get http://:http/status/check delay=600s timeout=60s period=300s #success=1 #failure=3
    Environment:
      MINION_POD_NAME:              newrelic-synthetics-minion-2 (v1:metadata.name)
      MINION_POD_NAMESPACE:         monitoring (v1:metadata.namespace)
      MINION_PRIVATE_LOCATION_KEY:  NRSP-usXXXXXXX
      MINION_LOG_LEVEL:             INFO
      MINION_HEAVY_WORKERS:         2
      MINION_LIGHTWEIGHT_WORKERS:   50
    Mounts:
      /tmp from minion-volume (rw,path="newrelic-synthetics-minion/tmp")
      /var/run/secrets/kubernetes.io/serviceaccount from newrelic-synthetics-minion-token-xppxf (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  minion-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  minion-volume-newrelic-synthetics-minion-2
    ReadOnly:   false
  newrelic-synthetics-minion-token-xppxf:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  newrelic-synthetics-minion-token-xppxf
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  beta.kubernetes.io/os=linux
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age                    From                                          Message
  ----     ------                  ----                   ----                                          -------
  Warning  FailedScheduling        <unknown>              default-scheduler                             error while running "VolumeBinding" filter plugin for pod "newrelic-synthetics-minion-2": pod has unbound immediate PersistentVolumeClaims
  Warning  FailedScheduling        <unknown>              default-scheduler                             error while running "VolumeBinding" filter plugin for pod "newrelic-synthetics-minion-2": pod has unbound immediate PersistentVolumeClaims
  Normal   Scheduled               <unknown>              default-scheduler                             Successfully assigned monitoring/newrelic-synthetics-minion-2 to aks-defaultpool-33270332-vmss000015
  Normal   NotTriggerScaleUp       40m                    cluster-autoscaler                            pod didn't trigger scale-up (it wouldn't fit if a new node is added):
  Normal   SuccessfulAttachVolume  39m                    attachdetach-controller                       AttachVolume.Attach succeeded for volume "pvc-cd859e31-7afe-47d0-8f3c-e5c19e355bd8"
  Normal   Pulling                 39m                    kubelet, aks-defaultpool-33270332-vmss000015  Pulling image "quay.io/newrelic/synthetics-minion:3.0.28"
  Normal   Pulled                  39m                    kubelet, aks-defaultpool-33270332-vmss000015  Successfully pulled image "quay.io/newrelic/synthetics-minion:3.0.28"
  Normal   Created                 39m                    kubelet, aks-defaultpool-33270332-vmss000015  Created container update-mounted-subpath-permissions
  Normal   Started                 39m                    kubelet, aks-defaultpool-33270332-vmss000015  Started container update-mounted-subpath-permissions
  Normal   Pulled                  37m (x5 over 39m)      kubelet, aks-defaultpool-33270332-vmss000015  Container image "quay.io/newrelic/synthetics-minion:3.0.28" already present on machine
  Normal   Created                 37m (x5 over 39m)      kubelet, aks-defaultpool-33270332-vmss000015  Created container synthetics-minion
  Normal   Started                 37m (x5 over 39m)      kubelet, aks-defaultpool-33270332-vmss000015  Started container synthetics-minion
  Warning  BackOff                 4m31s (x150 over 38m)  kubelet, aks-defaultpool-33270332-vmss000015  Back-off restarting failed container

What you expected to happen?

It would not crash

How to reproduce it?

I would say spin up a node that is close the resource limits (but allows this to fit) and then add this to the node.
We're running in AKS

Anything else we need to know?
Initially I doubled the cpu & memory resource that it requested, and it's limit. That didn't correct the issue as it looked like it spun up a job that desired that amount of resources also so increasing it seemed to have it be harder to fit within a node.

Then
I set the replica set to 3 (thinking to maybe spread the load), it is running happily on the node (004) that is less utilized, but the 2 that are close to capacity it is crashlooping on.

 kubectl describe node aks-defaultpool-33270332-vmss000004 | grep Allocated -A12
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                       Requests          Limits
  --------                       --------          ------
  cpu                            2920m (75%)       9600m (248%)
  memory                         7342450944 (29%)  18432375910400m (73%)
  ephemeral-storage              0 (0%)            0 (0%)
  attachable-volumes-azure-disk  0                 0
Events:                          <none>
PS C:\Users\David.Bollman> kubectl describe node aks-defaultpool-33270332-vmss000012 | grep Allocated -A12
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                       Requests           Limits
  --------                       --------           ------
  cpu                            3431m (88%)        15850m (410%)
  memory                         10053780608 (39%)  27078187110400m (107%)
  ephemeral-storage              0 (0%)             0 (0%)
  attachable-volumes-azure-disk  0                  0
Events:                          <none>
PS C:\Users\David.Bollman> kubectl describe node aks-defaultpool-33270332-vmss000015 | grep Allocated -A7
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                       Requests           Limits
  --------                       --------           ------
  cpu                            3559m (92%)        23500m (608%)
  memory                         17081043584 (67%)  45003221606400m (178%)
  ephemeral-storage              0 (0%)             0 (0%)
  attachable-volumes-azure-disk  0                  0

[newrelic-logging] Outdated Fluent Bit version

Is your feature request related to a problem? Please describe.

Current latest version of the helm chart is still running on Fluent Bit v1.0.3, which lacks a lot of new features such as multiline support in Docker_mode.

Describe the solution you'd like

Bump Fluent Bit version to a higher version and keep the dependency updated overtime to ensure compatibility.

Describe alternatives you've considered

Running my own image.

Additional context

https://docs.fluentbit.io/manual/pipeline/inputs/tail#docker_mode

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.