prometheus-operator / prometheus-operator Goto Github PK

View Code? Open in Web Editor NEW

9.0K 147.0 3.7K 113.23 MB

Prometheus Operator creates/configures/manages Prometheus clusters atop Kubernetes

Home Page: https://prometheus-operator.dev

License: Apache License 2.0

Makefile 0.61% Go 80.94% Shell 0.41% Dockerfile 0.08% Jsonnet 17.96%

kubernetes prometheus monitoring hacktoberfest

prometheus-operator's Introduction

Prometheus Operator

Overview

The Prometheus Operator provides Kubernetes native deployment and management of Prometheus and related monitoring components. The purpose of this project is to simplify and automate the configuration of a Prometheus based monitoring stack for Kubernetes clusters.

The Prometheus operator includes, but is not limited to, the following features:

Kubernetes Custom Resources: Use Kubernetes custom resources to deploy and manage Prometheus, Alertmanager, and related components.
Simplified Deployment Configuration: Configure the fundamentals of Prometheus like versions, persistence, retention policies, and replicas from a native Kubernetes resource.
Prometheus Target Configuration: Automatically generate monitoring target configurations based on familiar Kubernetes label queries; no need to learn a Prometheus specific configuration language.

For an introduction to the Prometheus Operator, see the getting started guide.

Project Status

The operator in itself is considered to be production ready. Please refer to the Custom Resource Definition (CRD) versions for the status of each CRD:

monitoring.coreos.com/v1: stable CRDs and API, changes are made in a backward-compatible way.
monitoring.coreos.com/v1beta1: unstable CRDs and API, changes can happen but the team is focused on avoiding them. We encourage usage in production for users that accept the risk of breaking changes.
monitoring.coreos.com/v1alpha1: unstable CRDs and API, changes can happen frequently, and we suggest avoiding its usage on mission-critical environments.

Prometheus Operator vs. kube-prometheus vs. community helm chart

Prometheus Operator

The Prometheus Operator uses Kubernetes custom resources to simplify the deployment and configuration of Prometheus, Alertmanager, and related monitoring components.

kube-prometheus

kube-prometheus provides example configurations for a complete cluster monitoring stack based on Prometheus and the Prometheus Operator. This includes deployment of multiple Prometheus and Alertmanager instances, metrics exporters such as the node_exporter for gathering node metrics, scrape target configuration linking Prometheus to various metrics endpoints, and example alerting rules for notification of potential issues in the cluster.

helm chart

The prometheus-community/kube-prometheus-stack helm chart provides a similar feature set to kube-prometheus. This chart is maintained by the Prometheus community. For more information, please see the chart's readme

Prerequisites

Version >=0.39.0 of the Prometheus Operator requires a Kubernetes cluster of version >=1.16.0. If you are just starting out with the Prometheus Operator, it is highly recommended to use the latest version.

If you have an older version of Kubernetes and the Prometheus Operator running, we recommend upgrading Kubernetes first and then the Prometheus Operator.

CustomResourceDefinitions

A core feature of the Prometheus Operator is to monitor the Kubernetes API server for changes to specific objects and ensure that the current Prometheus deployments match these objects. The Operator acts on the following Custom Resource Definitions (CRDs):

Prometheus, which defines a desired Prometheus deployment.
PrometheusAgent, which defines a desired Prometheus deployment, but running in Agent mode.
Alertmanager, which defines a desired Alertmanager deployment.
ThanosRuler, which defines a desired Thanos Ruler deployment.
ServiceMonitor, which declaratively specifies how groups of Kubernetes services should be monitored. The Operator automatically generates Prometheus scrape configuration based on the current state of the objects in the API server.
PodMonitor, which declaratively specifies how group of pods should be monitored. The Operator automatically generates Prometheus scrape configuration based on the current state of the objects in the API server.
Probe, which declaratively specifies how groups of ingresses or static targets should be monitored. The Operator automatically generates Prometheus scrape configuration based on the definition.
ScrapeConfig, which declaratively specifies scrape configurations to be added to Prometheus. This CustomResourceDefinition helps with scraping resources outside the Kubernetes cluster.
PrometheusRule, which defines a desired set of Prometheus alerting and/or recording rules. The Operator generates a rule file, which can be used by Prometheus instances.
AlertmanagerConfig, which declaratively specifies subsections of the Alertmanager configuration, allowing routing of alerts to custom receivers, and setting inhibit rules.

The Prometheus operator automatically detects changes in the Kubernetes API server to any of the above objects, and ensures that matching deployments and configurations are kept in sync.

To learn more about the CRDs introduced by the Prometheus Operator have a look at the design page.

Dynamic Admission Control

To prevent invalid Prometheus alerting and recording rules from causing failures in a deployed Prometheus instance, an admission webhook is provided to validate PrometheusRule resources upon initial creation or update.

For more information on this feature, see the user guide.

Quickstart

Note: this quickstart does not provision an entire monitoring stack; if that is what you are looking for, see the kube-prometheus project. If you want the whole stack, but have already applied the bundle.yaml, delete the bundle first (kubectl delete -f bundle.yaml).

To quickly try out just the Prometheus Operator inside a cluster, choose a release and run the following command:

kubectl create -f bundle.yaml

Note: make sure to adapt the namespace in the ClusterRoleBinding if deploying in a namespace other than the default namespace.

To run the Operator outside of a cluster:

make
scripts/run-external.sh <kubectl cluster name>

Removal

To remove the operator and Prometheus, first delete any custom resources you created in each namespace. The operator will automatically shut down and remove Prometheus and Alertmanager pods, and associated ConfigMaps.

for n in $(kubectl get namespaces -o jsonpath={..metadata.name}); do
  kubectl delete --all --namespace=$n prometheus,servicemonitor,podmonitor,alertmanager
done

After a couple of minutes you can go ahead and remove the operator itself.

kubectl delete -f bundle.yaml

The operator automatically creates services in each namespace where you created a Prometheus or Alertmanager resources, and defines three custom resource definitions. You can clean these up now.

for n in $(kubectl get namespaces -o jsonpath={..metadata.name}); do
  kubectl delete --ignore-not-found --namespace=$n service prometheus-operated alertmanager-operated
done

kubectl delete --ignore-not-found customresourcedefinitions \
  prometheuses.monitoring.coreos.com \
  servicemonitors.monitoring.coreos.com \
  podmonitors.monitoring.coreos.com \
  alertmanagers.monitoring.coreos.com \
  prometheusrules.monitoring.coreos.com

Testing

See TESTING

Contributing

See CONTRIBUTING.

Security

If you find a security vulnerability related to the Prometheus Operator, please do not report it by opening a GitHub issue, but instead please send an e-mail to the maintainers of the project found in the MAINTAINERS.md file.

Troubleshooting

Check the troubleshooting documentation for common issues and frequently asked questions (FAQ).

Acknowledgements

prometheus-operator organization logo was created and contributed by Bianca Cheng Costanzo.

prometheus-operator's People

Contributors

Stargazers

Watchers

Forkers

brancz barakmich gosharplite triclambert ryanj pixelfederation lpabon brian-brazil jimmidyson joonas wearemolecule shaunthium yarntime zoues mxinden fengor laverite colemanserious gianrubio euank daniel-yavorovich mikebryant nirdothan-zz zoutaiqi ivanthelad luxas heartlock philips cloudposse-archives peterrosell quantum krobertson mrtrustor jhorwit2 eugene-chow splisson sjuxax tuannvm klinakuf electricjesus crandles wilkers-steve frakti mrwlad jpisaac imkira authbox-lib ds0nt tapppi sheerun gytisgreitai wleese qianmoke chen0031 semyonslepov jasisk jakemctigue ayetier genti-t erimatnor mmerrill3 ops-center fabxc satchel9 eedugon slok ananthravi jescarri wizard-cxy gouthamve cjauvin jordanjennings alexxnica kryndex zeiot-old bzub discordianfish leastauthority iahmad-khan aknuds1 weiwei04 allen12921 alejandroesc wyatt88 niclic alindeman zyonash utilitywarehouse goblain cofyc julianvmodesto stakater lopatkinevgeniy zfl926 yamihalovna garylai ryanwalls tsloughter linhdsv tommyvn

prometheus-operator's Issues

Add Grafana TPR

With Prometheus deployments becoming more dynamic, we need to catch up with deploying Grafana, its dashboards and datasources in a similarly dynamic and easy way.

@alexsomesan @fabxc

Goes with the discussion in #14 and #98 .

prom statefulset should use 1 configmap reloader container instead of 2

Currently there is a reload container for rules and another one for the config. The underlying reloader library supports multiple volume dirs. Is there a reason there isn't just one reloader with 2 volume dirs?

What is the preferred way to run exporters (automatically) on the cluster for deployed applications?

What would be the preferred way to gather stats from applications that don't have prometheus support (yet), but for example have an exporter written for them?

Would this be a feature to implement in prometheus-operator to be able to create/run exporters from some sort of notation, labels or ServiceMonitor object?

add Google analytics tracker

Please git pull in https://github.com/coreos/template-project and git push to master. This will setup the LICENSE, etc.

Support nodeSelector for Prometheus TPR

I'd like to control which nodes my Prometheus instance will be scheduled onto. But, it seems that nodeSelector is not available in the Prometheus TPR, which, I assume, would be the correct place to specify it.

Unable to connect to web UI for Prometheus after creating service

I'm running minikube v0.14.0 and was following this tutorial here: https://coreos.com/blog/the-prometheus-operator.html, but after creating the prometheus TPR and service using kubectl create -f https://coreos.com/operators/prometheus/latest/prometheus-k8s.yaml in the Managed Deployments step, I was unable to connect to the web UI. Running minikube service prometheus-k8s simply continuously outputs Waiting, endpoint for service is not ready yet before error-ing out.

Any assistance would be greatly appreciated! Thank you.

allow disabling cert checks

The Prometheus TLS config has the possibility to allow skipping certificate verification. We allow setting the scheme, so we also need to be able to set whether to verify the certificates.

This blocks prometheus-operator/kube-prometheus#21 as there are on going certificate issues with the kubelet and apiserver.

@fabxc @alexsomesan

e2e test operator container image

It seems like the default process should be to test the operator image that would result from the current repository HEAD. Currently it may be confusing to run make e2e and have it being tested as the :latest tagged image by default.

@brancz

syncVersion "deadlocks" control loop

This was almost clear when implementing it, but the Prometheus operator gets stuck when syncVersion gets stuck, e.g. because there are no sufficient scheduling resources.
The work queue won't be processed any longer and hence even downscaling the replicas won't have any further effect.

To revisit: we only have to do this in the first place because PetSets do not automatically manage template updates for us, which they are intended to in the future.

prometheus-operator Pod Error

$ kubectl apply -f manifests/prometheus-operator.yaml
$ kubectl get pod
prometheus-operator-1805596728-4ehlz 0/1 CrashLoopBackOff 20 1h
$ kubectl logs prometheus-operator-1805596728-4ehlz
communicating with server failed: Get https://10.254.0.1:443/version: x509: cannot validate certificate for 10.254.0.1 because it doesn't contain any IP SANs

Why certificate cannot be verified?
How to create certificate?
my command:
$ openssl genrsa -out ca.key 2048
$ openssl req -x509 -new -nodes -key ca.key -subj "/CN=10.254.0.1" -days 5000 -out ca.crt
$ openssl genrsa -out server.key 2048
$ openssl req -new -key server.key -subj "/CN=10.254.0.1" -out server.csr
$ openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out server.crt -days 5000

the prometheus-operator container is to connect 10.254.0.1?
How did it get?

Thanks!

Allow subPath configuration for Prometheus storage VolumeMount

Hi,

I'm getting this error:

Error opening memory series storage: could not detect storage version on disk, assuming version 0, need version 1 - please wipe storage or run a version of Prometheus compatible with storage version 0" source="main.go:181

when trying to mount a GlusterFS PVC (created by a StorageClass). The cause of the problem seems to be this issue: prometheus/prometheus#953 - meaning that Prometheus needs an empty directory when initializing data. The problem is that GlusterFS contains a .trashcan directory in the volume root by default.

I suppose the easiest fix would be to allow a subPath configuration in the StorageSpec config specification, or even easier - to store the data into a fixed subfolder by default.

Invalid label name

I'm seeing what I assume to be an incompatibility between k8s label syntax and prometheus label syntax.

time="2016-12-29T22:24:07Z" level=info msg="Starting prometheus (version=1.3.0, branch=master, revision=18254a172b1e981ed593442b2259bd63617d6aca)" source="main.go:75"
time="2016-12-29T22:24:07Z" level=info msg="Build context (go=go1.7.3, user=root@d363f050a0e0, date=20161101-17:06:27)" source="main.go:76"
time="2016-12-29T22:24:07Z" level=info msg="Loading configuration file /etc/prometheus/config/prometheus.yaml" source="main.go:247"
time="2016-12-29T22:24:07Z" level=error msg="Error loading config: couldn't load configuration (-config.file=/etc/prometheus/config/prometheus.yaml): \"__meta_kubernetes_service_label_k8s-app
\" is not a valid label name" source="main.go:149"

Seems reinforced by this: prometheus/prometheus#1178

Not sure if there is scope for trying to work around this through escaping the - character (and others) somehow in the prometheus operator, or if this is just "live with" for now.

use client-go workqueue

The latest client-go ships the workqueue used by upstream controllers. It can be found in

pkg/util/workqueue

The current implementation of a similar workqueue that was built before this was present should be replaced. Also this allows use to make use of the more advanced features that workqueue already has implemented, such as rate limiting.

@alexsomesan @fabxc

Config reload not working

When using external urls the reload endpoint may differ as it can be prefixed, therefore the reload endpoints are not as expected.

@alexsomesan @fabxc

Fix is on the way, but needs testing.

ServiceMonitors unable to work

Using the latest v2.2 of the Prometheus Operator and following the instructions here https://coreos.com/blog/the-prometheus-operator.html, I'm unable to get any Targets from the ServiceMonitors displaying in the Prometheus UI. None of the logs for pods etc. seem to show any errors or unusual activity; simply that none of the ServiceMonitor targets are showing up on the Targets page.

Interestingly enough, I did not encounter this problem in v0.10 of the Prometheus Operator, only in subsequent versions up to the latest (v2.2).

Any help would be greatly appreciated! Please do let me know as well if there are any logs that would be helpful.

analytics: please add analytics

So we can track how the operator as it gets used please add analytics pings by default. You can use the etcd-operator as a model:

Issue: coreos/etcd-operator#197
Code: https://github.com/coreos/etcd-operator/pull/199/files

ConfigMap update is reseted

Hi,

if i update the ConfigMap with kubectl replace , it works. After some time (Minutes) the ConfigMap is reset to the Factory state.

Is updating the ConfigMap not supported? Witout the correct config i cannot Monitor kubernetes (and the factory ConfigMap does not work)

regards f0

Add node/kubelet monitoring

I was following the kube-prometheus guide but it seems like the operator overwrites the initial config map with the configuration for the ServiceMonitors and as a result the kubelet monitoring is lost.

The node-exporters and kube-state-metrics (also deleted by the operator) can be monitored again using ServiceMonitors but as far as I can tell the kubelets can't since it uses the node role.

Less colliding governing service name

To avoid breaking Prometheus installations already present in a cluster we should be creating the governing Services for the Statefulsets with more unique names than alertmanager or prometheus which are likely the names of the Services in place today.

I propose naming them prometheus-operated and alertmanager-operated.

@alexsomesan @fabxc

Labels used for relabelling need flattening

Prometheus labels can't be in the form of for example kubernetes.io/cluster-service, it requires to be flattened just like the discovery mechanism in Prometheus does when concatenating in the operator to generate the config.

An example that will need fixing: https://github.com/coreos/prometheus-operator/blob/854bab6774af287000a9be2f5ca20f94c9ebe3cd/pkg/prometheus/promcfg.go#L84

@fabxc @alexsomesan

The content of the original blog post is out of date

In https://coreos.com/blog/the-prometheus-operator.html, under the "Service Monitoring" heading, the example prometheus spec uses serviceMonitors as a key instead of the actual serviceMonitorSelector key. In testing out a setup in my own environment, I based my prometheus.yaml on this key and it caused a hell of lot of hair pulling.

The documentation here in the repo is nice and clear, and being alpha you've certainly mentioned anything is subject to change, but just in case people's entry to this pretty rad idea is through that post (as was mine), you might want to save them some frustration.

Multiple prometheus objects in the same namespace get added to the same default 'prometheus' service

Prometheus Operator seems to create default service objects named prometheus and alertmanager. If you create more than one prometheus object in the same namespace (E.G. frontend and backend, both in the monitoring namespace,) then both get added to the same default service named prometheus, as seen in the related endpoints object. (Looks like the same would happen if for some reason you wanted more than one alertmanager object in the same namespace.)

This is confusing and I imagine not what was intended. (Or maybe there's a purpose I'm missing?)

I'm unclear as to the purpose of these default service objects. I'm creating my own as I need to control various aspects including annotations.

P.S. Hope I'm not creating too many issues, love what you guys are doing, hoping to help!

Prometheus and Alertmanager objects cannot have the same name

I was thinking it'd make sense to have a Prometheus, Alertmanager, (not AlertManager?) and ServiceMonitor object definition each named frontend, the three being related, yet already differentiated by their kind. However, since the Operator looks for ConfigMaps named <(prometheus name|alertmanager name)>, I assume it would break.

What about an option to specify the configMap's name/label?

Use ownerReferences

We should set ownerReferences in the ObjectMeta of controller-created objects.
This will also be necessary to not delete user-created ConfigMaps for rules and configuration on teardown.

Handle DeletedFinalStateUnknown everywhere

All deletion event handlers need a piece of code along these lines: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/replicaset/replica_set.go#L356-L371

Namespace constraints

The Prometheus TPR includes ServiceMonitors via label selection. It only selects ServiceMonitors that are in the same namespace as the Prometheus TPR. That is aligned with semantics of other Kubernetes resources and there seems little reason to diverge from that.

ServiceMonitors select Services by label selection. These label selection work across all namespaces. There are very valid use cases for this, such as meta-monitoring. In the general case however, namespaces are expected to provide isolation and I'd expect my setup to not be negatively affected by actions of other people in their own namespaces.

It seems like we have three use cases:

Consider services from same namespace as ServiceMonitor (and transitively Prometheus)
Consider services from a subset of namespaces, e.g. a prod and dev namespace
Consider services from all namespaces, e.g. meta-monitoring

Currently we only have 3.

How do we want to expose the other options to users? Do we select multiple namespaces by simply providing a list of them or do we also allow selecting them by label, e.g. so that I can select all namespaces belonging to my team?
Label selectors are always more dynamic, but they come with a cost of complexity and should be compared to their likely real-world benefit. For example, Alertmanager services are added to Prometheus TPRs by simple namespace/name references rather than label selection.

I think selecting all namespaces by default, as it is now is potentially dangerous and should only be done explicitly. My current idea looks as follows:

apiVersion: monitoring.coreos.com/v1alpha1
kind: ServiceMonitor
metadata:
  name: frontend
  labels:
    tier: frontend
spec:
  # Simple list of namespaces to select services from.
  namespaces: <string array>
  # Explicit enabling of all namespaces.
  allNamespace: <bool>
  # Dynamic selection could be added later relatively seamlessly
  # if there's an actual need?
  namespaceSelector: <label selector>
  # The above are mutually exclusive. If none is provided, default to just selecting within
  # ServiceMonitors namespace?
  
  # Should this be `serviceSelector` then for explicitness?
  # Other Kubernetes resources stick with just `selector` where reasonable.
  selector:
    # label selector of services
  endpoints:
    # ...

This is merely one idea. Would be great if anyone can think of something smarter and can verify what I said against their real-world experience.

@matthiasr @grobie @brancz @brian-brazil

panic when no resources are set

When no resources are set when creating a Prometheus TPR then the operator panics as it is trying to set a resource request on a nil map.

Logs:

panic: assignment to entry in nil map

goroutine 21 [running]:
panic(0xea0f00, 0xc42046e480)
        /usr/local/Cellar/go/1.7.1/libexec/src/runtime/panic.go:500 +0x1a1
github.com/coreos/prometheus-operator/pkg/operator.makePetSet(0xc42042c4f0, 0xa, 0xc4203ae8c0, 0x1e, 0xc42042c500, 0xf, 0x0, 0x0, 0xc42042c510, 0xa, ...)
        /Users/fredericbranczyk/go/src/github.com/coreos/prometheus-operator/pkg/operator/petset.go:46 +0x8ec
github.com/coreos/prometheus-operator/pkg/operator.(*Operator).reconcile(0xc4200855f0, 0xc4200576d0, 0xc420027f90, 0x1058b01)
        /Users/fredericbranczyk/go/src/github.com/coreos/prometheus-operator/pkg/operator/operator.go:421 +0xfcc
github.com/coreos/prometheus-operator/pkg/operator.(*Operator).worker(0xc4200855f0)
        /Users/fredericbranczyk/go/src/github.com/coreos/prometheus-operator/pkg/operator/operator.go:286 +0x82
created by github.com/coreos/prometheus-operator/pkg/operator.(*Operator).Run
        /Users/fredericbranczyk/go/src/github.com/coreos/prometheus-operator/pkg/operator/operator.go:103 +0xa0

Should be a simple fix by just instantiating a map if it is nil.

@fabxc

Simplify monitor selector

Currently we allow including ServiceMonitors by a list of label selectors (kind of how we allow lists of everything in the Prometheus config). Example:

apiVersion: monitoring.coreos.com/v1alpha1
kind: Prometheus
metadata:
  name: prometheus-frontend
  labels:
    prometheus: frontend
spec:
  version: v1.3.1
  serviceMonitors:
  - selector:
      matchLabels:
        tier: frontend

While the most flexible it adds complexity and thus goes against the main goal of the operator. It does not seem aligned with selector occurrences in Kubernetes.

I think 99% of use cases will be covered by

a simple single-label inclusion like team=X or tier=Y
include all ServiceMonitors in this namespace

The first would be sufficiently covered by one single label Selector like:

apiVersion: monitoring.coreos.com/v1alpha1
kind: Prometheus
metadata:
  name: prometheus-frontend
  labels:
    prometheus: frontend
spec:
  version: v1.3.0
  serviceMonitors:
    matchLabels:
      tier: frontend

I'd imagine for any use case that could not be covered by label selectors, one would just expect the ServiceMonitors labelling scheme to be adjusted to do so. Monitoring setups generally seem static enough to do it this way.

This of course only works if we are positive that we won't need any additional options on how to include service monitors. (Like overwriting some of the attributes defined in there – which I don't expect as people should own both resources.)

For the second, do we need an explicit option for that? Do we just expect labelling to solve that (exists-relation can include monitors just based on a standard label existing, regardless of value)?

Thoughts?

@brancz @brian-brazil @grobie @matthiasr

Prometheus pod is not deleted on deleting the resource

After following https://coreos.com/blog/the-prometheus-operator.html - I deleted all the resources (kubectl delete -f . on the folder having all the yamls)

The prometheus-k8s pod was remaining.

Finally figured out that it is running from a PetSet and had to delete that separately.

As a user I didn't create a PetSet, so I shouldn't have to be deleting it to ensure proper cleanup (or other activities.)

set omitempty accordingly

Many fields on the TPR are not necessary but have not been marked with omitempty therefore when requesting a resource even if a field is actually blank there is a lot of bloat returned. This should be set on all optional fields.

@alexsomesan @fabxc

Track PetSet rename to StateFul set

x-ref: kubernetes/kubernetes#35534

Pet Sets will be renamed to Stateful Sets.
We'll have to figure out whether we breakingly move forward or adapt the operators behavior based on the used cluster version. If it's merely a rename we should be able to handle it gracefully.

Revisit operator-defined labels

Currently the operator defines various labels on components it creates.
These labels are all namespaces as prometheus.coreos.com/. Based on how other controllers work, the namespacing generally seems only to be used in annotations and is rather clumsy for labels.

Revisit dropping the namespace and replacing it with simple labels.
The namespace will be invalid anyway per #10.

Pass Prometheus object's labels through to StatefulSets? (and thus Pods)

Currently Prometheus Operator provides no way to add custom labels to pods created by StatefulSets (created by Prometheuses.) StatefulSets already pass through their metadata label sets through to the pods created by themselves, so Prometheus Operator would only need to read a Prometheus object's labels and pass them through to the StatefulSets it creates.

I believe this is the relevant code.

Also, it wasn't clear to me in the docs that the Operator forces the label prometheus: <prometheus object name>, which caused me some confusion for a time until I read the code.

Issue creating "prometheus" container on cluster

Hi,

I was trying out the new v0.2.0 release of the Prometheus Operator on my minikube cluster this morning with kubernetes v1.5 and everything worked beautifully. Thank you guys for all the work! However, when I tried the same on my GCE cluster, I got an error when trying to create the StatefulSet in charge of creating the prometheus pods.

My events log for the StatefulSet pod is as follows:

pet: prometheus-k8s-0

Successfully assigned prometheus-k8s-0 to <redacted>

Container image "quay.io/prometheus/prometheus:v1.4.1" already present on machine

warning
Failed to create docker container "prometheus" of pod "prometheus-k8s-0_default(<redacted>)" with error: Error response from daemon: no such file or directory

pulling image "jimmidyson/configmap-reload"

Successfully pulled image "jimmidyson/configmap-reload"

Created container with docker id <redacted>; Security:[seccomp=unconfined]

Started container with docker id <redacted>

Created container with docker id <redacted>; Security:[seccomp=unconfined]

Started container with docker id <redacted>

The configmap-reload image seems to creating fine, but the prometheus image errors out with the message above. As a result, my StatefulSet is unable to be created.

The YAML file for my Prometheus TPR is as follows:

apiVersion: monitoring.coreos.com/v1alpha1
kind: Prometheus
metadata:
  name: prometheus-k8s
  labels:
    prometheus: k8s
spec:
  version: v1.4.1
  resources:
    requests:
      memory: 400Mi

In short, I'm able to get past this step on my local minikube environment, but my GCE cluster throws the error above. Any and all assistance would be greatly appreciated! Thank you.

Can't see any targets

Very new to Prometheus, and was following the steps here - https://coreos.com/blog/the-prometheus-operator.html

Shortly after Kubernetes will update the configuration in the Prometheus pod and we can see targets showing up on the "Targets" page.

The targets page is blank for me.

The only change I did to the yaml configs was to add namespace: kube-system to them.

rename: use the operator term

I think we simply rename this to prometheus-operator and merge in alert manager as well.

Updates after changed pod template

Currently we just inspect the version suffix of the base image to determine whether all containers are in sync. We can change the entire pod template though and it the new resource field in the Prometheus TPR we actually do.

So we'd need something like pod_template_hash labels like Deployments use eventually.
It's all still up in the air how the PetSet controller will provide updates later on. If it will also use that label, we will be conflicting again.

Possibly we'd just do a an on the fly hash for now or simply track all mutable template fields until it's figured out what the PetSet controller will do.

deployment fails with: no kind "Prometheus" is registered for version "monitoring.coreos.com/v1alpha1"

Hi I'm trying to deploy this example, however it fails with
I'm manually executing https://github.com/coreos/kube-prometheus/blob/master/hack/cluster-monitoring/deploy

and it dies at this point:

kctl get servicemonitor
No resources found.
kctl get prometheus
No resources found.
kubectl apply -f manifests/prometheus
configmap "prometheus-k8s" configured
configmap "prometheus-k8s-rules" configured
service "prometheus-k8s" configured
error: unable to decode "manifests/prometheus/prometheus-k8s.yaml": no kind "Prometheus" is registered for version "monitoring.coreos.com/v1alpha1"

However I can see the prometheus-operator running

kctl get pod
NAME                                  READY     STATUS    RESTARTS   AGE
grafana-874468113-mwisk               2/2       Running   0          7m
kube-state-metrics-3229993571-lx7t3   1/1       Running   0          31m
node-exporter-bmfqf                   1/1       Running   0          23m
node-exporter-j01fn                   1/1       Running   0          23m
node-exporter-y5w98                   1/1       Running   0          23m
prometheus-operator-479044303-s4sn2   1/1       Running   0          7m

kctl logs prometheus-operator-479044303-s4sn2
ts=2016-12-15T00:11:03Z caller=operator.go:102 component=alertmanageroperator msg="connection established" cluster-version=v1.4.4+coreos.0
ts=2016-12-15T00:11:03Z caller=operator.go:150 component=prometheusoperator msg="connection established" cluster-version=v1.4.4+coreos.0
ts=2016-12-15T00:11:03Z caller=operator.go:476 component=alertmanageroperator msg="TPR created" tpr=alertmanager.monitoring.coreos.com
ts=2016-12-15T00:11:03Z caller=operator.go:653 component=prometheusoperator msg="TPR created" tpr=service-monitor.monitoring.coreos.com
ts=2016-12-15T00:11:03Z caller=operator.go:653 component=prometheusoperator msg="TPR created" tpr=prometheus.monitoring.coreos.com
ts=2016-12-15T00:11:06Z caller=operator.go:116 component=alertmanageroperator msg="TPR API endpoints ready"
ts=2016-12-15T00:11:09Z caller=operator.go:164 component=prometheusoperator msg="TPR API endpoints ready"

I have the following api-versions

kctl api-versions
apps/v1alpha1
authentication.k8s.io/v1beta1
authorization.k8s.io/v1beta1
autoscaling/v1
batch/v1
batch/v2alpha1
certificates.k8s.io/v1alpha1
extensions/v1beta1
monitoring.coreos.com/v1alpha1
policy/v1alpha1
rbac.authorization.k8s.io/v1alpha1
storage.k8s.io/v1beta1
v1

What else I'm missing?

Do I need to apply any other kind of configuration?

Thanks!.

Document Google analytics usage

I know a couple of $companies with very strict policies when it comes to infrastructure talking to the internet. The Google Analytics should be clearly documented for that reason.

Switch to monitoring.coreos.com API namespace

x-ref #4

`kubectl describe <TPR>` doesn't work

Using Prometheus Operator v0.2.2:

david@machine:~/prometheus$ kubectl get servicemonitors,alertmanagers,prometheuses --all-namespaces
NAMESPACE   NAME                                    KIND
default     servicemonitors/prometheus-front-tier   ServiceMonitor.v1alpha1.monitoring.coreos.com

NAMESPACE   NAME                                    KIND
default     alertmanagers/alertmanager-front-tier   Alertmanager.v1alpha1.monitoring.coreos.com

NAMESPACE   NAME                                 KIND
default     prometheuses/prometheus-front-tier   Prometheus.v1alpha1.monitoring.coreos.com
david@machine:~/prometheus$
david@machine:~/prometheus$ kubectl describe prometheuses/prometheus-front-tier
the provided version "monitoring.coreos.com/v1alpha1" has no relevant versions: group monitoring.coreos.com has not been registered
no matches for monitoring.coreos.com/, Kind=Prometheus
david@machine:~/prometheus$
david@machine:~/prometheus$ kubectl describe alertmanagers/alertmanager-front-tier
the provided version "monitoring.coreos.com/v1alpha1" has no relevant versions: group monitoring.coreos.com has not been registered
no matches for monitoring.coreos.com/, Kind=Alertmanager
david@machine:~/prometheus$
david@machine:~/prometheus$ kubectl describe servicemonitors/prometheus-front-tier
the provided version "monitoring.coreos.com/v1alpha1" has no relevant versions: group monitoring.coreos.com has not been registered
no matches for monitoring.coreos.com/, Kind=ServiceMonitor
david@machine:~/prometheus$
david@machine:~/prometheus$ kubectl version
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1", GitCommit:"82450d03cb057bab0950214ef122b67c83fb11df", GitTreeState:"clean", BuildDate:"2016-12-14T00:57:05Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1", GitCommit:"82450d03cb057bab0950214ef122b67c83fb11df", GitTreeState:"clean", BuildDate:"2016-12-14T00:52:01Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
david@machine:~/prometheus$

Add PodMonitor

We should have a PodMonitor to, using the pod role in the Prometheus config. That solves monitoring pods running the same application but which are only partially belong to a service, e.g. quarantined pods.
This should be easy enough and would look exactly like the service monitor only with different selector semantics:

apiVersion: monitoring.coreos.com/v1alpha1
kind: PodMonitor
metadata:
  name: frontend
  labels:
    tier: frontend
spec:
  selector:
    # label selector as in ServiceMonitor but doesn't select services of which we discover pods
    # but directly selects pods.
  endpoints:
    # similar to ServiceMonitor, just we don't differentiate between target ports and service ports.

This now has to be included into a Prometheus TPR and would happen just like for ServiceMonitors (x-ref: #37).
Now we have two selectors in there (or two lists of selectors) to select different resource types. Another option would be to have a single selector applying to ServiceMonitors and PodMonitors. A single selector applying to two different resource types is quite implicit and hence not too great. OTOH, as a user I don't really care about what type the monitor is.

Another option would be to consolidate both in a single Monitor.

apiVersion: monitoring.coreos.com/v1alpha1
kind: Monitor
metadata:
  name: frontend
  labels:
    tier: frontend
spec:
  pods:
    # label selector working on pods
  services:
    # label selector working on service endpoints
  endpoints:
    # similar to ServiceMonitor, just we don't differentiate between target ports and service ports.

Seems interesting first but I believe it will constrain us in the long run by colliding options that only apply to either, i.e. I'd like to allow specifying service blackbox probing at some point.

Thoughts on any of these options and ideas?

@matthiasr @grobie @brian-brazil @brancz

Documentation of user specific findings/questions

A list for every user of prometheus-operator that misses something from the documentation that should be documented 😉 (If a finding/question is missing let me know and I'll add).

Components Doc

Grafana Watcher

What is the use case of it? (Taken from #245)
Example usage? (Taken from #245)
How grafana dashboard must be modified? (Taken from #245)

User Guides

Cluster Monitoring
Application Monitoring
Example Exporter Configs/Manifests (this is kube-prometheus, which is planned to be merged with this repository)
Pushing (cronjob) metrics to Pushgateway
Use Kubernetes Ingress to expose Prometheus

Allow specifying service port for Alertmanager

Currently we just take the first service port in the list to generate Alertmanager URLs to add as Prometheus flags.
The user should be able to specify a port name or plain number as we cannot expect any particular order in the service ports.

Of course we can often make this assumption – especially if the Alertmanagers are run via the operator.
When using service discovery in Prometheus to find AMs, we have no guarantees around order though and don't know which one is the first.

@brancz

Enable access via `kubectl proxy`

When the kubes cluster isn't running locally, it's pretty common to need to access it via the apiserver's proxy mechanism - this is particularly useful for "infrastructure" tools, as the authentication mechanism of Kubernetes is applied.

However, the Prometheus UI uses absolute URLs with what I would go so far as to call reckless abandon, and doesn't work at all behind any proxy which doesn't rewrite all requests.

It can be made to work by adding these two arguments to prometheus:

-web.external-url=http://127.0.0.1:8001/api/v1/proxy/namespaces/default/services/prometheus-k8s:web/
-web.route-prefix=/

And then the UI can be reached by running kubectl proxy and browsing to the URL above.

Auto generate docs for API

As the code is always the source of truth and the types published as part of the API should be well commented, we should be able to auto generate at least the documentation for those objects.

Not a priority, but opening this for tracking.

@alexsomesan @fabxc

Update Grafana data sources

The operator should auto-update the Grafana data sources whenever it deploys a new Prometheus.
We can figure out later how that could possible restrict to certain Grafana's if there are multiple ones.

kubectl apply does not work

This may be similar to #91. Currently I have to delete and recreate Prometheus and ServiceMonitor resources with every update to the manifest.

[10:05:04][~]$kc -n monitoring apply -f servicemonitor-node-exporter.yaml
error: unable to decode "servicemonitor-node-exporter.yaml": no kind "ServiceMonitor" is registered for version "monitoring.coreos.com/v1alpha1"

Alertmanager container should just be called "alertmanager"

Currently the alertmanager container name has the same name as the Alertmanager object. That is more confusing then useful and should be changed to just "alertmanager", the equivalent is already the case for the Prometheus object and corresponding container.

@alexsomesan @fabxc

alertmanager matched, that should not match

I spun up a fresh minikube

minikube start --vm-driver="virtualbox" --kubernetes-version="v1.4.3" --memory 2048 --cpus 2

Then built the operator from HEAD.

make container

Edited the deployment.yaml to use the appropriate image tag (in this case: 4f0fe2d) and created the deployment.

kubectl create -f deployment.yaml

Waited for the prometheus TPR to be ready by testing with

kubectl get prometheus

Once they the TPRs were ready I created all resources in example/.

kubectl create -f example/

And to my surprise, the Alertmanager instances were discovered correctly, even though, the configuration says they are called alertmanager even though, the Alertmanager cluster resource name is alertmanager-main.

I'm unsure whether this is a problem in the operator or upstream. I'm actually thinking it might be upstream as this is the generated configuration, which looks like what I expected, but should not work.

alerting:
  alertmanagers:
  - kubernetes_sd_configs:
    - role: endpoints
    relabel_configs:
    - action: keep
      regex: alertmanager
      source_labels:
      - __meta_kubernetes_service_name
    - action: keep
      regex: default
      source_labels:
      - __meta_kubernetes_namespace
    - action: keep
      regex: web
      source_labels:
      - __meta_kubernetes_endpoint_port_name
    scheme: http
global:
  evaluation_interval: 30s
  scrape_interval: 30s
rule_files:
- /etc/prometheus/rules/*.rules
scrape_configs:
- job_name: default/example-app/0
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - action: keep
    regex: example-app
    source_labels:
    - __meta_kubernetes_service_label_app
  - action: keep
    regex: default
    source_labels:
    - __meta_kubernetes_namespace
  - action: keep
    regex: web
    source_labels:
    - __meta_kubernetes_endpoint_port_name
  - source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
    replacement: svc_$1
  - action: replace
    replacement: ""
    target_label: __meta_kubernetes_pod_label_pod_template_hash
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
    replacement: pod_$1
  - replacement: ${1}-web
    source_labels:
    - __meta_kubernetes_service_name
    target_label: job
  scrape_interval: 30s
- job_name: default/node-exporter/0
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - action: keep
    regex: node-exporter
    source_labels:
    - __meta_kubernetes_service_label_app
  - action: keep
    regex: default
    source_labels:
    - __meta_kubernetes_namespace
  - action: keep
    regex: scrape
    source_labels:
    - __meta_kubernetes_endpoint_port_name
  - source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
    replacement: svc_$1
  - action: replace
    replacement: ""
    target_label: __meta_kubernetes_pod_label_pod_template_hash
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
    replacement: pod_$1
  - replacement: ${1}-scrape
    source_labels:
    - __meta_kubernetes_service_name
    target_label: job

@fabxc

prometheus-operator / prometheus-operator Goto Github PK

prometheus-operator's Introduction

Prometheus Operator

Overview

Project Status

Prometheus Operator vs. kube-prometheus vs. community helm chart

Prometheus Operator

kube-prometheus

helm chart

Prerequisites

CustomResourceDefinitions

Dynamic Admission Control

Quickstart

Removal

Testing

Contributing

Security

Troubleshooting

Acknowledgements

prometheus-operator's People

Contributors

Stargazers

Watchers

Forkers

prometheus-operator's Issues

Components Doc

Grafana Watcher

User Guides

Recommend Projects

Recommend Topics

Recommend Org