Giter Site home page Giter Site logo

servicemeshinterface / smi-metrics Goto Github PK

View Code? Open in Web Editor NEW
25.0 25.0 17.0 145 KB

Expose SMI Metrics

Home Page: https://smi-spec.io

License: Apache License 2.0

Dockerfile 0.52% Makefile 3.30% Shell 1.57% Go 92.43% Starlark 0.38% HTML 1.81%
cncf servicemesh

smi-metrics's People

Contributors

adleong avatar bridgetkromhout avatar daxmc99 avatar grampelberg avatar idvoretskyi avatar ihcsim avatar lachie83 avatar nojnhuh avatar pothulapati avatar slack avatar stefanprodan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

smi-metrics's Issues

Negative Values in the latency metrics.

Now when there are no requests happening for a particular component, The default latency values are returning in negative as shown here which is not desirable.

Zero values can be a better default?

More documentation on the SMI APIService

Right now, The spec defines a CRD but we only use it for output and this repo is very similar to that of kubernetes-metrics. There's a bit of confusion about the approach here especially because of this being different from the implementations of other specs of SMI.

More documentation about the use of metrics.smi-spec.io APIService would be helpful for users.

Smi-Metrics fails on kubernetes 1.19

SMI-Metrics fails to run on Kubernetes 1.19 clusters stating error trying to reach service: x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=

ApiService dosen't run on 1.15

The v1alpha1.metrics.smi-spec.io ApiService fails on Kubernetes clusters with verison 1.15 marking as FailedDiscoveryCheck

  conditions:
  - lastTransitionTime: "2019-06-28T16:20:22Z"
    message: 'failing or missing response from https://10.100.82.130:443: bad status
      from https://10.100.82.130:443: 404'
    reason: FailedDiscoveryCheck
    status: "False"
    type: Available

Looks like this has something to do with the the service not exposing the information about it self as the pods and services run successfully.

Move image to a real location

The images are using my personal docker account right now, feels like that should be something more general. @slack any good places we could put the images? I'm happy to create an SMI docker hub account.

Add Tests for Istio pkg

Right now there are no tests for Istio, To make the development process easier, adding tests would be the way to go!

Add support for Consul Connect

Consul Connect depends on prometheus for metrics, there is an issue in Connect repo to add kubernetes metadata as labels but its still in works, that would make this process easy.

If not, we can talk to the k8s API and then get the pod metada and query prometheus for that.

Problem installing with default instructions

From the base of the repo I am recieving this error

helm template chart --set adapter=linkerd | kubectl apply -f -
apiservice.apiregistration.k8s.io/v1alpha1.metrics.smi-spec.io configured
Error from server (Invalid): error when creating "STDIN": Secret "RELEASE-NAME-smi-metrics" is invalid: metadata.name: Invalid value: "RELEASE-NAME-smi-metrics": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
Error from server (Invalid): error when creating "STDIN": ConfigMap "RELEASE-NAME-smi-metrics" is invalid: metadata.name: Invalid value: "RELEASE-NAME-smi-metrics": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
Error from server (Invalid): error when creating "STDIN": ServiceAccount "RELEASE-NAME-smi-metrics" is invalid: metadata.name: Invalid value: "RELEASE-NAME-smi-metrics": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
Error from server (Invalid): error when creating "STDIN": RoleBinding.rbac.authorization.k8s.io "RELEASE-NAME-smi-metrics" is invalid: subjects[0].name: Invalid value: "RELEASE-NAME-smi-metrics": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
Error from server (Invalid): error when creating "STDIN": Service "RELEASE-NAME-smi-metrics" is invalid: metadata.name: Invalid value: "RELEASE-NAME-smi-metrics": a DNS-1035 label must consist of lower case alphanumeric characters or '-', start with an alphabetic character, and end with an alphanumeric character (e.g. 'my-name',  or 'abc-123', regex used for validation is '[a-z]([-a-z0-9]*[a-z0-9])?')
Error from server (Invalid): error when creating "STDIN": Deployment.apps "RELEASE-NAME-smi-metrics" is invalid: [metadata.name: Invalid value: "RELEASE-NAME-smi-metrics": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), spec.template.spec.serviceAccountName: Invalid value: "RELEASE-NAME-smi-metrics": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')]

Are there any scripts around to update the RELEASE-NAME variable?

Template response latency queries.

Right now, Response Latency Queries use the same template with different percentiles i.e 99 , 90 , etc.

This can be templated, so that the configuration files are more readable.

Add CI

Let's get travis up and running for PRs and master (and add the requisite badges).

Bump k8s dependencies to 1.15

Now that 1.15 is out, all the k8s dependencies should be bumped. This might be an issue with deislabs/smi-sdk-go that will make this more difficult.

How to serve multiple API versions

It is desirable to serve more than one API version at a time to allow clients a deprecation window in which they can gracefully upgrade to the next API version. How should this be accomplished?

One way would be to duplicate most of the controller code such that the copy uses the v1alpha2 types and adds any v1alpha2 functionality while the original uses the v1alpha1 types. The router would then pick the v1alpha1 or v1alpha2 handler depending on the api version of the request. Users would need to have 2 ApiService objects (one for each version) which both point to the same service.

Another way would be to simply upgrade the controller code to v1alpha2 and drop support for v1alpha1. Users would then run two different releases of the smi-metrics controller, one at the v1alpha1 release and one at the v1alpha2 release. They would create two different ApiService objects and have each one point to the appropriate controller.

Service shouldnt 404 on `/` endpoint

Right now the APIService only serves requests from /apis/group/version, and 404's on / .

We can do something like what kubernetes does, on / and provide information about sub resources that users can ask about! That would provide awesome user experience on the SMI-Metrics API

Screenshot 2019-07-09 08 23 36

Move to Github Actions

Now that Github Actions is out of beta, and most projects are already using it.
I think its better if we move.

Fix Charts directory

Currently we publish installation helm charts as a github release, but the charts in the repo are templated and thus not usable directly causing confusions like #42 ,etc

I think the best way is to update the release workflow to also change relevant files in the repo, so that users can directly use the charts dir when they clone the repo.

Fix make build

Right now, make build fails with fatal: no tag exactly matches, replace the tagging and Image name functionality with env variables that users can pass.

Support for edges at Kind level

There are common situations where you need to request edge information across a Kind within a namespace. For example, if I want to create an edges graph I need information about every deployment in the namespace. Right now, the only way to do this is to request edges for each kind found in the namespace. Instead, I propose we add a special edges endpoint.

For example, /apis/metrics.smi-spec.io/v1alpha1/namespaces/default/deployments/ could have the endpoint of an edge of /apis/metrics.smi-spec.io/v1alpha1/namespaces/default/deployments/$edges. The special character is required to avoid any overlap with a possible resource name.

Any thoughts on this proposal?

Support Metrics from Istio

Once #3 is done, As the prometheus queries will be pluggable and Istio also stores aggregate data in Prometheus.

Installation Configuration (specific to istio i.e prometheus url, etc) and a new prom queries ConfigMap should be able to get metrics from istio.

cc @grampelberg

Making Queries Configurable

Right Now, The queries are present in the queries.go file and are specific to linkerd.

To make this repo support more implementations, we need a way to make the prom quieries pluggable. This can be solved by having each implementation have a ConfigMap with the queries.

The corresponding ConfigMap is installed and dynamically loaded as a configuration field into Viper during initialisation.

@grampelberg

Telemetry v2 for Istio

Now Istio moved a two new telemetry model, where the metrics are not configurable yet and Istio only gives back some default metrics.

There are also some changes in the default metrics like fluxcd/flagger#478

This can act like a placeholder Issue for the Telemetry v2 Discussion

Very large values return from API vs Prometheus

I have been noticing some discrepancies between the values reported by smi-metrics vs Prometheus directly.

For example

kubectl get --raw /apis/metrics.smi-spec.io/v1alpha1/namespaces/linkerd/deployments/linkerd-web/ | jq
{
  "kind": "TrafficMetrics",
  "apiVersion": "metrics.smi-spec.io/v1alpha1",
  "metadata": {
    "name": "linkerd-web",
    "namespace": "linkerd",
    "selfLink": "/apis/metrics.smi-spec.io/v1alpha1/namespaces/linkerd/deployments/linkerd-web",
    "creationTimestamp": "2020-01-28T23:58:10Z"
  },
  "timestamp": "2020-01-28T23:58:10Z",
  "window": "30s",
  "resource": {
    "kind": "Deployment",
    "namespace": "linkerd",
    "name": "linkerd-web"
  },
  "edge": {
    "direction": "from",
    "resource": null
  },
  "metrics": [
    {
      "name": "p99_response_latency",
      "unit": "ms",
      "value": "195500m"
    },
    {
      "name": "p90_response_latency",
      "unit": "ms",
      "value": "155"
    },
    {
      "name": "p50_response_latency",
      "unit": "ms",
      "value": "58333m"
    },
    {
      "name": "success_count",
      "value": "25498m"
    },
    {
      "name": "failure_count",
      "value": "0"
    }
  ]
}

Debug logs from the same time

time="2020-01-29T00:00:37Z" level=debug msg="querying prometheus" query="sum(\n          increase(\n            response_total{\n              classification=\"success\",\n              namespace=~\"linkerd\",\n              deployment=~\"linkerd-web\",\n              dst_deployment=~\".+\"\n            }[30s]\n          )\n        ) by (\n          deployment,\n          dst_deployment,\n          namespace,\n          dst_namespace\n        )"
time="2020-01-29T00:00:37Z" level=debug msg="query results" query="sum(\n          increase(\n            response_total{\n              classification=\"success\",\n              namespace=~\"linkerd\",\n              deployment=~\"linkerd-web\",\n              dst_deployment=~\".+\"\n            }[30s]\n          )\n        ) by (\n          deployment,\n          dst_deployment,\n          namespace,\n          dst_namespace\n        )" result="{deployment=\"linkerd-web\", dst_deployment=\"linkerd-controller\", dst_namespace=\"linkerd\", namespace=\"linkerd\"} => 32.998350082495875 @[1580256037.374]"

Prometheus query
Screen Shot 2020-01-28 at 5 02 52 PM

I'm struggling to understand what might be the cause of this. This only happens occasionally. I was able to get reasonable values with repeated queries to the APIService.
Maybe this is why
https://github.com/deislabs/smi-metrics/blob/4109ac7c83538ad13cb5681bbefd1aa91209a244/pkg/prometheus/client.go#L102

Publish smi-metrics helm Chart

Currently, even though we build and release the helm chart after each release, through the same Github workflow. we don't publish the helm chart anywhere. Users who want to try the component, will have to download and generate the template. By publishing it somewhere, it becomes very easy for users.

@stefanprodan suggested to use https://github.com/marketplace/actions/helm-publisher, which feels like the most simplest and does the job pretty well. I'm planning to create a gh-pages branch with the current release helm pkgs and also update the CI to do this automatically whenever we perform a release. WDYT?

@michelleN @stefanprodan

Add support for Maesh

Maesh is a new Service Mesh from Contanious, based on Trafeik proxy. It also uses prometheus internally for metrics, Adding smi-metrics support would be great!

Configurable window value on a per query basis

A default value of 30 seconds for all queries may not be desired for every query. This should be made configurable with an optional additional parameter to every endpoint. For example,
/apis/metrics.smi-spec.io/v1alpha1/namespaces/{Namespace}/{Kind}/{ResourceName} should support an optional query parameter ?window=30s. This would allow the window to be set per request.
Additionally, the window size should also be globally configurable in the config map.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.