Giter Site home page Giter Site logo

the-monitor's Introduction

the-monitor's People

Contributors

abrilloyaenriquez avatar af12066 avatar davidmlentz avatar dbenamydd avatar echang26 avatar eslamelhusseiny avatar fossamagna avatar guacbot avatar hkaj avatar irabinovitch avatar jayjaym avatar jhotta avatar johnaxel avatar kimroen avatar kurochan avatar mallorym avatar mratheist avatar mstbbs avatar nuxero avatar robin-norwood avatar toshiya-matsuda avatar vagelim avatar z0331 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

the-monitor's Issues

Varnish dashboard doesn't work

The dashboard showed at https://github.com/DataDog/the-monitor/blob/master/varnish/monitor_varnish_using_datadog.md is blank.

This is probably related to DataDog/dd-agent#1459

Anyone willing to follow the guide will end up with a blank dashboard wondering if something went wrong, but seeing the graphs from "infrastructure->apps->varnish" is possible.

Is there a way i can fix the dashboard (".MAIN.") without having to rewrite everything?

edit-> varnish 5.2 and datadog-agent-5.18.1

Template Elasticsearch Timeboard indexing/query latency might be wrong

The template chart of indexing latency and query latency might be wrong. It shows different with Kibana.

Take query latency for example. The template is using rate of fetch time and rate of query time like this:
image

It might be better to use derivatives to get the same chart with Kibana like this:
(derivative of fetch time + derivative of query time ) / derivative of query total

image

The json of this is

{ "viz": "timeseries", "status": "done", "requests": [ { "q": "( derivative(sum:elasticsearch.search.fetch.time{$cluster}) + derivative(sum:elasticsearch.search.query.time{$cluster}) ) * 1000 / derivative(sum:elasticsearch.search.query.total{$cluster})", "aggregator": "avg", "conditional_formats": [], "type": "line" } ], "autoscale": true }

DataDog Agent for PostgreSQL vs Tracing Postgres Queries with ddtrace

Hi,

We monitor our Postgres Database using the DataDog Agent for PostgreSQL as well as by tracing the queries our Go applications generate. We use the ddtrace's gorm package to trace the database queries.

When investigating the requests for a query, the values for trace.postgres.query.hits and postgresql.queries.count are returning completely different results within the same timeframe.

Is this expected? Why is this happening?

Issue with applying kubernetes deployment for AWS EKS

https://www.datadoghq.com/blog/eks-monitoring-datadog/#create-and-deploy-the-cluster-agent-manifest

Based on the provided documentation, kubectl apply -f /path/to/datadog-cluster-agent.yaml should work, but it fails with error:

kubectl apply -f /path/to/datadog-cluster-agent.yaml
service/datadog-cluster-agent created
error: unable to recognize "/path/to/datadog-cluster-agent.yaml": no matches for kind "Deployment" in version "extensions/v1beta1"

Fixes required:

  1. Update apiVersion: extensions/v1beta1 to apiVersion: apps/v1
  2. Add the following underneath spec:
selector:
    matchLabels:
      app: datadog-cluster-agent

Please update the documentation to support the fixes as otherwise other users will run into issues

Spark api's

Is there any road map to add documentation apis for spark as well ?

Document linking scheme for external contributors

Some of our articles contain internal links to other Datadog pages and articles, e.g.

/blog/monitoring-101-collecting-data/

These links won't work from within Github, and they can be confusing to external contributors (see for example #68), so we should either:

  1. Standardize on full, functional URLs for source files (though this has some drawbacks for us while the posts are in development/staging)
  2. Document clearly why we're doing what we're doing with links and other quirks so that well-meaning contributors don't spend time fixing them

Incorrect explanation on acceptCount in tomcat-architecture-and-performance.md

Quote from https://www.datadoghq.com/blog/tomcat-architecture-and-performance

Upon startup, Tomcat will create threads based on the value set for minSpareThreads and increase that number based on demand, up to the number of maxThreads. If the maximum number of threads is reached, and all threads are busy, incoming requests are placed in a queue (acceptCount) to wait for the next available thread. The server will only continue to accept a certain number of concurrent connections (as determined by maxConnections). When the queue is full and the number of connections hits maxConnections, any additional incoming clients will start receiving Connection Refused errors.

The bold part is incorrect. Tomcat will continue accept new connections until the concurrent connections reaches maxConnections. Once maxConnections is reached, OS will queue new connections until acceptCount.

https://tomcat.apache.org/tomcat-8.5-doc/config/http.html

Question on how to get the Swap file size.

On Docker for Mac, if we go to Preferences > Advanced. We see a swap field defaulted to 1GB.
Can you please tell me if there is a command or API to get the present value set for the Swap.

Question: How to get frontend stats by host header?

Hi, i'm looking for a way to monitor frontends by host header in HAProxy config where we route by host header. For example, consider the following config:

frontend http-in
    bind *:80
    log /dev/log    len 65535 local1 info
    capture request header User-Agent len 30
    capture request header X-Request-ID len 36
    capture request header Host len 32

    # Frontend rules for host header routing
    use_backend user if { hdr(Host) -i user user.example.com  }
    use_backend login if { hdr(Host) -i login login.example.com }


backend user
    mode http
    server-template user 10 _user._tcp.service.consul resolvers consul resolve-prefer ipv4 check

backend login
    mode http
    server-template login 10 _login._tcp.service.consul resolvers consul resolve-prefer ipv4 check

Is there a way to get stats of all frontends, for example in particular haproxy.frontend.response.4xx by header:Host?

NGINX log format renders with slashes in HTML

I was following the docs for https://www.datadoghq.com/blog/how-to-collect-nginx-metrics/#metrics-collection-nginx-logs and noticed extra slashes in the rendered output which aren't in the example output

It looks like the Markdown parser or HTML generator is inserting extra slashes for $

Expected:

log_format nginx '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent $request_time '
'"$http_referer" "$http_user_agent"';

log_format nginx '$remote_addr - $remote_user [$time_local] '
                 '"$request" $status $body_bytes_sent $request_time '
                 '"$http_referer" "$http_user_agent"';

Actual:
image

Submitting a new blog for Apache APISIX's Datadog Plugin

Hello,

I am Yilin, from the Apache APISIX community. Apache APISIX is a Cloud-native API gateway, and it is the top-level project of the Apache Software Foundation. You can get more details from GitHub: https://github.com/apache/apisix.

We recently released a plugin to integrate Datadog in Apache APISIX. I think the plugin is very meaningful for developers and both communities.In addition, it will help Datadog and Apache APISIX publicize and let more developers and companies know about us.

I am reaching out to you guys to see if we can have this blog posted here.What do you think?

The "Metric to watch: Volume queue length" Section in The "Part 1: Key metrics for Amazon EBS monitoring" Article

Hello All...

In the "Metric to watch: Volume queue length" Section in The "Part 1: Key metrics for Amazon EBS monitoring" article, it was mentioned that "A rule of thumb for SSD volumes is to aim for a queue length of one for every 500 IOPS available" and the source for that statement is https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/benchmark_procedures.html#UnderstandingQueueLength , which has been updated to be "we recommend that you target a queue length of 1 for every 1000 IOPS available".

So, Could you please update your article to reflect the latest changes in the documentation?

Thank you,
Ahmed

Update Kafka version references

Reword this section, please.

Despite being pre-1.0, (current version is 0.9.0.1), it is production-ready

Kafka is now 2.0

No metric for Pod & Container CPU utilization?

https://www.datadoghq.com/blog/monitoring-kubernetes-performance-metrics/ says:

Metric to watch: CPU utilization
Tracking the amount of CPU your pods are using compared to their configured requests and limits, as well as CPU utilization at the node level, will give you important insight into cluster performance.

However, it doesn't explain how to do that. The https://docs.datadoghq.com/containers/kubernetes/data_collected/ page doesn't show any metric with pod or container-level CPU usage information. What metric should queries use to get that info?

Way to collect network statistics not complete

Hi,
suggested method to look at network stats works only if there is only one process inside a container. In most of the cases, we might have a shell script launching all the other required processes and entering into an infinite loop or monitoring applications that were launched.

HAproxy integration & SSL

If HAproxy isn't terminating SSL, the metrics look a bit misleading (large red number for 2xx).

Are there any plans to allow an alternate SSL passthrough centric dashboard? (connections per second, response times, etc.)

Filtering and collecting additional elasticsearch metrics

In elasticsearch integration doc I am seeing few of the metrics are missing from the Metrics section. For example, jvm.buffer_pools.* , jvm.classes.*. Can someone let me know

  1. if it is possible to add these metrics in elastic.d/conf.yaml ?
  2. is there a way to filter metrics that is collected by elasticsearch integration via conf.yaml. For example, let's say I am not interested in elasticsearch.cgroup.cpu.stat.number_of_times_throttled and want it not to be collected?

Agent Version - 7.21.1

Time Maps

Apologies if this isn't the proper forum for this kind of feedback. After reading the time series graph 101 blog post, I wanted to mention this kind of visualization in case not on your roadmap already. I haven't seen it available in any of the monitoring systems I've used.

Garbage Collection Metrics in Kubernetes

Hello, I had gone through the post https://www.datadoghq.com/blog/monitoring-kubernetes-performance-metrics/
. It is very well written and good to read.

I had a query regarding some metrics in Kubernetes which I am not able to see on the web mentioned clearly .
Is it possible to get the Garbage collector metrics etc. I saw there are some for Go lang in cAdvisor but anything specific for Java JVM .
I saw the below article
https://www.robustperception.io/measuring-java-garbage-collection-with-prometheus/

But do you have any specific way we can do it.

Monitor Backends UP / DOWN

Hello,

I have a HAProxy questionn.

Is it possible to monitor the number of backends currently UP / DOWN (the kind of thing you can see by opening HATOP)?

I'd quite like to see that information and have monitors against it.

Thanks for your help.

Measure server/client timeout value ?

We have set server and client timeout. But I could not able to measure it from ui. I want to know how many clients are getting timeout and when. Same for the server.

Wildcard in a timeseries graph

Just trying to create a timeseries graph based on a wildcard of the name of the instance.

{
  "requests": [
    {
      "q": "avg:system.load.1{name:content*}",
      "type": "line",
      "conditional_formats": [],
      "aggregator": "avg"
    }
  ],
  "viz": "timeseries"
}

You get the idea.. trying to match on any server with name content but it breaks horribly. No carnage but it just turns red. Am I missing something? My google foo is failing me.

Pseudo file location on CoreOS

Hi folks,

Yesterday I ran into an issue about collecting docker metrics on CoreOS v. 1068.6.0 using a shell script. I was trying to collect memory usage of a container by cat-ing the following file /sys/fs/cgroup/memory/system.slice/docker-$CONTAINER_ID/memory.usage_in_bytes
All the time I was getting the same value and it wasn't the right one. Digging around I've found that this file has been moved at /sys/fs/cgroup/memory/init.scope/system.slice/docker-$CONTAINER_ID/memory.usage_in_bytes

So basically the path for the metrics in newer versions of CoreOS has been changed from /sys/fs/cgroup/<METRIC>/system.slice/docker-$CONTAINER_ID/<METRIC_VALUE> to /sys/fs/cgroup/<METRIC>/init.scope/system.slice/docker-$CONTAINER_ID/<METRIC_VALUE>

The testing has been done against 2 CoreOS clusters. One cluster has version 1010.6.0 and the other one 1068.6.0. Also the new path exists in the latest version of CoreOS (1068.9.0).

Maybe you should update the metrics collection page

Thanks!

Not so good json log format for Nginx in this guide

You are giving an example JSON log for Nginx in this guide :
https://www.datadoghq.com/blog/how-to-monitor-nginx-with-datadog/#use-json-logs-for-automatic-parsing
it's quite a poor example as:

  • it doesn't match with the standard attributes - http_user_agent should be in a http element to have the http.user_agent
  • and there's a collision with the reserved status attributes, that should be a http.status_code probably
  • lot of interesting fields are still missing (X-forwarded-for ..

Broken links

Great documentation. However, the links to Part 1 + 3 are broken. Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.