Giter Site home page Giter Site logo

william-yeh / ansible-prometheus Goto Github PK

View Code? Open in Web Editor NEW
261.0 261.0 148.0 145 KB

An Ansible role that installs Prometheus, in the format for Ansible Galaxy.

Home Page: https://galaxy.ansible.com/William-Yeh/prometheus

License: MIT License

Shell 100.00%
ansible monitoring prometheus

ansible-prometheus's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ansible-prometheus's Issues

Node exporter step "download and untar node_exporter tarball" is always "changed"

TASK [ansible-prometheus : download and untar node_exporter tarball] ***********
changed: [hw01] => {"changed": true, "dest": "/opt/prometheus", "extract_results": {"cmd": ["/bin/tar", "--extract", "-C", "/opt/prometheus", "-z", "-f", "/tmp/ansible_IxDpGS/node_exporter-0.13.0.linux-amd64.tar.gz"], "err": "", "out": "", "rc": 0}, "gid": 1001, "group": "prometheus", "handler": "TgzArchive", "mode": "0750", "owner": "prometheus", "size": 4096, "src": "/tmp/ansible_IxDpGS/node_exporter-0.13.0.linux-amd64.tar.gz", "state": "directory", "uid": 1001}

Happens every time the role runs. This should not be happening.

Prometheus 0.16.0rc1 upgrade strategy

There are quite a few breaking changes in the most recent Prometheus release, 0.16.0rc1. They are core enough changes that my simple setup is now broken. Instead of updating the Prometheus version number in this repo and possibly leaving people's systems broken, we may need to come up with a different solution...

Problems with ansible_distribution_version|int conditions

I'm using Ansible 2.3.1.0 , where the conditions defined in set-role-variables.yml don't work as expected.
ansible_distribution_version|int returns "0" for Centos version 7.x.x.

This leads to wrong OS detection and upload of SysV init script, instead of Systemd one.

node_exporter.service doesn't start at boot

After rebooting the machines where only node_exporter was installed through this nice role, the node_exporter.service Systemd unit does not start.

If I try to enable it manually:

sudo systemctl enable node_exporter.service
The unit files have no installation config (WantedBy, RequiredBy, Also, Alias
settings in the [Install] section, and DefaultInstance for template units).
This means they are not meant to be enabled using systemctl.
Possible reasons for having this kind of units are:
1) A unit may be statically enabled by being symlinked from another unit's
   .wants/ or .requires/ directory.
2) A unit's purpose may be to act as a helper for some other unit which has
   a requirement dependency on it.
3) A unit may be started when needed via activation (socket, path, timer,
   D-Bus, udev, scripted systemctl call, ...).
4) In case of template units, the unit is meant to be enabled with some
   instance name specified.

Starting it manually does work. But, obviously, we need to have it started at each boot! ๐Ÿ˜„

Appending this [Install] block to /lib/systemd/system/node_exporter.service should suffice:

[Install]
WantedBy=network-online.target

alertmanager fails if alertmanager.conf does not exist

The role currently copies the optional user provided alertmanager.conf after trying to start the service which resulted in a failure with an error complaining the config file was not found. Either the user supplied conf or some default should be in place prior to starting the server.

An empty alertmanager.conf seems sufficient to keep it from failing, but I have not used alertmanager yet so I am not sure if any empty conf is an appropriate default or not.

"with_dict expects a dict" in ansible-2.2.0.0

While using ansible 2.2, there will be an error:

TASK [williamyeh.prometheus : copy rule files from playbook's, if any] *********
task path: /private/etc/ansible/roles/williamyeh.prometheus/tasks/install-prometheus.yml:145
fatal: [hive]: FAILED! => {
"failed": true,
"msg": "with_dict expects a dict"
}

prometheus.service not starting correctly with prometheus 2.0.0

Trying to launch prometheus.service
systemctl start prometheus
installed with prometheus 2.0.0 fails with the following message :
prometheus: error: unknown short flag '-c'

The reason is options have changed on prometheus 2.0.0 versus 1.5.
-config.file has become --config.file
-storage.local.path has become --storage.tsdb.path
-web* options have become --web*

Then /etc/init.d/prometheus file should be edited this way :

DAEMON_OPTS="$DAEMON_OPTS --config.file=$CONFIG --storage.tsdb.path=/var/lib/prometheus" DAEMON_OPTS="$DAEMON_OPTS --web.console.templates=/opt/prometheus/prometheus-2.0.0.linux-amd64/consoles --web.console.libraries=/opt/prometheus/prometheus-2.0.0.linux-amd64/console_libraries"

user/group create should use vars

Surely this is wrong:

main.yml

- name: create Prometheus group
  group: name=prometheus state=present

- name: create Prometheus user
  user:
    name: prometheus
    group: prometheus
    createhome: no
    shell: /sbin/nologin
    comment: "Prometheus User"
    state: present

should be:

- name: create Prometheus group
  group: name="{{ prometheus_group }}" state=present

- name: create Prometheus user
  user:
    name: "{{ prometheus_user }}"
    group: "{{ prometheus_group }}"
    createhome: no
    shell: /sbin/nologin
    comment: "Prometheus User"
    state: present

Prometheus not starting after Installation.

I am new to prometheus and trying to install it using your ansible role.

This is role used:

---
- hosts:
    - Graylog-V3
  roles:
    - {
        role: 3dparty/ansible-role-prometheus,
        prometheus_components: [ "prometheus", "alertmanager" ],
        prometheus_alertmanager_url: "http://xxx.xxx.170.174:9093/"
    }

Ansible deployment went fine, but process is not running on target machine.

Checking it's status gave weird result:

systemctl status prometheus
โ— prometheus.service - LSB: monitoring system and time series database.
   Loaded: loaded (/etc/init.d/prometheus)
   Active: active (exited) since Sun 2017-03-05 20:34:22 GMT; 12min ago
  Process: 37023 ExecStop=/etc/init.d/prometheus stop (code=exited, status=1/FAILURE)
  Process: 37030 ExecStart=/etc/init.d/prometheus start (code=exited, status=0/SUCCESS)

Mar 05 20:34:22 sd-48238 prometheus[37030]: Starting Prometheus monitoring system -: prometheus        Please configure prometheus and then edit /etc/default/prometheus
Mar 05 20:34:22 sd-48238 prometheus[37030]: and set the "START" variable to "yes" in order to allow
Mar 05 20:34:22 sd-48238 prometheus[37030]: prometheus to start.
Mar 05 20:34:22 sd-48238 systemd[1]: Started LSB: monitoring system and time series database..
Mar 05 20:41:00 sd-48238 systemd[1]: Started LSB: monitoring system and time series database..

Pid file doesn't exist, and prometheus process is not execution

/etc/default/prometheus contents:

# cat /etc/default/prometheus
START=yes

Prometheus config file

# cat /etc/prometheus/prometheus.yml
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # By default, scrape targets every 15 seconds.
  # scrape_timeout is set to the global default (10s).

  # The labels to add to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'master'






# A list of scrape configurations.
scrape_configs:

  - job_name: 'prometheus'
    scrape_interval: 10s
    scrape_timeout:  10s
    static_configs:
      - targets: ['localhost:9090']

  - job_name: "node"
    file_sd_configs:
      - files:
        - '/etc/prometheus/tgroups/*.json'
        - '/etc/prometheus/tgroups/*.yml'
        - '/etc/prometheus/tgroups/*.yaml'
    #static_configs:
    #- targets:
    #  - "localhost:9100"

If I start it manunally with prometheus -config.file /etc/prometheus.yml it works fine.

Am I missing something?

Support other architectures than i386 and amd64

In tasks/install-compile-tools.yml, the Go compiler is only downloaded for i386 and amd64 architectures.

Additionally, the host architecture is determined by the userspace word width and not the CPU architecture.
It may be better to base the package name on ansible_architecture - in that case the playbook would fail for unknown CPU architectures. This is much better than trying to run i386 binaries on an ARM CPU, for example.

Old rules are not deleted

If a rules file is removed from prometheus_rule_files it is not deleted from
the remote directory on the next run.

Restart services when init files change?

Hey, thanks for putting this together.

I just had a situation where I updated the definition for
prometheus_node_exporter_opts, but after provisioning our servers I had to
manually reload the node_exporter service to apply the changes that were made
in the init script.

I believe we could handle this automatically by adding the restart node_exporter notifier to the three tasks here:
https://github.com/William-Yeh/ansible-prometheus/blob/master/tasks/install-node-exporter.yml#L94.
Is there any reason that this would be a bad idea?

(This might also apply to the equivalent tasks for the other components).

Playbook not working

Hi all,

I tried to run the playbook (test.yml)
root@ip-10-0-0-253:~# cat test.yaml

  • hosts: all
    become: True
    roles:

    • mkrakowitzer.prometheus

    vars:
    prometheus_components: [ "prometheus", "alertmanager", "node_exporter" ]
    prometheus_alertmanager_url: "http://localhost:9093/"

Got following error while running the playbook :
root@ip-10-0-0-253:~# ansible-playbook -i "localhost," -c local test.yaml

PLAY ***************************************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [mkrakowitzer.prometheus : include] ***************************************
included: /etc/ansible/roles/mkrakowitzer.prometheus/tasks/install.yml for localhost

TASK [mkrakowitzer.prometheus : install deps (Ubuntu)] *************************
ok: [localhost] => (item=[u'python-pip', u'jq'])

TASK [mkrakowitzer.prometheus : install deps (RHEL)] ***************************
skipping: [localhost] => (item=[])

TASK [mkrakowitzer.prometheus : pip] *******************************************
ok: [localhost]

TASK [mkrakowitzer.prometheus : include] ***************************************
fatal: [localhost]: FAILED! => {"failed": true, "reason": "ERROR! no action detected in task\n\nThe error appears to have been in '/etc/ansible/roles/mkrakowitzer.prometheus/tasks/install-prometheus.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: Create prometheus container\n ^ here\n"}

PLAY RECAP *********************************************************************
localhost : ok=4 changed=0 unreachable=0 failed=1

Thanks for any help !!

set-role-variables - Condition is wrong evaluated for ansible_distribution_version

Hello,

as I reported also here, there is a problem with the set-role-variables.yml.

Let me report it:
I think that in file set-role-variables.yml there would be a problem with the "when" condition.

when: ansible_distribution == "CentOS" and ansible_distribution_version|int >= 7
In my case, the fact ansible_distribution_version is evaluated with 7.2.1511 and passing it through the |int function, it returns "0", due to multiple dots separator.

How to replicate:

ansible -vvv -i localhost, -c local -m debug -a "var={{ '7.2.1511' | int }}" all

I think that a better variable that can be used for the when condition could be: ```

ansible_distribution_major_version

(The result of this issue is for me the installation of the mysql_exporter deamon as service in /etc/init.d instead of place the service in /etc/systemd/system folder. I'm running CentOS v7.2.1511)

Check mode failure

...
...
...
TASK [ansible-prometheus : download and untar Golang tarball] ******************
skipping: [hw01] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}

TASK [ansible-prometheus : create Prometheus group] ****************************
changed: [hw01] => {"changed": true}

TASK [ansible-prometheus : create Prometheus user] *****************************
changed: [hw01] => {"changed": true}

TASK [ansible-prometheus : mkdir for general cases] ****************************
changed: [hw01] => (item=/opt/prometheus) => {"changed": true, "item": "/opt/prometheus"}
changed: [hw01] => (item=/etc/prometheus) => {"changed": true, "item": "/etc/prometheus"}
changed: [hw01] => (item=/var/log/prometheus) => {"changed": true, "item": "/var/log/prometheus"}
changed: [hw01] => (item=/var/run/prometheus) => {"changed": true, "item": "/var/run/prometheus"}

TASK [ansible-prometheus : set internal variables for convenience] *************
ok: [hw01] => {"ansible_facts": {"gosu_exe_url": "https://github.com/tianon/gosu/releases/download/1.10/gosu-amd64"}, "changed": false}

TASK [ansible-prometheus : set internal variables for convenience] *************
skipping: [hw01] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}

TASK [ansible-prometheus : download gosu executable] ***************************
skipping: [hw01] => {"changed": false, "msg": "remote module (get_url) does not support check mode", "skipped": true}

TASK [ansible-prometheus : add executable permission] **************************
fatal: [hw01]: FAILED! => {"changed": false, "failed": true, "msg": "file (/usr/local/bin/gosu) is absent, cannot continue", "path": "/usr/local/bin/gosu", "state": "absent"}

ERROR! error while evaluating conditional: \"prometheus\" in prometheus_components

Greetings,

i was planning on using your ansible role to deploy node_exporter to a number of hosts but it failed with the above mentioned error. It fails when checking which packages (prometheus, node_exporter or alertmanager) shall be installed and which tasks shall be included based on the prometheus_components variable. Even though the conditional (when: '"prometheus" in prometheus_components') is exactly what is mentioned in the ansible documentation, it fails.

I have tried ansible 1.9.2 (CentOS7 package) and ansible 2.0.0 (devel a10c5ca5f5) which i just pulled from the main ansible git repository. Ansible is run on CentOS7.

Error message:

TASK [williamyeh.prometheus : install prometheus] ******************************
fatal: [wiki.maeh.org]: FAILED! => {"failed": true, "msg": "ERROR! The conditional check '"prometheus" in prometheus_components' failed. The error was: ERROR! error while evaluating conditional: "prometheus" in prometheus_components ({% if "prometheus" in prometheus_components %} True {% else %} False {% endif %})"}

Has this come up previously or do you know what obvious mistake i could be doing?

Regards,
Andreas

Possible to have more role maintainers, to manage the PR Fixes/Improvements ?!

In the current repository, progress is hindered due to a lack of PR maintenance, despite the many nice PR contributions by some happy role users.
Maybe the current maintainer (@William-Yeh ) can offer maintainer status to interested parties, so we can bring this role forward !
Thanks and merry Xmas ๐Ÿ‘


I went through some forks to see which ones are quite ahead of the official repos, also by merging some of the PR (still open here), which could be a good new start point:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.