Giter Site home page Giter Site logo

ducgt / crawlermonitor Goto Github PK

View Code? Open in Web Editor NEW

This project forked from adrian9631/crawlermonitor

0.0 1.0 0.0 44 KB

爬虫监控及可视化 ( Prometheus and Grafana ) Building a crawler with distributed task queues (Celery) and fetching data with a reliable monitor system.

License: MIT License

Python 100.00%

crawlermonitor's Introduction

CrawlerMonitor

Introduction

  本项目在 Celery 分布式爬虫的基础上构建监控方案 Demo,在编写 Statsd + InfluxDB 方案代码进行调研过程中,转向了 Prometheus 的怀抱 ,使用 Grafana 对监控序列进行可视化,爬虫部分目前只完成对下载和解析进行简单解耦,反爬部分和代码结构优化等后续会陆续进行完善。

QuickStart

  本项目环境为 Ubuntu 16.04 LTS 以及使用 Python 3.6.5 (Anaconda), 安装执行流程默认适配上述环境,具体各部分安装配置请移步 快速安装,其中包括 RabbitMQ、MongoDB、Python 相关库、 Celery (推荐使用 4.1.1 版本)、Prometheus、Grafana 等安装配置,操作启动控制请参考 快速启动,可分为手动启动和后台运行。后续会考虑 docker 容器化。启动三个实例 worker 跑了几分钟,从 MongoDB 导出的几千条 Json 数据 此处

FlowChart

  简单画个流程图:

sentence_model

Metrics

  通过调研发现,在statsd 和 prometheus 的客户端均有对 metrics 的实现,后者在类型上支持较为丰富一点,这里使用 prometheus metrics 在 task 和 worker 层面设计监控指标,其设计如下表,并采用 worker,task,results 三个主要标签维度进行合理划分,各个类型 metric 的定义和使用详细可见 Prometheus官网 以及对应的 Python 客户端

Type Name Worker Task Results
Gauge workers_state
Counter workers_processed
Gauge workers_active
Counter tasks_counter
Summary tasks_runtime
Info tasks_info

  这里只涉及任务和工作单元层面,爬虫异常层面仍需进一步设计指标

ScreenShot

  Grafana 监控界面 Dashboard 模板

sentence_model

  获取 Dashboard 模板: 直接在 Grafana 里 import 粘贴 https://grafana.com/dashboards/9970/ 即可。

TODO-LIST

  TODO-LIST 留给自己。

crawlermonitor's People

Contributors

adrian9631 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.