Giter Site home page Giter Site logo

cd-house's Introduction

[toc]

成都房协预售楼盘爬虫

定期从成都房协抓取发布的楼盘信息,当发现有新的楼盘发布或有楼盘信息发生更新的时候,通过Slack或微信公众号将信息推送给关注者

楼盘统计数据查看

https://cdfangxun.wiseturtles.com/

使用说明

运行项目前, 有四个环境变量需要配置

必须配置的环境变量

  • DATABASE_URL: 设置保存数据的数据库路径,比如: sqlite:////path/to/db.sqlite

可选环境变量

  • SLACK_WEBHOOK_URL: Slack的web hook url, 用于通过slack来推送房源信息

  • WECHAT_APP_ID: 微信公众号的appid, 用于通过微信公众号来推送房源信息

  • WECHAT_APP_SECRET: 微信公众号的appsecret 用于通过微信公众号来推送房源信息

运行代码

下载代码后,可以在本地配置开发环境运行项目,也可以通过docker-compose运行。

下载代码

$ git clone https://github.com/crazygit/cd-house.git

直接运行

确定本地python版本为3.6+, 开发使用的python版本为

$ python --version
Python 3.6.4

安装依赖管理工具

$ pip3 install pipenv

运行代码

# 配置保存爬取数据的数据库地址
$ export DATABASE_URL='sqlite:////data/house.sqlite'

# slack和微信公众号可以任意选择需要配置哪一种,也可以都配置,也可以都不配置
$ export SLACK_WEBHOOK_URL='xxxxxxxxxxxxxxxxxxxxx'
$ export WECHAT_APP_ID='xxxxxxxxxxxxxxxxxxxxx'
$ export WECHAT_APP_SECRET='xxxxxxxxxxxxxxxxxxxxx'

# 安装依赖
$ pipenv install --python 3.6.4

# 运行爬虫
$ pipenv run scrapy crawl cdfangxie

# 运行web服务
$ pipenv run python manage.py run

以docker的方式运行

# 配置保存爬取数据的数据库地址
$ export DATABASE_URL='sqlite:////data/house.sqlite'
# slack和微信公众号可以任意选择需要配置哪一种,也可以都配置,也可以都不配置
$ export SLACK_WEBHOOK_URL='xxxxxxxxxxxxxxxxxxxxx'
$ export WECHAT_APP_ID='xxxxxxxxxxxxxxxxxxxxx'
$ export WECHAT_APP_SECRET='xxxxxxxxxxxxxxxxxxxxx'

# 构建docker镜像
$ docker-compose build --force-rm

# 运行爬虫服务
$ docker-compose run -e DATABASE_URL=${DATABASE_URL} -e SLACK_WEBHOOK_URL=${SLACK_WEBHOOK_URL} -e WECHAT_APP_ID=${WECHAT_APP_ID} -e WECHAT_APP_SECRET=${WECHAT_APP_SECRET} crawler

注意:

可以根据自己的需要,周期性运行爬虫

Todo List

  • 自定义爬虫命令, 当有异常时,返回正常的exit_code
  • 添加微信公众号自定义菜单,方便按照区域查询在售楼盘信息
  • 集成Dash, 展示一些有用的数据图表

效果图

微信公众号效果图

Slack效果图

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.