Giter Site home page Giter Site logo

cjwn / china_gov_website_spider Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 12 KB

一个爬取文旅部和广电总局新闻信息的小练习

Home Page: http://30%E5%A4%9A%E5%B2%81%E8%BF%98%E5%BE%97%E5%81%9A%E7%89%B9%E5%88%9D%E7%BA%A7%E7%9A%84%E5%B7%A5%E4%BD%9C%EF%BC%8C%E9%82%A3%E5%B0%B1%E5%86%99%E4%B8%AA%E7%88%AC%E8%99%AB%E5%90%A7.me

Python 90.23% HTML 9.77%
spider china-gov-website

china_gov_website_spider's Introduction

爬取文旅部和广电总局新闻信息

动因及吐槽


领导却让我每周整理文旅部和广电总局发布的有商业价值的消息。要把网站里的新闻都看个遍,一条一条点开实在是浪费时间。 已经30多岁了,可怎么看都像是个实习生的活儿,光劳神没什么实际价值和提升。 于是尝试用爬虫将所有的新闻合并到一个页面下,既能一目十行还能顺便学习下爬虫和类的用法。


功能

使用xpath进行解析并保存至sqlite数据库,并由网页展示所有新闻内容

使用方法

环境准备 python > 3.7

pip install lxml, flask-sqlalchemy 

(虽然sqlalchemy目前还没用上……)

1、运行main.py (这将生成数据库文件)
2、运行web.py,按输出打开浏览器窗口 (比如地址:http://127.0.0.1:5000)
3、enjoy!

TODO

1、优化获取全文和展示的方法,目前是全部都显示,未使用异步获取拿到。
2、尝试使用Vue框架实现一些前端功能
3、学习SQL语句的同时使用SQLAlchemy进行数据的获取,并比较二者时间。
4、学习celery做定时任务

主要方法

DbBot 管理数据库相关的方法
Spyderlets 爬虫特工们的相关功能

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.