Light

cjwn / china_gov_website_spider Goto Github PK

View Code? Open in Web Editor NEW

0.0 1.0 0.0 12 KB

一个爬取文旅部和广电总局新闻信息的小练习

Home Page: http://30%E5%A4%9A%E5%B2%81%E8%BF%98%E5%BE%97%E5%81%9A%E7%89%B9%E5%88%9D%E7%BA%A7%E7%9A%84%E5%B7%A5%E4%BD%9C%EF%BC%8C%E9%82%A3%E5%B0%B1%E5%86%99%E4%B8%AA%E7%88%AC%E8%99%AB%E5%90%A7.me

Python 90.23% HTML 9.77%

spider china-gov-website

china_gov_website_spider's Introduction

爬取文旅部和广电总局新闻信息

动因及吐槽

领导却让我每周整理文旅部和广电总局发布的有商业价值的消息。要把网站里的新闻都看个遍，一条一条点开实在是浪费时间。 ~~已经30多岁了，可怎么看都像是个实习生的活儿，光劳神没什么实际价值和提升。~~ 于是尝试用爬虫将所有的新闻合并到一个页面下，既能一目十行还能顺便学习下爬虫和类的用法。

功能

使用xpath进行解析并保存至sqlite数据库，并由网页展示所有新闻内容

使用方法

环境准备 python > 3.7

pip install lxml, flask-sqlalchemy

(虽然sqlalchemy目前还没用上……)

1、运行main.py (这将生成数据库文件)
2、运行web.py,按输出打开浏览器窗口 (比如地址：http://127.0.0.1:5000）
3、enjoy！

TODO

1、优化获取全文和展示的方法，目前是全部都显示，未使用异步获取拿到。
2、尝试使用Vue框架实现一些前端功能
3、学习SQL语句的同时使用SQLAlchemy进行数据的获取，并比较二者时间。
4、学习celery做定时任务

主要方法

DbBot 管理数据库相关的方法
Spyderlets 爬虫特工们的相关功能

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.