Blogbar

http://www.blogbar.cc

个人博客之死，就是个人博客之生。

将信息的快速传递交给新兴媒介，让个人博客回归原来的位置：一种信息的雕刻与沉淀的工具。

世界太嘈杂，这里只有个人^兆赫。

Blogbar，聚合个人博客。

技术栈

Powered by Flask.
Based on Flask-Boost.
Use cron to grab data periodically.
Use feerparser to parse RSS and Atom.
Use requests and lxml to grab structured data from HTML page.

开发环境搭建

git clone https://github.com/blogbar/blogbar.git
cd blogbar
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
bower install

将db/blogbar.sql导入本地数据库。

将config/development_sample.py另存为config/development.py，并按需更新配置项。

python manage.py run

扩展

如果一个博客不提供 Feed，但是这个博客的价值又非常高（比如 Livid、王垠、Lifesinger等等），可继承爬取博客的爬虫基类 BaseSpider（位于 spiders/base.py）实现，步骤如下：

类变量赋值

在子类中对如下类变量重新赋值：

url = ""  # 网址
posts_url = ""  # 包含博文列表的网址（选填，只有当博客网址与博文列表网址不同时才需填写）
title = ""  # 博客标题
subtitle = ""  # 博客副标题（选填）
author = ""  # 博主

重载方法

重载如下 2 个方法：

get_posts：获取博文列表
get_post：获取单篇博文内容

具体使用方法见 BaseSpider 类，以及用于爬取网页内容的 lxml 库。

调试

编写过程中如需调试抓取结果，可使用 test_spider.py 提供的测试方法：

$ python test_spider.py get_posts
$ python test_spider.py get_post
$ python test_spider.py all

具体见 test_spider.py。

提交

测试通过后，可发起 pull request。

示例

以下是爬取 Livid 博客的示例代码：

# coding: utf-8
from .base import BaseSpider, get_inner_html
from datetime import datetime


class LividSpider(BaseSpider):
    url = "http://livid.v2ex.com"
    title = "Livid"
    author = "Livid"

    @staticmethod
    def get_posts(tree):
        posts = []
        for li in tree.cssselect('.posts li'):
            date_element = li.cssselect('span')[0]
            published_at = datetime.strptime(date_element.text_content(), "%d %b %Y")
            link = li.cssselect('a')[0]
            posts.append({
                'url': link.get('href'),
                'title': link.text_content(),
                'published_at': published_at
            })
        return posts

    @staticmethod
    def get_post(tree):
        content_element = tree.cssselect('div.span10')[0]
        return get_inner_html(content_element)

bryant1410 / blogbar Goto Github PK

blogbar's Introduction

Blogbar

技术栈

开发环境搭建

扩展

类变量赋值

重载方法

调试

提交

示例

blogbar's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent