Giter Site home page Giter Site logo

blogbar's Introduction

Blogbar

http://www.blogbar.cc

个人博客之死,就是个人博客之生。

将信息的快速传递交给新兴媒介,让个人博客回归原来的位置:一种信息的雕刻与沉淀的工具。

世界太嘈杂,这里只有个人兆赫

Blogbar,聚合个人博客。

##技术栈

##开发环境搭建

git clone https://github.com/blogbar/blogbar.git
cd blogbar
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
bower install

db/blogbar.sql导入本地数据库。

config/development_sample.py另存为config/development.py,并按需更新配置项。

python manage.py run

##扩展

如果一个博客不提供 Feed,但是这个博客的价值又非常高(比如 Livid王垠Lifesinger等等),可继承爬取博客的爬虫基类 BaseSpider(位于 spiders/base.py)实现,步骤如下:

####类变量赋值

在子类中对如下类变量重新赋值:

url = ""  # 网址
posts_url = ""  # 包含博文列表的网址(选填,只有当博客网址与博文列表网址不同时才需填写)
title = ""  # 博客标题
subtitle = ""  # 博客副标题(选填)
author = ""  # 博主

####重载方法

重载如下 2 个方法:

  • get_posts:获取博文列表
  • get_post:获取单篇博文内容

具体使用方法见 BaseSpider 类,以及用于爬取网页内容的 lxml 库。

####调试

编写过程中如需调试抓取结果,可使用 test_spider.py 提供的测试方法:

  • $ python test_spider.py get_posts
  • $ python test_spider.py get_post
  • $ python test_spider.py all

具体见 test_spider.py

####提交

测试通过后,可发起 pull request。

####示例

以下是爬取 Livid 博客的示例代码:

# coding: utf-8
from .base import BaseSpider, get_inner_html
from datetime import datetime


class LividSpider(BaseSpider):
    url = "http://livid.v2ex.com"
    title = "Livid"
    author = "Livid"

    @staticmethod
    def get_posts(tree):
        posts = []
        for li in tree.cssselect('.posts li'):
            date_element = li.cssselect('span')[0]
            published_at = datetime.strptime(date_element.text_content(), "%d %b %Y")
            link = li.cssselect('a')[0]
            posts.append({
                'url': link.get('href'),
                'title': link.text_content(),
                'published_at': published_at
            })
        return posts

    @staticmethod
    def get_post(tree):
        content_element = tree.cssselect('div.span10')[0]
        return get_inner_html(content_element)

blogbar's People

Contributors

hustlzp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

blogbar's Issues

使用分布式Server获取信息

在国内的阿里云上建立一个Restful服务,DO上requests如果报timeout错误,则远程调用阿里云Server的查询结果。

历史博文导入

支持用户上传XML,当然,需要一定的格式(比如Wordpress导出的历史记录?)

后台博客管理系统

用户认领博客后,可自行更改博客信息。
并可查看一系列数据。
遇到问题时,可向管理员发送ticket。

建议在readme.md里写一写关于配置的情况

博主能不能在 readme.md 里写一写如何在本地进行相关程序配置的情况。比如安装好requeirments里需要的插件后,该怎么设置,怎样运行等? 留下一些来作为参考

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.