Giter Site home page Giter Site logo

lichunhong2010 / douban-comments-spider Goto Github PK

View Code? Open in Web Editor NEW

This project forked from voyage-ing/douban-comments-spider

1.0 1.0 0.0 8.03 MB

这是一个豆瓣评论的爬虫,包括电影,音乐和书籍的短片并以词云的方式输出。

Python 100.00%

douban-comments-spider's Introduction

GetID_Douban.py

Get a Douban id according to the film name,music name,or book name that you provid.

Douban_id():

在main函数中调用,需要自己创造对象,并将参数传进来。

def init(self,name,sort='movie'):

param name:电影名,音乐名或书本名。 param sort:分类,电影(movie),图书(book),音乐(music)。

def getID(self):

需要通过对象手动调用。 根据用户提供的名字和分类查找,拿到对应的id并返回值。

主要用xml和正则表达式。

getComments.py

将Douban_id()获取的id和suburl拼凑出完整的短评url,拿到数据并保存在本地。返回值为文件保存的路径。

Keywords.py

将保存在文件中的评论信息,进行清洗。清洗出的关键词生成词云。用到文件夹下的ChineseStopWords.txt,将所有的中文虚词剔除,可以自己做或者从网上下载。simhei.ttf词云字体类型。

comments_infor

评论信息及词云存放位置。

screenshorts

数据存放

分类目录:

image

影片,电影,图书:

image

评论保存文件:

image

词云显示:

image

douban-comments-spider's People

Contributors

voyage-ing avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.