Giter Site home page Giter Site logo

dingmyu / weibo_analysis Goto Github PK

View Code? Open in Web Editor NEW
145.0 6.0 66.0 96 KB

The python crawler which automatically crawls the original microblogs and pictures of the specified user, analyzes the microblogs, and displays them in the form of html charts.

Python 27.57% HTML 72.43%

weibo_analysis's Introduction

weibo_analysis

python爬虫自动爬取指定用户的原创微博和图片,并对微博进行归类分析,最后以html图表的形式展示。

首先得到你要爬取的user_id和你的cookie,填入到weibo.py中。

运行weibo.py,即可生成你要爬取的user_id为名字的原创微博内容文档和存有所有图片链接的文件,之后会对所有图片链接进行爬取,图片存到weibo_image文件夹中。

之后运行analysis.py,填入user_id,即可对刚才爬到的微博内容进行分析。(要去掉前两行微博名字和简介)

分析的内容有微博分类、最常使用表情和次数、最常使用词语和次数、微博中的人名及出现次数。

之后用xita.html的h5格式即可生成饼状图,可以很直观的观看了。(调用了google的api,可能要翻墙才看得到)

weibo_analysis's People

Contributors

dingmyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

weibo_analysis's Issues

报错 print

SyntaxError: Missing parentheses in call to 'print'

error sleep

一直 如此的话 怎么破?
1 error
1 sleep
2 error
2 sleep
3 error
3 sleep

27行下标越界

user_id和cookie读入成功
Traceback (most recent call last):
File "weibo.py", line 27, in
pageNum = (int)(selector.xpath('//input[@name="mp"]')[0].attrib['value'])
IndexError: list index out of range

Windows 下报错,请作者协助检查原因,谢谢!作者辛苦!

报错信息:
user_id和cookie读入成功
ready
2
正在进行第 1 次停顿,防止访问次数过多
正在进行第 2 次停顿,防止访问次数过多
正在进行第 3 次停顿,防止访问次数过多
正在进行第 4 次停顿,防止访问次数过多
1 word ok
1 picurl ok
1 sleep
2 word ok
2 picurl ok
2 sleep
正在进行第 5 次停顿,防止访问次数过多
文字微博爬取完毕
图片链接爬取完毕
该用户原创微博中不存在图片
原创微博爬取完毕,共11条,保存路径D:\TEMP\weibo/5269080015
Traceback (most recent call last):
File "weibo.py", line 128, in
print u'寰崥鍥剧墖鐖彇瀹屾瘯锛屽叡%d寮狅紝淇濆瓨璺緞%s'%(image_count -
1,image_path)
NameError: name 'image_path' is not defined

环境:
Windows 7 64bit 企业版
Python 2.7.13

图片无法爬取

有朋友遇到了爬取用户有原创图片,而运行完之后爬取不到微博图片这种问题吗

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.