spiderclub / weibospider Goto Github PK
View Code? Open in Web Editor NEW:zap: A distributed crawler for weibo, building with celery and requests.
License: MIT License
:zap: A distributed crawler for weibo, building with celery and requests.
License: MIT License
hi
win7系统
按照readme配置方法一直走,到运行login-first这步时,无法登陆
=====================================================
pycharm的显示
F:\Anaconda\python.exe E:/pyCharm_files/weibospider-master/first_task_execution/login_first.py
F:\Anaconda\lib\site-packages\pymysql\cursors.py:166: Warning: (1366, "Incorrect string value: '\xD6\xD0\xB9\xFA\xB1\xEA...' for column 'VARIABLE_VALUE' at row 481")
result = self._query(query)
F:\Anaconda\lib\site-packages\pymysql\cursors.py:166: Warning: (1287, "'@@tx_isolation' is deprecated and will be removed in a future release. Please use '@@transaction_isolation' instead")
result = self._query(query)
2018-01-19 12:25:52 - crawler - INFO - The login task is starting...
celery的显示
E:\pyCharm_files\weibospider-master>celery -A tasks.workers -Q login_queue,user_
crawler,fans_followers,search_crawler,home_crawler worker -l info -c 1
-------------- celery@admin-PC v4.1.0 (latentcall)
---- **** -----
--- * *** * -- Windows-7-6.1.7601-SP1 2018-01-19 12:32:58
-- * - **** ---
[tasks]
. tasks.comment.crawl_comment_by_page
. tasks.comment.crawl_comment_page
. tasks.comment.execute_comment_task
. tasks.dialogue.crawl_dialogue
. tasks.dialogue.crawl_dialogue_by_comment_id
. tasks.dialogue.crawl_dialogue_by_comment_page
. tasks.dialogue.execute_dialogue_task
. tasks.downloader.download_img_task
. tasks.home.crawl_ajax_page
. tasks.home.crawl_weibo_datas
. tasks.home.execute_home_task
. tasks.login.execute_login_task
. tasks.login.login_task
. tasks.repost.crawl_repost_by_page
. tasks.repost.crawl_repost_page
. tasks.repost.execute_repost_task
. tasks.search.execute_search_task
. tasks.search.search_keyword
. tasks.user.crawl_follower_fans
. tasks.user.crawl_person_infos
. tasks.user.execute_user_task
不好意思,刚接触这个,不是很懂
在抓评论的时候,如果没有评论,那爬虫就会重复尝试,而实际上已经反馈了正确的内容。
这样会消耗爬虫资源。
我也不知道怎么改😂,我自己这边直接给注释了。
今天做微博爬虫的测试,发现了一个问题,在Centos6的部分机器(VM)上,通过
celery -A tasks.workers worker -l info -c 1
启动worker时,发现worker在启动后马上就停止了,没有报任何错误,通过-l debug
参数,也没发现特别有价值的信息。最后,把celery版本降级到了3.1.25,结果可以正常工作了。
同样代码在mac os上或者在我的centos实体机上却没有任何问题。
这个问题比较诡异,目前也还没找到比较好的解决方法。
这是@的链接
<a target="_blank" render="ext" extra-data="type=atname" href="http://weibo.com/n/%E6%A0%BC%E6%A0%BC?from=feed&loc=at" usercard="name=格格">@格格</a>
这是转发的链接
<a target="_blank" render="ext" extra-data="type=atname" href="http://weibo.com/n/%E5%AD%94%E5%B0%8F%E6%B4%9E?from=feed&loc=at" usercard="name=孔小洞">@孔小洞</a>
除了转发前面会有 // 外好像也没什么区别🙁,您在代码里也没有区分
登陆后,尝试爬取 http://s.weibo.com/weibo/关键字&Refer=STopic_box 的数据,但返回的页面似乎无法用xpath和css这类解析,用正则也无从下手,不知你能不能解析。期待能给点思路。
看了眼配置环境,可能配起来就吓跑一堆用户了,有没有可能出个docker的image?
sql文件中,建表 login_info 时,没有need_验证码 这个字段 ,希望重新更新一下SQL文件
获取了完整的网页,但是认为没有完整返回
if not is_complete(page):
count += 1
continue
在解析用户详细信息的时候发现
测试网页:
http://weibo.com/p/1005055667096018/info?mod=pedit_more
在提交Issue之前请先回答下面问题,谢谢!
博文中存在meoji时
python3 first_task_execution/home_first.py
尽量把你的操作过程描述清楚,最好能够复现问题。
能正确插入表
DB operation error,here are details:(pymysql.err.InternalError) (1366, "Incorrect string value: '\\xF0\\x9F\\x98\\x8E\\xE5\\xBD...' for column 'weibo_cont' at row 1") [SQL: 'INSERT INTO weibo_data (weibo_id, weibo_cont, weibo_img, weibo_video, repost_num, comment_num, praise_num, uid, is_origin, device, weibo_url, create_time, comment_crawled, repost_crawled, dialogue_crawled) VALUES (%(weibo_id)s, %(weibo_cont)s, %(weibo_img)s, %(weibo_video)s, %(repost_num)s, %(comment_num)s, %(praise_num)s, %(uid)s, %(is_origin)s, %(device)s, %(weibo_url)s, %(create_time)s, %(comment_crawled)s, %(repost_crawled)s, %(dialogue_crawled)s)'] [parameters: {'weibo_cont': '#周二型不型# 这套礼服驳头部分我们采用全真丝制作,配上极致的黑色作为底色无论任何时间场合,你都能成为一匹黑马😎彰显王者的风采~\xa0\xa0#成都西服高级定制# 更多绅士款欢迎预约到店试穿 \u200b\u200b\u200b\u200b', 'weibo_img': 'https://wx3.sinaimg.cn/thumb150/005ODqergy1fmub4fcxcgj31fi24e1kx.jpg;https://wx4.sinaimg.cn/thumb150/005ODqergy1fmub4mkpprj321w3344qp.jpg;https://wx1 ... (322 characters truncated) ... w334x42.jpg;https://wx4.sinaimg.cn/thumb150/005ODqergy1fmub4kgv4vj321w334kjl.jpg;https://wx2.sinaimg.cn/thumb150/005ODqergy1fmub4xr7btj333421w1b7.jpg', 'repost_crawled': 0, 'device': '', 'weibo_id': '4189261395361540', 'weibo_video': '', 'weibo_url': 'https://weibo.com/5328876591/FBs5ZmuMs?from=page_1006065328876591_profile&wvr=6&mod=weibotime', 'create_time': '2017-12-26 17:40', 'praise_num': 3, 'is_origin': 1, 'comment_crawled': 0, 'repost_num': 0, 'comment_num': 6, 'dialogue_crawled': 0, 'uid': '5328876591'}]
commit 5fc365b
Ubuntu 16.04
有
PS: 苹果博文emoji表情问题,见
http://blog.csdn.net/tongsh6/article/details/52292336
https://www.cnblogs.com/h--d/p/5712490.html
http://blog.csdn.net/qiaqia609/article/details/51161943
把weibo_data的weibo_cont字段字符集设置成utf8mb4_unicode_ci即可解决
weibo_comment的comment_cont字段也需要同样的更改
ALTER TABLE weibo.weibo_data MODIFY COLUMN `weibo_cont` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci ;
希望能在创建表时解决问题
由于程序会把抓取失败的url暂时存入redis,而未对其进行后续处理,所以导致redis堆积的url越来越多,此外,任务太多也会占用大量的内存,所以程序需要考虑数据过期删除
@ResolveWang 在介绍里说抓到了30万用户数据,可以分享一下吗?
你好,我做 舆情 爬取的时候,会按地区用某些字段进行[内容, 展开全部, 评论]爬取。
间隔是5-8s,16线程,单机。
会存在大量的翻页,差不多爬500-600条数据,微博就会提示我需要输入验证码。
问题:
是否能解答下?谢谢
1、python login_first.py
2、python user_first.py
2018-01-02 14:09:53 - crawler - INFO - the crawling url is http://weibo.com/p/1005051195242865/info?mod=pedit_more
[2018-01-02 14:09:53,646: INFO/ForkPoolWorker-1] the crawling url is http://weibo.com/p/1005051195242865/info?mod=pedit_more
2018-01-02 14:09:53 - crawler - WARNING - no cookies in cookies pool, please find out the reason
[2018-01-02 14:09:53,650: WARNING/ForkPoolWorker-1] no cookies in cookies pool, please find out the reason
(WeiboSpider)root@jian-spider:/home/ubuntu/weibospider# 2018-01-02 14:09:54 - crawler - ERROR - failed to crawl http://weibo.com/p/1005051195242865/info?mod=pedit_more,here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)
[2018-01-02 14:09:54,293: ERROR/ForkPoolWorker-1] failed to crawl http://weibo.com/p/1005051195242865/info?mod=pedit_more,here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)
[2018-01-02 14:09:54,304: ERROR/ForkPoolWorker-1] list index out of range
[2018-01-02 14:09:54,304: ERROR/ForkPoolWorker-1] list index out of range
[2018-01-02 14:09:54,305: ERROR/ForkPoolWorker-1] list index out of range
[2018-01-02 14:09:54,324: INFO/MainProcess] Received task: tasks.user.crawl_follower_fans[49a1e5cb-240c-4b0d-a767-e1664574b74e]
2018-01-02 14:09:54 - crawler - INFO - the crawling url is http://weibo.com/p/1005051195242865/follow?relate=fans&page=1#Pl_Official_HisRelation__60
[2018-01-02 14:09:54,329: INFO/ForkPoolWorker-1] the crawling url is http://weibo.com/p/1005051195242865/follow?relate=fans&page=1#Pl_Official_HisRelation__60
2018-01-02 14:09:54 - crawler - WARNING - no cookies in cookies pool, please find out the reason
[2018-01-02 14:09:54,331: WARNING/ForkPoolWorker-1] no cookies in cookies pool, please find out the reason
2018-01-02 14:09:54 - crawler - ERROR - failed to crawl http://weibo.com/p/1005051195242865/follow?relate=fans&page=1#Pl_Official_HisRelation__60,here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)
[2018-01-02 14:09:54,958: ERROR/ForkPoolWorker-1] failed to crawl http://weibo.com/p/1005051195242865/follow?relate=fans&page=1#Pl_Official_HisRelation__60,here are details:(535, b'5.7.11 the behavior of this user triggered some restrictions to this account'), stack is File "/home/ubuntu/weibospider/decorators/decorator.py", line 14, in time_limit
return func(*args, **kargs)
Line 84 in 7b227e5
大家都没疑问吗,对这个项目?是我文档写的太好了,还是代码写的太好了,大家都不提issue啊。很多朋友喜欢加微信交流,不过我建议大家能开issue,问题就可以在issue中统一回答了,方便别人
加个关键词字段,这样可以分类整理从关键词搜索到的微博。感谢😊
在提交Issue之前请先回答下面问题,谢谢!
1.你是怎么操作的?
尽量把你的操作过程描述清楚,并且能够复现
2.你期望的结果是什么?
3.实际上你得到的结果是什么?
4.你使用的是哪个版本的WeiboSpider? 你的操作系统是什么?g
用的1.7.2系统是deepin,用的虚拟环境。密码是正确的。
2018-04-17 21:44:32,720: INFO/MainProcess] Received task: tasks.login.login_task[e8e67e00-f0f2-4131-82a4-bc313dac75de]
[2018-04-17 21:44:32,827: ERROR/ForkPoolWorker-1] Task tasks.login.login_task[e8e67e00-f0f2-4131-82a4-bc313dac75de] raised unexpected: KeyError('showpin',)
Traceback (most recent call last):
File "/homen_gu/Desktop/weibospider-master/.envb/python3.5/site-packages/celery/app/trace.py", line 374, in trace_task
R = retval = fun(*args, **kwargs)
File "/homen_gu/Desktop/weibospider-master/.envb/python3.5/site-packages/celery/app/trace.py", line 629, in protected_call
return self.run(*args, **kwargs)
File "/homen_gu/Desktop/weibospider-master/tasks/login.py", line 12, in login_task
get_session(name, password)
File "/homen_gu/Desktop/weibospider-master/login/login.py", line 228, in get_session
url, yundama_obj, cid, session = do_login(name, password, proxy)
File "/homen_gu/Desktop/weibospider-master/login/login.py", line 206, in do_login
if server_data['showpin']:
KeyError: 'showpin'
2017-10-25 07:00:32 - crawler - INFO - the crawling url is http://s.weibo.com/weibo/%E6%B7%B1%E5%9C%B3%20%E5%86%B0%E9%9B%B9&scope=ori&suball=1&page=1
2017-10-25 07:00:50 - crawler - INFO - keyword 深圳 冰雹 has been crawled in this turn
操作之前我已经帮redis的内容都清空,我看了下代码,在tasks/search下判断该轮爬取是否结束有两处,一是通过微博mid看该微博是否在数据库存在,二是查看是否还有下一页,对于第一条,我觉得不是很合理,因为在多个关键字搜索的结果中是可能存在重复的微博内容的,不知道这样想是不是对的,我自己把return改成continue,但是还是发现爬取的页面数比较少就直接说该轮结束。
操作系统 Ubuntu 16.04
WeiboSpider 1.7.2
本来想要自己从头开始跑一遍的..发现进度太慢..能不能请博主再分享一下最近的数据 有关用户微博表 以及个人信息表的数据 做机器学习和NLP的原料..(ㄒoㄒ)
存储后端MySQL相对于消息队列redis更容易做容灾,而Redis是和Celery耦合在一起的,所以需要Celery官方支持的方案,才更容易也更稳定地解决broker的单点故障问题。
这里项目采用的方式是使用Redis Sentinel的方案,这种方案在Celery4.0以后就由官方支持了,所以如果有用户需要将本项目用于生产环境,可以做一下高可用。主要过程我已经写在了这篇文章中。
如果使用Redis Sentinel,那么配置文件中的 redis单机配置参数:redis->host和redis->port就可以直接不用修改,需要将redis->password修改成连接sentinel的password,把sentinel集群的参数写到 redis-sentinel这个域中。
如果用户不需要做高可用,那么直接在redis域设置sentinel: ''
即可
Every time I run tasks, the following will be printed on console.
'@@tx_isolation' is deprecated and will be removed in a future release. Please use '@@transaction_isolation' instead
Seems like the old version 1.1.9 of SQLAlchemy assigned in requirement.txt to be the culprit.
您在文档里提到,开两个beat会导致任务重复。
而在上一次beat开启的任务还没完成的情况下,又重新放入任务会导致任务。这在任务特别多而性能不是很好的情况下一定会出现。
建议:将wb_data.set_weibo_comment_crawled(weibo_data.weibo_id)
这行代码加入到excute_comment_task
方法中(已经提了一个pr)。
#11
ps:代码中没有判断该微博的转发或评论是否爬取完毕
不是很熟悉这个项目··今天看到尝试了下 环境在win下,启动了单个worker,然后celery似乎有问题,刚接触celery,求指教
[2017-09-08 21:02:48,769: ERROR/MainProcess] Task handler raised error: ValueError('not enough values to unpack (expected 3, got 0)',) Traceback (most recent call last): File "d:\anaconda3\lib\site-packages\billiard\pool.py", line 358, in workloop result = (True, prepare_result(fun(*args, **kwargs))) File "d:\anaconda3\lib\site-packages\celery\app\trace.py", line 525, in _fast_trace_task tasks, accept, hostname = _loc ValueError: not enough values to unpack (expected 3, got 0)
您好,我电脑登陆pc微博,发现,每一页的微博,有动态加载,看了一下请求头是类似于这样的:http://weibo.com/p/aj/v6/mblog/mbloglist?ajwvr=6&domain=100505&from=myfollow_all&is_all=1&pagebar=0&pl_name=Pl_Official_MyProfileFeed__22&id=1005051671103241&script_uri=/amandababe&feed_type=0&page=1&pre_page=1&domain_op=100505&__rnd=1499221602230后面的RND是怎么构造的,或者你是怎么解决动态加载的问题的?
比如说user_first.py中,程序中使用了任务调用任务的方式:
execute_user_task 调用任务 crawl_person_infos,crawl_person_infos再在内部调用任务crawl_follower_fans。问题是:
git clone https://github.com/SpiderClub/weibospider.git
获得python first_task_execution/login_first.py # 发送登陆任务
,python first_task_execution/user_first.py # 抓取用户个人信息
python first_task_execution/home_first.py # 抓取用户主页微博
python first_task_execution/repost_first.py # 抓取转发微博
数据库里面有这个表的,但是当抓文本数据是就显示这个。用户数据都在抓,请您指导一下这个什么问题?
[2017-10-19 11:39:09,537: ERROR/ForkPoolWorker-3] db operation error,here are details(pymysql.err.ProgrammingError) (1146, "Table 'weibo.weibo_data' doesn't exist") [SQL: 'SELECT weibo_data.id AS weibo_data_id, weibo_data.weibo_id AS weibo_data_weibo_id, weibo_data.weibo_cont AS weibo_data_weibo_cont, weibo_data.weibo_img AS weibo_data_weibo_img, weibo_data.weibo_video AS weibo_data_weibo_video, weibo_data.repost_num AS weibo_data_repost_num, weibo_data.comment_num AS weibo_data_comment_num, weibo_data.praise_num AS weibo_data_praise_num, weibo_data.uid AS weibo_data_uid, weibo_data.is_origin AS weibo_data_is_origin, weibo_data.device AS weibo_data_device, weibo_data.weibo_url AS weibo_data_weibo_url, weibo_data.create_time AS weibo_data_create_time, weibo_data.comment_crawled AS weibo_data_comment_crawled, weibo_data.repost_crawled AS weibo_data_repost_crawled \nFROM weibo_data \nWHERE weibo_data.weibo_id = %(weibo_id_1)s \n LIMIT %(param_1)s'] [parameters: {'param_1': 1, 'weibo_id_1': '4142197962445365'}]
tasks/home.py
# only crawls origin weibo
HOME_URL = 'http://weibo.com/u/{}?is_ori=1&is_tag=0&profile_ftype=1&page={}'
what will happen when I replace the url with not origin?
在提交Issue之前请先回答下面问题,谢谢!
1.你是怎么操作的?
nohup celery -A tasks.workers -Q login_queue,user_crawler,fans_followers,search_crawler,home_crawler worker -l info -c 1 &
nohup python login_first.py &
nohup celery beat -A tasks.workers -l info &
nohup python search_first.py &
2.你期望的结果是什么?
我想让程序一直跑每天都爬取一次所有的关键词
3.实际上你得到的结果是什么?
我爬取一天得到了1w8k条数据,然后账号1变0,后来我登录微博发现账号还可以用,就update了数据库0变1,然后重启任务,但是使用 ps aux|grep celery 抓取任务时发现不是开始时候的:
root 1281 0.3 2.9 180644 59444 pts/8 S 13:02 0:01 /usr/bin/python3 /usr/local/bin/celery -A tasks.workers -Q login_queue,user_crawler,fans_followers,search_crawler,home_crawler worker -l info -c 1
root 1286 4.7 3.6 275440 73800 pts/8 S 13:02 0:16 /usr/bin/python3 /usr/local/bin/celery -A tasks.workers -Q login_queue,user_crawler,fans_followers,search_crawler,home_crawler worker -l info -c 1
root 1311 0.3 2.8 188344 57460 pts/8 S 13:04 0:00 /usr/bin/python3 /usr/local/bin/celery beat -A tasks.workers -l info
而是只有:
root 1311 0.3 2.8 188344 57460 pts/8 S 13:04 0:00 /usr/bin/python3 /usr/local/bin/celery beat -A tasks.workers -l info
并且数据库增加了几十个条目后暂停了
4.你使用的是哪个版本的WeiboSpider? 你的操作系统是什么?是否有读本项目的[常见问题]
我使用的是release的最后一个版本,操作系统是ubuntu14.04,读了几遍文档但是还是不知道怎么解决。
comsumer: cannot connect to redis://**@127.0.0.1:6397/6 : misconf redis is configured to save RDB snapshots, but is currently not able to persist on disk ,commands that may modify the data set are disabled. please check redis logs for details about
RT
在借用你的代码的时候,发现了一个问题
in page_parse/home.py
def get_weibo_list(html):
...
weibo_datas.append(wb_data) #(140行)
...
wb_data是一个dict,python的按引用传递会把wb_data的地址append进去,循环修改wb_data以后会在整个数组中存的都是最后一条微博的复制内容
但是我并没有用过这个库,并不知道在你这会不会也出现这个bug
我的修改方法是
weibo_datas.append(dict(wb_data))
在提交Issue之前请先回答下面问题,谢谢!
1.你是怎么操作的?
尽量把你的操作过程描述清楚,最好能够复现问题。
2.你期望的结果是什么?
3.实际上你得到的结果是什么?
4.你使用的是哪个版本的WeiboSpider? 你的操作系统是什么?是否有读本项目的常见问题?
如果账号里面有高危号,应该考虑换个cookies,而不是放弃任务
在微博评论页下有个查看对话的按钮,可以查看评论中的对话。是十分有用的对话数据来源。
本来项目中配置的是20小时定时登录一次,但是由于所有任务都会抢占资源,所以导致定时登录任务可能不会执行,从而系统错误判定为无cookie可用而停掉项目。
一个理想的解决方法是任务有优先级,当任务堆积的时候,让登录任务优先执行。但事实上使用celery+redis的方案,celery官方并没有给定优先级任务,所以只有用别的方法来解决。目前我能想到的有两种,欢迎大家提供更好的方法。
1.让登录任务单独在一个或者多个节点运行,这样就没有其他任务和登录任务抢占资源。
2.让想执行登录和抓取任务的节点,同时起两个worker,一个执行登录任务,另外一个执行抓取任务。
@ResolveWang 您好,最近在学习您关于分布式爬虫项目的思路,想通过该方式构建一个自己需要的爬虫项目。虽然也看了wiki上面的文档,但目前仍有点疑问想咨询下。主要是关于请求之间具有优先调用关系的问题,比如两个请求URL_A需要在URL_B之前被调用,并且在URL_A请求中解析出来的结果需要用到URL_B中,不知道您是否有什么好的建议?不胜感激。
每个worker是不是自动被分配一个名称,每启动一个worker,其他的worker都会显示同步这个worker.,那个名字就是主机名
我想要在代码里获得自己的名字应该怎么做?
这个名字我用来修改cookies获取规则,因为当前的cookies获取规则下,要被封号就一起被封。
Readme中的参见SQL表过期了
在运行项目之前,需要在数据库中建表,建表语句参见sql表,也需要把自己的多个微博账号存入表(weibo.login_info)中,把 搜索关键词存入关键词表(keywords)中。这里我已经预插入用户抓取用的一些种子用户了,如果想抓别的用户,请在种子用户表(seed_ids)中做插入操作。
https://github.com/ResolveWang/weibospider/blob/master/config/sql/spider.sql
版本:v1.7.2
操作:
1、source env.sh
2、python admin/manage.py runserver 0.0.0.0:8000
3、celery -A tasks.workers -Q login_queue,user_crawler,fans_followers,search_crawler,home_crawler,ajax_home_crawler,comment_crawler,comment_page_crawler,repost_crawler,repost_page_crawler worker -l info -c 1 &
4、python first_task_execution/login_first.py
已配置数据库、云打码
云打码已出现一次打码记录
但脚本执行出现以下问题:
[2017-12-31 16:51:02,851: ERROR/ForkPoolWorker-1] Task tasks.login.login_task[395b4991-9a63-4038-8dfc-3aabecf0cc8c] raised unexpected: MissingSchema("Invalid URL 'login_need_pincode': No schema supplied. Perhaps you meant http://login_need_pincode?",)
Traceback (most recent call last):
File "/home/ubuntu/weibospider/.env/lib/python3.5/site-packages/celery/app/trace.py", line 374, in trace_task
R = retval = fun(*args, **kwargs)
File "/home/ubuntu/weibospider/.env/lib/python3.5/site-packages/celery/app/trace.py", line 629, in protected_call
return self.run(*args, **kwargs)
File "/home/ubuntu/weibospider/tasks/login.py", line 12, in login_task
get_session(name, password)
File "/home/ubuntu/weibospider/login/login.py", line 229, in get_session
rs_cont = session.get(url, headers=headers)
File "/home/ubuntu/weibospider/.env/lib/python3.5/site-packages/requests/sessions.py", line 501, in get
return self.request('GET', url, **kwargs)
File "/home/ubuntu/weibospider/.env/lib/python3.5/site-packages/requests/sessions.py", line 474, in request
prep = self.prepare_request(req)
File "/home/ubuntu/weibospider/.env/lib/python3.5/site-packages/requests/sessions.py", line 407, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "/home/ubuntu/weibospider/.env/lib/python3.5/site-packages/requests/models.py", line 302, in prepare
self.prepare_url(url, params)
File "/home/ubuntu/weibospider/.env/lib/python3.5/site-packages/requests/models.py", line 382, in prepare_url
raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL 'login_need_pincode': No schema supplied. Perhaps you meant http://login_need_pincode?
why?
在提交Issue之前请先回答下面问题,谢谢!
1.你是怎么操作的?
尽量把你的操作过程描述清楚,最好能够复现问题。
2.你期望的结果是什么?
3.实际上你得到的结果是什么?
4.你使用的是哪个版本的WeiboSpider? 你的操作系统是什么?是否有读本项目的常见问题?
演示视频链接失效了,大佬能否更新下呢,我是新手,想看下演示视频
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.