xianhu / pspider Goto Github PK
View Code? Open in Web Editor NEW简单易用的Python爬虫框架,QQ交流群:597510560
Home Page: https://github.com/xianhu/PSpider
License: BSD 2-Clause "Simplified" License
简单易用的Python爬虫框架,QQ交流群:597510560
Home Page: https://github.com/xianhu/PSpider
License: BSD 2-Clause "Simplified" License
def set_start_url(self, url, keys=None, priority=0, deep=0)
url 为 fetch 的目标
priority 用于 priorityQueue,从 queue 取出的优先级
那么 keys, deep 的作用分别是什么?
deep 是 fetch url 的深度吗?如何与 self._max_deep = max_deep # default: 0, if -1, spider will not stop until all urls are fetched
一起起作用的?是如何实现 deep 的逻辑的?
keys
在 抓取豆瓣 中,通过 keys[0] 来区分抓取的是索引页面还是电影详情页
最好加下license
而且keys={"type": "360"}
似乎并没有起作用,( web_spider.set_start_url("http://zhushou.360.cn/", priority=0, keys={"type": "360"}, deep=0)
)
最近写爬虫经常遇到ip被封,于是爬了好多代理ip,但是代理ip可用性是个问题,超时、失效或者被封都要换ip,这块写起来很麻烦。我看PSpider里面没有这个模块,是通过别的方法绕过去了,还是暂时不需要呢,为什么我每次都遇到这个问题
您好,我这面找不到这个,这个User-Agent,是不是可以定义一个静态的,也可以写成一个动态的random
不太明白weibo_user.py 中的 WeiBoLogin self.cookie_jar, self.opener = None, None self.yundama = spider.YunDaMa("", "")
这段,调用的时候也出错说‘AttributeError: module 'spider' has no attribute 'YunDaMa'’ 去spider里找了下也并没有找到有YunDaMa相关,不知道这里是什么用法还是缺失是一个bug了。
2017-11-05 20:12:01,076 WARNING ThreadPool start: urls_count=1, fetcher_num=10, is_over=True
2017-11-05 20:12:06,091 WARNING ThreadPool status: running_tasks=2; fetch:[NOT=0, SUCC=8, FAIL=0, 8/(5s)]; parse:[NOT=0, SUCC=8, FAIL=0, 8/(5s)]; save:[NOT=211, SUCC=6, FAIL=0, 6/(5s)]; total_seconds=5
2017-11-05 20:12:11,107 WARNING ThreadPool status: running_tasks=2; fetch:[NOT=0, SUCC=15, FAIL=0, 7/(5s)]; parse:[NOT=0, SUCC=15, FAIL=0, 7/(5s)]; save:[NOT=407, SUCC=16, FAIL=0, 10/(5s)]; total_seconds=10
2017-11-05 20:12:16,122 WARNING ThreadPool status: running_tasks=2; fetch:[NOT=0, SUCC=19, FAIL=0, 4/(5s)]; parse:[NOT=0, SUCC=19, FAIL=0, 4/(5s)]; save:[NOT=531, SUCC=28, FAIL=0, 12/(5s)]; total_seconds=15
2017-11-05 20:12:21,138 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 2/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 2/(5s)]; save:[NOT=591, SUCC=31, FAIL=0, 3/(5s)]; total_seconds=20
2017-11-05 20:12:26,151 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=579, SUCC=43, FAIL=0, 12/(5s)]; total_seconds=25
2017-11-05 20:12:31,161 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=568, SUCC=54, FAIL=0, 11/(5s)]; total_seconds=30
2017-11-05 20:12:36,176 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=566, SUCC=56, FAIL=0, 2/(5s)]; total_seconds=35
2017-11-05 20:12:41,190 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=561, SUCC=61, FAIL=0, 5/(5s)]; total_seconds=40
2017-11-05 20:12:46,202 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=560, SUCC=62, FAIL=0, 1/(5s)]; total_seconds=45
2017-11-05 20:12:51,218 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=558, SUCC=64, FAIL=0, 2/(5s)]; total_seconds=50
2017-11-05 20:12:56,227 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=554, SUCC=68, FAIL=0, 4/(5s)]; total_seconds=55
2017-11-05 20:13:01,230 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=548, SUCC=74, FAIL=0, 6/(5s)]; total_seconds=60
2017-11-05 20:13:06,245 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=522, SUCC=100, FAIL=0, 26/(5s)]; total_seconds=65
2017-11-05 20:13:11,261 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=514, SUCC=108, FAIL=0, 8/(5s)]; total_seconds=70
2017-11-05 20:13:16,270 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=495, SUCC=127, FAIL=0, 19/(5s)]; total_seconds=75
2017-11-05 20:13:21,277 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=489, SUCC=133, FAIL=0, 6/(5s)]; total_seconds=80
2017-11-05 20:13:26,282 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=477, SUCC=145, FAIL=0, 12/(5s)]; total_seconds=85
2017-11-05 20:13:31,298 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=460, SUCC=162, FAIL=0, 17/(5s)]; total_seconds=90
2017-11-05 20:13:36,300 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=453, SUCC=169, FAIL=0, 7/(5s)]; total_seconds=95
2017-11-05 20:13:41,315 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=447, SUCC=175, FAIL=0, 6/(5s)]; total_seconds=100
2017-11-05 20:13:46,326 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=441, SUCC=181, FAIL=0, 6/(5s)]; total_seconds=105
2017-11-05 20:13:51,342 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=436, SUCC=186, FAIL=0, 5/(5s)]; total_seconds=110
2017-11-05 20:13:56,357 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=423, SUCC=199, FAIL=0, 13/(5s)]; total_seconds=115
2017-11-05 20:14:01,362 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=412, SUCC=210, FAIL=0, 11/(5s)]; total_seconds=120
2017-11-05 20:14:06,378 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=401, SUCC=221, FAIL=0, 11/(5s)]; total_seconds=125
2017-11-05 20:14:11,394 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=388, SUCC=234, FAIL=0, 13/(5s)]; total_seconds=130
2017-11-05 20:14:16,402 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=383, SUCC=239, FAIL=0, 5/(5s)]; total_seconds=135
2017-11-05 20:14:21,405 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=379, SUCC=243, FAIL=0, 4/(5s)]; total_seconds=140
2017-11-05 20:14:26,421 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=361, SUCC=261, FAIL=0, 18/(5s)]; total_seconds=145
2017-11-05 20:14:31,430 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=352, SUCC=270, FAIL=0, 9/(5s)]; total_seconds=150
2017-11-05 20:14:36,443 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=351, SUCC=271, FAIL=0, 1/(5s)]; total_seconds=155
2017-11-05 20:14:41,459 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=350, SUCC=272, FAIL=0, 1/(5s)]; total_seconds=160
2017-11-05 20:14:46,461 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=343, SUCC=279, FAIL=0, 7/(5s)]; total_seconds=165
2017-11-05 20:14:51,467 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=335, SUCC=287, FAIL=0, 8/(5s)]; total_seconds=170
2017-11-05 20:14:56,483 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=334, SUCC=288, FAIL=0, 1/(5s)]; total_seconds=175
2017-11-05 20:15:01,484 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=331, SUCC=291, FAIL=0, 3/(5s)]; total_seconds=180
2017-11-05 20:15:06,490 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=331, SUCC=291, FAIL=0, 0/(5s)]; total_seconds=185
2017-11-05 20:15:11,506 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=330, SUCC=292, FAIL=0, 1/(5s)]; total_seconds=190
2017-11-05 20:15:16,519 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=313, SUCC=309, FAIL=0, 17/(5s)]; total_seconds=195
2017-11-05 20:15:21,530 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=309, SUCC=313, FAIL=0, 4/(5s)]; total_seconds=200
2017-11-05 20:15:26,545 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=298, SUCC=324, FAIL=0, 11/(5s)]; total_seconds=205
2017-11-05 20:15:31,550 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=286, SUCC=336, FAIL=0, 12/(5s)]; total_seconds=210
2017-11-05 20:15:36,566 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=277, SUCC=345, FAIL=0, 9/(5s)]; total_seconds=215
2017-11-05 20:15:41,582 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=271, SUCC=351, FAIL=0, 6/(5s)]; total_seconds=220
2017-11-05 20:15:46,592 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=258, SUCC=364, FAIL=0, 13/(5s)]; total_seconds=225
2017-11-05 20:15:51,593 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=257, SUCC=365, FAIL=0, 1/(5s)]; total_seconds=230
2017-11-05 20:15:56,609 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=249, SUCC=373, FAIL=0, 8/(5s)]; total_seconds=235
2017-11-05 20:16:01,624 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=241, SUCC=381, FAIL=0, 8/(5s)]; total_seconds=240
2017-11-05 20:16:06,625 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=237, SUCC=385, FAIL=0, 4/(5s)]; total_seconds=245
2017-11-05 20:16:11,641 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=235, SUCC=387, FAIL=0, 2/(5s)]; total_seconds=250
2017-11-05 20:16:16,656 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=232, SUCC=390, FAIL=0, 3/(5s)]; total_seconds=255
2017-11-05 20:16:21,672 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=232, SUCC=390, FAIL=0, 0/(5s)]; total_seconds=260
2017-11-05 20:16:26,688 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=226, SUCC=396, FAIL=0, 6/(5s)]; total_seconds=265
2017-11-05 20:16:31,703 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=222, SUCC=400, FAIL=0, 4/(5s)]; total_seconds=270
2017-11-05 20:16:36,716 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=219, SUCC=403, FAIL=0, 3/(5s)]; total_seconds=275
2017-11-05 20:16:41,728 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=214, SUCC=408, FAIL=0, 5/(5s)]; total_seconds=280
2017-11-05 20:16:46,744 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=211, SUCC=411, FAIL=0, 3/(5s)]; total_seconds=285
2017-11-05 20:16:51,750 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=207, SUCC=415, FAIL=0, 4/(5s)]; total_seconds=290
2017-11-05 20:16:56,765 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=193, SUCC=429, FAIL=0, 14/(5s)]; total_seconds=295
2017-11-05 20:17:01,781 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=191, SUCC=431, FAIL=0, 2/(5s)]; total_seconds=300
2017-11-05 20:17:06,796 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=183, SUCC=439, FAIL=0, 8/(5s)]; total_seconds=305
2017-11-05 20:17:11,797 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=172, SUCC=450, FAIL=0, 11/(5s)]; total_seconds=310
2017-11-05 20:17:16,813 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=158, SUCC=464, FAIL=0, 14/(5s)]; total_seconds=315
2017-11-05 20:17:21,828 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=155, SUCC=467, FAIL=0, 3/(5s)]; total_seconds=320
2017-11-05 20:17:26,843 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=145, SUCC=477, FAIL=0, 10/(5s)]; total_seconds=325
2017-11-05 20:17:31,859 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=140, SUCC=482, FAIL=0, 5/(5s)]; total_seconds=330
2017-11-05 20:17:36,864 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=139, SUCC=483, FAIL=0, 1/(5s)]; total_seconds=335
2017-11-05 20:17:41,880 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=135, SUCC=487, FAIL=0, 4/(5s)]; total_seconds=340
2017-11-05 20:17:46,895 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=118, SUCC=504, FAIL=0, 17/(5s)]; total_seconds=345
2017-11-05 20:17:51,902 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=106, SUCC=516, FAIL=0, 12/(5s)]; total_seconds=350
2017-11-05 20:17:56,917 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=90, SUCC=532, FAIL=0, 16/(5s)]; total_seconds=355
2017-11-05 20:18:01,933 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=86, SUCC=536, FAIL=0, 4/(5s)]; total_seconds=360
2017-11-05 20:18:06,949 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=84, SUCC=538, FAIL=0, 2/(5s)]; total_seconds=365
2017-11-05 20:18:11,960 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=83, SUCC=539, FAIL=0, 1/(5s)]; total_seconds=370
2017-11-05 20:18:16,975 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=74, SUCC=548, FAIL=0, 9/(5s)]; total_seconds=375
2017-11-05 20:18:21,983 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=59, SUCC=563, FAIL=0, 15/(5s)]; total_seconds=380
2017-11-05 20:18:26,999 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=52, SUCC=570, FAIL=0, 7/(5s)]; total_seconds=385
2017-11-05 20:18:32,014 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=43, SUCC=579, FAIL=0, 9/(5s)]; total_seconds=390
2017-11-05 20:18:37,030 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=34, SUCC=588, FAIL=0, 9/(5s)]; total_seconds=395
2017-11-05 20:18:42,041 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=28, SUCC=594, FAIL=0, 6/(5s)]; total_seconds=400
2017-11-05 20:18:47,053 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=27, SUCC=595, FAIL=0, 1/(5s)]; total_seconds=405
2017-11-05 20:18:52,063 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=27, SUCC=595, FAIL=0, 0/(5s)]; total_seconds=410
2017-11-05 20:18:57,079 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=22, SUCC=600, FAIL=0, 5/(5s)]; total_seconds=416
2017-11-05 20:19:02,095 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=16, SUCC=606, FAIL=0, 6/(5s)]; total_seconds=421
2017-11-05 20:19:07,110 WARNING ThreadPool status: running_tasks=0; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=0, SUCC=623, FAIL=0, 17/(5s)]; total_seconds=426
2017-11-05 20:19:12,115 WARNING ThreadPool status: running_tasks=0; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=0, SUCC=623, FAIL=0, 0/(5s)]; total_seconds=431
2017-11-05 20:19:12,115 WARNING ThreadPool end: fetcher_num=10, is_over=True, fetch:[SUCC=21, FAIL=0]; parse[SUCC=21, FAIL=0]; save:[SUCC=623, FAIL=0]
2017-11-05 20:19:12,115 WARNING ThreadPool start: urls_count=1, fetcher_num=10, is_over=True
2017-11-05 20:19:17,129 WARNING ThreadPool status: running_tasks=2; fetch:[NOT=0, SUCC=6, FAIL=0, 6/(5s)]; parse:[NOT=0, SUCC=6, FAIL=0, 6/(5s)]; save:[NOT=147, SUCC=7, FAIL=0, 7/(5s)]; total_seconds=5
2017-11-05 20:19:22,131 WARNING ThreadPool status: running_tasks=2; fetch:[NOT=0, SUCC=11, FAIL=0, 5/(5s)]; parse:[NOT=0, SUCC=11, FAIL=0, 5/(5s)]; save:[NOT=291, SUCC=19, FAIL=0, 12/(5s)]; total_seconds=10
2017-11-05 20:19:27,143 WARNING ThreadPool status: running_tasks=2; fetch:[NOT=0, SUCC=19, FAIL=0, 8/(5s)]; parse:[NOT=0, SUCC=19, FAIL=0, 8/(5s)]; save:[NOT=531, SUCC=28, FAIL=0, 9/(5s)]; total_seconds=15
2017-11-05 20:19:32,158 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 2/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 2/(5s)]; save:[NOT=591, SUCC=31, FAIL=0, 3/(5s)]; total_seconds=20
2017-11-05 20:19:37,174 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=574, SUCC=48, FAIL=0, 17/(5s)]; total_seconds=25
2017-11-05 20:19:42,189 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=568, SUCC=54, FAIL=0, 6/(5s)]; total_seconds=30
2017-11-05 20:19:47,203 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=564, SUCC=58, FAIL=0, 4/(5s)]; total_seconds=35
2017-11-05 20:19:52,205 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=561, SUCC=61, FAIL=0, 3/(5s)]; total_seconds=40
2017-11-05 20:19:57,221 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=560, SUCC=62, FAIL=0, 1/(5s)]; total_seconds=45
2017-11-05 20:20:02,236 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=558, SUCC=64, FAIL=0, 2/(5s)]; total_seconds=50
2017-11-05 20:20:07,252 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=554, SUCC=68, FAIL=0, 4/(5s)]; total_seconds=55
2017-11-05 20:20:12,267 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=550, SUCC=72, FAIL=0, 4/(5s)]; total_seconds=60
2017-11-05 20:20:17,281 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=545, SUCC=77, FAIL=0, 5/(5s)]; total_seconds=65
2017-11-05 20:20:22,292 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=517, SUCC=105, FAIL=0, 28/(5s)]; total_seconds=70
2017-11-05 20:20:27,298 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=502, SUCC=120, FAIL=0, 15/(5s)]; total_seconds=75
2017-11-05 20:20:32,313 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=491, SUCC=131, FAIL=0, 11/(5s)]; total_seconds=80
2017-11-05 20:20:37,329 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=478, SUCC=144, FAIL=0, 13/(5s)]; total_seconds=85
2017-11-05 20:20:42,344 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=469, SUCC=153, FAIL=0, 9/(5s)]; total_seconds=90
2017-11-05 20:20:47,360 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=455, SUCC=167, FAIL=0, 14/(5s)]; total_seconds=95
2017-11-05 20:20:52,362 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=450, SUCC=172, FAIL=0, 5/(5s)]; total_seconds=100
2017-11-05 20:20:57,372 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=437, SUCC=184, FAIL=1, 13/(5s)]; total_seconds=105
2017-11-05 20:21:02,388 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=436, SUCC=185, FAIL=1, 1/(5s)]; total_seconds=110
2017-11-05 20:21:07,403 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=423, SUCC=198, FAIL=1, 13/(5s)]; total_seconds=115
2017-11-05 20:21:12,419 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=412, SUCC=209, FAIL=1, 11/(5s)]; total_seconds=120
2017-11-05 20:21:17,433 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=398, SUCC=223, FAIL=1, 14/(5s)]; total_seconds=125
2017-11-05 20:21:22,446 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=388, SUCC=233, FAIL=1, 10/(5s)]; total_seconds=130
2017-11-05 20:21:27,462 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=383, SUCC=238, FAIL=1, 5/(5s)]; total_seconds=135
2017-11-05 20:21:32,477 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=375, SUCC=246, FAIL=1, 8/(5s)]; total_seconds=140
2017-11-05 20:21:37,493 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=359, SUCC=261, FAIL=2, 16/(5s)]; total_seconds=145
2017-11-05 20:21:42,508 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=352, SUCC=268, FAIL=2, 7/(5s)]; total_seconds=150
2017-11-05 20:21:47,524 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=351, SUCC=269, FAIL=2, 1/(5s)]; total_seconds=155
2017-11-05 20:21:52,530 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=347, SUCC=273, FAIL=2, 4/(5s)]; total_seconds=160
2017-11-05 20:21:57,536 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=336, SUCC=284, FAIL=2, 11/(5s)]; total_seconds=165
2017-11-05 20:22:02,550 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=334, SUCC=286, FAIL=2, 2/(5s)]; total_seconds=170
2017-11-05 20:22:07,566 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=334, SUCC=286, FAIL=2, 0/(5s)]; total_seconds=175
2017-11-05 20:22:12,568 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=331, SUCC=289, FAIL=2, 3/(5s)]; total_seconds=180
2017-11-05 20:22:17,584 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=330, SUCC=290, FAIL=2, 1/(5s)]; total_seconds=185
2017-11-05 20:22:22,585 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=323, SUCC=297, FAIL=2, 7/(5s)]; total_seconds=190
2017-11-05 20:22:27,600 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=309, SUCC=311, FAIL=2, 14/(5s)]; total_seconds=195
2017-11-05 20:22:32,606 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=307, SUCC=313, FAIL=2, 2/(5s)]; total_seconds=200
2017-11-05 20:22:37,621 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=288, SUCC=332, FAIL=2, 19/(5s)]; total_seconds=205
2017-11-05 20:22:42,636 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=279, SUCC=341, FAIL=2, 9/(5s)]; total_seconds=210
2017-11-05 20:22:47,638 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=273, SUCC=347, FAIL=2, 6/(5s)]; total_seconds=215
2017-11-05 20:22:52,651 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=262, SUCC=358, FAIL=2, 11/(5s)]; total_seconds=220
2017-11-05 20:22:57,661 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=258, SUCC=362, FAIL=2, 4/(5s)]; total_seconds=225
2017-11-05 20:23:02,676 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=251, SUCC=369, FAIL=2, 7/(5s)]; total_seconds=230
2017-11-05 20:23:07,692 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=245, SUCC=375, FAIL=2, 6/(5s)]; total_seconds=235
2017-11-05 20:23:12,707 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=238, SUCC=382, FAIL=2, 7/(5s)]; total_seconds=240
2017-11-05 20:23:17,723 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=235, SUCC=385, FAIL=2, 3/(5s)]; total_seconds=245
2017-11-05 20:23:22,739 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=234, SUCC=386, FAIL=2, 1/(5s)]; total_seconds=250
2017-11-05 20:23:27,740 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=232, SUCC=388, FAIL=2, 2/(5s)]; total_seconds=255
2017-11-05 20:23:32,745 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=229, SUCC=391, FAIL=2, 3/(5s)]; total_seconds=260
2017-11-05 20:23:37,760 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=223, SUCC=397, FAIL=2, 6/(5s)]; total_seconds=265
2017-11-05 20:23:42,776 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=219, SUCC=401, FAIL=2, 4/(5s)]; total_seconds=270
2017-11-05 20:23:47,791 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=217, SUCC=403, FAIL=2, 2/(5s)]; total_seconds=275
2017-11-05 20:23:52,807 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=211, SUCC=409, FAIL=2, 6/(5s)]; total_seconds=280
2017-11-05 20:23:57,820 WARNING ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=209, SUCC=411, FAIL=2, 2/(5s)]; total_seconds=285
Process finished with exit code 1
I stop it when I found it rerun twice
感觉README里主要说的项目的构成,最好写个turtoial,别人看到也能更好上手
你好,交流群已经满了,加不进去
ln -sf /usr/local/bin/python3 /usr/bin/python
ln -sf /usr/bin/python2.6 /usr/bin/python2
/usr/bin/yum
sed -i 's/usr/bin/python/usr/bin/python2/g' /usr/bin/yum
笑虎大神你好:
刚接触爬虫这个领域,看了一下你的源代码,觉得设计的很好,我这里有个小小的疑问,你看我理解的对不对。
我的疑问是fetcher,parser,saver多线程之间是怎样相互协调的。
我的理解是在你的ThreadThreadPool里面有_number_dict这个变量,这个变量是所有其他线程共享的,其实相当于semaphore的想法,每次更新都需要lock起来,比如当fetcher获取到了新的url,parser就可以去根据这个信号量的变化进行下一步的工作。
你看我这样理解对吗?
谢谢
bitarray最新的.8只支持到python3.3,这个模块安装的时候一直报错
新手, 想修改parser
, 也就是inst_parse.py
, 用threads
方法用以抓取电影天堂下载链接(使用test_spider()
函数),以下是修改的parser
:
def htm_parse_2(self, priority: int, url: str, keys: object, deep: int, content: object) -> (int, list, list):
"""
parse the content of a url, you can rewrite this function, parameters and return refer to self.working()
"""
*_, html_text = content
url_list = [], save_list = []
if (self._max_deep < 0) or (deep < self._max_deep):
if not re.compile(r"/\d{8}/").search(url): #如果输入网址是列表网页 ,则抓取各个电影的下载链接网页
a_list = re.findall(r"<a href=\"(?P<url>[\w\W]{5,}?)\" class=\"ulink\">[\w\W]+?</a>", html_text, flags=re.IGNORECASE)
url_list = [(_url, keys, priority+1) for _url in [get_url_legal(href, url) for href in a_list]]
else:#如果输入网址是下载链接网页,则抓取下载链接
download_url = re.search(r"<td style=\"WORD-WRAP: break-word\"[\w\W]*?><a href=\"(?P<url>[\w\W]{5,}?)\">", html_text, flags=re.IGNORECASE)
save_list = [(download_url.group("url").strip(), datetime.datetime.now()), ] if download_url else []
return 1, url_list, save_list
另外修改初始的url
为http://www.ygdy8.net/html/gndy/oumei/list_7_12.html
一个其他部分不变,但是刚允许程序就结束了,log信息为:
WARNING:root:MonitorThread[monitor] start...
WARNING:root:ThreadPool set_start_url: keys=None, priority=0, deep=0, url=http://www.ygdy8.net/html/gndy/oumei/list_7_12.html
WARNING:root:ThreadPool start: fetcher_num=10, is_over=True
WARNING:root:FetchThread[fetcher-1] start...
WARNING:root:FetchThread[fetcher-2] start...
WARNING:root:FetchThread[fetcher-3] start...
WARNING:root:FetchThread[fetcher-4] start...
WARNING:root:FetchThread[fetcher-5] start...
WARNING:root:FetchThread[fetcher-6] start...
WARNING:root:FetchThread[fetcher-7] start...
WARNING:root:FetchThread[fetcher-8] start...
WARNING:root:FetchThread[fetcher-9] start...
WARNING:root:FetchThread[fetcher-10] start...
WARNING:root:ParseThread[parser] start...
WARNING:root:SaveThread[saver] start...
WARNING:root:ThreadPool status: running_tasks=0; fetch=(0, 0, 0/(5s)); parse=(0, 0, 0/(5s)); save=(0, 0, 0/(5s)); total_seconds=5
WARNING:root:FetchThread[fetcher-1] end...
WARNING:root:FetchThread[fetcher-2] end...
WARNING:root:FetchThread[fetcher-3] end...
WARNING:root:FetchThread[fetcher-4] end...
WARNING:root:FetchThread[fetcher-5] end...
WARNING:root:FetchThread[fetcher-6] end...
WARNING:root:FetchThread[fetcher-7] end...
WARNING:root:FetchThread[fetcher-8] end...
WARNING:root:FetchThread[fetcher-9] end...
WARNING:root:ParseThread[parser] end...
WARNING:root:SaveThread[saver] end...
WARNING:root:FetchThread[fetcher-10] end...
WARNING:root:ThreadPool status: running_tasks=0; fetch=(0, 0, 0/(5s)); parse=(0, 0, 0/(5s)); save=(0, 0, 0/(5s)); total_seconds=10
WARNING:root:MonitorThread[monitor] end...
WARNING:root:ThreadPool end: fetcher_num=10, is_over=True
也没调过threads 程序, 不知道该怎么调试,debug 模式也无法做到一步一步进行,请问问题出在哪里呢?另外可否推荐以下怎么调threads相关的程序,需要其他模块吗,比如winpdb等等?
谢谢!
请问一下,那个bid随机的策略还能用吗,之前我用了一下还能用,今天不能用了,你那边怎么样呢
应该以什么顺序读您的代码,才能够理解您的爬虫框架呢
在运行test_demos.py的时候,会遇到301错误。在修改demos_doubanmovies的fetch方法,使得重定向被允许之后,还是会出现403。请问还有哪些设置可以修改,以成功爬取豆瓣的电影数据呢?
我用的是Ubuntu16.04 ,装有python2.7.12 和 python3.5.2 。python 默认是指向python2.7.12。使用 python setup.py install 安装的话,有各种奇怪问题。
用demos_doubanmovies一次一个标签最多可以抓取三百部电影,怎么才能抓取更多的电影呢?谢谢!
请问这个框架在控制爬取速度上怎么设计的? 目前有些网站抓取过快会导致链接断开。另外某些站点需要验证码来通过下一页验证,例如**文书网,请问作者是怎样解决这些问题
like this
def image_fetch(self, url: str):
response = requests.get(url, headers={"User-Agent": make_random_useragent()}, stream=True, timeout=(3.05, 10))
payload = urlparse(url).path
_left_bound_pos = payload.rfind('/')
_right_bound_pos = payload.find('.', _left_bound_pos)
if (payload[_right_bound_pos + 1:] == 'jpeg' or payload[_right_bound_pos + 1:] == 'jpg') and \
response.headers['Content-Type'] == 'image/jpeg':
_ext = '.jpeg'
elif payload[_right_bound_pos + 1:] == 'gif' or response.headers['Content-Type'] == 'image/gif':
_ext = '.gif'
else:
_ext = '.jpeg'
return payload[_left_bound_pos + 1:_right_bound_pos] + _ext
你好!看之前的issues里面有提到说写了详细的文档,请问文档在哪里呢?谢谢!
大神考虑写一份中文文档吗?读起来方便
你好啊,请问将url_list设置成集合的话应该能避免一些重复的url,不是更好吗?
但是集合中的元素又不能是字典(keys),我可以将url_list改为集合,keys改为keys[1] 吗?除了:
当我使用 redis 取代 PriorityQueue 时,无法结束线程。
你在 dangdang 中提到 如果每一次爬取都反复开关
driver 开销太大
,我也是这样认为的,但没有实际测试过。
如果要复用 driver,那么怎么保证所有打开的driver都能关闭呢?
以下面代码为例,self.pool.finish_a_task(TPEnum.URL_FETCH)
以上的任意代码出现问题,都可能造成线程异常退出,导致程序无法正常结束。比较简单的解决办法是加个 try ... catch
def work_fetch(self):
# ----1
priority, url, keys, deep, critical, fetch_repeat, parse_repeat = self.pool.get_a_task(TPEnum.URL_FETCH)
# ----2
code, content = self.worker.working(url, keys, critical, fetch_repeat)
# ----3
if code > 0:
self.pool.update_number_dict(TPEnum.URL_FETCH, +1)
self.pool.add_a_task(TPEnum.HTM_PARSE, (priority, url, keys, deep, critical, fetch_repeat, parse_repeat, content))
elif code == 0:
priority += (1 if critical else 0)
self.pool.add_a_task(TPEnum.URL_FETCH, (priority, url, keys, deep, critical, fetch_repeat+1, parse_repeat))
else:
pass
# ----4
self.pool.finish_a_task(TPEnum.URL_FETCH)
return True
Traceback (most recent call last):
File "test.py", line 13, in
from bs4 import BeautifulSoup
ModuleNotFoundError: No module named 'bs4'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.