chenjiandongx / async-proxy-pool Goto Github PK
View Code? Open in Web Editor NEW🔅 Python3 异步爬虫代理池
License: MIT License
🔅 Python3 异步爬虫代理池
License: MIT License
起了sanic, 訪問http://localhost:3289/pop失敗
{"description":"Internal Server Error","status":500,"message":"The server encountered an internal error and cannot complete your request."}
从总的来看, 是访问一个代理网站,再访问另一个网站
```
for func in all_funcs:
for proxy in func():
redis_conn.add_proxy(proxy)
logger.info("Crawler √ {}".format(proxy)
```
redis 是不是可以用aioredis代替提升一下性能
被识别的IP: {'http': 'http://91.191.250.142:31059'}
被识别的IP: {'http': 'http://96.9.73.80:56891'}
被识别的IP: {'http': 'http://91.106.178.160:8080'}
被识别的IP: {'http': 'http://95.104.54.227:42119'}
被识别的IP: {'http': 'http://80.78.75.59:38253'}
被识别的IP: {'http': 'http://91.239.180.202:8080'}
被识别的IP: {'http': 'http://79.104.197.58:8080'}
被识别的IP: {'http': 'http://78.157.254.42:3128'}
被识别的IP: {'http': 'http://51.158.68.68:8811'}
被识别的IP: {'http': 'http://47.89.37.177:3128'}
被识别的IP: {'http': 'http://47.103.77.76:3128'}
被识别的IP: {'http': 'http://46.254.217.54:53281'}
被识别的IP: {'http': 'http://43.225.192.225:50878'}
被识别的IP: {'http': 'http://42.115.88.71:62225'}
被识别的IP: {'http': 'http://36.89.191.73:23500'}
被识别的IP: {'http': 'http://36.89.183.113:34131'}
被识别的IP: {'http': 'http://36.67.239.23:8080'}
被识别的IP: {'http': 'http://217.145.150.19:42191'}
被识别的IP: {'http': 'http://210.5.106.202:43147'}
被识别的IP: {'http': 'http://207.148.68.190:80'}
被识别的IP: {'http': 'http://202.52.234.236:35931'}
被识别的IP: {'http': 'http://202.21.98.150:60640'}
被识别的IP: {'http': 'http://201.219.213.197:999'}
被识别的IP: {'http': 'http://201.217.245.229:49160'}
被识别的IP: {'http': 'http://200.89.178.67:80'}
被识别的IP: {'http': 'http://200.41.174.2:8080'}
被识别的IP: {'http': 'http://197.159.23.174:39150'}
被识别的IP: {'http': 'http://195.88.16.155:36141'}
被识别的IP: {'http': 'http://195.211.162.116:47215'}
被识别的IP: {'http': 'http://193.95.106.249:3128'}
被识别的IP: {'http': 'http://190.109.167.9:57608'}
被识别的IP: {'http': 'http://186.250.119.137:8080'}
被识别的IP: {'http': 'http://186.227.119.207:6699'}
被识别的IP: {'http': 'http://183.89.71.76:8080'}
被识别的IP: {'http': 'http://182.52.51.59:38238'}
被识别的IP: {'http': 'http://182.34.32.195:9999'}
被识别的IP: {'http': 'http://182.253.234.236:8080'}
被识别的IP: {'http': 'http://182.253.130.12:3128'}
被识别的IP: {'http': 'http://181.209.82.154:23500'}
被识别的IP: {'http': 'http://181.114.63.129:8085'}
被识别的IP: {'http': 'http://180.122.180.99:9999'}
被识别的IP: {'http': 'http://175.42.128.210:9999'}
被识别的IP: {'http': 'http://175.10.145.183:8060'}
被识别的IP: {'http': 'http://171.100.102.154:30090'}
被识别的IP: {'http': 'http://167.172.25.209:3128'}
被识别的IP: {'http': 'http://151.252.72.211:53281'}
被识别的IP: {'http': 'http://14.232.245.221:8080'}
被识别的IP: {'http': 'http://134.35.27.242:8080'}
被识别的IP: {'http': 'http://124.122.167.76:8080'}
被识别的IP: {'http': 'http://119.82.253.175:31500'}
被识别的IP: {'http': 'http://119.15.90.38:42832'}
被识别的IP: {'http': 'http://118.25.183.14:8082'}
被识别的IP: {'http': 'http://118.118.234.56:80'}
被识别的IP: {'http': 'http://117.197.41.54:23500'}
被识别的IP: {'http': 'http://116.193.221.69:49936'}
被识别的IP: {'http': 'http://115.223.3.42:80'}
被识别的IP: {'http': 'http://115.178.97.190:23500'}
被识别的IP: {'http': 'http://112.84.178.21:8888'}
被识别的IP: {'http': 'http://106.51.84.243:8080'}
被识别的IP: {'http': 'http://106.14.14.20:3128'}
被识别的IP: {'http': 'http://104.244.77.254:8080'}
被识别的IP: {'http': 'http://103.210.141.31:8080'}
被识别的IP: {'http': 'http://101.95.115.196:8080'}
被识别的IP: {'http': 'http://101.37.118.54:8888'}
代理错误:{'https': 'https://49.76.111.75:8841'}
连接超时:{'https': 'https://47.110.130.152:8654'}
代理错误:{'https': 'https://218.60.8.99:8870'}
代理错误:{'https': 'https://218.2.226.42:8680'}
连接超时:{'https': 'https://180.122.224.30:9056'}
代理错误:{'https': 'https://113.204.227.218:8971'}
被识别的IP: {'http': 'http://95.38.14.3:8080'}
被识别的IP: {'http': 'http://95.31.211.111:80'}
被识别的IP: {'http': 'http://95.156.125.190:41870'}
被识别的IP: {'http': 'http://94.50.33.89:8080'}
被识别的IP: {'http': 'http://94.200.195.218:8080'}
被识别的IP: {'http': 'http://94.191.32.225:8118'}
被识别的IP: {'http': 'http://94.182.48.146:8080'}
被识别的IP: {'http': 'http://93.91.112.247:41258'}
被识别的IP: {'http': 'http://93.116.57.4:54838'}
被识别的IP: {'http': 'http://89.216.48.230:44061'}
被识别的IP: {'http': 'http://88.250.65.219:8080'}
被识别的IP: {'http': 'http://83.2.189.66:38011'}
被识别的IP: {'http': 'http://77.236.226.153:8080'}
被识别的IP: {'http': 'http://74.85.157.198:8080'}
被识别的IP: {'http': 'http://63.249.67.70:53281'}
被识别的IP: {'http': 'http://61.19.27.201:8080'}
被识别的IP: {'http': 'http://5.190.217.121:8080'}
被识别的IP: {'http': 'http://49.89.223.141:9999'}
被识别的IP: {'http': 'http://49.86.181.243:9999'}
被识别的IP: {'http': 'http://49.86.181.192:9999'}
被识别的IP: {'http': 'http://49.70.99.52:9999'}
被识别的IP: {'http': 'http://49.70.7.179:9999'}
被识别的IP: {'http': 'http://49.128.178.22:31257'}
被识别的IP: {'http': 'http://47.94.160.143:8635'}
被识别的IP: {'http': 'http://47.75.39.146:80'}
被识别的IP: {'http': 'http://47.244.50.194:8081'}
被识别的IP: {'http': 'http://46.20.59.243:47497'}
被识别的IP: {'http': 'http://45.248.42.200:8080'}
被识别的IP: {'http': 'http://43.240.5.97:31777'}
被识别的IP: {'http': 'http://41.33.22.186:8080'}
本机访问没任何问题,但是外部(虚拟机外部)访问不了。
其他应用没问题,所以应该不是网络问题。
r = session.get('http://127.0.0.1:3289/count')
r.json()
{'count': '4200'}
抓到了特别多的https,是不是这个影响的呢?
测试代理: http://127.0.0.1:3289/get/10
测试网站: https://zhihu.com
测试次数: 1000
成功次数: 246
失败次数: 754
成功率: 0.246
测试代理: http://127.0.0.1:3289/get/200
测试网站: https://zhihu.com
测试次数: 1000
成功次数: 237
失败次数: 763
成功率: 0.237
Site: http://pzzqz.com/
API Doc: https://pzzqz.com/settings/profile
[INFO] Goin' Fast @ http://localhost:3289
[ERROR] Unable to start server
OSError: [Errno 99] error while attempting to bind on address ('::1', 3289, 0, 0): cannot assign requested address
Pyhton3 异步爬虫代理池,是Python
作者您好,我们也是一家专业做IP代理的服务商,极速HTTP,我们注册认证会送10000IP(可以帮助您的学者适当薅羊毛试用 :) 。想跟您谈谈是否能够达成商业推广上的合作。如果您,有意愿的话,可以联系我,微信:13982004324 谢谢(如果没有意愿的话,抱歉,打扰了)
执行python client.py,报错:AttributeError: 'str' object has no attribute 'items',我的环境是windows10+python3.6
用test_proxy测试时候发现,用一个https的网址来测试,本机的ip被ban了,好像用requests的proxies的时候,只设置http代理的话,https的访问并不会走代理。
我自己加上了一句话才好用
proxy['https'] = proxy['http']
ok 改好了
Originally posted by @chenjiandongx in #2 (comment)
您好,我再次遇到了这个问题,然后手动修改了 requirements 文件中 redis 的版本号,重新安装了一遍 redis 库才好,请修复之
[2018-11-27 21:53:37 +0800] [5347] [INFO] Goin' Fast @ http://localhost:3289
[2018-11-27 21:53:37 +0800] [5347] [ERROR] Unable to start server
Traceback (most recent call last):
File "uvloop/loop.pyx", line 1106, in uvloop.loop.Loop._create_server
File "uvloop/handles/tcp.pyx", line 88, in uvloop.loop.TCPServer.bind
File "uvloop/handles/streamserver.pyx", line 89, in uvloop.loop.UVStreamServer ._fatal_error
File "uvloop/handles/tcp.pyx", line 86, in uvloop.loop.TCPServer.bind
File "uvloop/handles/tcp.pyx", line 26, in uvloop.loop.__tcp_bind
OSError: [Errno 99] Cannot assign requested address
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/sanic/server.py", line 612, in se rve
http_server = loop.run_until_complete(server_coroutine)
File "uvloop/loop.pyx", line 1446, in uvloop.loop.Loop.run_until_complete
File "uvloop/loop.pyx", line 1667, in create_server
File "uvloop/loop.pyx", line 1111, in uvloop.loop.Loop._create_server
OSError: [Errno 99] error while attempting to bind on address ('::1', 3289, 0, 0 ): cannot assign requested address
[2018-11-27 21:53:37 +0800] [5347] [INFO] Server Stopped
[root@yj007 async-proxy-pool]# python3 server.py
[2018-11-27 21:53:47 +0800] [5355] [INFO] Goin' Fast @ http://localhost:3289
[2018-11-27 21:53:47 +0800] [5355] [ERROR] Unable to start server
Traceback (most recent call last):
File "uvloop/loop.pyx", line 1106, in uvloop.loop.Loop._create_server
File "uvloop/handles/tcp.pyx", line 88, in uvloop.loop.TCPServer.bind
File "uvloop/handles/streamserver.pyx", line 89, in uvloop.loop.UVStreamServer ._fatal_error
File "uvloop/handles/tcp.pyx", line 86, in uvloop.loop.TCPServer.bind
File "uvloop/handles/tcp.pyx", line 26, in uvloop.loop.__tcp_bind
OSError: [Errno 99] Cannot assign requested address
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/sanic/server.py", line 612, in se rve
http_server = loop.run_until_complete(server_coroutine)
File "uvloop/loop.pyx", line 1446, in uvloop.loop.Loop.run_until_complete
File "uvloop/loop.pyx", line 1667, in create_server
File "uvloop/loop.pyx", line 1111, in uvloop.loop.Loop._create_server
OSError: [Errno 99] error while attempting to bind on address ('::1', 3289, 0, 0 ): cannot assign requested address
[2018-11-27 21:53:47 +0800] [5355] [INFO] Server Stopped
$ python3 client.py
2018-11-21 09:57:07,169 - Crawler working...
Traceback (most recent call last):
File "client.py", line 7, in
run_schedule()
File "/mnt/d/repos/async-proxy-pool/async_proxy_pool/scheduler.py", line 20, in run_schedule
schedule.every(CRAWLER_RUN_CYCLE).minutes.do(crawler.run).run()
File "/home/venyo/.local/lib/python3.5/site-packages/schedule/init.py", line 411, in run
ret = self.job_func()
File "/mnt/d/repos/async-proxy-pool/async_proxy_pool/crawler.py", line 35, in run
redis_conn.add_proxy(proxy)
File "/mnt/d/repos/async-proxy-pool/async_proxy_pool/database.py", line 47, in add_proxy
self.redis.zadd(REDIS_KEY, proxy, score)
File "/home/venyo/.local/lib/python3.5/site-packages/redis/client.py", line 2263, in zadd
for pair in iteritems(mapping):
File "/home/venyo/.local/lib/python3.5/site-packages/redis/_compat.py", line 123, in iteritems
return iter(x.items())
AttributeError: 'str' object has no attribute 'items'
换句话说,只能校验http的ip?
Traceback (most recent call last):
File "D:/Program Files/PycharmProjects/untitled/async-proxy-pool/client.py", line 8, in
run_schedule()
File "D:\Program Files\PycharmProjects\untitled\async-proxy-pool\async_proxy_pool\scheduler.py", line 20, in run_schedule
schedule.every(CRAWLER_RUN_CYCLE).minutes.do(crawler.run).run()
File "D:\Program Files\PycharmProjects\untitled\venv\lib\site-packages\schedule_init_.py", line 466, in run
ret = self.job_func()
File "D:\Program Files\PycharmProjects\untitled\async-proxy-pool\async_proxy_pool\crawler.py", line 34, in run
redis_conn.add_proxy(proxy)
File "D:\Program Files\PycharmProjects\untitled\async-proxy-pool\async_proxy_pool\database.py", line 45, in add_proxy
self.redis.zadd(REDIS_KEY, proxy, score)
File "D:\Program Files\PycharmProjects\untitled\venv\lib\site-packages\redis\client.py", line 2320, in zadd
for pair in iteritems(mapping):
File "D:\Program Files\PycharmProjects\untitled\venv\lib\site-packages\redis_compat.py", line 109, in iteritems
return iter(x.items())
AttributeError: 'str' object has no attribute 'items'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.