karmenzind / fp-server Goto Github PK
View Code? Open in Web Editor NEWFree proxy server, continuously crawling and providing proxies, based on Tornado and Scrapy. 免费代理服务器,基于Tornado和Scrapy,在本地搭建属于自己的代理池
License: MIT License
Free proxy server, continuously crawling and providing proxies, based on Tornado and Scrapy. 免费代理服务器,基于Tornado和Scrapy,在本地搭建属于自己的代理池
License: MIT License
Your operating system and Python version?
Have you followed the manual and FAQ?
Paste your console output here. It will be great if you can provide a log file.
经测试,IP海已经停止服务(访问生么页面都是404/500)
希望修改一下站点列表
老大,改一下在windows下运行吧,不要用Docker,实在太卡
[页面执行时间:7.3968110084534 秒]`
PROXY_STORE_NUM: 50
fp-server_11111.log
阿里云 centos7,redis4.0.6
Linux version 3.10.0-693.2.2.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Sep 12 22:26:13 UTC 2017
晚上用fp_server跑了个guzzle爬虫,采集的时候有许多都是无效ip
我甚至把config中的个数拔高到5000个,定时从3600改到300.
4000多个真正可用的不到10%
后来,我用另一个http://123.207.35.36:5010/get/的ip直接测试,准确率到80%.
这个说明我们的检测是不是不够到位.
一直抓取不到代理
/api/status/
{"code": 0, "msg": "success", "data": {"spiders": [{"status": "stopped", "name": "coderbusy", "last_start_time": "1533775630"}, {"status": "stopped", "name": "kuaidaili", "last_start_time": "1533775630"}, {"status": "stopped", "name": "mix", "last_start_time": "1533775630"}, {"status": "stopped", "name": "data5u", "last_start_time": "1533775630"}, {"status": "stopped", "name": "xicidaili", "last_start_time": "1533775079"}, {"status": "stopped", "name": "checker", "last_start_time": "1533776171"}, {"status": "stopped", "name": "coolproxy", "last_start_time": "1533775079"}, {"status": "stopped", "name": "3464", "last_start_time": "1533775630"}, {"status": "stopped", "name": "yundaili", "last_start_time": "1533775630"}, {"status": "stopped", "name": "ip66", "last_start_time": "1533775630"}], "proxies": {"total": 0, "detail": {"http": 0, "https": 0, "transparent": 0, "anonymous": 0}}}}
/api/spider/run_all/
{"code": 500, "msg": "\u670d\u52a1\u5668\u5185\u90e8\u9519\u8bef", "data": {}}
显示:服务器内部错误
环境:Debian 9 x64 (stretch),python3.6.5
函数recuresive_update
old_value为字符串的时候, value为list会报错。value为tuple时,old_value字符串转成list失去了本来的意义了
main.py应该是能运行
api也能访问,但是获取的 内柔就是count为0 是main.py无法存储数据到redis的原因吗
File "/usr/local/lib/python3.7/site-packages/scrapy/middleware.py", line 34, in from_settings
mwcls = load_object(clspath)
File "/usr/local/lib/python3.7/site-packages/scrapy/utils/misc.py", line 44, in load_object
mod = import_module(module)
File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/usr/local/lib/python3.7/site-packages/scrapy/extensions/telnet.py", line 12, in <module>
from twisted.conch import manhole, telnet
File "/usr/local/lib/python3.7/site-packages/twisted/conch/manhole.py", line 154
def write(self, data, async=False):
^
SyntaxError: invalid syntax
"url": "https://114.226.128.6:6666", "speed": "643.5837540626526"
像这样的代理 响应速度太慢了,后面使用的时候 还需要自己筛选 觉得不够人性化
可以在配置文件里面设置一个值响应速度的值 抓取的时候 如果响应速度在小于这个值 则保存下来
很占用资源,阿里云乞丐版服务器跑了一下,直接卡死服务器。还是放弃了。
Your operating system and Python version?
centos7
版本为python34-->python36
成功用nginx + 代码 部署在服务器上面
其中遇到坑就是需要安装python36-devel 安装python-redis插件需要这个 。
感谢大佬
还有就是能不能在返回的字段加入返回的延时啊
还有有个代理叫31代理 网址是http://31f.cn/
/api/proxy/
{"code": 0, "msg": "success", "data": {"count": 1, "detail": [{"ip": "91.196.39.196", "scheme": "https", "port": "32585", "need_auth": "0", "url": "https://91.196.39.196:32585", "anonymity": "anonymous"}]}}
可否改为或加个简单版如/api/proxy/simple/
{"code": 0, "msg": "success", "data": {"count": 1, "detail": [{"scheme": "https", "ip": "91.196.39.196:32585"}]}}
或更加干脆的直接
重启后会ip池就会清空了,不知道什么情况,测了几回都是一样的.
运行了大概1天后,IP的质量变[图片]得很差了,成功率也很低
[图片]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.