Giter Site home page Giter Site logo

ipproxy's Introduction

IPProxy

爬虫所需要的IP代理,抓取八个网站的代理IP检测/清洗/入库/更新,添加调用接口



目前只在win10 64位机,python3.5 / ubuntu server 16.04.1 LTS 64位 ,python 3.5下测试通过
不同配置的机器, 请在Config.py中修改最大线程数。详情可以看下面Config.py部分

如何使用

查看demo.py

Util.Refresh():数据库和新的数据需要主动调用此函数更新

Util.Get():调用可获取一条可用的代理,Util.Get()返回的代理:
{'http': 'http://115.159.152.130:81', 'https': 'https://115.159.152.130:81'}
requests可以直接使用:requests.get(url,proxies=Util.Get(),headers={})

Config.py 部分:

设置最大线程数量限制,MaxThreads。如果说,我的电脑配置很低,那么设置16,32慢慢跑;如果对你的电脑贼自信,我电脑牛X啊,i7 志强,又是什么N多G内存,网络带宽贼6,那么你可以设置1024。
如果你还有代理网站可以添加,请添加在Url_Regular字典中。
代理IP网址和对应的正则式,正则式一定要IP和Port分开获取,例如[(192.168.1.1, 80), (192.168.1.1, 90),]
只抓取首页,想要抓取首页以后页面的可以将链接和正则式贴上来,例如,将某网站的1、2、……页的链接和对应的正则式分别添加到Url_Regular字典中。
添加正则式之前请先在 站长工具-正则表达式在线测试 测试通过后添加


数据来源:

http://www.kuaidaili.com/free/
http://www.66ip.cn/
http://www.xicidaili.com/nn/
http://www.ip3366.net/free/
http://www.proxy360.cn/Region/China
http://www.mimiip.com/
http://www.data5u.com/free/index.shtml
http://www.ip181.com/
http://www.kxdaili.com/
欢迎添加你知道的代理网站,大家资源共享

逻辑结构:



欢迎issue和pull,代码渣渣,大神轻喷

ipproxy's People

Contributors

zkeeer avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.