Giter Site home page Giter Site logo

nidhoggurz / 1024_dagaier_spider Goto Github PK

View Code? Open in Web Editor NEW

This project forked from anguslkc/1024_dagaier_spider

0.0 1.0 0.0 8.5 MB

爬取草榴论坛达盖尔的旗帜分类下第1-10页主题的图片,8线程,需自备梯子,跨平台。小撸怡情,大撸伤身,强撸灰飞烟灭。

Python 100.00%

1024_dagaier_spider's Introduction

caoliu_1024_dagaier_spider

爬取草榴论坛"达盖尔的旗帜"分类下的主题图片

https://raw.githubusercontent.com/cary-zhou/caoliu_1024_dagaier_spider/master/dagaier.zip

运行:

linux:
python ./达盖尔.py
or
windows:
python .\达盖尔.py

环境准备:

windows:

pip install pyquery
pip install -U requests[socks]==2.12.0

linux:

pip install pyquery
pip install requests[socks]

修改参数:

请修改为自己SS or SSR监听的端口
proxy={"http":"socks5://127.0.0.1:1088","https":"socks5://127.0.0.1:1088"}

请合理设置线程数
work_manager=ThreadManager(8)

请修改需要爬取的主题分页数
while offset<10: #主题列表分页数

预编译二进制:

压缩包:dagaier.zip,是windows下直接可双击执行的exe文件,
使用时需要解压exe可执行文件出来,不要在zip压缩管理器内直接双击执行,免得爬虫运行完了找不到肉。
然后启动你的SSR代理->选项设置->本地端口,填1088,因为程序内设置死了通过本地socks5://127.0.0.1:1088爬梯。
如图:
image
爬取到的资源放在exe同级目录的images文件夹下,每个帖子每个文件夹分开存放,文件夹名就是帖子标题名。
程序偶尔会报一些404警告,这个很正常,DEBUG日志输出而已。

1024_dagaier_spider's People

Contributors

anguslkc avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.