Giter Site home page Giter Site logo

qai41 / spider-press Goto Github PK

View Code? Open in Web Editor NEW

This project forked from iofu728/spider

0.0 2.0 0.0 2.02 MB

🐳high availability ip proxy solutionβ›°Uneven Pressure Distribution SystemπŸ•·some website spider

License: MIT License

Python 100.00%

spider-press's Introduction

Spider logo

Spider Press Man

GitHub GitHub tag GitHub code size in bytes

ι«˜ε―η”¨δ»£η†IPζ±  ι«˜εΉΆε‘ηˆ¬θ™« δΈε‡εŒ€ηš„εŽ‹εŠ›εˆ†ε‘η³»η»Ÿ
Highly Available Proxy IP Pool, Highly Concurrent Spider, Uneven Pressure Distribution System

Key

  • Highly Available Proxy IP Pool
    • By obtaining data from Gatherproxy, Goubanjia, xici etc. Free Proxy WebSite
    • Analysis the Goubanjia port data
    • Quickly verify IP availability
    • Cooperate with Requests to automatically assign proxy Ip, with Retry mechanism, fail to write DB mechanism
  • Netease
    • classify -> playlist id -> song_detail
    • V1 Write file, One run version, no proxy, no record progress mechanism
    • V1.5 Small amount of proxy IP
    • V2 Proxy IP pool, Record progress, Write to MySQL
      • Optimize the write to DB Load data/ Replace INTO
  • Press Test
    • By highly available proxy IP pool to pretend user.
    • Give some web service uneven pressure
    • To do: press uniform

Development

$ git clone https://github.com/iofu728/spider.git
$ cd spider
$ ipython
# netease spider
$ import netease.netease_music_db
$ xxx = netease.netease_music_db.Get_playlist_song()
# press
$ import press.press
$ xxx = press.press.Press_test()
$ xxx.one_press_attack(url, host, qps, types, total)

Structure

.
β”œβ”€β”€ LICENSE                        // LICENSE
β”œβ”€β”€ README.md                      // README
β”œβ”€β”€ log                           // failured log
β”œβ”€β”€ netease
β”‚Β Β  β”œβ”€β”€ netease_music_base.py      // v1 spider
β”‚Β Β  β”œβ”€β”€ netease_music_db.py        // v2 spider
β”‚Β Β  β”œβ”€β”€ result.txt                 // result
β”‚Β Β  └── table.sql                  // netease sql
β”œβ”€β”€ press
β”‚Β Β  └── press.py                   // press
β”œβ”€β”€ proxy
β”‚Β Β  β”œβ”€β”€ gatherproxy                // gatherproxy data
β”‚Β Β  β”œβ”€β”€ getproxy.py                // proxy pool
β”‚Β Β  └── table.sql                  // proxy sql
β”œβ”€β”€ song_detail
└── utils
    β”œβ”€β”€ db.py                      // db operation
    └── utils.py                   // requests operation

Design document

spider-press's People

Contributors

iofu728 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.