ι«ε―η¨δ»£ηIPζ± ι«εΉΆεη¬θ« δΈεεηεεεεη³»η»
Highly Available Proxy IP Pool, Highly Concurrent Spider, Uneven Pressure Distribution System
Highly Available Proxy IP Pool
- By obtaining data from Gatherproxy, Goubanjia, xici etc. Free Proxy WebSite
- Analysis the Goubanjia port data
- Quickly verify IP availability
- Cooperate with Requests to automatically assign proxy Ip, with Retry mechanism, fail to write DB mechanism
Netease
- classify -> playlist id -> song_detail
- V1 Write file, One run version, no proxy, no record progress mechanism
- V1.5 Small amount of proxy IP
- V2 Proxy IP pool, Record progress, Write to MySQL
- Optimize the write to DB
Load data/ Replace INTO
Press Test
- By highly available proxy IP pool to pretend user.
- Give some web service uneven pressure
- To do: press uniform
$ git clone https://github.com/iofu728/spider.git
$ cd spider
$ ipython
# netease spider
$ import netease.netease_music_db
$ xxx = netease.netease_music_db.Get_playlist_song()
# press
$ import press.press
$ xxx = press.press.Press_test()
$ xxx.one_press_attack(url, host, qps, types, total)
.
βββ LICENSE // LICENSE
βββ README.md // README
βββ log // failured log
βββ netease
βΒ Β βββ netease_music_base.py // v1 spider
βΒ Β βββ netease_music_db.py // v2 spider
βΒ Β βββ result.txt // result
βΒ Β βββ table.sql // netease sql
βββ press
βΒ Β βββ press.py // press
βββ proxy
βΒ Β βββ gatherproxy // gatherproxy data
βΒ Β βββ getproxy.py // proxy pool
βΒ Β βββ table.sql // proxy sql
βββ song_detail
βββ utils
βββ db.py // db operation
βββ utils.py // requests operation