Giter Site home page Giter Site logo

butian_urls's Introduction

butian_urls

20200619爬取的补天公益src厂商列表(厂商名、域名或者ip)

过程中遇到的主要问题就是发现补天好像对爬虫出新策略了?访问频率过快的话,server端会回复一段混淆处理过的JS代码让client端执行并返回执行结果。 原理大概就是client如果是浏览器的话自然就解析了JS并发送验证信息,但一般代码处理server回包无法自动解释执行JS,这样就区分了浏览器和爬虫代码。

网上能找到相应的解决办法:https://blog.csdn.net/qq_36783371/article/details/90760914 当然,,,,也能

time.sleep(xxx)..........

排除超时和异常的项,结果集总共爬到4919项,如下:

数据样例

butian_urls's People

Contributors

cynthrial avatar

Stargazers

 avatar upload avatar  avatar wk avatar  avatar gt4404gb avatar Abell avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.