Giter Site home page Giter Site logo

googlecommonspider's Introduction

谷歌图片通用爬虫

安装

需要python 3.6

Download python-3.6 from python.org

使用pip安装selenium, requests

pip3 install selenium requests

下载chrome驱动

Download ChromeDriver from chromium.org

使用

chromedriver的处理

请将下载好的chrome驱动程序放在与main.py的同级目录下,对于windows开发者,请保证chrome驱动程序的文件名为chromedriver.exe。对于其他系统的开发者,在保证驱动程序和脚本在同级目录的前提下,修改main.py第37行,将“chromedriver.exe”修改为您下载好相应的chromedriver的文件名。

配置下载网址

本爬虫是基于google图片搜索而构建的爬虫,因此先在google中输入要查找的图片关键字,例如“蔬菜”。点击图片分类,google会跳转到“蔬菜”关键字的图片搜索页面,接着复制当前网址,在main.py第40行的url_list的列表中删除默认的两个dict(这两个是配置demo示范,分别爬取的是“蔬菜”和“不新鲜的蔬菜”,需要删去换成自己需要爬取的网址),将刚复制的网址粘贴上去。下面是具体参数的解释:

{
    "url": "你要爬取的网址,需要粘贴上去的内容。",
    "dir": "爬取结果图片保存的文件夹,例如示例中写的是fresh,则结果就会保存在result/fresh下"
}

配置代理

由于在国内,又是针对google图片进行爬取,代理必不可少。main.py的第18行是requests库需要的代理,main.py的26行是selenium需要的代理,请根据本机情况自行填写代理地址。

运行

以上全部配置完成即可运行,结果会保存在与main.py的同级目录下的 “result/您配置的保存文件夹”中。

运行:

python main.py

依赖

selenium github

requests github

license

MIT

googlecommonspider's People

Contributors

vaskka avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.