Giter Site home page Giter Site logo

girlcrawler's Introduction

#girlCrawler

###一个针对网站http://www.girl13.com上图片的爬取工具,具有以下功能和特性:

  • 爬取到网站上所有主体下的图片列表
  • 在本地建立与各主题对应的文件夹
  • 将爬取到的图片下载到本地对应主题的文件夹下
  • 多次运行工程能够检测图片文件是否已经存在,如存在则不再下载,只下载新的图片,节省流量

###girlCrawler主要是建立在以下依赖库之上的:

  • Node.js - 应用服务器
  • cheerio - 为服务器特别定制的,快速、灵活、实施的jQuery核心实现

###安装和启动

  1. 安装Node.js.

  2. 将整个工程clone到本地.

     >git clone https://github.com/xuelangcxy/girlCrawler.git
    
  3. 在工程的根目录下启动主文件

     >node girl.js
    

###尚存在的问题

  1. 运行该工程时存在中途中断下载的情况,可以直接按Ctrl+c以终止运行并尝试再次启动工程.
  2. 下载完成后可能存在某些图片不能查看,图片大小为0,可以将此类图片文件删除并尝试再次运行工程.
  3. 再次运行工程不会重复下载已存在的文件.

###温馨提示:

由于图片数量较大,经测试大小大概在350-400MB,请下载前酌情考虑

girlcrawler's People

Contributors

xuelangcxy avatar

Watchers

James Cloos avatar rooterFromNowOn avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.