Giter Site home page Giter Site logo

jdbee's Introduction

JdBee

使用jsoup抓取京东数据

只用于学习交流,私自用于其他途径,后果自负!!!

目前只抓取零食相关的数据,现在就只需要零食相关的数据,其他后续再议!

抓取零食相关的目的就是为了这个vipsnacks项目的后续开发。

项目需要

  • httpclient
  • jsoup
  • slf4j
  • selenium
  • phantomjs
  • WebCollector

更新日志

  • 初始化项目,完成一,二级类目的抓取 (2017-05-24)
  • 采用selenium获取页面数据,获取三,四,五级类目(2017-05-25)
  • 多线程并发爬取类目分页数据(2017-05-26)
  • 多线程爬取商品skuid(2017-05-28)

selenium这个爬取的速度太慢了,而且每次还要打开一个网页,抓取少量数据还可以用一用,多的话实在罩不住,近期在找别的方法爬取

  • 使用WebCollector+selenium+phantomjs爬取商品(2017-06-01只爬取一个类目测试)
  • 数据入库测试(2017-06-02)
  • 测试爬取一个小类目,爬取20万数据用时21分钟(2017-06-03)
  • 数据正常入库,爬取数据285330条(2017-06-04)
  • 优化获取商品代码,从获取一页要19664毫秒,优化到现在获取一页商品要7000毫秒左右,(2017-06-07)

觉得不错的朋友可以点下star,watch,fork也算是对我的鼓励了。

jdbee's People

Contributors

handexing avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.