Giter Site home page Giter Site logo

scrapy-amazon's Introduction

==========================

scrapy-amazon(亚马逊爬虫)

基于scrapy的亚马逊的爬虫

默认python3环境,python2未测试

  • 默认抓取手机版亚马逊
  • 默认采集亚马逊指定关键词所有商品
  • 采集属性包括商品名称、链接、图片地址、ASIN、商品描述、评论等等
  • 爬取到的数据存储到MongoDB数据库

未开源版本新增功能

  • 支持采集指定不同国家的亚马逊(美国亚马逊、日本亚马逊等等)
  • 支持指定代理IP访问,减少亚马逊Robot Check几率
  • 支持采集、发布日志保存到文件,方便查询
  • 接入百度翻译、有道翻译、腾讯翻译,自定义语种实现伪原创
  • 支持采集到的数据清洗伪原创一键发布到wordpress(带特色图片)
  • 支持发布去重、减少网站被K几率

注意:建议自行指定IP池,随机更换User-Agent,防止被封

截图展示

数据展示

联系作者

QQ1498066696,不常回复,欢迎直接issue

scrapy-amazon's People

Contributors

ofzfzs avatar

Stargazers

Geoffrey avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.