Giter Site home page Giter Site logo

kekewind / 1688-selenium-spider Goto Github PK

View Code? Open in Web Editor NEW

This project forked from resphinas/1688-selenium-spider

0.0 0.0 0.0 11.4 MB

1688爬虫,通过搜索关键词采用selenium爬取指定页数的商品信息。喜欢的话点个星星(^U^)ノ~YOThe 1688 crawler uses selenium to crawl a specified number of pages of product information by searching for keywords.

Python 100.00%

1688-selenium-spider's Introduction

1688爬虫(基于selenium)

阿里巴巴爬虫 通过搜索关键词采用selenium+selenium日志hook(亮点:自行查找相关功能,实现所有请求包括ajax动态请求的监听)爬取指定页数的商品信息,包括公司名,五项评分,综合评分,价格,所有宝贝图的图片,以及产品的规格,尺寸暂时没写,不足之处:验证码,已经写了ip切换的功能,暂时没找到合适的ip池,需要的自行根据代码将注释取消启用,并且修改ip.txt的内容即可,ip通过http https 地址+端口直连的方式连接。在爬取频繁之后,一般是十个商品会出现一次验证码,目前采用的方式是在电脑人工切换ip刷新页面。数量不多的话影响不大。

""" author:wes; createtime:2022.03.22

项目概述: 爬取1688关键词名称和对应的产品数量,单页60个产品的数据。 对于每个产品: A.标题(以及红标题)和链接 B.五项评分评分和综合评分 C.复购率 D.成交额 E.价格 F.企业名称 G.页面链接 """ """ author:wes; updatetime:2022.04.22 第三版更新说明: 优化了程序,使之能完整的运行 未来可以继续优化的步骤:效率高于扫码登录的更优方式,ip验证问题(遇到ip验证需要人力解决)

""" """ 第二版更新说明: 修改了退换体验分数为空时存表为-1的错误 修改了成交额显示错误的问题 美化了下代码

"""

1.项目名称:1688.com 的关键词商品信息爬虫

2.需求分析

A.分析商品页ajax链接(下的)存储到{keyword}_{sort_type}.csv 中 (此功能在py爬虫文件均自动重新生成)

3.主要代码实现

4.其他描述: A.直接运行主文件

5.测试: cookies容易失效,后续考虑多账号轮番登录,登录暂时需要人工

配置:略 安装包: 目录下 requirements.txt文件 打开当前目录的dos窗口 输入 pip install -r requirements.txt

本代码仅供个人参考交流探讨更优方案等。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.