Giter Site home page Giter Site logo

kangvcar / infospider Goto Github PK

View Code? Open in Web Editor NEW
7.1K 178.0 1.4K 41.32 MB

INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰,旨在安全快捷的帮助用户拿回自己的数据,工具代码开源,流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、**移动、**联通、**电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源**博客、简书。

Home Page: https://infospider.vercel.app

License: GNU General Public License v3.0

Python 66.67% Shell 0.06% Jupyter Notebook 2.73% HTML 14.20% CSS 0.22% JavaScript 16.12%
python3 crawl spider selenium wxpython tkinter automation hotmail chrome csdn

infospider's Introduction

InfoSpider logo


GitHub stars UW2eVx.png UW2eVx.png UW2eVx.png GitHub repo size GitHub repo size

一个神奇的工具箱,拿回你的个人信息。

👉⚡使用说明 ⚡| 视频演示 | English | 获取最新维护版本 | TG交流群

🗣️ TG交流群:加入群组

开发者回忆录

点击展开👉 开发者回忆录

场景一

小明一如往常打开 Chrome 浏览器逛着论坛,贴吧,一不小心点开了网页上的广告,跳转到了京东商城,下意识去关闭窗口时发现 (OS:咦?京东怎么知道我最近心心念念的宝贝呢?刚好我正需要呢!),既然打开了那就看看商品详情吧 (OS:哎哟不错哦),那就下单试试吧!

场景二

小白听着网易云音乐的每日推荐歌单无法自拔 (OS:哇!怎么播放列表里都是我喜欢的音乐风格?网易云音乐太棒了吧!深得我心啊!黑胶会员必须来一个!),逛着知乎里的“如何优雅的XXX?”,“XXX是怎样一种体验?”,“如何评价XXX?” (OS:咦?这个问题就是我刚好想问的,原来早已有人提问!什么???还有几千条回答!!进去逛逛看!

场景三

小达上班时不忘充实自己,逛着各大技术论坛博客园、CSDN、开源**、简书、掘金等等,发现首页的内容推荐太棒了(OS:这些技术博文太棒了,不用找就出来了),再打开自己的博客主页发现不知不觉地自己也坚持写博文也有三年了,自己的技术栈也越来越丰富(OS:怎么博客后台都不提供一个数据分析系统呢?我想看看我这几年来的发文数量,发文时间,想知道哪些博文比较热门,想看看我在哪些技术上花费的时间更多,想看看我过去的创作高峰期时在晚上呢?还是凌晨?我希望系统能给我更多指引数据让我更好的创作!

看到以上几个场景你可能会感叹科技在进步,技术在发展,极大地改善了我们的生活方式。

但当你深入思考,你浏览的每个网站,注册的每个网站,他们都记录着你的信息你的足迹。

细思恐极的背后是自己的个人数据被赤裸裸的暴露在互联网上并且被众多的公司利用用户数据获得巨额利益,如对用户的数据收集分析后进行定制的广告推送,收取高额广告费。但作为数据的生产者却没能分享属于自己的数据收益。

想法

如果有一个这样的工具,它能帮你拿回你的个人信息,它能帮你把分散在各种站点的个人信息聚合起来,它能帮你分析你的个人数据并给你提供建议,它能帮你把个人数据可视化让你更清楚地了解自己。

你是否会需要这样的工具呢? 你是否会喜欢这样的工具呢?

基于以上,我着手开发了 INFO-SPIDER 👇👇👇

What is INFO-SPIDER

INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱,旨在安全快捷的帮助用户拿回自己的数据,工具代码开源,流程透明。并提供数据分析功能,基于用户数据生成图表文件,使得用户更直观、深入了解自己的信息。 目前支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、**移动、**联通、**电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源**博客、简书。

详细使用说明参照使用说明文档视频教程

你可以在 Gitter 与我们一起交流学习

Features

  • 安全可靠:本项目为开源项目,代码简洁,所有源码可见,本地运行,安全可靠。
  • 使用简单:提供 GUI 界面,只需点击所需获取的数据源并根据提示操作即可。
  • 结构清晰:本项目的所有数据源相互独立,可移植性高,所有爬虫脚本在项目的 Spiders 文件下
  • 数据源丰富:本项目目前支持多达24+个数据源,持续更新。
  • 数据格式统一:爬取的所有数据都将存储为json格式,方便后期数据分析。
  • 个人数据丰富:本项目将尽可能多地为你爬取个人数据,后期数据处理可根据需要删减。
  • 数据分析:本项目提供个人数据的可视化分析,目前仅部分支持。
  • 文档丰富:本项目包含完整全面的使用说明文档视频教程

Screenshot

screenshot.png

QuickStart

依赖安装

  1. 安装python3和Chrome浏览器

  2. 安装与Chrome浏览器相同版本的驱动

  3. 安装依赖库 pip install -r requirements.txt

如果您在这一步操作遇到问题,可以获取免安装版InfoSpider

工具运行

  1. 进入 tools 目录

  2. 运行 python3 main.py

  3. 在打开的窗口点击数据源按钮, 根据提示选择数据保存路径

  4. 弹出的浏览器输入用户密码后会自动开始爬取数据, 爬取完成浏览器会自动关闭.

  5. 在对应的目录下可以查看下载下来的数据(xxx.json), 数据分析图表(xxx.html)

购买服务

限量发售中...去看看

  1. InfoSpider 最新维护版本
  2. 更全面的个人数据分析
  3. 免去安装程序的所有依赖环境,便捷,适合小白
  4. 已打包好的程序,双击即可运行程序
  5. 手把手教你如何打包 InfoSpider
  6. 开发者一对一技术支持
  7. 购买后即可免费获得即将发布的全新2.0版本

wechat
购买链接

数据源

  • GitHub
  • QQ邮箱
  • 网易邮箱
  • 阿里邮箱
  • 新浪邮箱
  • Hotmail邮箱
  • Outlook邮箱
  • 京东
  • 淘宝
  • 支付宝
  • **移动
  • **联通
  • **电信
  • 知乎
  • 哔哩哔哩
  • 网易云音乐
  • QQ好友(cjh0613)
  • QQ群(cjh0613)
  • 生成朋友圈相册
  • 浏览器浏览历史
  • 12306
  • 博客园
  • CSDN博客
  • 开源**博客
  • 简书

数据分析

  • 博客园
  • CSDN博客
  • 开源**博客
  • 简书

计划

  • 提供web界面操作,适应多平台
  • 对爬取的个人数据进行统计分析
  • 融合机器学习技术、自然语言处理技术等对数据深入分析
  • 把分析结果绘制图表直观展示
  • 添加更多数据源...

Visitors

Developers want to say

  1. 该项目解决了个人数据分散在各种各样的公司之间,经常形成数据孤岛,多维数据无法融合的痛点。
  2. 作者认为该项目的最大潜力在于能把多维数据进行融合并对个人数据进行分析,是个人数据效益最大化。
  3. 该项目使用爬虫手段获取数据,所以程序存在时效问题(需要持续维护,根据网站的更新做出修改)。
  4. 该项目的结构清晰,所有数据源相互独立,可移植性高,所有爬虫脚本在项目的Spiders文件下,可移植到你的程序中。
  5. 目前该项目v1.0版本仅在Windows平台上测试,Python 3.7,未适配多平台。
  6. 计划在v2.0版本对项目进行重构,提供web端操作与数据可视化,以适配多平台。
  7. 本项目INFO-SPIDER代码已开源,欢迎star支持。

Contributors

Sponsors

Thank you to JetBrains, who provide Open Source License for PyCharm!

本仓库将不定期更新,如需获取最新维护版本,请购买支持!谢谢!

Changelog

点击展开 Changelog
  • 2020年7月10日

    1. 更新GUI布局
    2. 添加GitHub、QQ好友、QQ群数据源
  • 2020年7月12日

    1. 修复QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail、Outlook数据源
    2. 添加生成朋友圈相册功能
  • 2020年7月14日

    1. 修复京东、淘宝、支付宝、12306数据源
    2. 添加Chrome浏览记录功能
  • 2020年7月17日

    1. 修复**移动、**联通数据源
    2. 添加知乎、哔哩哔哩、网易云音乐数据源
  • 2020年7月19日

    1. 添加博客园、CSDN、开源**、简书数据源
    2. 编写使用说明文档
    3. 录制使用视频教程
  • 2020年7月30日

    1. 添加博客园数据分析功能
    2. 使用pyechart绘制图表并生成html文件保存在数据目录下
  • 2020年8月18日

    1. 修复部分bug
    2. 更新README.md
  • 2020年9月12日

    1. 更换项目Logo
  • 2020年10月20日

    1. 更新所有爬虫脚本
    2. 制作Python-embed版InfoSpider
    3. 更新logo
  • 2020年11月29日

    1. 更新爬虫脚本

License

GPL-3.0

Star History

Star History Chart

infospider's People

Contributors

0ctl0 avatar charleshua666 avatar dependabot[bot] avatar hzherrr avatar hzqmwne avatar kangvcar avatar mydatahomes avatar wuzhisheng avatar yarnauy avatar zhouhaocheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

infospider's Issues

挺不错的,但是为什么要用tk作为gui界面, 我很奇怪。

刚在公众号推送看到你的开源, 确实很不错, 很全也挺美观的, 但是作为一个开发来说,我觉得这个工具作用没那么高, 但是对应你的标题,收集你自己的个性信息,又非常合情合理了, 另外其实没必要用tkinter作为GUI界面的创造,tk有的时候会崩溃的,不是很好使,最好建议是开发一个web端口, 那就很好了,good !

淘宝和支付宝网站支持的不好,抛出异常

Bug Report

Description: [Description of the issue]
Traceback (most recent call last):
File "main.py", line 520, in OnClick
t = TaobaoSpider(cookie_list)
File "E:\my_work_spaces\pycharm\Self_learn_projs\Crawler_projs\InfoSpider-master./Spiders\taobao\spider.py", line 65, in init
self.path = askdirectory(title='选择信息保存文件夹')
File "G:\py37\lib\tkinter\filedialog.py", line 428, in askdirectory
return Directory(**options).show()
File "G:\py37\lib\tkinter\commondialog.py", line 39, in show
w = Frame(self.master)
File "G:\py37\lib\tkinter_init_.py", line 2744, in init
Widget.init(self, master, 'frame', cnf, {}, extra)
File "G:\py37\lib\tkinter_init_.py", line 2299, in init
(widgetName, self._w) + extra + self._options(cnf))
RuntimeError: main thread is not in main loop

拼多多有计划支持么?

Bug Report

Description: [Description of the issue]

Expected behavior: [What should happen]

Current behavior: [What happpens instead of the expected behavior]

Steps to Reproduce:

  1. [First Step]
  2. [Second Step]
  3. [and so on ¡­]

Reproduce how often: [What percentage of the time does it reproduce?]

Possible solution: [Not obligatory, but suggest a fix/reason for the bug]

Context (Environment):[The code version, python version, operating system or other software/libs you use]

Additional Information

[Any other useful information about the problem].

GITHUB

您好我有其他项目需要咨询 麻烦请加我的Q 3374835496 或许SKPYE live:.cid.b409052f6258136f
Hello, I have other projects to consult, please add my Q 3374835496 or SKPYE live:.cid.b409052f6258136f

没有微博数据源吗

Bug Report

Description: [Description of the issue]

Expected behavior: [What should happen]

Current behavior: [What happpens instead of the expected behavior]

Steps to Reproduce:

  1. [First Step]
  2. [Second Step]
  3. [and so on ¡­]

Reproduce how often: [What percentage of the time does it reproduce?]

Possible solution: [Not obligatory, but suggest a fix/reason for the bug]

Context (Environment):[The code version, python version, operating system or other software/libs you use]

Additional Information

[Any other useful information about the problem].

No module named 'Spiders'

运行main.py报错:
Traceback (most recent call last):
File "main.py", line 32, in
from Spiders.A12306 import main12306
ModuleNotFoundError: No module named 'Spiders'

在print(BASE_PATH)后增加一行sys.path.append(BASE_PATH)就能运行了

taobao 爬虫好像不成功 taobao_cookies.json需要更换吗

Bug Report

Description: [Description of the issue]

Expected behavior: [What should happen]

Current behavior: [What happpens instead of the expected behavior]

Steps to Reproduce:

  1. [First Step]
  2. [Second Step]
  3. [and so on ¡­]

Reproduce how often: [What percentage of the time does it reproduce?]

Possible solution: [Not obligatory, but suggest a fix/reason for the bug]

Context (Environment):[The code version, python version, operating system or other software/libs you use]

Additional Information

[Any other useful information about the problem].

我在使用中出现了这个报错

Traceback (most recent call last):
  File "D:\Working\Codes\InfoSpider\tools\main.py", line 34, in <module>
    from alipay.main import ASpider
ModuleNotFoundError: No module named 'alipay.main'

安装依赖时报错,临时解决办法

安装第一个依赖时报错:UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 2621: illegal multibyte sequence
解决办法,替换版本
matplotlib==3.2.0 为 matplotlib==3.6.0

应该是安装numpy的时候提示要安装c++ 14以上的
解决方法,先安装conda 然后
conda install libpython m2w64-toolchain -c msys2

没有微博数据源吗?

Bug Report

Description: [Description of the issue]

Expected behavior: [What should happen]

Current behavior: [What happpens instead of the expected behavior]

Steps to Reproduce:

  1. [First Step]
  2. [Second Step]
  3. [and so on ¡­]

Reproduce how often: [What percentage of the time does it reproduce?]

Possible solution: [Not obligatory, but suggest a fix/reason for the bug]

Context (Environment):[The code version, python version, operating system or other software/libs you use]

Additional Information

[Any other useful information about the problem].

知乎提示 请升级客户端后重试

Bug Report

Description: [Description of the issue]

{"id":"c9b28ce4b50bf0444d17d010224cb06f","url_token":"houziliaorenwu","name":"猴子","use_default_avatar":false,"avatar_url":"https://pic1.zhimg.com/v2-12ef91a3f1e91e70bd3480d755e058b1_l.jpg?source=32738c0c","avatar_url_template":"https://picx.zhimg.com/v2-12ef91a3f1e91e70bd3480d755e058b1.jpg?source=32738c0c","is_org":false,"type":"people","url":"https://www.zhihu.com/api/v4/people/houziliaorenwu","user_type":"people","headline":"公中号(猴子数据分析)著有畅销书《数据分析思维》 科普**专家","headline_render":"公中号(猴子数据分析)著有畅销书《数据分析思维》科普**专家","gender":1,"is_advertiser":false,"ip_info":"IP 属地北京","vip_info":{"is_vip":true,"vip_type":1,"rename_days":"60","widget":{"id":"13017","url":"https://pic1.zhimg.com/v2-06ff79935442c7b0b2de8bde3529de2a.jpg?source=88ceefae","night_mode_url":"https://pic1.zhimg.com/v2-7cb817a30db30272a00bc17450a2ea79.jpg?source=88ceefae"},"entrance_v2":null,"rename_frequency":3,"rename_await_days":0},"available_medals_count":0,"is_realname":true,"has_applying_column":false}

{
    "error": {
        "code": 10002,
        "message": "10002:\u8bf7\u6c42\u53c2\u6570\u5f02\u5e38\uff0c\u8bf7\u5347\u7ea7\u5ba2\u6237\u7aef\u540e\u91cd\u8bd5"
    }
}

{
    "error": {
        "code": 10002,
        "message": "10002:\u8bf7\u6c42\u53c2\u6570\u5f02\u5e38\uff0c\u8bf7\u5347\u7ea7\u5ba2\u6237\u7aef\u540e\u91cd\u8bd5"
    }
}

{
    "error": {
        "code": 10002,
        "message": "10002:\u8bf7\u6c42\u53c2\u6570\u5f02\u5e38\uff0c\u8bf7\u5347\u7ea7\u5ba2\u6237\u7aef\u540e\u91cd\u8bd5"
    }
}

<html><title>404: Not Found</title><body>404: Not Found</body></html>
{"error":{"message":"请求参数异常,请升级客户端后重试","code":10003}}

{"data": []}

请求商务推广合作

作者您好,我们也是一家专业做IP代理的服务商,极速HTTP,我们注册认证会送10000IP(可以帮助您的学者适当薅羊毛试用 :) 。想跟您谈谈是否能够达成商业推广上的合作。如果您,有意愿的话,可以联系我,微信:13982004324 谢谢(如果没有意愿的话,抱歉,打扰了)

Mac系统是不是不支持?

2020-08-27 11:04:51.534 Python[1657:25291] -[wxNSApplication _setup:]: unrecognized selector sent to instance 0x7fae66c372a0

这个不犯法吗?

Bug Report

Description: [Description of the issue]

Expected behavior: [What should happen]

Current behavior: [What happpens instead of the expected behavior]

Steps to Reproduce:

  1. [First Step]
  2. [Second Step]
  3. [and so on ¡­]

Reproduce how often: [What percentage of the time does it reproduce?]

Possible solution: [Not obligatory, but suggest a fix/reason for the bug]

Context (Environment):[The code version, python version, operating system or other software/libs you use]

Additional Information

[Any other useful information about the problem].

【更新建议】可以支持人人网吗

Bug Report

Description: [Description of the issue]

Expected behavior: [What should happen]

Current behavior: [What happpens instead of the expected behavior]

Steps to Reproduce:

  1. [First Step]
  2. [Second Step]
  3. [and so on ¡­]

Reproduce how often: [What percentage of the time does it reproduce?]

Possible solution: [Not obligatory, but suggest a fix/reason for the bug]

Context (Environment):[The code version, python version, operating system or other software/libs you use]

Additional Information

[Any other useful information about the problem].

how to get all user facebook id

Bug Report

Description: [Description of the issue]

Expected behavior: [What should happen]

Current behavior: [What happpens instead of the expected behavior]

Steps to Reproduce:

  1. [First Step]
  2. [Second Step]
  3. [and so on ¡­]

Reproduce how often: [What percentage of the time does it reproduce?]

Possible solution: [Not obligatory, but suggest a fix/reason for the bug]

Context (Environment):[The code version, python version, operating system or other software/libs you use]

Additional Information

[Any other useful information about the problem].

我用pip安装的时候,报这个错误

value:InfoSpider:% pip install -r requirements.txt                     <master>
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
ERROR: Could not find a version that satisfies the requirement matplotlib==3.2.0 (from -r requirements.txt (line 1)) (from versions: 0.86, 0.86.1, 0.86.2, 0.91.0, 0.91.1, 1.0.1, 1.1.0, 1.1.1, 1.2.0, 1.2.1, 1.3.0, 1.3.1, 1.4.0, 1.4.1rc1, 1.4.1, 1.4.2, 1.4.3, 1.5.0, 1.5.1, 1.5.2, 1.5.3, 2.0.0b1, 2.0.0b2, 2.0.0b3, 2.0.0b4, 2.0.0rc1, 2.0.0rc2, 2.0.0, 2.0.1, 2.0.2, 2.1.0rc1, 2.1.0, 2.1.1, 2.1.2, 2.2.0rc1, 2.2.0, 2.2.2, 2.2.3, 2.2.4, 2.2.5, 3.0.0rc2, 3.0.0, 3.0.1, 3.0.2, 3.0.3)
ERROR: No matching distribution found for matplotlib==3.2.0 (from -r requirements.txt (line 1))
value:InfoSpider:%                                                       <master>

在安装依赖时报错了

lib-3.2.0-cp38-cp38-win_amd64.whl
Downloading matplotlib-3.2.0-cp38-cp38-win_amd64.whl (9.2 MB)
|██████████▌ | 3.0 MB 4.7 kB/s eta 0:22:02ER
ROR: Exception:
Traceback (most recent call last):
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_vendor\urllib3\response.py", line 437, in _error_catcher
yield
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_vendor\urllib3\response.py", line 519, in read
data = self._fp.read(amt) if not fp_closed else b""
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_vendor\cachecontrol\filewrapper.py", line 62, in read
data = self.__fp.read(amt)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\http\c
lient.py", line 454, in read
n = self.readinto(b)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\http\c
lient.py", line 498, in readinto
n = self.fp.readinto(b)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\socket
.py", line 669, in readinto
return self._sock.recv_into(b)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\ssl.py
", line 1241, in recv_into
return self.read(nbytes, buffer)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\ssl.py
", line 1099, in read
return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\cli\base_command.py", line 228, in _main
status = self.run(options, args)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\cli\req_command.py", line 182, in wrapper
return func(self, options, args)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\commands\install.py", line 323, in run
requirement_set = resolver.resolve(
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\resolution\legacy\resolver.py", line 183, in resolve
discovered_reqs.extend(self._resolve_one(requirement_set, req))
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\resolution\legacy\resolver.py", line 388, in _resolve_one
abstract_dist = self._get_abstract_dist_for(req_to_install)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\resolution\legacy\resolver.py", line 340, in _get_abstract
_dist_for
abstract_dist = self.preparer.prepare_linked_requirement(req)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\operations\prepare.py", line 467, in prepare_linked_requir
ement
local_file = unpack_url(
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\operations\prepare.py", line 255, in unpack_url
file = get_http_url(
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\operations\prepare.py", line 129, in get_http_url
from_path, content_type = _download_http_url(
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\operations\prepare.py", line 282, in _download_http_url
for chunk in download.chunks:
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\cli\progress_bars.py", line 168, in iter
for x in it:
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\network\utils.py", line 64, in response_chunks
for chunk in response.raw.stream(
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_vendor\urllib3\response.py", line 576, in stream
data = self.read(amt=amt, decode_content=decode_content)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_vendor\urllib3\response.py", line 541, in read
raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\contex
tlib.py", line 131, in exit
self.gen.throw(type, value, traceback)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_vendor\urllib3\response.py", line 442, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files
.pythonhosted.org', port=443): Read timed out.

安装依赖

报错:

Using legacy 'setup.py install' for lxml, since package 'wheel' is not installed.
Installing collected packages: lxml, pyquery, certifi, chardet, idna, requests, Pillow, wxPython, pytz, pandas, future, pypng, pyqrcode, itchat, wxpy, soupsieve, beautifulsoup4
Running setup.py install for lxml ... error
ERROR: Command errored out with exit status 1:
command: 'd:\soft\python\python38\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\Administrator\AppData\Local\Temp\pip-install-grl1th1g\lxml\setup.py'"'"'; file='"'"'C:\Users\Administrator\AppData\Local\Temp\pip-install-grl1th1g\lxml\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\Administrator\AppData\Local\Temp\pip-record-ohs8ihq1\install-record.txt' --single-version-externally-managed --compile --install-headers 'd:\soft\python\python38\Include\lxml'
cwd: C:\Users\Administrator\AppData\Local\Temp\pip-install-grl1th1g\lxml
Complete output (77 lines):
Building lxml version 4.3.3.
Building without Cython.
ERROR: b"'xslt-config' \xb2\xbb\xca\xc7\xc4\xda\xb2\xbf\xbb\xf2\xcd\xe2\xb2\xbf\xc3\xfc\xc1\xee\xa3\xac\xd2\xb2\xb2\xbb\xca\xc7\xbf\xc9\xd4\xcb\xd0\xd0\xb5\xc4\xb3\xcc\xd0\xf2\r\n\xbb\xf2\xc5\xfa\xb4\xa6\xc0\xed\xce\xc4\xbc\xfe\xa1\xa3\r\n"
** make sure the development packages of libxml2 and libxslt are installed **

Using build configuration of libxslt
running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.8
creating build\lib.win-amd64-3.8\lxml
copying src\lxml\builder.py -> build\lib.win-amd64-3.8\lxml
copying src\lxml\cssselect.py -> build\lib.win-amd64-3.8\lxml
copying src\lxml\doctestcompare.py -> build\lib.win-amd64-3.8\lxml
copying src\lxml\ElementInclude.py -> build\lib.win-amd64-3.8\lxml
copying src\lxml\pyclasslookup.py -> build\lib.win-amd64-3.8\lxml
copying src\lxml\sax.py -> build\lib.win-amd64-3.8\lxml
copying src\lxml\usedoctest.py -> build\lib.win-amd64-3.8\lxml
copying src\lxml\_elementpath.py -> build\lib.win-amd64-3.8\lxml
copying src\lxml\__init__.py -> build\lib.win-amd64-3.8\lxml
creating build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\__init__.py -> build\lib.win-amd64-3.8\lxml\includes
creating build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\builder.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\clean.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\defs.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\diff.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\ElementSoup.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\formfill.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\html5parser.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\soupparser.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\usedoctest.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\_diffcommand.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\_html5builder.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\_setmixin.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\__init__.py -> build\lib.win-amd64-3.8\lxml\html
creating build\lib.win-amd64-3.8\lxml\isoschematron
copying src\lxml\isoschematron\__init__.py -> build\lib.win-amd64-3.8\lxml\isoschematron
copying src\lxml\etree.h -> build\lib.win-amd64-3.8\lxml
copying src\lxml\etree_api.h -> build\lib.win-amd64-3.8\lxml
copying src\lxml\lxml.etree.h -> build\lib.win-amd64-3.8\lxml
copying src\lxml\lxml.etree_api.h -> build\lib.win-amd64-3.8\lxml
copying src\lxml\includes\c14n.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\config.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\dtdvalid.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\etreepublic.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\htmlparser.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\relaxng.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\schematron.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\tree.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\uri.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\xinclude.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\xmlerror.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\xmlparser.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\xmlschema.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\xpath.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\xslt.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\__init__.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\etree_defs.h -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\lxml-version.h -> build\lib.win-amd64-3.8\lxml\includes
creating build\lib.win-amd64-3.8\lxml\isoschematron\resources
creating build\lib.win-amd64-3.8\lxml\isoschematron\resources\rng
copying src\lxml\isoschematron\resources\rng\iso-schematron.rng -> build\lib.win-amd64-3.8\lxml\isoschematron\resources\rng
creating build\lib.win-amd64-3.8\lxml\isoschematron\resources\xsl
copying src\lxml\isoschematron\resources\xsl\RNG2Schtrn.xsl -> build\lib.win-amd64-3.8\lxml\isoschematron\resources\xsl
copying src\lxml\isoschematron\resources\xsl\XSD2Schtrn.xsl -> build\lib.win-amd64-3.8\lxml\isoschematron\resources\xsl
creating build\lib.win-amd64-3.8\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_abstract_expand.xsl -> build\lib.win-amd64-3.8\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_dsdl_include.xsl -> build\lib.win-amd64-3.8\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schematron_message.xsl -> build\lib.win-amd64-3.8\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schematron_skeleton_for_xslt1.xsl -> build\lib.win-amd64-3.8\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_svrl_for_xslt1.xsl -> build\lib.win-amd64-3.8\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\readme.txt -> build\lib.win-amd64-3.8\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
running build_ext
building 'lxml.etree' extension
error: Microsoft Visual C++ 14.0 is required. Get it with "Build Tools for Visual Studio": https://visualstudio.microsoft.com/downloads/
----------------------------------------

ERROR: Command errored out with exit status 1: 'd:\soft\python\python38\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\Administrator\AppData\Local\Temp\pip-install-grl1th1g\lxml\setup.py'"'"'; file='"'"'C:\Users\Administrator\AppData\Local\Temp\pip-install-grl1th1g\lxml\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\Administrator\AppData\Local\Temp\pip-record-ohs8ihq1\install-record.txt' --single-version-externally-managed --compile --install-headers 'd:\soft\python\python38\Include\lxml' Check the logs for full command output.

kobicoin.com

Bug Report

Description: [Description of the issue]

Expected behavior: [What should happen]

Current behavior: [What happpens instead of the expected behavior]

Steps to Reproduce:

  1. [First Step]
  2. [Second Step]
  3. [and so on ¡­]

Reproduce how often: [What percentage of the time does it reproduce?]

Possible solution: [Not obligatory, but suggest a fix/reason for the bug]

Context (Environment):[The code version, python version, operating system or other software/libs you use]

Additional Information

[Any other useful information about the problem].

GITHUB

兄弟能看看有没有办法让邮箱服务器走代理。主要是需要寻找一个可以放Linux全局走代理的方法就行,然后通过命令行切换IP或许是其他邮箱服务器切换IP的思路? 能解决会有报酬 有解决办法的开发请联系 Q3374835496 邮箱 [email protected] skype live:.cid.b409052f6258136f

Brother can see if there is a way to make the mailbox server go proxy. Need to find a way to put Linux global go proxy on the line, and then switch IP through the command line may be other mailbox server switch IP ideas? Can solve will be paid, there is a solution to the development of please contact Q3374835496 email 3374835496@qq. Com Skype Live: . Cid. B409052F6258136F

期待macos版本

这个想法非常不错。能不能拓展一下关于关键词的信息搜索与归纳的功能。
希望早点支持Macos版本,与我同样期待的人应该不少。

关于简书爬虫

如果作者开发一个从特定文章获取数据的功能,也许会提升运行效率。

看了目前的爬虫代码,是从个人主页获取的,但是文章中获取好像有点难,开发工具里找不到对应的网络请求。

要爬的字段主要是这几个:

  • 简书钻
  • 阅读量
  • 发布时间
  • 点赞量
  • 评论量

后两个已经可以解决了,前三个可以在 Html 中找到,但直接 Get 获取不到,看网络请求发现没有,应该是 JS 发起请求再填充进去的,但我没有 JS 开发能力,没办法解析代码。

初步定位到请求应该来自 _app.js 这个文件,不知道具体怎么发起的,居然可以隐藏网络请求。

最后,我自己有个简书爬虫库,主页的 JianshuResearchTools 就是,也用的 Requests 和 BeautifulSoup4,可以参考一下,如果能提几个 PR 更好。

感谢开发大大。

ERROR: Command errored out with exit status 1

pip install -r requirements.txt
输出:

ERROR: Command errored out with exit status 1: /usr/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-pf5_kd92/wxpython/setup.py'"'"'; __file__='"'"'/tmp/pip-install-pf5_kd92/wxpython/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-gjw9u541/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /home/tz/.local/include/python3.8/wxPython Check the logs for full command output.

爬取失败

systeminfo:
image

C:\Users\stsg0>python -V
Python 3.7.9

C:\Users\stsg0>pip -V
pip 20.2.3 from c:\users\stsg0\appdata\local\programs\python\python37\lib\site-packages\pip (python 3.7)

1.点击QQ邮箱,没有弹出输入框,右下角直接提示爬取失败
image
2.点击网易邮箱,控制台报错
image

chromeDriver已经启动了
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.