zhengxiaotian / geek_crawler Goto Github PK
View Code? Open in Web Editor NEW极客时间课程抓取脚本,支持输入账号密码后自动将极客时间的专栏课程保存到本地
License: MIT License
极客时间课程抓取脚本,支持输入账号密码后自动将极客时间的专栏课程保存到本地
License: MIT License
大神来看下呀:
/Users/bo/PycharmProjects/pythonProject/main.py[line:550] - ERROR: 请求过程中出错了,出错信息为:Traceback (most recent call last):
File "/Users/bo/PycharmProjects/pythonProject/main.py", line 547, in
run(cellphone, pwd, exclude=exclude, get_comments=get_comments)
File "/Users/bo/PycharmProjects/pythonProject/main.py", line 513, in run
geek._article(aid, pro, file_type=file_type, get_comments=get_comments) # 获取单个文章的信息
File "/Users/bo/PycharmProjects/pythonProject/main.py", line 341, in _article
self.save_to_file(
File "/Users/bo/PycharmProjects/pythonProject/main.py", line 449, in save_to_file
os.mkdir(dir_path)
FileNotFoundError: [Errno 2] No such file or directory: 'A/B测试从0到1'
$ python geek_crawler.py
Traceback (most recent call last):
File "geek_crawler.py", line 12, in
import requests
ModuleNotFoundError: No module named 'requests'
請問如何下載音頻檔案,我下載內容只有MD檔。
在原来的代码基础上简单的修改了一下,实现下载指定的课程
修改点1.使用原来的exclude变量,存储想要下载的课程,大概在539行左右
# 将exclude设置为指定要爬取的文章
exclude = ['快速上手C++数据结构与算法']
修改点2.将297行左右的
if product.get('title', '') in self.exclude:
修改为
if product.get('title', '') not in self.exclude:
请求登录接口:
接口请求参数:{'country': 86, 'cellphone': '*******', 'password': '********', 'captcha': '', 'remember': 1, 'platform': 3, 'appid': 1, 'source': ''}
请求过程中出错了,出错信息为:Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/urllib3-1.26.0.dev0-py3.8.egg/urllib3/connectionpool.py", line 686, in urlopen
self._prepare_proxy(conn)
File "/usr/local/lib/python3.8/site-packages/urllib3-1.26.0.dev0-py3.8.egg/urllib3/connectionpool.py", line 952, in prepare_proxy
conn.connect()
File "/usr/local/lib/python3.8/site-packages/urllib3-1.26.0.dev0-py3.8.egg/urllib3/connection.py", line 389, in connect
self.sock = ssl_wrap_socket(
File "/usr/local/lib/python3.8/site-packages/urllib3-1.26.0.dev0-py3.8.egg/urllib3/util/ssl.py", line 397, in ssl_wrap_socket
ssl_sock = context.wrap_socket(sock, server_hostname=server_hostname)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 1040, in _create
self.do_handshake()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1123)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/requests-2.24.0-py3.8.egg/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/usr/local/lib/python3.8/site-packages/urllib3-1.26.0.dev0-py3.8.egg/urllib3/connectionpool.py", line 745, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.8/site-packages/urllib3-1.26.0.dev0-py3.8.egg/urllib3/util/retry.py", line 474, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='account.geekbang.org', port=443): Max retries exceeded with url: /account/ticket/login (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1123)')))
geek_crawler.py 后出现 File "geek_crawler.py", line 102 return ';'.join([f'{k}={v}' for k, v in self._cookies.items()])
报错信息:
File "geek_crawler.py", line 483, in save_to_file
with open(file_path, 'w', encoding='utf-8') as f:
OSError: [Errno 22] Invalid argument: 'D:\0-git-time\geek_crawler-master\JavaScript核心原理解析\20 _ (0, eval)("x = 100") :一行让严格模式形同虚设的破坏性设计(上).md'
大神要不要看看
请问 有下载视频部分的处理吗? 可以分享下不? 你功能部分介绍的最后一个有具体实现吗?
我有50多门课(仅3个是视频课,其他都是文字版),只有20多门课能下载。请问是什么原因导致不能下载所有课程
在主函数中:
原: run(cellphone, pwd, exclude=exclude, get_comments=get_comments)
应改为:
run(cellphone, pwd, exclude=exclude, get_comments=get_comments, file_type=file_type)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.