Giter Site home page Giter Site logo

paper-blue / geek_crawler Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zhengxiaotian/geek_crawler

0.0 0.0 0.0 30 KB

极客时间课程抓取脚本,支持输入账号密码后自动将极客时间的专栏课程保存到本地

License: MIT License

Python 88.55% CSS 11.45%

geek_crawler's Introduction

geek_crawler

最近极客时间有个活动,企业可以为每位员工免费领取3门课程。刚好我们公司领导也给我们申请了这个权益(没有领取的可以找领导说说帮忙弄一下,活动地址)。

免费领取的课程只有30天有效期,因为工作日白天要正常上班,30天之内没法学完3门课程。所以就写了个脚本,将账号下所有可以看到的专栏课程自动保存到本地。

💥 该项目仅限学习交流使用,请勿用于任何商业行为和损害其它人利益的行为。 💥

如何使用

  1. 将代码 clone 到本地

    git clone [email protected]:zhengxiaotian/geek_crawler.git
  2. 直接在终端或者 Pycharm 中运行脚本(ps: 代码是在 Python3 下编写的,需要使用 Python3 运行)

    # 运行前需安装一个第三方库 requests
    python geek_crawler.py
  3. 输入账号密码

    E:\geek_crawler (master -> origin)
    λ python geek_crawler.py
    请输入你的极客时间账号(手机号): *************
    请输入你的极客时间密码: ************
  4. 抓取完成

    2020-04-28 19:32:41,624 - geek_crawler.py[line:307] - INFO: 请求获取文章信息接口:
    2020-04-28 19:32:41,633 - geek_crawler.py[line:320] - INFO: 接口请求参数:{'id': 225554, 'include_neighbors': 'tru
    e', 'is_freelyread': 'true'}
    2020-04-28 19:32:42,047 - geek_crawler.py[line:349] - INFO: ----------------------------------------
    2020-04-28 19:32:47,131 - geek_crawler.py[line:478] - INFO: 正常抓取完成。

    Snipaste_2020-04-29_08-55-08.png

    PS:如果抓取过程中有接口报错导致抓取中断,可以查看日志中对应的报错信息,然后直接重新跑脚本继续抓取(之前抓取成功的文章会在本地有文档记录,后续不会重复抓取的)

成果展示

Snipaste_2020-04-29_08-44-44.png

Snipaste_2020-04-28_19-31-52.png

功能清单

  • 输入账号密码后自动将该账号下所有可以看到的专栏(图文+音频),保存到本地;

  • 可以支持选择保存成 Markdown 文档或者 HTML 文档;

  • 支持配置排除某些课程的拉取(比如已经有的课程不再下载);

  • 抓取指定名称的课程;

  • 将每篇文章的评论与正文一起保存到本地;

  • 将视频拉取下来保存成 MP4 文件;

geek_crawler's People

Contributors

zhengxiaotian avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.