Giter Site home page Giter Site logo

lynda-video-transcripts's Introduction

Lynda Video Transcripts

一个批量抓取 Lynda 视频字幕的爬虫脚本。

Requirements

  • Node.js
  • Phantom.js 2.x

Installation

$ git clone https://github.com/riophae/lynda-video-transcripts.git
$ cd lynda-video-transcripts
$ npm install # 安装依赖
$ # 配置 config
$ npm run build # 每次修改 config 后都要进行编译
$ npm start # 执行爬虫脚本

Configuration

复制一份 config.example.yaml 并更名为 config.yaml,打开编辑:

  • detectNetworkCondition 设置是否在开始时检查网络连接状况 yes/no
  • userAgent 建议配置成与自己常用浏览器一致的 userAgent 可能好一些
  • captureScreenAutomatically 设置爬虫运行过程中是否定时自动截图 yes/no
  • viewportSize 设置爬虫使用的浏览器的可视区域大小,取值任意,不要太小即可
  • username password lynda.com 账号名和密码
  • courses 需要抓取的课程列表
  • intervalBetweenTutorialVisits 设置每两节课程抓取时间的间隔,不建议设置得太短,避免被反作弊处理

courses

支持两种方式。可以同时指定输出目录和该课程起始抓取点:

courses:
  - dirName: <COURSE_OUTPUT_DIR>
    startPoint: <START_POINT_URL>
  - dirName: ...
    startPoint: ...
  - dirName: ...
    startPoint: ...

也可以只指定每个课程的起始点,程序会自动根据课程名称确定输出目录:

courses:
  - <START_POINT_URL>
  - <ANOTHER_START_POINT_URL>
  - ...

爬虫内部的运作逻辑是,会从指定的起始点开始抓取字幕,直到课程的最后一节。

Caveats

每次启动爬虫脚本都会清空输出目录(output/),因此请注意及时转移文件。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.