lynda-video-transcripts's Introduction

Lynda Video Transcripts

一个批量抓取 Lynda 视频字幕的爬虫脚本。

Requirements

Node.js
Phantom.js 2.x

Installation

$ git clone https://github.com/riophae/lynda-video-transcripts.git
$ cd lynda-video-transcripts
$ npm install # 安装依赖
$ # 配置 config
$ npm run build # 每次修改 config 后都要进行编译
$ npm start # 执行爬虫脚本

Configuration

复制一份 config.example.yaml 并更名为 config.yaml，打开编辑：

detectNetworkCondition 设置是否在开始时检查网络连接状况 yes/no
userAgent 建议配置成与自己常用浏览器一致的 userAgent 可能好一些
captureScreenAutomatically 设置爬虫运行过程中是否定时自动截图 yes/no
viewportSize 设置爬虫使用的浏览器的可视区域大小，取值任意，不要太小即可
username password lynda.com 账号名和密码
courses 需要抓取的课程列表
intervalBetweenTutorialVisits 设置每两节课程抓取时间的间隔，不建议设置得太短，避免被反作弊处理

`courses`

支持两种方式。可以同时指定输出目录和该课程起始抓取点：

courses:
  - dirName: <COURSE_OUTPUT_DIR>
    startPoint: <START_POINT_URL>
  - dirName: ...
    startPoint: ...
  - dirName: ...
    startPoint: ...

也可以只指定每个课程的起始点，程序会自动根据课程名称确定输出目录：

courses:
  - <START_POINT_URL>
  - <ANOTHER_START_POINT_URL>
  - ...

爬虫内部的运作逻辑是，会从指定的起始点开始抓取字幕，直到课程的最后一节。

Caveats

每次启动爬虫脚本都会清空输出目录（output/），因此请注意及时转移文件。

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.

riophae / lynda-video-transcripts Goto Github PK