transcript-model's Introduction

transcript-model

JSON schema and JavaScript model classes for dealing with time-aligned transcripts of speech.

Usage

Install in your project

$ npm install --save transcript-model

Then

const { Transcript } = require('transcript-model');

// Define some transcript JSON
const json = {
  speakers: [{ name: 'Alice' }, { name: 'Bob' }],
  segments: [
    {
      speaker: 0,
      words: [
        { start: 0.05, end: 0.64, text: 'Hello' },
        { start: 0.7, end: 1.1, text: 'Bob!' },
      ],
    },
    {
      speaker: 1,
      words: [
        { start: 1.53, end: 1.88, text: 'Hi' },
        { start: 1.92, end: 2.33, text: 'Alice.' },
      ],
    },
  ],
};

// Instantiate a Transcript object
const transcript = Transcript.fromJson(json);

// Do something with it
console.log(
  transcript.segments
    .map(
      segment =>
        `${transcript.speakers.get(segment.speaker).name}: ${segment.words
          .map(word => word.text)
          .join(' ')}`
    )
    .join('\n')
);

// Serialise as JSON
console.log(transcript.toJson());

Try it out on RunKit.

For more examples of creating and manipulating Transcript objects check out the source code.

CLI

A basic command line interface has been implemented to support conversion of BBC Kaldi output to the transcript JSON format.

Install

$ npm install -g transcript-model

Usage

To write to STDOUT:

$ transcript-model --kaldi path/to/transcript.json path/to/segments.json

To write to a file:

$ transcript-model --kaldi path/to/transcript.json path/to/segments.json > output.json

Author

Alex Norton

transcript-model's People

Contributors

Stargazers

Watchers

transcript-model's Issues

upgrade to webpack 2

It's not necessary to use Webpack to bundle the source files. Just use Babel to translate them as in transcript-editor.

protocol buffer serialization option

Update ajv to version 5.0.1-beta or later to prevent webpack warning

Currently when developing with webpack the version of the ajv dependency used causes an annoying error message:

As noted in ajv-validator/ajv#117 this is resolved from version 5.0.1-beta onwards. The dependency should be updated so as to prevent the error from occuring.

Add 'start' and 'end' properties to TranscriptSegment

In order to find words within a specific time range, it would be useful (for the purpose of optimisation) to know the start-time of the first word of a segment and the end-time of the last-time of a segment.