ebu / ebu-tt-live-toolkit Goto Github PK

View Code? Open in Web Editor NEW

25.0 10.0 10.0 114.63 MB

Toolkit for supporting the EBU-TT Live specification

Home Page: http://ebu.github.io/ebu-tt-live-toolkit/

License: BSD 3-Clause "New" or "Revised" License

Makefile 0.14% Python 63.72% Batchfile 0.09% HTML 1.84% CSS 0.20% Gherkin 17.12% JavaScript 16.89%

ebu-tt python subtitles captions subtitling captioning live broadcast video

ebu-tt-live-toolkit's Introduction

ebu-tt-live-toolkit

This is the repository for the interoperability kit of EBU-TT Live.

The kit is envisaged to contain a set of components for generating, testing and distributing subtitle documents in EBU-TT Part 3 format.

This is an open source project. Anyone is welcome to contribute to the development of the components. Please see the wiki for the list of required components, guidelines and release plan.

The project home page is at http://ebu.github.io/ebu-tt-live-toolkit/ and links to the pre-built documentation.

We have a Slack team called ebu-tt-lit for day to day communications, questions etc. Please join up!

If you would like to contribute or join the Slack team, please contact [email protected] or [email protected]

Preparing the build environment

Make sure you have python 2.7+. Make sure you have python virtual environment capability.

If not you can install virtualenv systemwide from your operating system's package repository or by pip:

sudo pip install virtualenv

After that creating a virtual environment should be as simple as:

virtualenv env

Let's activate it (source makes sure the current shell executes the script and assumes the environment variables that the activation script sets):

source ./env/bin/activate

To build the project you will also need node.js. Please read the instructions for your system here.

After having created the python virtual environment, having activated it and having installed node.js the package can be built by typing make if you have GNU build tooling on your system.

make

Alternatively:

pip install -r requirements.txt
python setup.py develop

pyxbgen --binding-root=./ebu_tt_live/bindings -m __init__ --schema-root=./ebu_tt_live/xsd/ -r -u ebutt_all.xsd

npm install nunjucks
node_modules/nunjucks/bin/precompile ebu_tt_live/ui/user_input_producer/template/user_input_producer_template.xml > ebu_tt_live/ui/user_input_producer/template/user_input_producer_template.js

After this you are supposed to be able to launch the command line tools this python package provides i.e.:

ebu-dummy-encoder

Windows users

Windows is not the best friend of Makefiles. So there is a make.bat file for those who would like to develop using Windows. Assuming python 2.7 and virtualenv is installed and are on the PATH. To build the project you will also need node.js. Please read the instructions for your system here. Then run :

make

This will make sure a virtual environment is created and activated and installs all the tools into it.

After that the following command should work:

ebu-dummy-encoder

The Schema definitions XSD

The schema definitions are to be found embedded in the Python library in the xsd1.1 subfolder. The root schemadocument is called ebutt_live.xsd.

The Python library

The library uses XSD schemas from the xsd1.1 subdirectory. The bindings will keep the validation sane and PyXB makes sure that updates are working as expected. Should the schema be modified a regeneration can be run and the bindings will respect the changes.

Scripts

There are several scripts that emulate different components (nodes) in the infrastructure. They can be executed individually or in combinations by running ebu-run. Assuming the Makefile worked, the package is installed in a virtual environment and the virtual environment is active, the components should be available by running the ebu-run script and passing a configuration file. There are several example configuration files in examples/config. For the complete list see of scripts see docs/build/html/scripts_and_their_functions.html.

Below is a list of some of the key components. .

The simple producer is the beginning of the data pipeline. It generates EBU-TT-Live documents in a timed manner. In the repository root there is a test.html file that can be used for manual testing of the producer in any websocket capable browser. To run it use ebu-run:

`ebu-run --admin.conf ebu_tt_live/examples/config/simple_producer.json`

The simple consumer connects to the producer or later on in the pipeline, assuming there are more components inserted.

ebu-run --admin.conf ebu_tt_live/examples/config/simple_consumer.json

The User Input producer is a web page with a user interface that allows you to send subtitle documents and view the output of a downstream node. For complete documentation see docs/build/html/user_input_producer.html.

To run a configuration of components, use a configuration file with multiple nodes defined. For example, this will create 3 nodes: a distributer that listens to the UIP and two consumers that subscribe to the distributer:

ebu-run --admin.conf ebu_tt_live/examples/config/user_input_producer_dist_consumers.json

Documentation

Go straight to the pre-built documentation for the current master branch.

The documentation framework uses the popular Sphinx documentation generating engine and autodoc plugins to give developers the flexibility of writing Extra documentation interleaved with the autogenerated documentation created by autodoc.

Prerequisite: Graphviz

To display the images in the documentation, you need to have Graphviz installed and make sure the dot executable is on the PATH. For example, for users of homebrew:

brew install graphviz

Generating documentation

Documentation can be generated based on the sources in the docs/source directory. After having installed the packages in requirements.txt (which is done automatically by the make command) documentation can be generated by one of the following three ways:

1 Calling setuptools

python setup.py build_sphinx

2 Running make in the docs directory where separate makefiles and a make.bat file is giving a variety of options.

cd docs
make html

3 Calling the sphinx-build command line script that comes with sphinx. WARNING: Platform-dependent path-separators.

sphinx-build -b html docs/source/ docs/build/html

Previewing the documentation

After sphinx finished with a successful execution log the generated documentation should be accessible by opening the docs/build/html/index.html in any web browser.

Tests

The test framework is described in CONTRIBUTING.md

How to contribute

Please refer to CONTRIBUTING.md

ebu-tt-live-toolkit's People

Contributors

Stargazers

Watchers

Forkers

bbc prernaburadkar malikbeytrison adamstrawson savard02 ccma-enginyeria

ebu-tt-live-toolkit's Issues

COMPONENT: Simple Part 3 consumer (Semantic validation)

As a developer/playout service provider, I want to validate an EBU-TT Live sequence against the specification so that I know if they conform to the specification.

Input: Single part 3 doc or sequence of part 3 documents
Output: display of text with begin and end times and validation message(s)

It requires the following steps to be completed:

(this list is not yet complete)

COMPONENT: Teletext-to-part 3

As a broadcaster, I want to convert subtitles in teletext format to EBU-TT Live documents so that I can use legacy systems.

Input: Teletext (data feed/VBI/VANC)
Output: Sequence of EBU-TT Live documents

TimecountTimingType regex is wrong

Spotted this while working on my understanding of the bindings and how to add smpte <-> timedelta conversion.

Python's regexes are read sequentially, meaning that if you have for example [0-9]+(h|m|s|ms) :

9h is parsed normally
9m also
9s also
However, 9ms is parsed as 9m and the s is forgotten

This is really problematic because it also allows 9mh for example and extracts 9m, however 9mh is not permitted in this case.

To solve the problem, the solution is to add a $ at the end of the regex, so for timecountTimingtype :

[0-9]+(\.[0-9]+)?(h|m|s|ms)$ in the xsd
?P<numerator>[0-9]+(?:\\.[0-9]+)?)(?P<unit>h|m|s|ms)$ in bindings

No dur attribute if ttp:markerModer="discontinuous"

tt:dur is not allowed when a document has both ttp:timeBase="smpte" and ttp:markerMode="discontinuous"

use make for CI

To reduce maintenance.

COMPONENT: Delay node

As a playout service provider, I want to add a delay to the sequence so that I can synchronise subtitle with audio.

Input: a sequence of part 3 documents.
Output: a delayed sequence of part 3 documents.

implement a fixed delay node #231
implement a variable delay node #232
write tests for delay node #233

Badge-per-branch for Travis CI?

Currently the Travis CI badge referred to in the readme.md points to the Master branch.

This means the badge may show incorrect status on other branches (unless the readme.md is modified).

Some people have done provided a pre-built hook to get around this, but it seems to be a bit fragile: http://stackoverflow.com/questions/18673694/referencing-current-branch-in-github-readme-md

Also check out: http://stackoverflow.com/questions/19810386/showing-travis-build-status-in-github-repo

COMPONENT: Fixed input producer

As a developer/tester, I want a sequence of EBU-TT Live documents generated automatically so that I can develop/test an implementation against it without the need to input text.

Input: none.
Output: a sequence of Part 3 documents.

markerMode continuous ?

From what I understand from https://git.ebu.io/ebutt/ebutt-part-3/issues/115 , markerMode="continuous" is permitted with timeBase="smpte", however this is not allowed by the spec and the schema definitions. Should it be added in the xsd files so we have at least semantically correct smpte cases ? (here for example : https://github.com/ebu/ebu-tt-live-toolkit/blob/a7a545d189e4a12d6c8632929fd019252de2e4cb/testing/bdd/templates/referenceClockIdentifier.xml )

COMPONENT: Downstream validator

As a developer/playout service provider, I want to check that an EBU-TT Live document conforms to downstream requirements so that I know it will be successfully processed.

Input: a single part 3 document; metadata instructions.
Output: validation message.

COMPONENT: SDI carrier

As a playout service provider, I want to carry EBU-TT Live documents over HD SDI so that I can pass EBU-TT Live around with video in a broadcast environment.

Input: a sequence of part 3 documents.
Output: a sequence of part 3 documents in HD SDI

COMPONENT: User input producer

As a developer/tester, I want a sequence of part 3 documents generated from text input so that I can develop/test an implementation against it.

Input: Text file provided by user.
Output: Sequence of EBU-TT Live documents.

Set up Travis CI correctly

make sure tests run
trigger on pull requests only or any commit? (maybe both for now)

Manifest availability times to use same timeBase

Create the availability times in the manifest file in the same timeBase as the documents.

As discused in #50

Timebase semantic testing

Test semantic validation on timebase rules :

Start point of a temporal interval associated with a tt:body element.
If the timebase is "smpte" the type shall be ebuttdt:smpteTimingType .
If the timebase is "media" the type shall be ebuttdt:mediaTimingType .
If the timebase is "media" the time expression should be the offset from a syncbase of "00:00:00.0".
If the timebase is "clock" the type shall be ebuttdt:clockTimingType .

CI e-mail notifications

Add CI build notifications to team (members).

Note that it seems committers get info on their commits already anyhow:

By default, email notifications are sent to the committer and the commit author, if they are members of the repository (that is, they have push or admin permissions for public repositories, or if they have pull, push or admin permissions for private repositories).

Options for wider notifications include:

Using the Slack Travis CI integration to channel the build messages to the CI channel.
Adding additional e-mail addresses to the Travis set up

COMPONENT: Simple noise introducer

As a developer/tester, I want to consume 'noisy' sequences of part 3 documents so that I can test my implementation against a known set of scenarios.

Input: a sequence of part 3 documents; noise options.
Output: a sequence of modified part 3 documents.

Set up automatic documentation building

http://blog.gockelhut.com/2014/09/automatic-documentation-publishing-with.html

Slack notifications granularity

Check if we all agree with changing the notifications to only show:

Failures
Changes (includes the first build after fails)

Responses:

COMPONENT: Reference clock

As a developer/tester, I want an external reference clock so that I can ensure documents are processed correctly.
Input: none.
Output: UTC date-time value.

COMPONENT: Archiver

As a broadcaster/access services provider, I want to archive EBU-TT Live documents so that I can reuse and distribute them after live transmission as a single Part 1 document.

Input: a sequence of EBU-TT-Live documents.
Output: a single EBU-TT part 1 document.

COMPONENT: RTP carrier

As a playout service provider, I want to carry EBU-TT Live documents using the RTP protocol so that I can use EBU-TT Live over IP networks.

Input: a sequence of part 3 documents.
Output: a sequence of part 3 documents for interfacing with RFC 3550.

COMPONENT: Distributor node

As a playout service provider, I want to distribute a sequence so that it can be processed by multiple consumers.

Input: a sequence of part 3 documents
Output: a sequence of part 3 documents available to multiple destinations

COMPONENT: XSD

As a developer/playout service provider, I want to validate EBUT-TT Live documents so that I know if they are valid XML

Input: a single part 3 document.
Output: validation message.

COMPONENT: Handover node

As a subtitle provider/broadcaster, I want to combine documents from alternating respeakers/stenographers into a single EBU-TT live

Input: 2 or more part 3 sequences; handover options.
Output: a single part 3 sequence.

Subtasks:

Implement core functionality #363
Create configurator #368
Document handover node #364
Unittest handover node #373
BDD testing for handover node #397 (duplicate of #311)
UIP modifications to support handover use-case #374

Create text equivalent list of normative spec requirements

Generate a text format list of the normative spec requirements against which we can write tests.

django dependency removal

COMPONENT:XSD ttm:agent element defined as string, should be a complexType

The ttm:agent element is defined in metadata.xsd as xs:string, but it should be as in TTML1 §12.1.5:

<ttm:agent
  type = (person|character|group|organization|other)
  xml:id = ID
  xml:lang = string
  xml:space = (default|preserve)
  {any attribute not in default or any TT namespace}>
  Content: ttm:name*, ttm:actor?
</ttm:agent>

where

<ttm:name
  type = (full|family|given|alias|other)
  xml:id = ID
  xml:lang = string
  xml:space = (default|preserve)
  {any attribute not in default or any TT namespace}>
  Content: #PCDATA
</ttm:name>

and

<ttm:actor
  agent = IDREF
  xml:id = ID
  xml:lang = string
  xml:space = (default|preserve)
  {any attribute not in default or any TT namespace}>
  Content: EMPTY
</ttm:actor>

This issues is copied from the BBC repo: bbc#8

skip individual scenario example lines

It would be good to be able to skip individual example lines in scenarios without commenting them out, so that those scenarios are counted better.

Test referenceClockIdentifier validation

Allows the reference clock source to be identified. Permitted only when
ttp:timeBase="clock" AND ttp:clockMode="local" OR when
ttp:timeBase="smpte".

Consider best way to track spec coverage

For example:

as comments in the code
as comments in the spec (Word, pdf, ...)
in a separate list/dbase

As @kozmaz87 suggested, probably best to annotate the spec with the test code locations.

Differentiate empty and missing template variable

For example, if : test_var = "?None?" and we have template :

{% if test_var != "?None?" %}
    <a_tag>{{ test_var }}</a_tag>
{% endif % }

So if test_var = '' the tag will be present but will contain an empty string.

graphviz dependency not documented

I think graphviz needs to be installed in order to see the figures in the documentation correctly, but this is not clear from the README.md.

(Building sphinx gives a warning it cannot find the dot executable if the user has not installed it).

I see two options:

install graphviz automatically (not easy to do platform independently?)
document in README.md that the user needs to install graphviz and add the location of the dot executable to the PATH

COMPONENT: Switcher node

As a playout service provider, I want to switch between multiple sequences so that I can choose which sequence is output.

Input: multiple part 3 sequences; switching options.
Output: a single part 3 sequence.

COMPONENT: EBU-TT-D encoder

As a playout service provider, I want to convert EBU-TT-Live documents to EBU-TT-D documents so that I can distribute them to the end device

Input: sequence of part 3 documents
Output: sequence of EBU-TT-D documents

Subtasks:

Add EBU-TT-D XSD 1.1 to repository and update it if necessary #177
Create bindings (move EBU-TT-3 current bindings to a sensible place) #176
Conversion initiation logic and validation #174
Create EBU-TT-D conversion classes #170

py.test is not callable

py.test is not callable, but CONTRIBUTING.md says it is

Create time metric tests (for smpte)

See #67 and #68: issue is to create tests that verify that the correct time units are used and the code does not get ms and m confused with each other.

Investigate CI plug-ins

As @kozmaz87 suggested, 2 Jenkins plug ins that would be useful:

The junit results plugin, which makes navigation of the test suite results easy and adds retrospective of the last builds, so you can see how the test metrics changed in the last X builds
the cobertura coverage plugin which does the same just with coverage

Timedelta <-> SMPTE conversion

Conversion between XML time formats and timedelta values is done in this file. This setup allows us to do the conversions during pyxb binding loop.

Conversion for SMPTE is a bit more complicated than conversion for clock and media times. Indeed for SMPTE we need to access values of some attributes of the <tt> element, which is not easily done through pyxb at the stage of the binding where the conversion happens.

COMPONENT: WebSocket carrier

As a playout service provider, I want to carry EBU-TT Live documents using the WebSocket protocol so that I can use EBU-TT Live over TCP.

Input: a sequence of part 3 documents.
Output: a sequence of part 3 documents for RFC 6455.

Document manifest format

We need to document the manitfest format. E.g.

in the code
in the wiki
in the header of the manifest file?

related to: #59

tests for smpte timebase semantic checks

Test that when timebase is set to smpte, documents are validated/rejected depending on the format of the time. Also test that presence and correctness of all needed parameters is checked

Unit-Test documents comparison (ComparableMixin)

Documents use the mixin ComparableMixin (defined in the project root in file utils.py). This mixin allows for correct and easy comparison of documents :

two documents with the same sequenceIdentifier will be compared using their sequenceNumber :

document1 < document2 if document1.sequenceNumber < document2.sequenceNumber

If the documents do not have the same sequenceIdentifier, there is no comparison possible and an error is raised.

With this issue I want to address the fact that this was not tested yet, so I implemented tests to ensure that document comparison works as intended.

COMPONENT: Part 1 cued

As a playout service provider, I want to consume a cued EBU-TT Live sequence from an EBU-TT Part 1 document .

Input: a single EBU-TT Part 1 document; cueing options.
Output: a sequence of Part 3 documents released according to the cueing options.

Decide on which python versions we support

check existing code/libraries
decide
add to CONTRIBUTING.md
add to .travis.yml

Basic testing setup

Find a way to handle xml files (with templates for example)
Setup a basic test infrastructure for bdd
Write some tests

File system carriage mechanism

Refers to BBC's fork issue 18

This issue asks for the implementation of a functionality that allows the simple producer to write its output to the file system along with a manifest file with availability times.

Timeformats in documents

Time formats in documents are a bit confusing :

<tt:body tt:begin="63016289ms" tt:dur="00:00:01">

Can we chose the format used to convert timedeltas more precisely ? For example here, having begin in hh:mm:ss.ms and dur in xxxxxms format would be more logical.

Raised from discussion during 13/07/2016 call.

COMPONENT: Complex noise introducer

As a developer/tester, I want to control the level of 'noise' and complexity in a stream of part 3 documents so that I can test my implementation against different scenarios.

Input: a sequence of part 3 documents; options for introducing complexity into the stream.
Output: a sequence of modified part 3 documents.

COMPONENT: Simple Part 3 consumer (XML)

As a developer/tester I want a simple view of the text and times in a sequence of EBU-TT Live documents so that I can verify that they will be processed correctly.

Input: Sequence of part 3 documents; 'validate only' option.
Output: display of text with begin and end times and validation message(s)