Giter Site home page Giter Site logo

cadubentzen / mkvdump Goto Github PK

View Code? Open in Web Editor NEW
12.0 4.0 0.0 154.59 MB

MKV and WebM parser CLI tool

License: Apache License 2.0

Rust 90.80% Dockerfile 0.37% Makefile 0.22% CSS 0.69% HTML 1.21% JavaScript 6.71%
mkv webm cli ebml matroska audio multimedia rust video

mkvdump's Introduction

mkvdump

coverage test Crates.io

A command-line tool for debugging Matroska/WebM files. It displays all internal elements of a Matroska file as JSON or YAML.

Sample YAML output
- id: EBML
  header_size: 5
  size: 36
  children:
  - id: EBMLVersion
    header_size: 3
    size: 4
    value: 1
  - id: EBMLReadVersion
    header_size: 3
    size: 4
    value: 1
  - id: EBMLMaxIDLength
    header_size: 3
    size: 4
    value: 4
  - id: EBMLMaxSizeLength
    header_size: 3
    size: 4
    value: 8
  - id: DocType
    header_size: 3
    size: 7
    value: webm
  - id: DocTypeVersion
    header_size: 3
    size: 4
    value: 2
  - id: DocTypeReadVersion
    header_size: 3
    size: 4
    value: 2
- id: Segment
  header_size: 12
  size: Unknown
  children:
  - id: Void
    header_size: 9
    size: 229
    value: null
  - id: Info
    header_size: 5
    size: 44
    children:
    - id: TimestampScale
      header_size: 4
      size: 7
      value: 1000000
    - id: MuxingApp
      header_size: 3
      size: 16
      value: Lavf58.29.100
    - id: WritingApp
      header_size: 3
      size: 16
      value: Lavf58.29.100
  - id: Tracks
    header_size: 5
    size: 101
    children:
    - id: TrackEntry
      header_size: 9
      size: 96
      children:
      - id: TrackNumber
        header_size: 2
        size: 3
        value: 1
      - id: TrackUID
        header_size: 3
        size: 4
        value: 1
      - id: FlagLacing
        header_size: 2
        size: 3
        value: 0
      - id: Language
        header_size: 4
        size: 7
        value: und
      - id: CodecID
        header_size: 2
        size: 7
        value: V_AV1
      - id: TrackType
        header_size: 2
        size: 3
        value: video
      - id: DefaultDuration
        header_size: 4
        size: 8
        value: 41708333
      - id: Video
        header_size: 9
        size: 32
        children:
        - id: PixelWidth
          header_size: 2
          size: 4
          value: 1280
        - id: PixelHeight
          header_size: 2
          size: 4
          value: 720
        - id: Colour
          header_size: 3
          size: 15
          children:
          - id: Range
            header_size: 3
            size: 4
            value: broadcast range
          - id: ChromaSitingHorz
            header_size: 3
            size: 4
            value: left collocated
          - id: ChromaSitingVert
            header_size: 3
            size: 4
            value: half
      - id: CodecPrivate
        header_size: 3
        size: 20
        value: '[81 05 0c 00 0a 0b 00 00 00 2d 4c ff b3 df ff 98 04]'
  - id: Tags
    header_size: 5
    size: 61
    children:
    - id: Tag
      header_size: 10
      size: 56
      children:
      - id: Targets
        header_size: 10
        size: 10
        children: []
      - id: SimpleTag
        header_size: 10
        size: 36
        children:
        - id: TagName
          header_size: 3
          size: 10
          value: ENCODER
        - id: TagString
          header_size: 3
          size: 16
          value: Lavf58.29.100
  - id: Cluster
    header_size: 6
    size: 2679
    children:
    - id: Timestamp
      header_size: 2
      size: 3
      value: 0
    - id: SimpleBlock
      header_size: 2
      size: 45
      value:
        track_number: 1
        timestamp: 0
        keyframe: true
    - id: SimpleBlock
      header_size: 2
      size: 59
      value:
        track_number: 1
        timestamp: 42
    - id: SimpleBlock
      header_size: 2
      size: 32
      value:
        track_number: 1
        timestamp: 83
    # ...

What's it useful for?

This tool is similar to mp4dump, but for Matroska files. It may be useful for:

  • snapshot testing: you can save mkvdump's output for a produced Matroska asset and use that in a human-readable snapshot test.
  • learning about EBML/Matroska/WebM: with this tool you can see how a Matroska file is structured. I also learned by writing the tool ๐Ÿ˜Š

Getting mkvdump

Debian package

Ubuntu users (>= 20.04) can install mkvdump via the DEB package available in the releases page.

Homebrew

Linux and macOS users on x86_64 devices can install mkvdump via the Homebrew tap:

$ brew install cadubentzen/mkvdump/mkvdump

macOS users on M1 or M2 devices need to use

$ brew install --build-from-source cadubentzen/mkvdump/mkvdump

Cargo

If you have cargo-binstall installed, you can install mkvdump with

$ cargo binstall mkvdump

Else, you can install by building it from source with:

$ cargo install mkvdump

Docker

To pull latest mkvdump from Docker Hub:

$ docker pull cadubentzen/mkvdump

A GitHub package is also available via

$ docker pull ghcr.io/cadubentzen/mkvdump

Images are multi-arch with support for linux/amd64, linux/386, linux/arm64, linux/arm/v7 and linux/arm/v6.

Running the container

Asssuming a Mastroska file in the host located at /host-path/sample.mkv. You can run mkvdump on it with the following command, by mounting a volume:

$ docker run -v /host-path:/media cadubentzen/mkvdump /media/sample.mkv

Prebuilt binaries

Download prebuilt binaries from the release page. There are binaries for the following targets:

  • Linux
    • statically linked with musl: x86_64, x86, aarch64, armv7l and armv6l
    • with GNU libc: x86_64 and x86 (built on Ubuntu 20.04)
  • macOS
    • x86_64 and aarch64 (>= macOS 11 Big Sur)
  • Windows
    • x86_64 and x86 with MSVC and MinGW

License

ยฉ 2022 Carlos Bentzen [email protected].

This project is licensed under either of

at your option.

The SPDX license identifier for this project is MIT OR Apache-2.0.

mkvdump's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

mkvdump's Issues

Add more samples to snapshot tests

Add a few more samples to the test suite:

  • H.264, VP9 and AV1 files
  • Audio files and mixed files
  • Encrypted and Unencrypted
  • Different muxers: ffmpeg, shaka-packager (what more?)

turn repo into workspace

Currently the mkvudmp bin and the library are in the same crate, thus we mix all the dependencies, although some of them are not used by the library.

Those who would like to use the library thus pay the price of adding those dependencies as well.

The soon-to-be wasm crate to use in the website also only needs to use the library.

Children is private

Thanks for working on this! I was having issues trying to get mkv tags with symphonia and vlc, maybe because the mkv tags are a matrix rather than flat? Not sure.

In the MasterElement struct children is private so library users can't walk the tree. I didn't see any other way to get it other than serialize to json then deserialize.

Compile to WASM

It would be pretty neat to have this crate compiled to WASM, and possibly in the future released as an Web app and/or extension

fuzzy testing

fuzz testing this crate would be really nice. And it also seems relatively simply to do so, because many random bytes actually yield valid EBML content, so the parser can go into different code paths.

Write README page

Last thing before the first release, once all things are set in place.

Use Element Paths for parsing and building the tree

Follow-up to #12. As Unknown sizes are a thing, building up the element tree could be done more elegantly by using paths specified in the XML file.

Currently, if we have a Master element of an unknown size, all elements following will be children of that Master element.

That does not work when concatenating multiple header files, or with clusters of unknown size.

Edit:
The element paths are also important for parsing, in order to recover from damaged elements.

Inline binary data with up to N bytes

Could display binary elements in the format
value: [00 0a 0b 0c]

if the value of the payload is smaller than N=64 (maybe).

  • Need to figure out how to do that with Serde YAML

remove nom

While nom is a great parser library, this project doesn't really use it fully, just take() and peek() functions.

Those functions should be easy to implement by hand and should reduce compilation time a lot.

Generate first release

Pre-built webmdump binary in the GitHub releases.

Try to have it built with musl to maximize portability. At first, Linux only.

Parse CodecPrivate for some codecs

Would be nice to see some parsing of CodecPrivate. Maybe implement this in a separate crate and use it here.

List of codecs:

  • H.264
  • HEVC
  • VP9
  • AV1
  • Opus

Improve error handling

Right now it's quite fixed how the parsing will happen, but we could improve the error handling by defining better error return types, that can be asserted in tests.

Implement streamed reading

Reading a GB file should not result in GB of memory utilized. The implementation could support that but the file reading is done with a single read-to-buffer call currently.

Could use VecDeque or some smarter library for buffered reading.

optimize memory usage

currently we parse the input by requiring each element (including its body) to be loaded into memory. That's the whole reason why we have a buffer-size option, so that it can be increased e.g. if we are parsing an MKV file with huge video frames.

However, since this crate is about displaying headers for those elements, it shouldn't be required to load the whole body into memory.

It's a bit trick though since we need to sync with skipping bytes from the input source, and sometimes we require to parse part of the body (e.g. in SimpleBlock) for useful info.

Handle null terminated strings

So far, we've been parsing strings and UTF-8 always using the provided size, and sometimes \0 null characters show up at the end.

Implement CRC-32 validation

Master elements might have it, so we need to check that the CRC-32 matches and discard the content if not.

fix panics in the tree module

The tree module hasn't been touched for a while, and its code is not using proper error handling yet, simply panicking in certain situations.

improve mkvparser documentation

now that there's a separate crate library for mkvparser, I should improve it's documentation.

Mainly adding an example to the landing docs.rs page

Add "no tree" mode

As we already parse the elements in a linear way, providing also the linear output, rather than tree mode, makes it easy to find elements.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.