Giter Site home page Giter Site logo

eyevinn / mp4ff Goto Github PK

View Code? Open in Web Editor NEW
405.0 18.0 76.0 3.86 MB

Library and tools for parsing and writing MP4 files including video, audio and subtitles. The focus is on fragmented files. Includes mp4ff-info, mp4ff-encrypt, mp4ff-decrypt and other tools.

License: MIT License

Go 99.83% Makefile 0.17%
mpeg-dash mp4 fmp4 avc cmaf sps pps wvtt stpp fragmented-mp4-files

mp4ff's Introduction

Logo

Test golangci-lint GoDoc Go Report Card license

Package mp4ff implements MP4 media file parsing and writing for AVC and HEVC video, AAC and AC-3 audio, and stpp and wvtt subtitles. It is focused on fragmented files as used for streaming in DASH, MSS and HLS fMP4, but can also decode and encode all boxes needed for progressive MP4 files. In particular, the tool mp4ff-crop can be used to crop a progressive file.

Command Line Tools

Some useful command line tools are available in cmd.

  1. mp4ff-info prints a tree of the box hierarchy of a mp4 file with information about the boxes. The level of detail can be increased with the option -l, like -l all:1 for all boxes or -l trun:1,stss:1 for specific boxes.
  2. mp4ff-pslister extracts and displays SPS and PPS for AVC or HEVC in a mp4 or a bytestream (Annex B) file. Partial information is printed for HEVC.
  3. mp4ff-nallister lists NALUs and picture types for video in progressive or fragmented file
  4. mp4ff-subslister lists details of wvtt or stpp (WebVTT or TTML in ISOBMFF) subtitle samples
  5. mp4ff-crop shortens a progressive mp4 file to a specified duration
  6. mp4ff-encrypt encrypts a fragmented file using cenc or cbcs Common Encryption scheme
  7. mp4ff-decrypt decrypts a fragmented file encrypted using cenc or cbcs Common Encryption scheme

You can install these tools by going to their respective directory and run go install . or directly from the repo with

go install github.com/Eyevinn/mp4ff/cmd/mp4ff-info@latest

Example code

Example code is available in the examples directory. The examples and their functions are:

  1. initcreator creates typical init segments (ftyp + moov) for video and audio
  2. resegmenter reads a segmented file (CMAF track) and resegments it with other segment durations using fullSample
  3. segmenter takes a progressive mp4 file and creates init and media segments from it. This tool has been extended to support generation of segments with multiple tracks as well as reading and writing mdat in lazy mode
  4. multitrack parses a fragmented file with multiple tracks
  5. combine-segs combines single-track init and media segments into multi-track segments

Library

The library has functions for parsing (called Decode) and writing (Encode) in the package mp4ff/mp4. It also contains codec specific parsing of AVC/H.264 including complete parsing of SPS and PPS in the package mp4ff.avc. HEVC/H.265 parsing is less complete, and available as mp4ff.hevc. Supplementary Enhancement Information can be parsed and written using the package mp4ff.sei.

Traditional multiplexed non-fragmented mp4 files can be parsed and decoded, but the focus is on fragmented mp4 files as used in DASH, HLS, and CMAF.

Beyond single-track fragmented files, support has been added to parse and generate multi-track fragmented files as can be seen in examples/segment and examples/multitrack.

The top level structure for both non-fragmented and fragmented mp4 files is mp4.File.

In a progressive (non-fragmented) mp4.File, the top level attributes Ftyp, Moov, and Mdat points to the corresponding boxes.

A fragmented mp4.File can be more or less complete, like a single init segment, one or more media segments, or a combination of both like a CMAF track which renders into a playable one-track asset. It can also have multiple tracks. For fragmented files, the following high-level attributes are used:

  • Init contains a ftyp and a moov box and provides the general metadata for a fragmented file. It corresponds to a CMAF header. It can also contain one or more sidx boxes.
  • Segments is a slice of MediaSegment which start with an optional styp box, possibly one or more sidx boxes and then one or moreFragment`s.
  • Fragment is a mp4 fragment with exactly one moof box followed by a mdat box where the latter contains the media data. It can have one or more trun boxes containing the metadata for the samples.

All child boxes of container box such as MoovBox are listed in the Children attribute, but the most prominent child boxes have direct links with names which makes it possible to write a path such as

fragment.Moof.Traf.Trun

to access the (only) trun box in a fragment with only one traf box, or

fragment.Moof.Trafs[1].Trun[1]

to get the second trun of the second traf box (provided that they exist). Care must be taken to assert that none of the intermediate pointers are nil to avoid panic.

Creating new fragmented files

A typical use case is to a fragment consisting of an init segment followed by a series of media segments.

The first step is to create the init segment. This is done in three steps as can be seen in examples/initcreator:

init := mp4.CreateEmptyInit()
init.AddEmptyTrack(timescale, mediatype, language)
init.Moov.Trak.SetHEVCDescriptor("hvc1", vpsNALUs, spsNALUs, ppsNALUs)

Here the third step fills in codec-specific parameters into the sample descriptor of the single track. Multiple tracks are also available via the slice attribute Traks instead of Trak.

The second step is to start producing media segments. They should use the timescale that was set when creating the init segment. Generally, that timescale should be chosen so that the sample durations have exact values without rounding errors.

A media segment contains one or more fragments, where each fragment has a moof and a mdat box. If all samples are available before the segment is created, one can use a single fragment in each segment. Example code for this can be found in examples/segmenter.

A simple, but not optimal, way of creating a media segment is to first create a slice of FullSample with the data needed. The definition of mp4.FullSample is

mp4.FullSample{
 Sample: mp4.Sample{
  Flags uint32 // Flag sync sample etc
  Dur   uint32 // Sample duration in mdhd timescale
  Size  uint32 // Size of sample data
  Cto   int32  // Signed composition time offset
 },
 DecodeTime uint64 // Absolute decode time (offset + accumulated sample Dur)
 Data       []byte // Sample data
}

The mp4.Sample part is what will be written into the trun box. DecodeTime is the media timeline accumulated time. The DecodeTime value of the first sample of a fragment, will be set as the BaseMediaDecodeTime in the tfdt box.

Once a number of such full samples are available, they can be added to a media segment like

seg := mp4.NewMediaSegment()
frag := mp4.CreateFragment(uint32(segNr), mp4.DefaultTrakID)
seg.AddFragment(frag)
for _, sample := range samples {
 frag.AddFullSample(sample)
}

This segment can finally be output to a w io.Writer as

err := seg.Encode(w)

For multi-track segments, the code is a bit more involved. Please have a look at examples/segmenter to see how it is done. A more optimal way of handling media sample is to handle them lazily, as explained next.

Lazy decoding and writing of mdat data

For video and audio, the dominating part of a mp4 file is the media data which is stored in one or more mdat boxes. In some cases, for example when segmenting large progressive files, it is much more memory efficient to just read the movie or fragment data from the moov or moof box and defer the reading of the media data from the mdat box to later.

For decoding, this is supported by running mp4.DecodeFile() in lazy mode as

parsedMp4, err = mp4.DecodeFile(ifd, mp4.WithDecodeMode(mp4.DecModeLazyMdat))

In this case, the media data of the mdat box will not be read, but only its size is being set. To read or copy the actual data corresponding to a sample, one must calculate the corresponding byte range and either call

func (m *MdatBox) ReadData(start, size int64, rs io.ReadSeeker) ([]byte, error)

or

func (m *MdatBox) CopyData(start, size int64, rs io.ReadSeeker, w io.Writer) (nrWritten int64, err error)

Example code for this, including lazy writing of mdat, can be found in examples/segmenter with the lazy mode set.

More efficient I/O using SliceReader and SliceWriter

The use of the interfaces io.Reader and io.Writer for reading and writing boxes gives a lot of flexibility, but is not optimal when it comes to memory allocation. In particular, the Read(p []byte) method needs a slice p of the proper size to read data, which leads to a lot of allocations and copying of data. In order to achieve better performance, it is advantageous to read the full top level boxes into one, or a few, slices and decode these.

To enable that mode, version 0.27 of the code introduced DecodeX(sr bits.SliceReader) methods to every box X where mp4ff.bits.SliceReader is an interface. For example, the TrunBox gets the method DecodeTrunSR(sr bits.SliceReader) in addition to its old DecodeTrun(r io.Reader) method. The bits.SliceReader interface provides methods to read all kinds of data structures from an underlying slice of bytes. It has an implementation bits.FixedSliceReader which uses a fixed-size slice as underlying slice, but one could consider implementing a growing version which would get its data from some external source.

The memory allocation and speed improvements achieved by this may vary, but should be substantial, especially compared to versions before 0.27 which used an extra io.LimitReader layer.

Fur further reduction of memory allocation when reading the ´mdat` data of a progressive file, some sort of buffered reader should be used.

Benchmarks

To investigate the efficiency of the new SliceReader and SliceWriter methods, benchmarks have been done. The benchmarks are defined in the file mp4/benchmarks_test.go and mp4/benchmarks_srw_test.go. For DecodeFile, one can see a big improvement by going from version 0.26 to version 0.27 which both use the io.Reader interface but another big increase by using the SliceReader source. The latter benchmarks are called BenchmarkDecodeFileSR but have here been given the same name, for easy comparison. Note that the allocations here refers to the heap allocations that are done inside the benchmark loop. Outside that loop, a slice is allocated to keep the input data.

For EncodeFile, one can see that v0.27 is actually worse than v0.26 when used with the io.Writer interface. That is because the code was restructured so that all writes go via the SliceWriter layer in order to reduce code duplication. However, if instead using the SliceWriter methods directly, there is a big relative gain in allocations as can be seen in the last column.

name \ time/op v0.26 v0.27 v0.27-srw
DecodeFile/1.m4s-16 21.9µs 6.7µs 2.6µs
DecodeFile/prog_8s.mp4-16 143µs 48µs 16µs
EncodeFile/1.m4s-16 1.70µs 2.14µs 1.50µs
EncodeFile/prog_8s.mp4-16 15.7µs 18.4µs 12.9µs
name \ alloc/op v0.26 v0.27 v0.27-srw
DecodeFile/1.m4s-16 120kB 28kB 2kB
DecodeFile/prog_8s.mp4-16 906kB 207kB 12kB
EncodeFile/1.m4s-16 1.16kB 1.39kB 0.08kB
EncodeFile/prog_8s.mp4-16 6.84kB 8.30kB 0.05kB
name \ allocs/op v0.26 v0.27 v0.27-srw
DecodeFile/1.m4s-16 98.0 42.0 34.0
DecodeFile/prog_8s.mp4-16 454 180 169
EncodeFile/1.m4s-16 15.0 15.0 3.0
EncodeFile/prog_8s.mp4-16 101 86 1

Box structure and interface

Most boxes have their own file named after the box, but in some cases, there may be multiple boxes that have the same content, and the code file then has a generic name like mp4/visualsampleentry.go.

The Box interface is specified in mp4/box.go. It does not contain decode (parsing) methods which have distinct names for each box type and are dispatched,

The mapping for decoding dispatch is given in the table mp4.decoders for the io.Reader methods and in mp4.decodersSR for the mp4ff.bits.SliceReader methods.

How to implement a new box

To implement a new box fooo, the following is needed.

Create a file fooo.go and create a struct type FoooBox.

FoooBox must implement the Box interface methods:

Type()
Size()
Encode(w io.Writer)
EncodeSW(sw bits.SliceWriter)  // new in v0.27.0
Info()

It also needs its own decode method DecodeFooo, which must be added in the decoders map in box.go, and the new in v0.27.0 DecodeFoooSR method in decodersSR. For a simple example, look at the PrftBox in prft.go.

A test file fooo_test.go should also have a test using the method boxDiffAfterEncodeAndDecode to check that the box information is equal after encoding and decoding.

Direct changes of attributes

Many attributes are public and can therefore be changed in freely. The advantage of this is that it is possible to write code that can manipulate boxes in many different ways, but one must be cautious to avoid breaking links to sub boxes or create inconsistent states in the boxes.

As an example, container boxes such as TrafBox have a method AddChild which adds a box to Children, its slice of children boxes, but also sets a specific member reference such as Tfdt to point to that box. If Children is manipulated directly, that link may not be valid.

Encoding modes and optimizations

For fragmented files, one can choose to either encode all boxes in a mp4.File, or only code the ones which are included in the init and media segments. The attribute that controls that is called FragEncMode. Another attribute EncOptimize controls possible optimizations of the file encoding process. Currently, there is only one possible optimization called OptimizeTrun. It can reduce the size of the TrunBox by finding and writing default values in the TfhdBox and omitting the corresponding values from the TrunBox. Note that this may change the size of all ancestor boxes of trun.

Sample Number Offset

Following the ISOBMFF standard, sample numbers and other numbers start at 1 (one-based). This applies to arguments of functions and methods. The actual storage in slices is zero-based, so sample nr 1 has index 0 in the corresponding slice.

Stability

The APIs should be fairly stable, but minor non-backwards-compatible changes may happen until version 1.

Specifications

The main specification for the MP4 file format is the ISO Base Media File Format (ISOBMFF) standard ISO/IEC 14496-12 6th edition 2020. Some boxes are specified in other standards, as should be commented in the code.

LICENSE

MIT, see LICENSE.

Some code in pkg/mp4, comes from or is based on https://github.com/jfbus/mp4 which has Copyright (c) 2015 Jean-François Bustarret.

Some code in pkg/bits comes from or is based on https://github.com/tcnksm/go-casper/tree/master/internal/bits Copyright (c) 2017 Taichi Nakashima.

ChangeLog and Versions

See CHANGELOG.md.

Support

Join our community on Slack where you can post any questions regarding any of our open source projects. Eyevinn's consulting business can also offer you:

  • Further development of this component
  • Customization and integration of this component into your platform
  • Support and maintenance agreement

Contact [email protected] if you are interested.

About Eyevinn Technology

Eyevinn Technology is an independent consultant firm specialized in video and streaming. Independent in a way that we are not commercially tied to any platform or technology vendor. As our way to innovate and push the industry forward we develop proof-of-concepts and tools. The things we learn and the code we write we share with the industry in blogs and by open sourcing the code we have written.

Want to know more about Eyevinn and how it is to work here. Contact us at [email protected]!

mp4ff's People

Contributors

birme avatar campbellwmorgan avatar chaunguyen4392 avatar garbbraf avatar imro0t avatar itsjamie avatar j1ng06 avatar julijane avatar k-danil avatar maximk-1 avatar mtneug avatar palmdalian avatar tanghaowillow avatar thejoeejoee avatar tmm1 avatar tobbee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mp4ff's Issues

corrupt input (decrypt-cenc)

sample encrypted mp4: https://user-images.githubusercontent.com/25949138/173815484-d5968e8e-c683-4acc-b2ed-77caffc29ac1.mp4

decryption key: 42cf7ff3ef8ed3d91824f83e8b392880

Steps to reproduce:

  • wget "https://user-images.githubusercontent.com/25949138/173815484-d5968e8e-c683-4acc-b2ed-77caffc29ac1.mp4" -O enc.mp4
  • decrypt-cenc -k 42cf7ff3ef8ed3d91824f83e8b392880 -i enc.mp4 -o dec.mp4
  • ffprobe dec.mp4

ffprobe output

ffprobe version 5.0.1 Copyright (c) 2007-2022 the FFmpeg developers
  built with Apple clang version 13.1.6 (clang-1316.0.21.2)
  configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/5.0.1 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-neon
  libavutil      57. 17.100 / 57. 17.100
  libavcodec     59. 18.100 / 59. 18.100
  libavformat    59. 16.100 / 59. 16.100
  libavdevice    59.  4.100 / 59.  4.100
  libavfilter     8. 24.100 /  8. 24.100
  libswscale      6.  4.100 /  6.  4.100
  libswresample   4.  3.100 /  4.  3.100
  libpostproc    56.  3.100 / 56.  3.100
[h264 @ 0x11b604450] SEI type 147 size 1720 truncated at 64
[h264 @ 0x11b604450] SEI type 220 size 1112 truncated at 80
[h264 @ 0x11b604450] SEI type 229 size 488 truncated at 32
[h264 @ 0x11b604450] SEI type 147 size 1720 truncated at 63
[h264 @ 0x11b604450] SEI type 220 size 1112 truncated at 77
[h264 @ 0x11b604450] SEI type 229 size 488 truncated at 31
[h264 @ 0x11b604450] A non-intra slice in an IDR NAL unit.
[h264 @ 0x11b604450] decode_slice_header error
[h264 @ 0x11b604450] no frame!
[h264 @ 0x11b604450] SEI type 24 size 328 truncated at 72
[h264 @ 0x11b604450] non-existing PPS 2 referenced
[h264 @ 0x11b604450] SEI type 24 size 328 truncated at 71
[h264 @ 0x11b604450] non-existing PPS 2 referenced
[h264 @ 0x11b604450] decode_slice_header error
[h264 @ 0x11b604450] no frame!
[h264 @ 0x11b604450] SEI type 185 size 1784 truncated at 64
[h264 @ 0x11b604450] non-existing PPS 1 referenced
[h264 @ 0x11b604450] SEI type 185 size 1784 truncated at 62
[h264 @ 0x11b604450] non-existing PPS 1 referenced
[h264 @ 0x11b604450] decode_slice_header error
[h264 @ 0x11b604450] no frame!
[h264 @ 0x11b604450] SEI type 248 size 112 truncated at 72
[h264 @ 0x11b604450] non-existing PPS 2 referenced
[h264 @ 0x11b604450] SEI type 248 size 112 truncated at 71
[h264 @ 0x11b604450] non-existing PPS 2 referenced
[h264 @ 0x11b604450] decode_slice_header error
[h264 @ 0x11b604450] no frame!
[h264 @ 0x11b604450] SEI type 207 size 416 truncated at 72
[h264 @ 0x11b604450] SEI type 207 size 416 truncated at 70
[h264 @ 0x11b604450] reference count overflow
[h264 @ 0x11b604450] decode_slice_header error
[h264 @ 0x11b604450] no frame!
[h264 @ 0x11b604450] SEI type 204 size 1528 truncated at 64
[h264 @ 0x11b604450] non-existing PPS 3 referenced
[h264 @ 0x11b604450] SEI type 204 size 1528 truncated at 62
[h264 @ 0x11b604450] non-existing PPS 3 referenced
[h264 @ 0x11b604450] decode_slice_header error
[h264 @ 0x11b604450] no frame!
[h264 @ 0x11b604450] SEI type 60 size 840 truncated at 64
[h264 @ 0x11b604450] SEI type 60 size 840 truncated at 60
[h264 @ 0x11b604450] Missing reference picture, default is 6
[h264 @ 0x11b604450] co located POCs unavailable
[h264 @ 0x11b604450] top block unavailable for requested intra mode
[h264 @ 0x11b604450] error while decoding MB 18 0, bytestream 36841
[h264 @ 0x11b604450] concealing 8160 DC, 8160 AC, 8160 MV errors in B frame
[h264 @ 0x11b604450] number of reference frames (1+4) exceeds max (4; probably corrupt input), discarding one
[h264 @ 0x11b604450] SEI type 223 size 720 truncated at 64
[h264 @ 0x11b604450] SEI type 51 size 1664 truncated at 72
[h264 @ 0x11b604450] SEI type 139 size 472 truncated at 64
[h264 @ 0x11b604450] non-existing PPS 2 referenced
[h264 @ 0x11b604450] SEI type 60 size 712 truncated at 72
[h264 @ 0x11b604450] SEI type 16 size 1400 truncated at 64
[h264 @ 0x11b604450] non-existing PPS 5 referenced
[h264 @ 0x11b604450] SEI type 150 size 232 truncated at 40
[h264 @ 0x11b604450] illegal reordering_of_pic_nums_idc 5
[h264 @ 0x11b604450] SEI type 225 size 992 truncated at 72
[h264 @ 0x11b604450] non-existing PPS 6 referenced
[h264 @ 0x11b604450] SEI type 179 size 376 truncated at 64
[h264 @ 0x11b604450] non-existing PPS 1 referenced
[h264 @ 0x11b604450] SEI type 161 size 1920 truncated at 72
[h264 @ 0x11b604450] SEI type 32 size 1576 truncated at 64
[h264 @ 0x11b604450] non-existing PPS 6 referenced
[h264 @ 0x11b604450] SEI type 49 size 1208 truncated at 64
[h264 @ 0x11b604450] non-existing PPS 9 referenced
[h264 @ 0x11b604450] SEI type 0 size 1288 truncated at 72
[h264 @ 0x11b604450] SEI type 52 size 536 truncated at 64
[h264 @ 0x11b604450] SEI type 132 size 1880 truncated at 72
[h264 @ 0x11b604450] non-existing PPS 2 referenced
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'dec.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 1
    compatible_brands: isomavc1dash
    creation_time   : 2022-06-14T14:26:56.000000Z
  Duration: 00:00:04.00, start: 0.080000, bitrate: 2380 kb/s
  Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 2376 kb/s, 25 fps, 25 tbr, 90k tbn (default)
    Metadata:
      creation_time   : 2022-06-14T14:26:56.000000Z
      handler_name    : ETI ISO Video Media Handler
      vendor_id       : [0][0][0][0]
      encoder         : Elemental H.264


with mp4decrypt

mp4decrypt --key 1:42cf7ff3ef8ed3d91824f83e8b392880 enc.mp4 dec.mp4

ffprobe version 5.0.1 Copyright (c) 2007-2022 the FFmpeg developers
  built with Apple clang version 13.1.6 (clang-1316.0.21.2)
  configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/5.0.1 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-neon
  libavutil      57. 17.100 / 57. 17.100
  libavcodec     59. 18.100 / 59. 18.100
  libavformat    59. 16.100 / 59. 16.100
  libavdevice    59.  4.100 / 59.  4.100
  libavfilter     8. 24.100 /  8. 24.100
  libswscale      6.  4.100 /  6.  4.100
  libswresample   4.  3.100 /  4.  3.100
  libpostproc    56.  3.100 / 56.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'dec.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 1
    compatible_brands: isomavc1dash
    creation_time   : 2022-06-14T14:26:56.000000Z
  Duration: 00:00:04.08, start: 0.080000, bitrate: 2338 kb/s
  Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 2329 kb/s, 25 fps, 25 tbr, 90k tbn (default)
    Metadata:
      creation_time   : 2022-06-14T14:26:56.000000Z
      handler_name    : ETI ISO Video Media Handler
      vendor_id       : [0][0][0][0]
      encoder         : Elemental H.264
    Side data:
      unknown side data type 24 (1085 bytes)

GetSampleNrAtTime fails on VBR mp4 files with last keyframe with 0 duration

Having last keyframe with 0 duration:
[stts]
...
- entry[113]: sampleCount=1 sampleDelta=95024
- entry[114]: sampleCount=1 sampleDelta=0
[stss]
- syncSampleCount: 10
...
- syncSample[10]: sampleNumber=114
[stsz]
- sample[113] size=2238
- sample[114] size=80726

GetSampleNrAtTime fails with "no matching sample found for time" for the two last cops times.

Proposed solution is to change line
if sampleStartTime < accTime+uint64(b.SampleCount[i])*timeDelta {
to
if sampleStartTime <= accTime+uint64(b.SampleCount[i])*timeDelta {

Add example code for cenc and cbcs segment encryption

The Common Encryption specification (ISO/IEC 23001-7) defines how ISOBMFF
segments should be encrypted in a standardized way.
This is used in both DASH and HLS for all codecs including H.264, HEVC, AV1, AAC, AC-3 etc.

There are two main schemes being used cenc and cbcs with the latter using striped encryption
with CBC.

There is example code for decryption in this repo, but it would be great to have code for encryption as well.

For video, there is a complication in that it should be partially encrypted by leaving headers (nalu + slice headers) unencrypted. This means that the code must parse the video to find out how big these headers are. There is already code for for slice header parsing for AVC and HEVC in their respective directories.

Beyond the standard, the Bento4 library provides a good source for learning about this encryption.
A possible acceptance criterium is that the generated encrypted segments should be possible to decrypt with Bento4 mp4decrypt in addition to by the example code in decrypt-cenc.

Bug: Matrix coefficients not set for video

All matrix coefficients in mvhd and tkhd boxes are set to zero, but should for video, they should
specify a unity matrix as described in Sections 8.2.2.2 and 8.3.2.2

template int(32)[9] matrix =
  { 0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 };

Doesn't seem cause any big problems but ffmpeg complains:

 Side data:
  displaymatrix: rotation of nan degrees

The strange-looking values are because the first 6 values are 16.16 fixed-point values and the last 3 are 2.30 fixed-point values. The actual values are therefore really a unity matrix with a diagonal of ones.

feature request: SetType() on AudioSampleEntryBox

Good day!

Right now there is no way too set AudioSampleEntryBox.name on already created box. This is must have feature to implement CENC compliant encryptor on mp4ff. As simplest solution we can have a SetType() method as it's already implemented for VisualSampleEntryBox.

StssBox IsSyncSample method is not thread-safe

The StssBox method IsSyncSample is not thread-safe. I ran into a fatal error fatal error: concurrent map writes when two threads are working on the same *mp4.File instance and call stss method IsSyncSample at almost the same time.
https://github.com/edgeware/mp4ff/blob/9953ef128cd4d264a44a5f1c5d5759c259b17651/mp4/stss.go#L51-L61

An easy solution could be moving the for loop to where we decode the StssBox and avoid adding locks to StssBox, but this solution slows down the decode a bit.

bug: example/segment fmp4 can not be played in firefox

generate fmp4 with
./segmenter -lazy -m -d 30000 -i ./box.mp4 -o fmp4

manual edit a m3u8 file like this
#EXTM3U
#EXT-X-TARGETDURATION:30
#EXT-X-VERSION:6
#EXT-X-MEDIA-SEQUENCE:1
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-MAP:URI="fmp4_init.mp4"
#EXTINF:43,
fmp4_media_1.m4s
#EXTINF:30.000,
fmp4_media_2.m4s
#EXTINF:30.000,
fmp4_media_3.m4s
#EXT-X-ENDLIST

use video.js to play the m3u8
try this index.html
index.html.zip

Other browser play well
But Firefox report

image

use mp4box to convert the same progressive mp4
MP4Box -dash 30000 -profile live box.mp4 -out dash.mpd --dual
it play well in all browser.
the difference is

example/segment
image

mp4box
image

Segment parsing bases on only styp box

Hi,

Now, mp4ff consider that a segment mus start with a styp box, but some content does not contain styp box in Dash On-Demand or ISM. This issue lead to a problem that a media file which should contain multiple segments contains just on segment with multiple fragments, and it doesn't seem right. In this case, we must base on sidx box for Dash On-Demand and tfra box for ISM to determine the segment boundaries.

segmenter example not working

Trying to run one of the examples segmenter, but it's not working.

aaaa@xxxx segmenter % go run main.go

command-line-arguments

./main.go:53:34: too many arguments in call to mp4.DecodeFile
./main.go:53:40: undefined: mp4.WithDecodeMode
./main.go:53:59: undefined: mp4.DecModeLazyMdat
./main.go:61:20: undefined: NewSegmenter
./main.go:65:34: undefined: getSegmentStartsFromVideo
./main.go:72:9: undefined: makeMultiTrackSegments
./main.go:75:10: undefined: makeSingleTrackSegmentsLazyWrite
./main.go:77:10: undefined: makeSingleTrackSegments

Incorrect (?) calculation of AVC slice header size

Good day!

I'm concerned about correctness of this part of code:

mp4ff/avc/slice.go

Lines 349 to 354 in 5026403

/* compute the size in bytes. Round up if not an integral number of bytes .*/
sh.Size = uint32(r.NrBytesRead())
if r.NrBitsReadInCurrentByte() > 0 {
sh.Size++
}

Let's say you read 15 bits, so pos will be 1 (as it's start with -1), but when you call NrBytesRead() it will increment this value by 1, so you get 2 and it's totally valid. Then you check if any bits read and increment it once again - size will be set to 3.

I tested this version of code:

	sh.Size = uint32(r.NrBytesRead())
	if r.NrBitsReadInCurrentByte() == 0 {
		sh.Size--
	}

It's work with Apple's SAMPLE-AES in Safari as it ignore Senc BytesOfClearData and make calculation on it's own.

It is possible to decrement value of SliceHeader.Size before using it but it's kinda misleading.

Audio stream loudness box missing

The audio stream loudness box ludt should be provided according to Apple's [HLS Authoring Specification].

2.19. In fMP4 files, you SHOULD provide loudness information by way of a loudness box (’ludt’). When present, the loudness box takes precedence over any loudness information in the audio stream.

It would therefore be nice to have support for it in mp4ff.

It is defined in section 12.2.7 of ISO/IEC 14496-12.
The ludt box is a container box that contains thou and alou boxes defined in the same section of the specification. These two are in turn both based on LoudnessBaseBox.

avc.IsIDRSample not working

Given test sample @ https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/v2/prog_index.m3u8

$ curl -r 0-718 -o init.mp4 https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/v2/main.mp4

$ curl -r 719-274200 -o segment.mp4 https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/v2/main.mp4

$ cat init.mp4 segment.mp4 > test.mp4

ffmpeg confirms NAL Unit 5 present:

$ ffmpeg -i test.mp4 -map v -codec copy -bsf:v trace_headers -f null -y /dev/null 2>&1 | grep -m1 -A4 'Slice Header'
[trace_headers @ 0x600003bf04b0] Slice Header
[trace_headers @ 0x600003bf04b0] 0           forbidden_zero_bit                                          0 = 0
[trace_headers @ 0x600003bf04b0] 1           nal_ref_idc                                                11 = 3
[trace_headers @ 0x600003bf04b0] 3           nal_unit_type                                           00101 = 5
[trace_headers @ 0x600003bf04b0] 8           first_mb_in_slice                                           1 = 0

and ffprobe confirms the same:

$ ffprobe -hide_banner -loglevel quiet test.mp4 -print_format json -select_streams v -show_frames | jq '.frames[0]' | egrep '(key|pict_type)'
  "key_frame": 1,
  "pict_type": "I",

but if i extract h264 data up til the first frame:

$ ffmpeg -i test.mp4 -map v -codec copy -frames:v 1 -f h264 -y frame.h264
$ ffmpeg -f h264 -i frame.h264 -c copy -bsf:v trace_headers -f null -y /dev/null 2>&1 | grep -A3 'Slice'
[trace_headers @ 0x600001d8c5a0] Slice Header
[trace_headers @ 0x600001d8c5a0] 0           forbidden_zero_bit                                          0 = 0
[trace_headers @ 0x600001d8c5a0] 1           nal_ref_idc                                                11 = 3
[trace_headers @ 0x600001d8c5a0] 3           nal_unit_type                                           00101 = 5

and feed into mp4ff:

$ cat test.go
package main

import (
        "fmt"
        "io/ioutil"

        "github.com/Eyevinn/mp4ff/avc"
)

func main() {
        data, _ := ioutil.ReadFile("frame.h264")
        fmt.Printf("%v\n", avc.IsIDRSample(data))
}

but the result is false:

$ go run test.go
false

Example mp4 to mp4f

Hello, I'm looking for an example to transform an existing mp4 to a fragmented mp4. Any ideas?

fmp4 to progressive mp4 in memory convert

I have a fmp4 with a m3u8 index store in disk or aws s3

customer want to download it as progressive mp4 on web browser.

Is there any way to convert it in memory and send it back to download user.

consider , Discontinued transmission.

could you give an example ?

File from Unified Streaming packager

I would like to use mp4ff-wvttlister to extract webvtt from mp4 packaged by Unified Streaming packager.
It fails

mp4ff-wvttlister test.mp4
Track 1, timescale = 1000
[vttC] size=14

  • config: "WEBVTT"
    panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x4afe6a]

goroutine 1 [running]:
github.com/Eyevinn/mp4ff/mp4.(*TfdtBox).BaseMediaDecodeTime(...)
/home/ubuntu/go/pkg/mod/github.com/!eyevinn/[email protected]/mp4/tfdt.go:76
github.com/Eyevinn/mp4ff/mp4.(*Fragment).GetFullSamples(0xc00006a3e0?, 0xc000016120)
/home/ubuntu/go/pkg/mod/github.com/!eyevinn/[email protected]/mp4/fragment.go:130 +0x8a
main.parseFragmentedMp4(0xc000128000?, 0xe038?, 0xffffffffffffffff)
/home/ubuntu/go/pkg/mod/github.com/!eyevinn/[email protected]/cmd/mp4ff-wvttlister/main.go:170 +0x42c
main.main()
/home/ubuntu/go/pkg/mod/github.com/!eyevinn/[email protected]/cmd/mp4ff-wvttlister/main.go:67 +0x2db

test.mp4

Is it a bug or a bad file ?
test.zip

How to merge two different tracks in a single "file"?

I would like to have a practical example how to merge two tracks in a single file.

Context:

I have some DASH segments(audio/video init, audio/video segments) and I want to merge both to a single init/segment.

I have been trying for two days to do that but I feel that I lack more fmp4 internals knowledge. If someone can help me with some insight I would appreciate it!

Thanks.

No sidx?

I have noticed there is no segment index "box" in the library. Is that on purpose?

feature request: add mp4 decrypt cmd for cenc mp4

could reference this code https://github.com/truedread/pymp4decrypt/blob/master/src/decrypt.py
this code is not running well. but it shows the steps.

could use following command to generate a cenc dash file
input.mp4 include both aac audio and avc video
output.mpd is a single fragmented and encrypted dash mp4 .

ffmpeg -i input.mp4 -r 24 -g 50 output.mp4
MP4Box -crypt drm.xml output.mp4 -out output_enc.mp4
MP4Box -dash 5000 -url-template -bs-switching no -out output.mpd -rap output_enc.mp4

drm.xml












How to fragment?

I am using https://github.com/nareix/joy4 to consume RTMP FLV stream and transcode it into multiple small MP4 chunks that I want to serve as live stream to users over http. The thing is that HLS or DASH are essentially for downloading complete video files, not for streaming as such. In order to support proper streaming I need to produce fragmented mp4 files which is supported by the media source extension in web browsers. Non-fragmented MP4 files will simply fail to play.

So I am greatly interested in this library and I wonder how can I use it to either take existing mp4 "monolith" file and recode it into fragmented mp4 or or better yet ,take h264 packets fro the FLV demuxer and write fragmented files right away.

edit:
So I jsut started toying with this but I am stuck on an endless dependency tree. Is there a way out of this rabbit hole?

		vid, _ := mp4.CreateHdlr("video")
		aud, _ := mp4.CreateHdlr("audio")

		vmi := mp4.NewMinfBox()
		vmi.....
		vMed := mp4.NewMdiaBox()
		vMed.AddChild(vmi)
		vMed.AddChild(vid)

		aMed := mp4.NewMdiaBox()
		aMed.AddChild(aud)

		trackHeader := mp4.CreateTkhd()

		vTrack := mp4.NewTrakBox()
		aTrack := mp4.NewTrakBox()

		vTrack.AddChild(trackHeader)
		aTrack.AddChild(trackHeader)

		vTrack.AddChild(vMed)
		aTrack.AddChild(aMed)

		moov := mp4.NewMoovBox()
		moov.AddChild(vTrack)
		moov.AddChild(aTrack)

                file := mp4.NewFile()
		file.AddChildBox(mp4.CreateFtyp(), 0)
		file.AddChildBox(moov, 0)

and that is just the setup, i haven't wrote enything into it yet ...

feature request: lazy reading of mdat

Currently, the full mdat content is read into memory when decoding a File or an MdatBox.

A lazy mode which does not load the full mdat box into memory when parsing a file would be beneficial, especially for large files.

A possible way would be an option to just store offset and size for the MdatBox,
so that sample data can be read by using a ReadSeeker interface.

decryptSegment: no senc box in traf

using this file (106 MB): http://0x0.st/Hwbc.mp4

I can decrypt like this:

> packager-win-x64 --enable_raw_key_decryption `
>> --keys key_id=21b82dc2ebb24d5aa9f8631f04726650:key=602a9289bfb9b1995b75ac63f123fc86 `
>> stream=video,in=enc.mp4,output=dec.mp4
[1121/225104:INFO:demuxer.cc(89)] Demuxer::Run() on file 'enc.mp4'.
[1121/225104:INFO:demuxer.cc(155)] Initialize Demuxer for file 'enc.mp4'.
[1121/225104:WARNING:track_run_iterator.cc(699)] Seeing non-zero composition offset 834167. An EditList is probably missing.
[1121/225104:WARNING:track_run_iterator.cc(703)] Adjusting timestamps by -834167. Please file a bug to https://github.com/google/shaka-packager/issues if you do not think it is right or if you are seeing any problems.

https://github.com/shaka-project/shaka-packager/releases

but these both fail:

mp4ff-decrypt -k 602a9289bfb9b1995b75ac63f123fc86 enc.mp4 dec.mp4

mp4ff-decrypt -k 602a9289bfb9b1995b75ac63f123fc86 -init enc.mp4 `
enc.mp4 dec.mp4

result:

2023/11/21 22:48:36 decryptSegment: no senc box in traf

Bug in AVC PicTiming SEI parsing

The offset is not correctly interpreted. For example, the offset is returned as 24 (the length of the field) when it should be zero.

examples/decrypt-cenc: multiple keys

I came across a video with multiple keys:

IV:d805bba9ee360871c54f430e5de9486b key:784f6b59536e7078594f653959715a49 type:2
IV:2890e5c5feb02fb5ec3e6ea4e0758bc6 key:5062524e343151366950744d7a4b4e65 type:2
IV:dc7e67eb7d9b6474fc9914ddeddb9493 key:58526c32716a30586c6c7349466f3639 type:2
IV:a870d9a9b8abd4b8add8c7eec1ac7f8e key:703439466a586a757235576a5069754b type:2
IV:c889acba1c740bb6f28dca890c5980a2 key:4f3532724142445476307876686d7347 type:2

does this module or tool support multiple keys? I can provide video sample if need be.

Add AV1 support

With Apple's announcement of AV1 support on iPhone 15 Pro, AV1 is now relevant for both DASH and HLS.

It would therefore be great to add support for the AV1 VisualSampleEntry box av01 and the av1C codec configuration box.

Version 1.2.0 of the specification for AV1 in ISOBMFF can be downloaded from https://github.com/AOMediaCodec/av1-isobmff/releases/tag/v1.2.0

An example file by Netflix is http://download.opencontent.netflix.com.s3.amazonaws.com/AV1/Sparks/Sparks-5994fps-AV1-10bit-1920x1080-2194kbps.mp4

feature request: extend segment example to a vod example

in a vod system, it normally convert a normal mp4 to hls(ts or m4s) in memory and stream to player
can we extend the segment example to support hls m4s ? and make a new example ?

Step:

  1. generate a m3u8 file base on mp4
  2. player read m3u8 and send a xx.m4s request to vod
  3. vod read mp4 and convert the related xx.m4s in memory and send to player.

support low latency hls (llhls) encrpytion

I am working on llhls with this open source project https://github.com/bluenviron/mediamtx
it works well, however missing llhls encrpytion feature.
I wonder how to do that in llhls .
llhls will generate normal segment.fmp4 and a smaller fragment.fmp4
shall we just encrypt segment.fmp4 fragment.fmp4 one by one ?
witch encryption method shall we use ?

    METHOD=SAMPLE-AES    = SAMPLE-AES-CTR   ?? ['SAMPLE-AES', 'SAMPLE-AES-CTR', 'SAMPLE-AES-CENC'];
    METHOD=AES-128       = AES-128-CBC

I see there is an example decrypted example https://github.com/Eyevinn/mp4ff/tree/master/examples/decrypt-cenc
could you provide an encrypt example ?

emsg parsing/writing doesn't handle non-empty message_data

The emsg parser does not treat the remaining bytes as message_data, but ignores it. That makes the size wrongly reported in mp4ff-info.

I'd suggest to add MessageData as member in the EmsgBox struct and make sure that it can be both decoded and encodedt.

typo in hevc/sps.go

Hi, guys.
I'm new here, so forgive me for not following the issue form(if exists).

I've found a typo(I guess) and want to let you know.
And of course, want it to be fixed also.
The typo I found is this.

Best Regards,
Dongjin

Information on times in the SampleComplete object

Hi,
could you provide some information on what the times in the SampleComplete object mean and what type of values they are(sec, msec,nsec..)?

	mp4.SampleComplete{
		Sample: mp4.Sample{
			Flags uint32
			Dur   uint32 <----
			Size  uint32
			Cto   int32   <----
		},
		DecodeTime:         uint64 <----
		PresentationTime: uint64 <----
		Data                      []byte
	}

Support sample_composition_time_offset

I'm currently working with the sintel video (https://durian.blender.org/download/), and I have transformed that to a fMP4 using bento4. Now I'm trying to read it out using mp4ff, but I cannot get the duration when diving into themoof headers.

I will find sample_composition_time_offset, but there is no way through the API to get this value (only .Dur is supported).

Any thoughts?

Screenshot 2021-07-25 at 22 12 58

mp4ff-info error "stpp size mismatch in stsd: 62 - 64" in a webvtt fragmented mp4 input file

mp4ff-info and wvttlister error stpp size mismatch in stsd: 62 - 64 in a webvtt fragmented mp4 input file. File was created in MP4Box cmdline tool, using other dvbreaders look ok. File is really broken(mp4box bug) or mp4ff-info cannot handle this type of webvtt.mp4 files?

https://refapp.hbbtv.org/videos/dashtest/test4/temp/temp-sub_eng.mp4

c:\temp> mp4ff-info.exe temp-sub_eng.mp4
2023/03/16 11:39:46 decode box "moov": decode box trak: decode box mdia: decode box minf: decode box stbl: decode box stsd: child stpp size mismatch in stsd: 62 - 64

c:\temp\mp4ff-wvttlister.exe temp-sub_eng.mp4
2023/03/16 11:43:43 decode moov: decode box trak: decode box mdia: decode box minf: decode box stbl: decode box stsd: child stpp size mismatch in stsd: 62 - 64

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.