Giter Site home page Giter Site logo

Comments (31)

barbibulle avatar barbibulle commented on August 24, 2024

Dealing with He-AAC implicit signaling is tricky, because it requires actually parsing the audio sample data and looking for the presence of SBR and PS extension elements (which essentially requires part of an AAC decoder, at least the high level bitstream parsing parts). This is because the MP4 structure that carries such AAC content contains no indication at all that SBR and/or PS is used until the decoder gets the data.
Also, all the profiles and recommendations that we've encountered require the use of explicit backward-compatible signaling for He-AAC in DASH (for example, see this DASH-IF recommendation: http://dashif.org/wp-content/uploads/2015/04/DASH-IF-IOP-v3.0.pdf)

If supporting such input data is really important in your case, I would certainly add a '--audio-codecs' command line parameter that would let you override the codecs strings computed from the input MP4 signaling. Let me know if this is something you'd need.

from bento4.

richardbushell avatar richardbushell commented on August 24, 2024

Yes, I think this is essential.

For instance, Fraunhofer's 'Application Bulletin: AAC Implementation Guidelines for Dynamic Adaptive Streaming over Http (DASH)' located here:
http://www.iis.fraunhofer.de/de/ff/amm/dl/whitepapers.html
Page 10: High Efficiency AAC (HE-AAC): The HE-AAC Profile is the most relevant profile for DASH and employs AAC-LC as a core codec in combination with SBR.
Page 21: Use HE-AAC as Default: The HE-AAC Profile can be seen as the default AAC Profile for DASH.

Obviously the parameter and command-line method used should allow it to be signalled correctly in the MPD and the audio fragments/initialization for correct playback in the browsers.

That would be great.

Richard

from bento4.

barbibulle avatar barbibulle commented on August 24, 2024

I'm not saying that you shouldn't use HE-AAC. HE-AAC is perfectly fine and completely supported by Bento4. In fact, there's specific support for this in the mp4-dash.py tool, and it will correctly set the correct @codecs attribute when the input is HE-AAC with Explicit Backwad-compatible signaling or Explicit Non-backward-compatible Signaling.
What I'm highlighting is that in order to stay compliant with various profiles and industry recommendations, HE-AAC should be signaled in the source MP4 tracks with Explicit Backward-compatible signaling. The Fraunhofer document you're pointing to does point to this requirement as well: "Another requirement for this mode of operation is that “Explicit Backward Compatible” signaling is used in the Audio Specific Config (ASC) to initialize the decoder (which is the case for DASH- AVC/264). Therefore, a content creator may offer an HE-AACv2 Adaptation Set and signal it in the MPD using @codecs=mp4a.40.29"

So what I'm recommending is that when you encode your audio tracks in He-AAC, configure your encoder/muxer to produce a tracks with Explicit Backward Compatible signaling.
If that's really not possible (maybe you have a legacy encoder that's not capable of producing this signaling, which would be surprising but not impossible), then manually overriding the @codecs attribute with a command line argument would be the solution, but that's much less clean than having the right input format to start with, because that won't be compliant with the DASH/264 industry interoperability profile.

from bento4.

richardbushell avatar richardbushell commented on August 24, 2024

Our encoder is NOT capable of producing Explicit signaling, only Implicit signaling.

If overriding the @codecs attribute with a command line argument would be the solution, are you saying this would just change the codecs text in the MPD file, but it would make no changes to the actual files themselves. Wouldn't the resulting MPD and segments be compliant with the DASH/264 industry interoperability profile?

Or could your command line argument write this signaling into the AAC stream, perhaps when using 'mp42aac' followed by 'mp4mux --audio-codecs:sbr' to put the explicit backwards compatible signaling back into a mp4 or m4a file container, so that the resulting MPD and segment files are OK for DASH compliance?

from bento4.

barbibulle avatar barbibulle commented on August 24, 2024

Overriding the @codecs would indeed be an sub-optimal, because the actual stream data will still have implicit signaling. A more robust solution would be to convert the sample description at the MP4 level, before creating the DASH MPD. This could be done by one of the command line apps (like mp4fragment, mp4mux or a new standalone app). This would create a result that's correct from an MPD perspective and stream data that follows the DASH/264 intro guidelines. I'll see what it would take for me to add this functionality to the code base.

from bento4.

richardbushell avatar richardbushell commented on August 24, 2024

That would be fantastic, thanks for looking at this.

from bento4.

richardbushell avatar richardbushell commented on August 24, 2024

Hi
We have looking at updating our encoder and have a working demo that now writes Explicit Backwards Compatible Signaling of SBR directly to the MP4 which should solve the problem.
Here is the MP4 generated:
http://living.tv/mob/example.mp4
Keyframes are strictly 3840ms.
I then ran:-
mp4fragment --fragment-duration 3840 example.mp4 example_f.mp4
then:
mp4-dash.py --output-dir=example --mpd-name=example.mpd --force example_f.mp4
BUT I think there must be a bug in your code, as it creates two much smaller final audio segments. These last two audio files should clearly be written in one single file segment instead, as they still don't make up one full segment in duration. It is writing 16 audio segments rather then 15 (which would be correct).
Is it possible to check your code from my example on the link above, to try and work out where the bug is, so that it doesn't split these last two final audio segments.
Richard

from bento4.

barbibulle avatar barbibulle commented on August 24, 2024

The reason you're seeing two small audio segments at the end is because your audio and video durations don't match in your input file.
The video track is:
media:
sample count: 1369
timescale: 25000
duration: 1369000 (media timescale units)
duration: 54760 (ms)
The audio track is:
media:
sample count: 1286
timescale: 24000
duration: 1316864 (media timescale units)
duration: 54869 (ms)

So there's 109ms worth of "extra audio". When the file is fragmented with mp4fragment, the tool tries to keep audio and video fragments matches with as close a duration as possible (they can't be exactly matched, because audio frames and video frames are not of equal duration).
So what happens here is that the last video fragment (25 frames) is matched with an audio fragment of 23 audio frames (the closest match). After that, there is still some leftover audio, which gets put in an "extra" audio fragment.

from bento4.

richardbushell avatar richardbushell commented on August 24, 2024

But 3840 ms segments is an exact multiple of Audio Frame Length (AU = 2048 samples per frame for SBR = 42.6667ms) as well as Video Frames at 40 ms each. If you run Mp42aac then mp4mux then mp4fragment it does create the correct 15 segments. For reference have also tried it with MP4box from GPAC and that's creating 15 segments.

All other Audio Segments (1-14) created are exactly 3840ms, but the last two (15+16) total only about 1 second combined. These should both be put in one single segment, not split over two segments. Audio segment 15 is 981ms, Audio Segment 16 is 128ms. DASH says that segment duration should not vary by over 50%, except for the final segment. After all, I have already specifed --fragment-duration 3840 on the command line.

p.s. If you use 'mp4fragment --track audio ...' and 'mp4fragment --track video ...' first before you mp4-dash.py each separately then it correctly creates 15 audio segments (with the last one 1109ms).

So it only glitches when it runs each track in turn contained within a single input file containing both audio and video.

from bento4.

barbibulle avatar barbibulle commented on August 24, 2024

What's going on here is this:

Your video track has 1369 frames and a frame rate of 25fps. With a fragment duration of 3840ms, that's 14 fragments of exactly 96 frames, plus a last fragment of 25 frames.
When you use mp4fragment with an input that contains both audio and video, it will try to create a sequence of "matched" fragments, where the amount of audio in each fragment is as close as possible to the amount of video in the matching video fragment. So for the first 14 fragments, you'll see 96 frames of video for each video fragment, and 90 frames of audio for the audio fragments. For the 15th (and last) video fragment (which has a duration of 25 frames, or 1.0 seconds) the closest match for corresponding audio fragment is 23 audio frames (0.981 seconds). So if we look at the first 15 pairs of audio/video fragments, that's a total of (14_96+25)=1369 video frames and (14_90+23)=1283 audio frames. However, your audio track contains more than that: 1286 audio frames. That's 3 frames too much. If that 15th audio fragment were stuffed with all the remaining audio in the track, it would contain 26 audio frames, which would make it 1.109s long. So you'd have a mismatch.
This means that the last 3 audio frames must be discarded (or you'd need to add about 3 video frames).
The mp4-dash.py program doesn't actually discard that last fragment, as it outputs all the media from the input file, but that 16th audio fragment will never be fetched by the DASH client, so it's just extra data sitting there, not used by anyone.

If you work with the audio and video tracks independently, then you'll indeed see 15 audio fragments, because the program won't have any video fragment to match against, so as long as the data fits in the desired fragment size (3840ms here), it will happily stuff all the tail of the audio track in it.

It may seem in this specific case that since the "extra" audio that's discarded is so short that you could hope that it would be "lumped in" with the rest, but that wouldn't be correct. Imagine that instead of an excess of 3 audio frames, you had an excess of 300 audio frames. You'd have a very strange last fragment indeed.

So, in summary, what you're seeing here is normal: the audio track is a tad too long, so that last 0.109 seconds of audio get truncated, stored in a 16th fragment that's not referenced by the MPD.

from bento4.

richardbushell avatar richardbushell commented on August 24, 2024

OK, I personally think it should respect the fragment length flag specified of 3840ms rather than the current default. As I said, this breaks the 'only one smaller final segment' rule for DASH.

If you wish to keep the current behaviour, please could you add a flag to choose to process the video and audio independently, or force segmenting to respect the fragment duration if user specified. That way a valid MPD file can also be generated in the process, rather than having to always process the streams separately and manually edit the MPD together.

from bento4.

barbibulle avatar barbibulle commented on August 24, 2024

Actually, this doesn't break the "only one smaller final segment" rule, since the 16th audio segment here isn't part of the DASH presentation. There is a segment file generated by the tool, but that isn't referenced by the DASH MPD. So as far as a DASH client is concerned, it is as if this 16th extra segment doesn't exist. You can remove the file and it would make no difference. If you observe the requests from the DASH client, you should never see request for this extra audio segment.
So no need to process the streams separately. The default output of the tool is totally compliant as it is.

from bento4.

richardbushell avatar richardbushell commented on August 24, 2024

But that doesn't make sense generally. There could of course be valid audio produced in a final segment. That wouldn't be played or heard, the track would cut off momentarily before the end of the track. We would never want this.

from bento4.

barbibulle avatar barbibulle commented on August 24, 2024

You're right that it might not be desirable to have the extra audio truncated this way. But the alternative (having some audio with no associated video at the end) can also be a problem for media players.
This is because the MPD declares the overall presentation duration, but not individual durations for the audio and video portions.
So if the MPD declared a duration D that's the max(audio_duration, video_duration), instead of min(audio_duration,video_duration), the player will expect to be able to fetch D worth of video and D worth of audio, and will fail when it reaches the end of the video, because there wouldn't be D worth of video. So truncation of the longest stream is necessary here.

from bento4.

richardbushell avatar richardbushell commented on August 24, 2024

You have a good point too, and I am genuinely not sure what the latest DASH guidelines state. One question though. You stated that the last video fragment (25 frames i.e. 1 second) is matched with an audio fragment of 23 audio frames (the closest match i.e. 0.981 second). But shouldn't it choose 24 audio frames? Although 24ms longer than the video, rather than 19ms shorter than the video, so not the CLOSEST match, doesn't it need to be the closest audio frame GREATER THAN OR EQUAL TO the video segment length. Or else, the combined audio length for all segments would come out less than the overall MPD presentation duration D. In which case why wouldn't the DASH player make a request for another audio segment as the audio still hasn't reached the overall MPD presentation duration D.

Or is the overall Presentation Duration deliberately understated rather than being the full and actual length so that it doesn't request another segment?

from bento4.

richardbushell avatar richardbushell commented on August 24, 2024

Any thoughts on the above, about ensuring the closest match on the last audio segment is at least equal to or greater than the video, so that the total audio length doesn't evaluate to less than the MPD Presentation Time or Period (therefore ensuring it never calls for an additional segment).

from bento4.

barbibulle avatar barbibulle commented on August 24, 2024

The way the truncation of the excess media is done currently, you won't see the client request the excess audio fragment. I've checked with at least two DASH players, both request exactly 15 audio and video segments. The 16th 'excess' audio fragment is never requested (it would exceed the presentation duration).
I have made a local change, which is to allow the MPD duration to have fractions of seconds (in the master branch, the MPD duration is rounded to integral number of seconds, but there's no good reason for this). I'll commit the fix shortly in the dev branch

from bento4.

richardbushell avatar richardbushell commented on August 24, 2024

Sorry, long delay, but eventually back to our project preparing for DASH. I note thaat you have allowed MPD duration to have fractions of seconds, but note that you have chosen 1/100 of a second. Please make the format 1/1000 of a second. This would tie in to the segment durations stipulated in milliseconds.

from bento4.

barbibulle avatar barbibulle commented on August 24, 2024

Good point. I'll make the change

from bento4.

richardbushell avatar richardbushell commented on August 24, 2024

Did you get a chance to make the change?

from bento4.

barbibulle avatar barbibulle commented on August 24, 2024

Yes, that's in the released version of the code and binaries.

from bento4.

richardbushell avatar richardbushell commented on August 24, 2024

Thanks for that!

I note that someone else posted the original query about small final audio segments here:
#19

Please could you add an optional user flag to mp4fragment, (e.g. --discard ), to force mp4fragment to stop when all the video duration has been paired by the audio. Programmatically it would be simple to skip the creation of a new audio anchor (subsequently unused and not required) if this flag is set.

That would give the user complete control of the decision of how they'd like this event handling.

from bento4.

barbibulle avatar barbibulle commented on August 24, 2024

That sounds like a good idea. I'll add the option.

from bento4.

richardbushell avatar richardbushell commented on August 24, 2024

Is this implemented in the latest build? If so, what's the flag/option?

from bento4.

richardbushell avatar richardbushell commented on August 24, 2024

Hi
Did you manage to implement this in a subsequent build?
Let me know, thanks.
Richard.

from bento4.

barbibulle avatar barbibulle commented on August 24, 2024

Sorry, this hasn't been implemented yet. It's definitely on the TODO list, probably for the next release next week.

from bento4.

richardbushell avatar richardbushell commented on August 24, 2024

Thank you, very much appreciated...

from bento4.

richardbushell avatar richardbushell commented on August 24, 2024

Did this make the next week's release you mentioned? Richard.

from bento4.

barbibulle avatar barbibulle commented on August 24, 2024

Try the --trim option of the mp4fragment tool from the latest release (1.4.3-600).
Let me know if this works for you.

from bento4.

richardbushell avatar richardbushell commented on August 24, 2024

Thanks, only tried it on the same original test file, but it worked perfectly! You can close this thread, I'll report back only if I have a subsequent problem. Thanks again!

from bento4.

barbibulle avatar barbibulle commented on August 24, 2024

Great!

from bento4.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.