When using an in point on a clip, it appears the avformat producer produces audio dela

mp3dec.c, mp3_parse_info_tag(...): <div class="highlight highlight-source-c notran

Some other things I tried: Suspected it to be caused by fastse

This post describes a way to seek within packets: <a href="http://st

Audio sometimes out-of-sync by one frame about mlt HOT 23 CLOSED

mltframework commented on September 18, 2024

Audio sometimes out-of-sync by one frame

from mlt.

Comments (23)

metellius commented on September 18, 2024

Could this be the cause of the audio splitting issue I'm trying to debug? For a split mp3 clip I'm seeing that the audio data (as returned by avcodec_decode_audio in decode_audio) in the first one or two frames after a split are zeroed (giving an audible gap in the sound).

from mlt.

ddennedy commented on September 18, 2024

Yes, if I recall correctly, I had trouble to find a context field to provide information on codec delay, and you cannot simply skip over initial silent samples as that will give a problem if that was intentional.

from mlt.

metellius commented on September 18, 2024

https://ffmpeg.org/doxygen/2.8/structAVCodecContext.html#a948993adfdfcd64b81dad1151fe50f33
is this the one?

from mlt.

metellius commented on September 18, 2024

There is also initial_padding, which is audio only and looks relevant:
https://ffmpeg.org/doxygen/2.8/structAVCodecContext.html#a8f95550ce04f236e9915516d04d3d1ab

from mlt.

metellius commented on September 18, 2024

Those were not there for ffmpeg 2.3, but are at least available for our current 2.7 release. Unfortunately both of these fields are set to zero for both wav and mp3 decoding in melt...
But I'm noticing how mp3dec prints

[mp3 @ 0x7f18f6483220] pad 576 576

while initializing, perhaps this is a clue?

from mlt.

metellius commented on September 18, 2024

mp3dec.c, mp3_parse_info_tag(...):

    /* Encoder delays */
    v= avio_rb24(s->pb);
    if(AV_RB32(version) == MKBETAG('L', 'A', 'M', 'E')
        || AV_RB32(version) == MKBETAG('L', 'a', 'v', 'f')
        || AV_RB32(version) == MKBETAG('L', 'a', 'v', 'c')
    ) {

        mp3->start_pad = v>>12;
        mp3->  end_pad = v&4095;
        st->start_skip_samples = mp3->start_pad + 528 + 1;
        if (mp3->frames) {
            st->first_discard_sample = -mp3->end_pad + 528 + 1 + mp3->frames * (int64_t)spf;
            st->last_discard_sample = mp3->frames * (int64_t)spf;
        }
        if (!st->start_time)
            st->start_time = av_rescale_q(st->start_skip_samples,
                                            (AVRational){1, c->sample_rate},
                                            st->time_base);
        av_log(s, AV_LOG_DEBUG, "pad %d %d\n", mp3->start_pad, mp3->  end_pad);
    }

from mlt.

ddennedy commented on September 18, 2024

I did some investigation. I printf-ed AVFormatContext.AVStream.skip_samples, .start_skip_samples, and .first_discard_sample. skip_samples is always 0 for me, but not the other two. Docs for start_skip_samples says "the samples are removed from packets with pts==0", which was not clear to me if API user should or ffmpeg internals will.

After some searching, I see libavformat/utils.c:read_frame_internal() copies skip_samples or start_skip_samples to some AVPacket "side data." libavformat/utils.c:avcodec_decode_audio4() reads this from the side data, appears to discard for us, and resets skip_samples. So, that appears to be a dead end.

from mlt.

metellius commented on September 18, 2024

Some other things I tried:

Suspected it to be caused by fastseek being enabled, but it didn't seem to be default on, and we don't set it.
Noticed that after seeking, seek_video flushes the stream, but seek_audio does not.Tried adding avcodec_flush_buffers(self->audio_codec[self->audio_index]), but it had no effect.
Had a look at seek_preroll, which also sounded promising, but that as well was 0.

I have been thinking there might be a way to look at our requested position, then seek, and then inspect the actual position in ffmpeg (assuming the difference might be the number whe're looking for), but haven't figured out how yet, or if it even makes sense.

from mlt.

metellius commented on September 18, 2024

This post describes a way to seek within packets:

http://stackoverflow.com/a/31859565

from mlt.

metellius commented on September 18, 2024

Here's an approach I'm considering: Similar to how int_position and req_position in decode_audio(...) is used for discarding frames, the same thing looks doable with samples. By comparing the requested timecode with the start and duration of the incoming decoded frame, we get a position inside the frame data we can use to chop off the first part of the samples. Does it sound reasonable?

from mlt.

ddennedy commented on September 18, 2024

yes

from mlt.

metellius commented on September 18, 2024

Starting to realize there's multiple things that have to be done - in addition to the partial discarding of frame samples, i'm also seeing whole frames with empty data after doing a seek (in my mp3 test case, two blank frames then the audio data starts coming in). Maybe we actually need to have it always seek a (possibly codec-specific) interval further back in order to get the decoder up and running...

from mlt.

ddennedy commented on September 18, 2024

I made some progress on this. It fixes the original reporter's problem and sometimes @metellius problem with splitting an mp3 file on the Shotcut timeline. In the related Shotcut bug, I discovered there is an additional separate problem on the timeline that does not appear if using just playlist. Here is the diff I am testing with now. The ignore variable is changed from frame units to sample count (not including channels).

diff --git a/src/modules/avformat/producer_avformat.c b/src/modules/avformat/producer_avformat.c
index bcd42d5..4d1ccc7 100644
--- a/src/modules/avformat/producer_avformat.c
+++ b/src/modules/avformat/producer_avformat.c
@@ -2097,7 +2097,7 @@ static void planar_to_interleaved( uint8_t *dest, uint8_t *src, int samples, int
 }
 #endif

-static int decode_audio( producer_avformat self, int *ignore, AVPacket pkt, int samples, double timecode, double fps )
+static int decode_audio( producer_avformat self, int64_t *ignore, AVPacket pkt, int samples, double timecode, double fps )
 {
        // Fetch the audio_format
        AVFormatContext *context = self->audio_format;
@@ -2185,12 +2185,14 @@ static int decode_audio( producer_avformat self, int *ignore, AVPacket pkt, int
                        audio_used += convert_samples;

                        // Handle ignore
-                       while ( *ignore && audio_used )
+                       if ( *ignore > 0 && audio_used )
                        {
-                               *ignore -= 1;
-                               audio_used -= audio_used > samples ? samples : audio_used;
-                               memmove( audio_buffer, &audio_buffer[ samples * channels * sizeof_sample ],
-                                                audio_used * sizeof_sample );
+                               int n = audio_used < *ignore ? audio_used : *ignore;
+//fprintf(stderr, "--> discarding %d samples, ignore = %ld\n", n, *ignore - n);
+                               *ignore -= n;
+                               audio_used -= n;
+                               memmove( audio_buffer, &audio_buffer[ n * channels * sizeof_sample ],
+                                                audio_used * channels * sizeof_sample );
                        }
                }
        }
@@ -2208,19 +2210,17 @@ static int decode_audio( producer_avformat self, int *ignore, AVPacket pkt, int
                int64_t int_position = ( int64_t )( timebase * pts * fps + 0.5 );
                int64_t req_position = ( int64_t )( timecode * fps + 0.5 );

-               mlt_log_debug( MLT_PRODUCER_SERVICE(self->parent),
+               mlt_log_verbose( MLT_PRODUCER_SERVICE(self->parent),
                        "A pkt.pts %"PRId64" pkt.dts %"PRId64" req_pos %"PRId64" cur_pos %"PRId64" pkt_pos %"PRId64"\n",
                        pkt.pts, pkt.dts, req_position, self->current_position, int_position );

-               if ( int_position > 0 )
-               {
-                       if ( int_position < req_position )
-                               // We are behind, so skip some
-                               *ignore = req_position - int_position;
-                       else if ( self->audio_index != INT_MAX && int_position > req_position + 2 )
-                               // We are ahead, so seek backwards some more
-                               seek_audio( self, req_position, timecode - 1.0 );
-               }
+               if ( pts * timebase < timecode )
+//fprintf(stderr, "--> req time %g current packet time %g\n", timecode, timebase * pts);
+                       // We are behind, so skip some
+                       *ignore = ( timecode - pts * timebase ) * fps * samples;
+               else if ( self->audio_index != INT_MAX && int_position > req_position + 2 )
+                       // We are ahead, so seek backwards some more
+                       seek_audio( self, req_position, timecode - 1.0 );
                // Cancel the find_first_pts() in seek_audio()
                if ( self->video_index == -1 && self->last_position == POSITION_INITIAL )
                        self->last_position = int_position;
@@ -2252,10 +2252,10 @@ static int producer_get_audio( mlt_frame frame, void **buffer, mlt_audio_format
                fps = mlt_properties_get_double( MLT_FRAME_PROPERTIES(frame), "producer_consumer_fps" );

        // Number of frames to ignore (for ffwd)
-       int ignore[ MAX_AUDIO_STREAMS ] = { 0 };
+       int64_t ignore[ MAX_AUDIO_STREAMS ] = { 0 };

        // Flag for paused (silence)
-       int paused = seek_audio( self, position, real_timecode );
+       int paused = seek_audio( self, position, real_timecode < 0.5 ? 0.0 : real_timecode - 0.5 );

        // Initialize ignore for all streams from the seek return value
        int i = MAX_AUDIO_STREAMS;

from mlt.

metellius commented on September 18, 2024

Nice, I test your fix when I get home. I have a script that visually shows the bug by rendering the samples around the split; I'll post the before and after images.

from mlt.

metellius commented on September 18, 2024

Here's what I used when I worked on a fix of my own:
I generated a ~2 second long sample that consists of linearly increasing values. Then I created a mlt in shotcut where I simply make a split after 1 second. This makes it easy to see when there's a bump. I have run the mlt files through melt from shotcut 16.01 and on top of origin/master + your diff.

First up is when the sample is of mp3 type, on mlt from 16.01:

You can easily see where the split is, where there are missing samples, but there is also a shift present although easier to spot in the next case:

Here is the same mlt, this time with your diff:

Now the missing samples are filled in, but there's still a disconnect.

Next I had a look at the exact same mlt, only with the mp3 sound swapped with a wav file. First up is melt 16.01:

Looks correct, you can't see the split.

Finally we have the wav-file case with your diff applied:

Notice that now there is a visible (and probably audible) disconnect at the split, so the change actually introduces a regression.

Finally, I have a 4 second long clip with music where there is a split at the 2 second mark.
Here's before the fix: https://youtu.be/nINssorxMfk
The jump is very noticable.

Here's after applying your fix:
https://youtu.be/IsKC0cUu3sI
As we've seen in the rendered waveforms, the missing samples are now back, but there is still an audible artifact at the 2 second mark.

Let me know if you're interested in any of the data/scripts I used for generating these.

from mlt.

ddennedy commented on September 18, 2024

Try to find the problem in my patch. It might be a rounding error for my use of floating point to calculate the number of samples to ignore. Maybe it can be converted to use timestamp and timebase. I am working on the other A/V sync bug now and will not return to this for a day or so.
*ignore = ( timecode - pts * timebase ) * fps * samples;

from mlt.

ddennedy commented on September 18, 2024

Yes, please share your scripts. I have trouble reproducing the problem with WAV created by melt and ladspa triangle producer, making a split with melt, and then viewing the result in audacity.

from mlt.

metellius commented on September 18, 2024

Here's the script with its mlt and data files. Running generateReport.sh (after modifying the file paths to the melt wrapper) should generate a set of pictures under a new folder called report. I have included the report that was generated on my computer, just to make sure that we're producing the same output.

http://metellius.mine.nu/meltsyncbugreport.tar.gz

You'll need sox and gnuplot installed.

from mlt.

ddennedy commented on September 18, 2024

I am getting some different results. Actually, my release pics looked drastically different from yours and my local. I made changes to your scripts to make the MLT XML output directly to WAV. I also removed transitions, set consumer real_time=-1, and converted LC_NUMERIC and commas to periods, just to simplify and remove as many variables as possible and focus on the avformat producer.
My Shotcut-16.01/melt picture linear-audio-from-wav-release.png looks like your linear-audio-from-wav-local.png:

Here is my linear-audio-from-mp3-release.png:

My release and locals look very similar to each other - even for mp3. I should mention that my local melt build is against a post FFmpeg 2.8 build from Nov. 10. Still, it shows something non-ideal. However, I might want to make a fix just for the original problem report that does not regress.

from mlt.

metellius commented on September 18, 2024

converted LC_NUMERIC and commas to periods

I have (as you might remember) had multiple issues with commas due to my norwegian locale. Are you saying this had an impact? Perhaps I should run the script again with locale set to en_US.UTF-8?

I'm interested in seeing your modified scripts so that I can run them again on my computer and check the result.

from mlt.

ddennedy commented on September 18, 2024

Changes, whose understanding are probably more important than getting a file dump:

convert locale of MLT XML and remove LC_NUMERIC
replace consumer with: <consumer real_time="-1" target="output.wav" mlt_service="avformat"/>
remove all <transition>s
change LC_ALL in generateReport.sh to C
remove call to ffmpeg in generateReport.sh
adjust makereports() calls for my environment

from mlt.

ddennedy commented on September 18, 2024

I resumed working on this due to the related shotcut issue #314. I noticed that generateReport.sh is reading out.wav, but my .mlt files are generating output.wav. When I fixed that, I reproduced the problem reported in @metellius most recent graphs above.

from mlt.

ddennedy commented on September 18, 2024

I committed a variation of the patch above that does not regress on uncompressed audio. It reduces the pop reported in Shotcut to a crackle because the mp3 image still looks a little like the second image in this thread. I have not been able to figure out a way to get a more accurate number of samples to discard than (our_time - stream_time) * stream_sample_rate.
It resolves this bug as well as I have been able to reproduce it again from my comments in the stackoverflow post.

from mlt.

Audio sometimes out-of-sync by one frame about mlt HOT 23 CLOSED

Comments (23)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent