Comments (23)
Could this be the cause of the audio splitting issue I'm trying to debug? For a split mp3 clip I'm seeing that the audio data (as returned by avcodec_decode_audio in decode_audio) in the first one or two frames after a split are zeroed (giving an audible gap in the sound).
from mlt.
Yes, if I recall correctly, I had trouble to find a context field to provide information on codec delay, and you cannot simply skip over initial silent samples as that will give a problem if that was intentional.
from mlt.
https://ffmpeg.org/doxygen/2.8/structAVCodecContext.html#a948993adfdfcd64b81dad1151fe50f33
is this the one?
from mlt.
There is also initial_padding, which is audio only and looks relevant:
https://ffmpeg.org/doxygen/2.8/structAVCodecContext.html#a8f95550ce04f236e9915516d04d3d1ab
from mlt.
Those were not there for ffmpeg 2.3, but are at least available for our current 2.7 release. Unfortunately both of these fields are set to zero for both wav and mp3 decoding in melt...
But I'm noticing how mp3dec prints
[mp3 @ 0x7f18f6483220] pad 576 576
while initializing, perhaps this is a clue?
from mlt.
mp3dec.c, mp3_parse_info_tag(...):
/* Encoder delays */
v= avio_rb24(s->pb);
if(AV_RB32(version) == MKBETAG('L', 'A', 'M', 'E')
|| AV_RB32(version) == MKBETAG('L', 'a', 'v', 'f')
|| AV_RB32(version) == MKBETAG('L', 'a', 'v', 'c')
) {
mp3->start_pad = v>>12;
mp3-> end_pad = v&4095;
st->start_skip_samples = mp3->start_pad + 528 + 1;
if (mp3->frames) {
st->first_discard_sample = -mp3->end_pad + 528 + 1 + mp3->frames * (int64_t)spf;
st->last_discard_sample = mp3->frames * (int64_t)spf;
}
if (!st->start_time)
st->start_time = av_rescale_q(st->start_skip_samples,
(AVRational){1, c->sample_rate},
st->time_base);
av_log(s, AV_LOG_DEBUG, "pad %d %d\n", mp3->start_pad, mp3-> end_pad);
}
from mlt.
I did some investigation. I printf-ed AVFormatContext.AVStream.skip_samples, .start_skip_samples, and .first_discard_sample
. skip_samples
is always 0 for me, but not the other two. Docs for start_skip_samples
says "the samples are removed from packets with pts==0", which was not clear to me if API user should or ffmpeg internals will.
After some searching, I see libavformat/utils.c:read_frame_internal()
copies skip_samples
or start_skip_samples
to some AVPacket "side data." libavformat/utils.c:avcodec_decode_audio4()
reads this from the side data, appears to discard for us, and resets skip_samples
. So, that appears to be a dead end.
from mlt.
Some other things I tried:
- Suspected it to be caused by fastseek being enabled, but it didn't seem to be default on, and we don't set it.
- Noticed that after seeking, seek_video flushes the stream, but seek_audio does not.Tried adding avcodec_flush_buffers(self->audio_codec[self->audio_index]), but it had no effect.
- Had a look at seek_preroll, which also sounded promising, but that as well was 0.
I have been thinking there might be a way to look at our requested position, then seek, and then inspect the actual position in ffmpeg (assuming the difference might be the number whe're looking for), but haven't figured out how yet, or if it even makes sense.
from mlt.
This post describes a way to seek within packets:
http://stackoverflow.com/a/31859565
from mlt.
Here's an approach I'm considering: Similar to how int_position and req_position in decode_audio(...) is used for discarding frames, the same thing looks doable with samples. By comparing the requested timecode with the start and duration of the incoming decoded frame, we get a position inside the frame data we can use to chop off the first part of the samples. Does it sound reasonable?
from mlt.
yes
from mlt.
Starting to realize there's multiple things that have to be done - in addition to the partial discarding of frame samples, i'm also seeing whole frames with empty data after doing a seek (in my mp3 test case, two blank frames then the audio data starts coming in). Maybe we actually need to have it always seek a (possibly codec-specific) interval further back in order to get the decoder up and running...
from mlt.
I made some progress on this. It fixes the original reporter's problem and sometimes @metellius problem with splitting an mp3 file on the Shotcut timeline. In the related Shotcut bug, I discovered there is an additional separate problem on the timeline that does not appear if using just playlist. Here is the diff I am testing with now. The ignore variable is changed from frame units to sample count (not including channels).
diff --git a/src/modules/avformat/producer_avformat.c b/src/modules/avformat/producer_avformat.c
index bcd42d5..4d1ccc7 100644
--- a/src/modules/avformat/producer_avformat.c
+++ b/src/modules/avformat/producer_avformat.c
@@ -2097,7 +2097,7 @@ static void planar_to_interleaved( uint8_t *dest, uint8_t *src, int samples, int
}
#endif
-static int decode_audio( producer_avformat self, int *ignore, AVPacket pkt, int samples, double timecode, double fps )
+static int decode_audio( producer_avformat self, int64_t *ignore, AVPacket pkt, int samples, double timecode, double fps )
{
// Fetch the audio_format
AVFormatContext *context = self->audio_format;
@@ -2185,12 +2185,14 @@ static int decode_audio( producer_avformat self, int *ignore, AVPacket pkt, int
audio_used += convert_samples;
// Handle ignore
- while ( *ignore && audio_used )
+ if ( *ignore > 0 && audio_used )
{
- *ignore -= 1;
- audio_used -= audio_used > samples ? samples : audio_used;
- memmove( audio_buffer, &audio_buffer[ samples * channels * sizeof_sample ],
- audio_used * sizeof_sample );
+ int n = audio_used < *ignore ? audio_used : *ignore;
+//fprintf(stderr, "--> discarding %d samples, ignore = %ld\n", n, *ignore - n);
+ *ignore -= n;
+ audio_used -= n;
+ memmove( audio_buffer, &audio_buffer[ n * channels * sizeof_sample ],
+ audio_used * channels * sizeof_sample );
}
}
}
@@ -2208,19 +2210,17 @@ static int decode_audio( producer_avformat self, int *ignore, AVPacket pkt, int
int64_t int_position = ( int64_t )( timebase * pts * fps + 0.5 );
int64_t req_position = ( int64_t )( timecode * fps + 0.5 );
- mlt_log_debug( MLT_PRODUCER_SERVICE(self->parent),
+ mlt_log_verbose( MLT_PRODUCER_SERVICE(self->parent),
"A pkt.pts %"PRId64" pkt.dts %"PRId64" req_pos %"PRId64" cur_pos %"PRId64" pkt_pos %"PRId64"\n",
pkt.pts, pkt.dts, req_position, self->current_position, int_position );
- if ( int_position > 0 )
- {
- if ( int_position < req_position )
- // We are behind, so skip some
- *ignore = req_position - int_position;
- else if ( self->audio_index != INT_MAX && int_position > req_position + 2 )
- // We are ahead, so seek backwards some more
- seek_audio( self, req_position, timecode - 1.0 );
- }
+ if ( pts * timebase < timecode )
+//fprintf(stderr, "--> req time %g current packet time %g\n", timecode, timebase * pts);
+ // We are behind, so skip some
+ *ignore = ( timecode - pts * timebase ) * fps * samples;
+ else if ( self->audio_index != INT_MAX && int_position > req_position + 2 )
+ // We are ahead, so seek backwards some more
+ seek_audio( self, req_position, timecode - 1.0 );
// Cancel the find_first_pts() in seek_audio()
if ( self->video_index == -1 && self->last_position == POSITION_INITIAL )
self->last_position = int_position;
@@ -2252,10 +2252,10 @@ static int producer_get_audio( mlt_frame frame, void **buffer, mlt_audio_format
fps = mlt_properties_get_double( MLT_FRAME_PROPERTIES(frame), "producer_consumer_fps" );
// Number of frames to ignore (for ffwd)
- int ignore[ MAX_AUDIO_STREAMS ] = { 0 };
+ int64_t ignore[ MAX_AUDIO_STREAMS ] = { 0 };
// Flag for paused (silence)
- int paused = seek_audio( self, position, real_timecode );
+ int paused = seek_audio( self, position, real_timecode < 0.5 ? 0.0 : real_timecode - 0.5 );
// Initialize ignore for all streams from the seek return value
int i = MAX_AUDIO_STREAMS;
from mlt.
Nice, I test your fix when I get home. I have a script that visually shows the bug by rendering the samples around the split; I'll post the before and after images.
from mlt.
Here's what I used when I worked on a fix of my own:
I generated a ~2 second long sample that consists of linearly increasing values. Then I created a mlt in shotcut where I simply make a split after 1 second. This makes it easy to see when there's a bump. I have run the mlt files through melt from shotcut 16.01 and on top of origin/master + your diff.
First up is when the sample is of mp3 type, on mlt from 16.01:
You can easily see where the split is, where there are missing samples, but there is also a shift present although easier to spot in the next case:
Here is the same mlt, this time with your diff:
Now the missing samples are filled in, but there's still a disconnect.
Next I had a look at the exact same mlt, only with the mp3 sound swapped with a wav file. First up is melt 16.01:
Looks correct, you can't see the split.
Finally we have the wav-file case with your diff applied:
Notice that now there is a visible (and probably audible) disconnect at the split, so the change actually introduces a regression.
Finally, I have a 4 second long clip with music where there is a split at the 2 second mark.
Here's before the fix: https://youtu.be/nINssorxMfk
The jump is very noticable.
Here's after applying your fix:
https://youtu.be/IsKC0cUu3sI
As we've seen in the rendered waveforms, the missing samples are now back, but there is still an audible artifact at the 2 second mark.
Let me know if you're interested in any of the data/scripts I used for generating these.
from mlt.
Try to find the problem in my patch. It might be a rounding error for my use of floating point to calculate the number of samples to ignore. Maybe it can be converted to use timestamp and timebase. I am working on the other A/V sync bug now and will not return to this for a day or so.
*ignore = ( timecode - pts * timebase ) * fps * samples;
from mlt.
Yes, please share your scripts. I have trouble reproducing the problem with WAV created by melt and ladspa triangle producer, making a split with melt, and then viewing the result in audacity.
from mlt.
Here's the script with its mlt and data files. Running generateReport.sh (after modifying the file paths to the melt wrapper) should generate a set of pictures under a new folder called report. I have included the report that was generated on my computer, just to make sure that we're producing the same output.
http://metellius.mine.nu/meltsyncbugreport.tar.gz
You'll need sox and gnuplot installed.
from mlt.
I am getting some different results. Actually, my release pics looked drastically different from yours and my local. I made changes to your scripts to make the MLT XML output directly to WAV. I also removed transitions, set consumer real_time=-1, and converted LC_NUMERIC and commas to periods, just to simplify and remove as many variables as possible and focus on the avformat producer.
My Shotcut-16.01/melt picture linear-audio-from-wav-release.png looks like your linear-audio-from-wav-local.png:
Here is my linear-audio-from-mp3-release.png:
My release and locals look very similar to each other - even for mp3. I should mention that my local melt build is against a post FFmpeg 2.8 build from Nov. 10. Still, it shows something non-ideal. However, I might want to make a fix just for the original problem report that does not regress.
from mlt.
converted LC_NUMERIC and commas to periods
I have (as you might remember) had multiple issues with commas due to my norwegian locale. Are you saying this had an impact? Perhaps I should run the script again with locale set to en_US.UTF-8?
I'm interested in seeing your modified scripts so that I can run them again on my computer and check the result.
from mlt.
Changes, whose understanding are probably more important than getting a file dump:
- convert locale of MLT XML and remove LC_NUMERIC
- replace consumer with:
<consumer real_time="-1" target="output.wav" mlt_service="avformat"/>
- remove all
<transition>
s - change LC_ALL in generateReport.sh to C
- remove call to ffmpeg in generateReport.sh
- adjust makereports() calls for my environment
from mlt.
I resumed working on this due to the related shotcut issue #314. I noticed that generateReport.sh is reading out.wav, but my .mlt files are generating output.wav. When I fixed that, I reproduced the problem reported in @metellius most recent graphs above.
from mlt.
I committed a variation of the patch above that does not regress on uncompressed audio. It reduces the pop reported in Shotcut to a crackle because the mp3 image still looks a little like the second image in this thread. I have not been able to figure out a way to get a more accurate number of samples to discard than (our_time - stream_time) * stream_sample_rate.
It resolves this bug as well as I have been able to reproduce it again from my comments in the stackoverflow post.
from mlt.
Related Issues (20)
- Issue when saving consumer stream in file on remote server HOT 2
- Termux: Linker cannot find symbols HOT 1
- Utilizing GPU for rendering HOT 1
- [4k 30fps] OOM after 15m while rendering hevc_nvenc profile HOT 4
- compile error on mlt-7.24.0 HOT 1
- how can i save the audio language HOT 1
- How to specify output video width and height? HOT 9
- melt.exe is creating pops in audio in clips with PCM audio in Kdenlive and ShotCut. HOT 13
- How to change the spacing of wrapped text? HOT 3
- how to use gpu decode video? HOT 1
- MLT master fails to build with ffmpeg 7.0 HOT 6
- Is there something wrong with the hash function of mlt_properties_add? HOT 2
- avformat: vbr parameter should also accept integer values HOT 1
- Audio not synchronized
- Setting lowres>=1 for codecs that don't support lowres results in broken output
- producer_avformat performs useless work when source is intra-only and consumer framerate is lower than the source's framerate HOT 3
- [producer_xml] failed to load filter "affine" HOT 2
- Rendering .mlt File to .mp4 with melt looses image transparency HOT 2
- error: use of undeclared identifiers 'free' and 'calloc' HOT 1
- audio_index=all broken with Sintel since c170481 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlt.