Giter Site home page Giter Site logo

endolith / waveform_analysis Goto Github PK

View Code? Open in Web Editor NEW
244.0 27.0 81.0 138 KB

Functions and scripts for analyzing waveforms, primarily audio. This is currently somewhat disorganized and unfinished.

License: MIT License

Python 99.92% Batchfile 0.08%
frequency-estimation noise rms waveform audio-analysis audio

waveform_analysis's Introduction

This was originally several different scripts for measuring or processing waveforms, which I pasted on gist:

Since they have a lot in common, I'm trying to combine them into one repository, so I'm not duplicating effort. It's a mess so far, though.

Please don't blindly trust this. If you use this and find a stupid error, please let me know. Also let me know if you use this and it works perfectly. :D

Installation

This should now be an installable package, using:

pip install git+https://github.com/endolith/waveform_analysis.git@master

I'm in the process of splitting it into callable functions and scripts that use those functions.

Waveform analyzer

Currently this displays file information and measurements like crest factor and noise (including A-weighted noise, which is not usually included by other tools).

Usage: python wave_analyzer.py "audio file.flac"

For Windows' SendTo menu: pythonw wave_analyzer_launcher.py

Requires:

  • Python 3
  • NumPy
  • SciPy

Recommended:

Optional:

  • EasyGUI (output to a window instead of the console)
  • Matplotlib (histogram of sample values)

A-weighting

Applies an A-weighting filter to a sound file stored as a NumPy array.

I was previously using the FFT filter in Adobe Audition to simulate an A-weighting filter. This worked for large signal levels, but not for low noise floors, which is what I was stupidly using it for.

Frequency estimator

A few simple frequency estimation methods in Python

These are the methods that everyone recommends when someone asks about frequency estimation or pitch detection. (Such as here: Music - How do you analyse the fundamental frequency of a PCM or WAV sample)

None of them work well in all situations, these are "offline", not real-time, and I am sure there are much better methods "in the literature", but here is some code for the simple methods at least.

  • Count zero-crossings
  • Using interpolation to find a "truer" zero-crossing gives better accuracy
  • Spline is better than linear interpolation?
  • Do FFT and find the peak
  • Using quadratic interpolation on a log-scaled spectrum to find a "truer" peak gives better accuracy
  • Do autocorrelation and find the peak
  • Calculate harmonic product spectrum and find the peak

THD+N calculator

Measures the total harmonic distortion plus noise (THD+N) for a given input signal, by guessing the fundamental frequency (finding the peak in the FFT), and notching it out in the frequency domain.

Example of usage, with 997 Hz full-scale sine wave generated by Adobe Audition at 96 kHz, showing the 16-bit quantization distortion:

> python thd_analyzer.py "../997 Hz from 32-bit to 16-bit no dither.wav"
Analyzing "../997 Hz from 32-bit to 16-bit no dither.wav"...
Frequency: 996.999998 Hz
THD+N:      0.0012% or -98.1 dB
A-weighted: 0.0006% or -104.1 dB(A)

(Theoretical SNR of a full-scale sine is 1.761+6.02⋅16 = −98.09 dB, so this seems right)

According to the never-wrong Wikipedia:

  • THD is the fundamental alone vs the harmonics alone. The definition is ambiguous
  • THD+N is the entire signal (not just the fundamental) vs the entire signal with the fundamental notched out. (For low distortion, the difference between the entire signal and the fundamental alone is negligible.)

I'm not sure how much of the surrounding region of the peak to throw away. Should the "skirt" around the fundamental be considered part of the peak or part of the noise (jitter)? Probably the way to match other test equipment is to just calculate the width of a certain bandwidth, but is that really ideal?

Previously was finding the nearest local minima and throwing away everything between them. It works for some cases, but on peaks with wider "skirts", it gets stuck at any notches. Now using a fixed width ±10% of the fundamental, which works better.

For comparison, Audio Precision manual states:

Bandreject Amplitude Function Bandreject Response typically –3 dB at 0.725 fo & 1.38 fo –20 dB at fo ±10% –40 dB at fo ±2.5%

So this is Q factor 1.53 or 0.93 octaves? 2nd-order?

Adobe Audition with default dither, full-scale 997 Hz sine wave:

 8-bit    -49.8 dB
16-bit    -93.3 dB
32-bit   -153.8 dB

Art Ludwig's Sound Files:

File                        Claimed  Measured  (dB)
Reference           (440Hz) 0.0%     0.0022%   -93.3
Single-ended triode (440SE) 5.0%     5.06%     -25.9
Solid state         (440SS) 0.5%     0.51%     -45.8

Comparing a test device on an Audio Precision System One 22 kHz filtered vs recorded into my 96 kHz 24-bit sound card and measured with this script:

Frequency   AP THD+N    Script THD+N
40          1.00%       1.04%
100         0.15%       0.19%
100         0.15%       0.14%
140         0.15%       0.17%
440         0.056%      0.057%
961         0.062%      0.067%
1021        0.080%      0.082%
1440        0.042%      0.041%
1483        0.15%       0.15%
4440        0.048%      0.046%
9974        7.1%        7.8%
10036       0.051%      0.068%
10723       8.2%        9.3%
13640       12.2%       16.8%
19998       20.2%       56.3%  (nasty intermodulation distortion)
20044       0.22%       0.30%

(This was done with local minima method. Redo with 10% method.)

So it's mostly accurate. Mostly.

To do

  • Guess the type of waveform and do different measurements in different situations? Noise vs sine vs whatever
  • Frequency estimation
    • Ideally: frequency with ±% accuracy - better accuracy for longer files
  • py2exe compilation for Windoze
  • Web page describing it
    • Screenshots compared to Audition analysis results
  • Histogram of sample values
    • ("matplotlib not installed... skipping histogram") hist()
  • everything that Audition does?
    • histogram of dB values
    • number of possibly clipped samples
    • max/min sample values
    • peak amplitude
    • min RMS, max RMS, average RMS for chunks of 100 ms or so
      • This is more a scientific measurement tool for engineering than a musical tool. Peak and trough RMS and RMS histogram are not as important? Include them anyway!
    • actual bit depth
      • Identify if it is 8-bit samples encoded with 16 bits, for instance, like Audition does. Also like JACK bitmeter does?
  • THD
  • Real-time analysis of sound card input?
  • Calculate intersample peaks
    • "If you want to see something really bad on the oversampled meter - try a sequence of maximum and minimum values that goes like this: "1010101101010" - notice that the alternating 1's and 0's suddenly change direction in the middle. The results depends on the filter being used in the reconstruction, with the intersample peak easily exceeding 10dB!"
  • Extract frequency response plot if the input is a sweep ;)
    • Probably should just make a separate script for each function like this, and this one can be a noise analysis script
  • Signal to noise and dynamic range from test file
    • Same guts as THD script, just input −60 dBFS waveform and compare to maximum value instead of fundamental peak
  • Both normalizations of 468:
    • ITU-R 468: 0 dB at 1 kHz, true quasi-peak meter, for professional equipment
    • ITU-R ARM: 0 dB at 2 kHz, average-response meter, for commercial equipment
  • there may be an error in peak calculation?
  • test with crazy files like 1 MHz sampling rate, 3-bit, etc.
  • Bug: high frequencies of A-weighting roll off too quickly at lower sampling rates
    • make freq-response graphs at different signal levels and different sampling frequencies
  • What I've been calling "dBFS" is probably better referred to as "dBov"?
  • THD() should use a flat-top window for improved accuracy.
  • bilinear transform makes weighting filters not accurate at high frequencies. Use FIR frequency sampling method instead? Or upsample the signal.

Done

waveform_analysis's People

Contributors

endolith avatar joostvdoorn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

waveform_analysis's Issues

Have weighting curves been validated?

The README mentions that some parts of this repo have not been validated. This seems to be the best Python resource for A, C weighting of digital audio, and it would be useful if someone could confirm that the implementation is correct.

Incorrect definition of dBFS

My previous understanding was that "dBFS" is ambiguous and a full-scale sine wave can be either 0 dBFS or -3 dBFS depending on convention, and I've been using the -3 dBFS convention because it makes sense that RMS level should be -3 dB from peak level.

However, I've looked at all the standards and there's no ambiguity in the standards. A full-scale sine wave is:

  • 0 dBFS (RMS (or peak?))
  • −3 dBov (RMS only)
  • 0 dBTP (peak only)

So I should change all the measurements to either use the unit dBov or to be increased by 3 dB and use dBFS.

It doesn't look like "peak dBFS" is even a legit unit?

AES17 doesn't really say:

amplitude expressed as a level in decibels relative to full-scale amplitude (20 times the common logarithm of the amplitude over the full-scale amplitude)

but implies that it's an RMS measurement with this note:

Because the definition of full scale is based on a sine wave, it will be possible with square-wave test signals to read as much as + 3,01 dB FS.

AES-6id-2000 says RMS specifically:

Digital signal rms amplitude expressed as a level in decibels relative to full-scale amplitude

IEC 61606 specifically says RMS(signal)/RMS(FS sine), and doesn't mention peak:

the amplitude of any signal can be defined in dB FS as 20 times the common logarithm of the ratio of the r.m.s. amplitude of the signal to that of ... a 997 Hz sinusoid whose peak positive sample just reaches positive digital full-scale

ITU P.381 mentions RMS as the reference, but doesn't specifically say the signal is RMS:

All signal levels stated in this section are relative to decibels relative to full scale (dBFS), where 0 dBFS represents the root mean square (RMS) level of a full- scale sinusoidal.

ITU P.382 is the same:

All output signal levels stated in this section are relative to decibels relative to full scale (dBFS), where 0 dBFS represents the root mean square (RMS) level of a full-scale sinusoidal signal.

However, Rane says:

Therefore all signal levels expressed in terms of dBFS are peak levels

But that's not a standard...

matched z-transform vs biliear

Hi,

I have seen your discussion on a gist about replacing the bilin transform with matched-z. I was surprised that it's not in scipy.

I needed a simple module to get digital coefficients for A&C weighting filters and created based on your repo just that and using the matched-z transform.

You might find some of the fragments useful! I didn't send a PR as I had a different objective.

https://github.com/berndporr/sound_weighting_filters

Best,
/Bernd

copyright unknown

What is your copyright policy? It would be nice if your code was MIT or BSD, you would need to add a comment at the top of the files like the below (copied from BSD http://opensource.org/licenses/BSD-3-Clause) or even better without (2)

'''
Copyright (c) ,
All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

'''

ValueError: Invalid number of FFT data points (0) specified.

I get a ValueError when I try to call freq_from_hps on a filtered segments of this audio file. I segment and filter my signal like this before calling freq_from_hps on every segment:

def get_pitches(filename, sr=44100):
  # Get onset times.
  onset_frames = get_onset_frames(filename, sr)
  # Filter the signal.
  filtered_y = bandpass_filter(y, sr, 80., 4000.)
  # Get pitches.
  hps_pitches = detect_pitch(filtered_y, sr, onset_frames, 'hps')

def detect_pitch(y, sr, onset_frames, method='stft', stft_offset=10, fmin=80, fmax=4000):
  pitches, magnitudes = librosa.piptrack(y=y, 
    sr=sr, fmin=fmin, fmax=fmax)
  ...
  elif method == 'hps':
    slices = segment_signal(y, sr, onset_frames)
    for segment in slices:
      pitch = freq_from_hps(segment, sr)
      result_pitches.append(pitch)
  ...
  return result_pitches

def bandpass_filter(y, sr, lowcut, highcut):
  # Setup parameters.
  nyquist_rate = sr / 2.
  filter_order = 3
  normalized_low = lowcut / nyquist_rate
  normalized_high = highcut / nyquist_rate

  sos = butter(filter_order, [normalized_low, normalized_high],
    btype='bandpass', output='sos')
  
  y = sosfilt(sos, y)
  return y

def segment_signal(y, sr, onset_frames):
  offset_end = int(librosa.time_to_samples(0.2, sr))

  slices = np.array([y[i : i + offset_end] for i
    in librosa.frames_to_samples(onset_frames)])

  return slices

I specifically get this error:

/home/pavlos55/guitartab/app/frequency_estimator.py:107: RuntimeWarning: divide by zero encountered in log
  X = log(abs(rfft(windowed)))
/home/pavlos55/guitartab/.env/local/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2889: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/home/pavlos55/guitartab/.env/local/lib/python2.7/site-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
127.0.0.1 - - [30/May/2017 18:51:09] "POST / HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/home/pavlos55/guitartab/.env/lib/python2.7/site-packages/flask/app.py", line 1997, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/pavlos55/guitartab/.env/lib/python2.7/site-packages/flask/app.py", line 1985, in wsgi_app
    response = self.handle_exception(e)
  File "/home/pavlos55/guitartab/.env/lib/python2.7/site-packages/flask/app.py", line 1540, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/home/pavlos55/guitartab/.env/lib/python2.7/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/pavlos55/guitartab/.env/lib/python2.7/site-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/pavlos55/guitartab/.env/lib/python2.7/site-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/pavlos55/guitartab/.env/lib/python2.7/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/pavlos55/guitartab/.env/lib/python2.7/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/pavlos55/guitartab/app/app.py", line 39, in index
    pitches = handle_file(file)
  File "/home/pavlos55/guitartab/app/app.py", line 53, in handle_file
    pitches = mir.transcribe(filepath)
  File "/home/pavlos55/guitartab/app/mir.py", line 49, in transcribe
    hps_pitches = detect_pitch(filtered_y, sr, onset_frames, 'hps')
  File "/home/pavlos55/guitartab/app/mir.py", line 106, in detect_pitch
    pitch = freq_from_hps(segment, sr)
  File "/home/pavlos55/guitartab/app/frequency_estimator.py", line 107, in freq_from_hps
    X = log(abs(rfft(windowed)))
  File "/home/pavlos55/guitartab/.env/lib/python2.7/site-packages/numpy/fft/fftpack.py", line 372, in rfft
    _real_fft_cache)
  File "/home/pavlos55/guitartab/.env/lib/python2.7/site-packages/numpy/fft/fftpack.py", line 56, in _raw_fft
    % n)

Any ideas?

HPS method estimation error

I am using freq_from_hps as described here to detect pitch in monophonic guitar samples.

I noticed that the results I get often have a slight estimation error which is always +10-15Hz. For example, when I want to detect pitch for this audio file (see end of post for the correct pitches) I get:

[['F4'], ['F#4'], ['G4'], ['G#4'], ['A4'], ['A4'], ['A#4'], ['B4'], ['C5'], ['C#5'], ['D5'], ['D#5'], ['E5'], ['F5'], ['F#5'], ['G5'], ['G#5'], ['A5'], ['A#5'], ['B5'], ['C6'], ['C#6'], ['F4']]

When I increase every of these pitches by 10Hz I get:

[['E4'], ['F4'], ['F#4'], ['G4'], ['G#4'], ['A4'], ['A#4'], ['B4'], ['C5'], ['C#5'], ['D5'], ['D#5'], ['E5'], ['F5'], ['F#5'], ['G5'], ['G#5'], ['A5'], ['A#5'], ['B5'], ['C6'], ['C#6'], ['E4']]

Which is actually 100% correct.

This basically happens for almost every note below G#4 (and above E2). What could be the issue here? Any ideas about how there is such a fixed estimation error?

Furthermore, as I am designing this application specifically for the guitar, do you have any recommendations/tips as to how I could optimise this method?

Thanks.

Pass objects to analysis functions with FFT cached?

Many of these functions calculate the FFT internally. It would be faster to calculate it once and "attach" it to the signal object in a way it could be used by all the functions without re-calculating.

audacity plugin

It would be great to have this turned into an audacity plugin

Should removing DC component (substracting average) be after applying window?

I noticed that in the current code (https://github.com/endolith/waveform_analysis/blob/master/waveform_analysis/thd.py#L61) the DC component is removed before applying window. The calculated DC offset would be different after the window has been applied, causing the resulting FFT to have a rather large DC component in some cases.

For example, I have a 1kHz -3dBFS sine tone 32-bit wave file at 48kHz sample rate. I feed only the first 16K samples to the code (obviously the last sine cycle won't be complete, and that's why window functions are used). The result I got is a THD+N of -77.8dB while I expect it to be much better. If I increase the FFT points to 1M point, the THD+N gets improved to <-100dB. By inspecting the noise spectrum in the 16K point case, I noticed that the DC bin has a -77.8dB magnitude, explaining the rather poor THD+N. If I move the subtraction to after the windowing (windowed -= mean(windowed)), the THD+N immediately gets improved to -189.7dB, and the performance is consistent across different FFT sizes between 16K and 2M. (while in the previous cases, performances improve with larger FFT size).

These make sense to me however I am not sure if my modifications are correct or not. Do you have any idea about this?

Errors (IndexError) on silent waves (degenerate pathological cases)

  File "/home/jimbo1qaz/miniconda3/envs/wavetable/lib/python3.6/site-packages/waveform_analysis/freq_estimation.py", line 92, in freq_from_autocorr
    start = find(d > 0)[0]
IndexError: index 0 is out of bounds for axis 0 with size 0

If freq_from_autocorr is given a silent signal d contains a series of 0.

I feel the best solution is to just return some arbitrary value without crashing.


I don't remember if it happens in practice, but a while back I found that parabolic(f, 0) compares to f[-1] (which is wrong but doesn't crash) and parabolic(f, len(f)-1) compares to f[len(x)] which raises IndexError.

Signal-to-noise or dynamic range measurement

Same guts as THDN function, except the numerator is the maximum possible signal, not the amplitude of the fundamental.

Research definitions of SNR and DNR and put the definition you use in the readme. One is with -60 dBFS sine wave notched out, etc.

Calculate THD

It should be possible to calculate THD from the peaks in the spectrum, as opposed to THD+N which includes noise, too. Should read the same thing even if you add noise to a given wave.

This is ugly

I've got common.py with load() and analyze_channels(), but then wave_analyzer.analyze() does the same thing.

I want everything wrapped in try: so it can run without any command line window and just pop up the results using easygui, or Exceptions in easygui if they happen, but then everything is wrapped in a try statement which is ugly. Separate file that launches the rest?

Having each analysis in its own function is nice, but if multiple functions are doing the same FFT on the same signal, wouldn't it be better to keep the previously-computed FFT to save time? So I should pass an object to the analysis functions instead, and if they compute FFT they should attach FFT results to the object as a cache?

Then I've still got the standalone gists for each analysis function that should be merged into this and killed off.

Suggestions welcome

how to use thdcalculator.py?

when i entered
:python3 thdcalculator.py mytestfile.wav, nothing happened.

then i use:
python3 thd_analyzer_launcher.py mytestfile.wav, i got this error:

Exception:
name 'analyze_channels' is not defined

sorry for being a noob, how to use the scripts properly? thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.