achabotl / pambox Goto Github PK

View Code? Open in Web Editor NEW

34.0 6.0 8.0 5.23 MB

Python auditory modeling toolbox.

Home Page: http://pambox.org

License: BSD 3-Clause "New" or "Revised" License

Makefile 0.28% Python 99.72%

auditory modeling hearing speech

pambox's Introduction

Python Auditory Modeling Toolbox

https://travis-ci.org/achabotl/pambox.svg?branch=develop

pambox is a Python toolbox to facilitate the development of auditory models, with a focus on speech intelligibility prediction models.

The project is maintained by @AlexChabotL.

pambox provides a consistent API for speech intelligibility models, inspired by Scikit-learn, to facilitate comparisons across models.

Links:

Official source code repo: https://github.com/achabotl/pambox
HTML documentations: http://pambox.readthedocs.org
Issue tracker: https://github.com/achabotl/pambox/issues
Mailing list: [email protected]
Mailing list archive: https://groups.google.com/d/forum/python-pambox

Dependencies

pambox is tested to work under Python 2.7 and Python 3.4 (thanks to six). Only Mac OS X (10.9) has been tested thoroughly.

The main dependencies are :

Numpy >= 1.8.0,
Scipy >=0.14.0,
Pandas >=0.14.1,
six >=1.7.2 (to have a single codebase for Python 2 and Python 3).
ipython-notebook >= 2.3.1 (for parallel experiments)

Lower versions of these packages are likely to work as well but have not been thoroughly tested.

pyaudio is required if you want to use the audio module.

For running tests, you will need pytest and pytest-cov.

Install

Right now, pambox is only avaible through Github. It should be available via pip soon. To install pambox from source:

git clone https://github.com/achabotl/pambox.git
cd pambox
python setup.py install

If you need more details, see the [Installation](https://github.com/achabotl/pambox/wiki/Installation) page on the wiki.

Contributing

You can check out the latest source and install it for development with:

git clone https://github.com/achabotl/pambox.git
cd pambox
python setup.py develop

To run tests (you will need pytest), from the root pambox folder, type:

python setup.py test

License

pambox is licensed under the New BSD License (3-clause BSD license).

pambox's People

Contributors

Stargazers

Watchers

Forkers

lx37 gitter-badger jfsantos maksymdelta atencra jupiterethan 3x10e8 gitzephyr

pambox's Issues

Ideal observer: curve should be fitted using the average intelligibility

The current implementation minimizes the squared error for each sentence compared to the reference. What should instead be done is to minimize the error between the average intelligibility (intelligibility across sentences) and the reference intelligibility.

utils.rms and utils.setdbspl fail with some signal sizes

If the signal is of shape 2xN, for example, utils.rms and utils.setdbspl spit out a ValueError because of incompatible shapes. The issue is that both function have to do a division by, or a subtraction of, the mean and that does not fit the broadcasting rules.

Parallel speech.Experiment throws a IPython.parallel.error when engines are not ready

I think the issues is that the load_balanced_view is non-blocking and that I don't check if it's ready before iterating through the results. Should try either making the view blocking, or checking for readiness, before iteration.

Implement the model of Lavandier and Culling (2010)

Lavandier, M. and Culling, J. F. (2010). Prediction of binaural speech intelligibility against noise in rooms, J. Acoust. Soc. Am., 127(1), 387-99

DOI: 10.1121/1.3268612

Have a single "play" function with a scaling argument

So far, there are a sound function and a soundsc function. The naming is inherited from Matlab. Have a single play function would be much more obvious.

Write tests for the Experiment class

Refactor Sepsm and MrSepsm

The predict function should be broken down for more modularity, such that there's no need to duplicate it for the mr-sEPSM. The abstraction level of all the call in predict should be the same.

Fix warning "A value is trying to be set on a copy of a slice from a DataFrame" in Experiment

/Users/chabot/Dropbox/PhD/pambox/pambox/speech/experiment.py:604: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

It happens in pred_to_pc. Should use .loc instead of double indexing.

Loading SSN should follow the `force_mono` flag

When loading the speech shaped noise in speech.Material, a mono signal should be returned if force_mono is True.

Models should return something other than dictionaries

namedtuples would make much more sense. See also #33, and #40

Implement speech-based STI

Probably a simple addition after the STI model (issue #6) has been implemented.

Implement STI model

International Electrotechnical Commission (2003). 60268--16-2003 Sound system equipment---Part 16: Objective rating of speech intelligibility by speech transmission index, , (60268--16-2003), 1--28

SRT calculation fails if model_srts is defined but the model is not in results

The srts_from_df method crashes if model_srts is defined but the models are not in the dataframe. The issues is that the aggregate method ends up with functions for columns that don't exist.

This should be fixed by adding the model_srts values only if the model--output pair exists in the data frame.

Implement the STI speech intelligibility model

Simplify the loading and manipulation of experiment results

Should be able to:

load from disk
Convert to SRT
Know its "groups"
Convert to intelligibility

Look into subclassing DataFrame for experiment results:

Speech material should only pick wav files from folder

When listing the files in the speech.Material class, all files in the path specified are listed. Only the wav files should be listed.

Signals loaded with `utils.read_wav_as_float` are 180 deg out of phase

It happens in the conversion to float, if the signal is integers. The signal is divided by the largest value possible, which is pretty much always -2^(Nbits - 1). The signal should be divided by the absolute value.

Tests fail on Travis because IPython is missing as a dependency

noctave_filtering should calculate the boundaries for each center frequency

The boundaries of each flitter should be calculated independently, and not suppose that the input frequencies are spaced properly.

So right now, if we input two frequencies that are not spaced according to the width parameter, e.g. [63, 1000], the boudaries are: [56.12661924, 70.71510904, 1122.46204831]. But there should be 4 boundaries.

Add a notification when an experiment is finished.

I could send an email or use local notification mechanisms on each platform:

OS X: notification center, Growl, terminal-notifier
Linux: ?
Window: balloontip

The sEPSM does a double compensation for filter bandwidth

When finding the bands above threshold in the sEPSM, there is a factor of 0.231 for the compensation of the filter bandwidth. This factor is unnecessary because the diffuse hearing threshold used for the comparison are already adjusted for the filter bandwidths.

The factor should be removed. Hopefully, that would not affect the predictions too much.

Add CI tests for Windows

See AppVeyor.

Standardize the return values of the intelligibility models

Each intelligibility model returns a different type of prediction value. Sometimes it is intelligibility percentage directly, but more often than not, it is some particular value that has to be transformed to intelligibility. A model can also return internal intermediate values, such a envelope powers, level spectra, etc. It would be great if the output of the the models was standardized such that the models can be used interchangeably.

Create an "experiment" class to simplify intelligibility experiments

The idea is to have (possibly) a single class, or function, that can loop through multiple distortions, speech materials, models, etc. and run all the experiments, with a minimum of boilerplate code.

Implement the STOI intelligibility model

Taal, C. H., Hendriks, R. C., Heusdens, R., and Jensen, J. (2010). A short-time objective intelligibility measure for time-frequency weighted noisy speech, , (), 4214--4217
DOI

Get rid of the nose dependency when testing `fftfilt`

The test setup for fftfilt depends on nose, which is not part of the regular dependencies. It would be bit stupid to depend on both nose and pytest, so I should find a way to convert the tests to use pytest.

Modulation filtering stage produces different output from "butter" function

Noticed that the time output of the modulation filterbank is different from the time output if we just use the sp.signal.butter function to create the coefficients and then filter the signal with sp. signal.filtfilt.

Apparently, there an extra "-1" factor in when creating the frequency vector that should not be there. If we remove it, then the output of the mod_filterbank function is the same as when using butter.

It would probably make sense to use butter, since we're using Butterworth filters anyway, instead of using our own implementation. Additionally, because of the way the modulation filtering is currently done, the shape of the filter is dependent on the length of the input signal, because it affects the resolution of the frequency vector.

Implement the model of Jelfs et al. (2013)

Matlab implementation

Jelfs, S., Culling, J. F., and Lavandier, M. (2011). Revision and validation of a binaural model for speech intelligibility in noise, Hear. Res., 275(127), 96-104

Experiment needs a way to set levels before or after the processing

I often run into the issue that the speech and noise levels should be set before the distortion, for example when an SNR is set at the source, rather than at the ears, in an experiment with signals in space. Right now I have to do very inelegant overrides of the preprocessing function, or

Have an extra argument to the Experiment class, to define adjust_levels_before_processing or something like that would really simplify things.

Additionally, that would solve the problem that adjusting the levels of binaural signals when HRTFs are applied. If the levels are adjusted before the distortion processing, then the signals should, in principle, still be binaural.

Implement Beutelmann et al (2010) intelligibility model

Beutelmann, R., Brand, T., and Kollmeier, B. (2010). Revision, extension, and evaluation of a binaural speech intelligibility model, J. Acoust. Soc. Am., 127(4), 2479--2497

Not possible to apply processing mixture of target and maskers

In Experiment.processing, the application of the distortion is done independently for the target and masker. It's a problem for non-linear distortions, like spectral-subtraction, which require the noisy signal and the noise alone.

I see two approaches for "fixing" that:

If required the user should subclass Experiment and replace the preprocessing method with a method that applies the distortion whichever way they want.
Add an additional option to Experiment to define the behavior inside the preprocessing method. That would require changing the level adjustment behavior as well. Actually, the level adjustment would have to be done before the application of the distortion.

Right now, I'd say we should stick to option 1.

SRT conversion should allow for model-specific criteria

Experiment.srts_from_df (which is a bad name) should allow the srt_at argument to take model-specific values. For example, if a model outputs SII values, then the SRT shouldn't be at 50% but at some value between 0 and 1. The current API is

srts_from_df(df, col='Intelligibility', srt_at=50)

We should allow for model specific criteria, like: {('Model', 'Output'}: criterion'}. This has the downside that if the "default" srt_at shouldn't be 50, then all models must be part of the dictionary. Therefore, adding another keyword argument might be a better idea.

srts_from_df(df, col='Intelligibility', srt_at=50, model_srts=None)

Sentences not found if path does not end with '/'

If the path_to_sentences doesn't end with '/' then the Material class cannot find the sentences. It concatenates path with *.ext, which of course, is wrong.

Add an option to Experiment class to save complete data frame to HDF5

Since it's possible to save a Pandas Dataframe directly to HDF5, it would be a good idea to offer that option when running an experiment. I think the default should be "off", because the resulting files will be too big, but it would certainly be useful for debugging, makings plots of the internal representations, etc.

Pick a "reference level" for the toolbox

Should pick a "reference level" for signals. For example, should a signal with an RMS value of 1 correspond to 0 dB, 100 dB, or something else?

We could use a physical standard too, where an RMS of 20e-6 corresponds to 0 dB, i.e.

level = 20 * log10(rms / 20e-6)

Convert modulation filtering function into a Class and cache filter coefficients

It's probably a good idea to convert the modulation fitlerbank code into a class, just like the other filter banks. It would be more consistent, but additionally, it would allow for caching the filter coefficients, instead of calculating them every time.

average_level = np.mean(utils.dbspl(signal))

Average level should therefore always be a single number, independently of if signal has one, two, or more channels.