Giter Site home page Giter Site logo

pambox's Introduction

Python Auditory Modeling Toolbox

https://travis-ci.org/achabotl/pambox.svg?branch=develop Documentation Status

pambox is a Python toolbox to facilitate the development of auditory models, with a focus on speech intelligibility prediction models.

The project is maintained by @AlexChabotL.

pambox provides a consistent API for speech intelligibility models, inspired by Scikit-learn, to facilitate comparisons across models.

Links:

Dependencies

pambox is tested to work under Python 2.7 and Python 3.4 (thanks to six). Only Mac OS X (10.9) has been tested thoroughly.

The main dependencies are :

Lower versions of these packages are likely to work as well but have not been thoroughly tested.

pyaudio is required if you want to use the audio module.

For running tests, you will need pytest and pytest-cov.

Install

Right now, pambox is only avaible through Github. It should be available via pip soon. To install pambox from source:

git clone https://github.com/achabotl/pambox.git
cd pambox
python setup.py install

If you need more details, see the [Installation](https://github.com/achabotl/pambox/wiki/Installation) page on the wiki.

Contributing

You can check out the latest source and install it for development with:

git clone https://github.com/achabotl/pambox.git
cd pambox
python setup.py develop

To run tests (you will need pytest), from the root pambox folder, type:

python setup.py test

License

pambox is licensed under the New BSD License (3-clause BSD license).

pambox's People

Contributors

achabotl avatar jfsantos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pambox's Issues

utils.rms and utils.setdbspl fail with some signal sizes

If the signal is of shape 2xN, for example, utils.rms and utils.setdbspl spit out a ValueError because of incompatible shapes. The issue is that both function have to do a division by, or a subtraction of, the mean and that does not fit the broadcasting rules.

Refactor Sepsm and MrSepsm

The predict function should be broken down for more modularity, such that there's no need to duplicate it for the mr-sEPSM. The abstraction level of all the call in predict should be the same.

Implement STI model

International Electrotechnical Commission (2003). 60268--16-2003 Sound system equipment---Part 16: Objective rating of speech intelligibility by speech transmission index, , (60268--16-2003), 1--28

noctave_filtering should calculate the boundaries for each center frequency

The boundaries of each flitter should be calculated independently, and not suppose that the input frequencies are spaced properly.

So right now, if we input two frequencies that are not spaced according to the width parameter, e.g. [63, 1000], the boudaries are: [56.12661924, 70.71510904, 1122.46204831]. But there should be 4 boundaries.

The sEPSM does a double compensation for filter bandwidth

When finding the bands above threshold in the sEPSM, there is a factor of 0.231 for the compensation of the filter bandwidth. This factor is unnecessary because the diffuse hearing threshold used for the comparison are already adjusted for the filter bandwidths.

The factor should be removed. Hopefully, that would not affect the predictions too much.

Standardize the return values of the intelligibility models

Each intelligibility model returns a different type of prediction value. Sometimes it is intelligibility percentage directly, but more often than not, it is some particular value that has to be transformed to intelligibility. A model can also return internal intermediate values, such a envelope powers, level spectra, etc. It would be great if the output of the the models was standardized such that the models can be used interchangeably.

Implement the STOI intelligibility model

Taal, C. H., Hendriks, R. C., Heusdens, R., and Jensen, J. (2010). A short-time objective intelligibility measure for time-frequency weighted noisy speech, , (), 4214--4217
DOI

Modulation filtering stage produces different output from "butter" function

Noticed that the time output of the modulation filterbank is different from the time output if we just use the sp.signal.butter function to create the coefficients and then filter the signal with sp. signal.filtfilt.

Apparently, there an extra "-1" factor in when creating the frequency vector that should not be there. If we remove it, then the output of the mod_filterbank function is the same as when using butter.

It would probably make sense to use butter, since we're using Butterworth filters anyway, instead of using our own implementation. Additionally, because of the way the modulation filtering is currently done, the shape of the filter is dependent on the length of the input signal, because it affects the resolution of the frequency vector.

Experiment needs a way to set levels before or after the processing

I often run into the issue that the speech and noise levels should be set before the distortion, for example when an SNR is set at the source, rather than at the ears, in an experiment with signals in space. Right now I have to do very inelegant overrides of the preprocessing function, or

Have an extra argument to the Experiment class, to define adjust_levels_before_processing or something like that would really simplify things.

Additionally, that would solve the problem that adjusting the levels of binaural signals when HRTFs are applied. If the levels are adjusted before the distortion processing, then the signals should, in principle, still be binaural.

Not possible to apply processing mixture of target and maskers

In Experiment.processing, the application of the distortion is done independently for the target and masker. It's a problem for non-linear distortions, like spectral-subtraction, which require the noisy signal and the noise alone.

I see two approaches for "fixing" that:

  1. If required the user should subclass Experiment and replace the preprocessing method with a method that applies the distortion whichever way they want.
  2. Add an additional option to Experiment to define the behavior inside the preprocessing method. That would require changing the level adjustment behavior as well. Actually, the level adjustment would have to be done before the application of the distortion.

Right now, I'd say we should stick to option 1.

SRT conversion should allow for model-specific criteria

Experiment.srts_from_df (which is a bad name) should allow the srt_at argument to take model-specific values. For example, if a model outputs SII values, then the SRT shouldn't be at 50% but at some value between 0 and 1. The current API is

srts_from_df(df, col='Intelligibility', srt_at=50)

We should allow for model specific criteria, like: {('Model', 'Output'}: criterion'}. This has the downside that if the "default" srt_at shouldn't be 50, then all models must be part of the dictionary. Therefore, adding another keyword argument might be a better idea.

srts_from_df(df, col='Intelligibility', srt_at=50, model_srts=None)

Add an option to Experiment class to save complete data frame to HDF5

Since it's possible to save a Pandas Dataframe directly to HDF5, it would be a good idea to offer that option when running an experiment. I think the default should be "off", because the resulting files will be too big, but it would certainly be useful for debugging, makings plots of the internal representations, etc.

Pick a "reference level" for the toolbox

Should pick a "reference level" for signals. For example, should a signal with an RMS value of 1 correspond to 0 dB, 100 dB, or something else?

We could use a physical standard too, where an RMS of 20e-6 corresponds to 0 dB, i.e.

level = 20 * log10(rms / 20e-6)

Level adjustment should work for binaural signals too

In speech.Exeriment.adjust_levels, it should be possible to adjust the levels correctly even if the signals are binaural. A way to do this is simply:

average_level = np.mean(utils.dbspl(signal))

Average level should therefore always be a single number, independently of if signal has one, two, or more channels.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.