Giter Site home page Giter Site logo

latentaudio's Introduction

LatentAudio

This repository collects code and data to disentangle the latent space of sound-event recognition model Yamnet into materials and actions. The data consists of 60K 1-second long sound snippets for which it is known which material and action was involved in making the sound. Yamnet is a 14 layer convolutional neural network that maps a sound's spectrogram to 521 common-sense auditory event classes.

Installation

This code has been tested on Windows, Linux and Mac. While x86 architectures were successfully set up, ARM architectures were found to have too many version conflicts. If you are using an Apple-silicon Mac, you are thus referred to switch your machine, e.g. to Google Colab. You will then need to ensure you are using a python 3.9x or 3.10x version. It is also recommended to have no more than the basic python packages installed in order to prevent version conflicts of the hereby installed packages. In your terminal, enter the root directory of the downloaded repository and execute the below line. In Colab, start a new code cell with a percentage sign (%) and then paste the below line thereafter. Then restart your code editor if you are using your local machine or the notebook's kernel if you are in Colab.

pip install .

Pre-processing

The pre-processing code first passes the sounds through yamnet to obtain the latent representation at each layer. It then uses principal component analysis to reduce the dimensionality of these representations. Researchers who are interested in verifying or adjusting this part of code can use the Preprocess.ipynb file as a starting point. This file walks the user through downloading the github repository, installing dependencies and running the pre-processing scripts and saving the results. It can for instance be opened in Google Colab.

Processing

Researchers who are interested in verifying or adjusting the actual analysis can skip the pre-processing and use the existing pre-processed data that is stored in this github repository. The main analysis is demonstrated in Process.ipynb which involves classification of the sounds into materials and actions at each of Yamnet's latent layers as well as the disentanglement of the latent space using an invertible flow model. The latent representations of sounds are then perturbed and systematic changes in Yamnets output are demonstrated as a result. The notebook provides numerous statistical tests and figures for these analyses.

latentaudio's People

Contributors

timhenry1995 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.