Giter Site home page Giter Site logo

raudioanalysis's Introduction

raudioanalysis

Alternate to the pyaudioanalysis python package in R. This R module is an alternaticve to the pyaudioanalysis library in python. It is still in the development stage. Up to this point, feature extraction has been implemented in R. Various fearures have been extracted taking into consideration the various types of categories that audio can be categorized into. The followings features were extracted:

● Zero Crossing Rate

● Energy

● Entropy of Energy

● Spectral Centroid

● Spectral Spread

● Spectral Entropy

● Spectral Flux

● Spectral Rolloff

● MFCCs

● Chroma Vector

We selected the features on the basis of information obtained from existing python packages like Pyaudioanalysis and from academic literature that was available regarding audio classification.

When successive samples of the audio input have different algebraic signs then zero crossing rate occurs. Measure of frequency content of a signal is known as the arte of zero crossings. It is the number of times the amplitude of the audio input passes through the value of zero in a given frame of time, in simple words how frequently it changes its sign from positive to negative or vice versa. Also analysis suggest that high crossing rate implies speech signal is unvoiced and low crossing rate implies signal is voiced.

Entropy of energy is the entropy of sub-frames' normalized energies. It can be interpreted as a measure of abrupt changes. Speech signals appear more unordered as compared to music considering the observations of spectrograms and signals.

Spectral Centroid is used to determine where does the centre of mass of the given spectrum lies. It is closely related to the brightness of a sound. Mathematically it is computed by taking the weighted average of of all the frequencies in the given signal which in turn are computed with the help of fourier transform by using the magnitudes as weights. Sometimes spectral centroid is used in reference with the median of the input because both of them are measures of central tendency. So at times both of them display similar behaviour in some of the situations. Higher centroid values indicate much brighter textures having high rising frequencies. Centroid also gives shape to sharpness of sound which is again related to high frequency of sound.

A technique used for transmitting telecommunication or radio signals is known as Spectral Spread. Higher the value of spectral spread, more distributed the spectrum is on both sides if the centroid whereas lower values implies that the spectrum is highly confined near the centroid.

Spectral Entropy is a quantitative analysis of spectral disorder of a given input audio. Mainly it is used to determine voiced and silence regions of speech. It finds use in speech recognition because of its this discriminatory property. Entropy can also be used to capture peakness and formants of a distribution.

The spectral flux is defined as the squared difference between the normalized magnitudes of successive spectral distributions that correspond to successive signal frames. Flux is an important feature for the separation of music from speech. This feature is specifically used to regulate the timbre of the audio signal.

Spectral Rolloff is referred to as the critical frequency below which eighty five percent of magnitude distribution of the input is concentrated. Similar to that of spectral centroid it is a measure of spectral shape which yields values for frequencies in high ranges. So it can be concluded that there is a strong correlation between spectral centroid and spectral rolloff.

MFCCs are a compact representation of the spectrum of an audio signal taking into account the nonlinear human perception of pitch, as described by the mel scale. The Short Time Fourier Transform (STFT) coefficients of each frame are grouped into a set of 40 coefficients. MFCC is calculated using a set of 40 weighting curves that in turn simulate the frequency perception of the human hearing system. Next step in the process is to take logarithm of the coefficients. Post that decorrelation is done by doing discrete cosine transform (DCT). Normally the five first coefficients are taken as features. Small musical units are detected the combination of chords and MFCCs. Chroma vector refers to a set of twelve pitch classes represented as a vector. Distribution of energy is calculated for every Chroma vector and as a result we get an updated audio signal of twelve dimensional Chroma distribution vector.

The code is divided into various files based on their application. Up to this point, the feature extraction file is complete, while the classification file is in progress.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.