Automated Audio Content Analysis Using Convolutional Deep Neural Networks
River is a prototype built as part of my MSc thesis to analyse pre-classified audio samples from radio broadcasts and use this to build a Convolutional Neural Network to predict these classes when presented with further samples from radio broadcasts. This research is to determine if Convolutional Neural Networks are an appropriate method of classifing audio.
River uses Google's TensorBoard open-source Machine Learning library and Librosa to analyse audio signals.
sudo pip install --upgrade virtualenv
virtualenv --system-site-packages ~/tensorflow
pip3 install --upgrade matplotlib
pip3 install --upgrade tensorflow
pip3 install --upgrade librosa
River analyses audio files in WAVE (.wav) format with filenames like yyyymmdd-aaaa-bb.wav
where yyyymmdd
is a date/time, aaaa
is an integer index, and bb
is the zero-indexed class number for this audio sample. The only really important part is the class number but the files should be in the format n-n-n.wav
with the last n
the class. These should be split into a training and validation set, usually at an 80/20 ratio. This can be done by sampling the files:
To sample at an 80/20 split use a stride of 5 which will move ever 5th file to the target directory. This will create an even distribution of files for the 20% split—the remainder is 80%.
python3 sample_files.py --src=/path/to/wav/input/files/ --stride=2000 --dest=/path/to/wav/output/files/
source ~/tensorflow/bin/activate
python3 plot.py --type{wave|spec} --file=/path/to/file1.wav --file=/path/to/file2.wav
Extract features found in train
and valid
subdirectories. Creates training and validation datasets in the data
directory
python3 extract.py --dir=/path/to/audio
Use training and validation datasets from the data
directory to train a convolutional neural network.
Saves model in model
directory.
epochs
- number of training iterationsbatch
- number of examples to supply in each training epochsample
- how often the current cost is returned to the console
python3 train.py --epochs=2000 --batch=50 --sample=10
Logs during training will be created in the log
subdirectory, these can be visualised with TensorBoard
Run TensorBoard:
tensorboard --logdir=log/
View results at http://localhost:6006/
To use the model built during the Training process to classify audio samples:
python3 audition.py --file=/path/to/filename.wav
...which should yield a class prediction with a confidence score. This filename does not have to be in any particular format as it is only used to encode as a feature and doesn't need a label.
deactivate
To suppress TensorFlow logging, use tf.sh
to run the Python code.