Updated June 2019

I abandoned the previous idea that I had for finding it too untractable at the moment before I learn more about machine learning
Current objective: creating audio samples by training GANs to synthesize spectrograms that can be c onverted to sound.

Objectives

Create a small dataset of focused audio samples (for example snare drum sounds). I have been told that creating these kinds of audio samples is not really something worth pursuing, as any drum sound can be compared and trivialised to a simple knocking sound. But I would like to think that expert listeneres, sound engineers, sound designers, musicians and music producers would disagree. There is a lot of inherent quality to certain sounds (the low rumble of a kick drum for example) that I don't think has been quiet achieved, and is worth exploring at the moment.
Train a GAN on the spectrograms of these audio samples. We will require a good resolution of these spectrograms for good conversion results.
Find a good technique to do the reverse transform from spectrogram to audio. There are many techniques at the moment, but which one will be used is still up for consideration. (Griffin-lim, deep griffin-lim iteration, gan based approach)

Master Thesis

Generate musical content from text or video
This repo will be: a storage space for all relevant material, links and research papers that I find; a "diary" for my ideas and thought process; as well as a notebook to report my research advancement, progress and new things that I learn and discover along the way.

Ideas to explore:

Idea 5/2/2019: Create a musical concept graph? We could create a graph that represents the emotional content of the text or video
Idea 5/3/2019: I need to have some input data set to train on. Maybe I can create some program (like the video to music program that I am making), to create music in some structured way. If I could generate and produce a bunch of pieces, that I will still revise later on, then I could create a meaningful data set to train on with an RNN or Wavenet. Could this be useful? Maybe. Probably not.
Idea 5/7/2019: So basically what my research will be boiling down to is, creating a musical sequence from other types of sequences such as text or video, such that it can be translated in a meaningful way. Why si this relevant? Because not a lot of research has gone into this yet and it could be a useful media application for musicians and those who work in a field tangentially related to music.
5/7/2019: Assume we create a model that is able of generating a sequence of musical notes from some arbitrary input sequence? How would we train and evaluate it's results? One way of doing this could be by crowdsourcing, have the results dynamically generated on a webpage and let people evaluate the results, since after all the human ear is still the best evaluator. So for example we give the person that is evaluating a bunch of tags ("such as happy or sad") to classify the output of the model, or maybe let them input a value between 0 and 10 to evaluate how well the model did. I still have to think about how to implement these specifics.

Tools:

Tensorflow crucial package for creating NN models
PyTorch A replacement for NumPy to use the power of GPUs install from here and check if you have a CUDA enabled GPU here, and here is a tutorial on how to use pyTorch with deep learning
TensorBoard to visualize the training process, a nice feature of tensorflow
JupyterLab
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.

Plotting Libraries:

This article presents a couple

Audio Datasets that aren't speech:

Nsynth Dataset, audio sample dataset of musical notes a magnitude larger than any other dataset on the internet (300k samples)
Macauly Library, library that has 513,285 animal call recordings + labeled spreadsheets for all of these recordings
Splice can easily be used to create a small dataset for drum sounds in addition to free sample packs on LANDR and CYMATICS
1. Splice needs a subscription, but after that you can use the samples for whatever purpose
2. LANDR lots of free sample packs
3. CYMATICS more free sample packs

Concepts:

Mu-Law Quantization, helps reduce the dynamic range of a waveform
Companding Transformation
Cross Entropy Explained
Difference Between Entropy and Cross-Entropy
Best Video for understanding the Fourier Transform

ahmadmoussa / thesis-proposal Goto Github PK

thesis-proposal's Introduction

Updated June 2019

Objectives

Master Thesis

Ideas to explore:

Tools:

Plotting Libraries:

Audio Datasets that aren't speech:

Concepts:

Probability Concepts for machine learning:

Auto-encoders:

GANs:

Other relevant material:

Optimization techniques:

Convolutional networks:

Recurrent Neural Networks

Resources:

thesis-proposal's People

Contributors

Stargazers

Watchers

Recommend Projects

Recommend Topics

Recommend Org