- I abandoned the previous idea that I had for finding it too untractable at the moment before I learn more about machine learning
- Current objective: creating audio samples by training GANs to synthesize spectrograms that can be c onverted to sound.
- Create a small dataset of focused audio samples (for example snare drum sounds). I have been told that creating these kinds of audio samples is not really something worth pursuing, as any drum sound can be compared and trivialised to a simple knocking sound. But I would like to think that expert listeneres, sound engineers, sound designers, musicians and music producers would disagree. There is a lot of inherent quality to certain sounds (the low rumble of a kick drum for example) that I don't think has been quiet achieved, and is worth exploring at the moment.
- Train a GAN on the spectrograms of these audio samples. We will require a good resolution of these spectrograms for good conversion results.
- Find a good technique to do the reverse transform from spectrogram to audio. There are many techniques at the moment, but which one will be used is still up for consideration. (Griffin-lim, deep griffin-lim iteration, gan based approach)
- Generate musical content from text or video
- This repo will be: a storage space for all relevant material, links and research papers that I find; a "diary" for my ideas and thought process; as well as a notebook to report my research advancement, progress and new things that I learn and discover along the way.
- Idea 5/2/2019: Create a musical concept graph? We could create a graph that represents the emotional content of the text or video
- Idea 5/3/2019: I need to have some input data set to train on. Maybe I can create some program (like the video to music program that I am making), to create music in some structured way. If I could generate and produce a bunch of pieces, that I will still revise later on, then I could create a meaningful data set to train on with an RNN or Wavenet. Could this be useful? Maybe. Probably not.
- Idea 5/7/2019: So basically what my research will be boiling down to is, creating a musical sequence from other types of sequences such as text or video, such that it can be translated in a meaningful way. Why si this relevant? Because not a lot of research has gone into this yet and it could be a useful media application for musicians and those who work in a field tangentially related to music.
- 5/7/2019: Assume we create a model that is able of generating a sequence of musical notes from some arbitrary input sequence? How would we train and evaluate it's results? One way of doing this could be by crowdsourcing, have the results dynamically generated on a webpage and let people evaluate the results, since after all the human ear is still the best evaluator. So for example we give the person that is evaluating a bunch of tags ("such as happy or sad") to classify the output of the model, or maybe let them input a value between 0 and 10 to evaluate how well the model did. I still have to think about how to implement these specifics.
-
Tensorflow crucial package for creating NN models
-
PyTorch A replacement for NumPy to use the power of GPUs install from here and check if you have a CUDA enabled GPU here, and here is a tutorial on how to use pyTorch with deep learning
-
TensorBoard to visualize the training process, a nice feature of tensorflow
-
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.
- Nsynth Dataset, audio sample dataset of musical notes a magnitude larger than any other dataset on the internet (300k samples)
- Macauly Library, library that has 513,285 animal call recordings + labeled spreadsheets for all of these recordings
- Splice can easily be used to create a small dataset for drum sounds in addition to free sample packs on LANDR and CYMATICS
- Mu-Law Quantization, helps reduce the dynamic range of a waveform
- Companding Transformation
- Cross Entropy Explained
- Difference Between Entropy and Cross-Entropy
- Best Video for understanding the Fourier Transform
- Kullback Leibler Divergence Explained in Detail, Count Bayesie
- Kullback leibler divergence overview, Medium Article
- Video explaining information entropy
- Course on information entropy MIT
- This guy has a very Interesting blog
- Why use the log probability for gradient descent instead of just the probability, very interestion!
- Thought Vectors Explained
- Skip or residual connections
- Conditioning and FiLM conditioning