Giter Site home page Giter Site logo

thesis-proposal's Introduction

Updated June 2019

  • I abandoned the previous idea that I had for finding it too untractable at the moment before I learn more about machine learning
  • Current objective: creating audio samples by training GANs to synthesize spectrograms that can be c onverted to sound.

Objectives

  • Create a small dataset of focused audio samples (for example snare drum sounds). I have been told that creating these kinds of audio samples is not really something worth pursuing, as any drum sound can be compared and trivialised to a simple knocking sound. But I would like to think that expert listeneres, sound engineers, sound designers, musicians and music producers would disagree. There is a lot of inherent quality to certain sounds (the low rumble of a kick drum for example) that I don't think has been quiet achieved, and is worth exploring at the moment.
  • Train a GAN on the spectrograms of these audio samples. We will require a good resolution of these spectrograms for good conversion results.
  • Find a good technique to do the reverse transform from spectrogram to audio. There are many techniques at the moment, but which one will be used is still up for consideration. (Griffin-lim, deep griffin-lim iteration, gan based approach)

Master Thesis

  • Generate musical content from text or video
  • This repo will be: a storage space for all relevant material, links and research papers that I find; a "diary" for my ideas and thought process; as well as a notebook to report my research advancement, progress and new things that I learn and discover along the way.

Ideas to explore:

  • Idea 5/2/2019: Create a musical concept graph? We could create a graph that represents the emotional content of the text or video
  • Idea 5/3/2019: I need to have some input data set to train on. Maybe I can create some program (like the video to music program that I am making), to create music in some structured way. If I could generate and produce a bunch of pieces, that I will still revise later on, then I could create a meaningful data set to train on with an RNN or Wavenet. Could this be useful? Maybe. Probably not.
  • Idea 5/7/2019: So basically what my research will be boiling down to is, creating a musical sequence from other types of sequences such as text or video, such that it can be translated in a meaningful way. Why si this relevant? Because not a lot of research has gone into this yet and it could be a useful media application for musicians and those who work in a field tangentially related to music.
  • 5/7/2019: Assume we create a model that is able of generating a sequence of musical notes from some arbitrary input sequence? How would we train and evaluate it's results? One way of doing this could be by crowdsourcing, have the results dynamically generated on a webpage and let people evaluate the results, since after all the human ear is still the best evaluator. So for example we give the person that is evaluating a bunch of tags ("such as happy or sad") to classify the output of the model, or maybe let them input a value between 0 and 10 to evaluate how well the model did. I still have to think about how to implement these specifics.

Tools:

  • Tensorflow crucial package for creating NN models

  • PyTorch A replacement for NumPy to use the power of GPUs install from here and check if you have a CUDA enabled GPU here, and here is a tutorial on how to use pyTorch with deep learning

  • TensorBoard to visualize the training process, a nice feature of tensorflow

  • JupyterLab

  • Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.

Plotting Libraries:

Audio Datasets that aren't speech:

  • Nsynth Dataset, audio sample dataset of musical notes a magnitude larger than any other dataset on the internet (300k samples)
  • Macauly Library, library that has 513,285 animal call recordings + labeled spreadsheets for all of these recordings
  • Splice can easily be used to create a small dataset for drum sounds in addition to free sample packs on LANDR and CYMATICS
    1. Splice needs a subscription, but after that you can use the samples for whatever purpose
    2. LANDR lots of free sample packs
    3. CYMATICS more free sample packs

Concepts:

Probability Concepts for machine learning:

Auto-encoders:

GANs:

Other relevant material:

Optimization techniques:

Convolutional networks:

Recurrent Neural Networks

Resources:

thesis-proposal's People

Contributors

ahmadmoussa avatar

Stargazers

 avatar henrique avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.