Giter Site home page Giter Site logo

jaedukseo / wav2rec Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tariqahassan/wav2rec

0.0 1.0 0.0 2.19 MB

Self-supervised neural network for music recommendations.

Home Page: https://TariqAHassan.github.io/wav2rec/

License: MIT License

Shell 0.34% Python 99.66%

wav2rec's Introduction



Overview

Wav2Rec is a library for music recommendation based on recent advances in self-supervised neural networks.

Installation

pip install git+git://github.com/TariqAHassan/wav2rec@main

Requires Python 3.7+

How it Works

Wav2Rec is built on top of recently developed techniques for self-supervised learning, whereby rich representations can be learned from data without explict labels. In particular, Wav2Rec leverages the simple siamese (or SimSam) neural network architecture proposed by Chen and He (2020), which is trained with the objective of maximizing the similarity between two augmentations of the same image.

In order to adapt SimSam to work with audio, Wav2Rec introduces two modifications. First, raw audio waveforms are converted into (mel)spectrograms, which can be seen as a form of image. This adaption allows the use of a standard image model encoders, such as ResNet50 or Vision Transformer (see audionets.py). Second, while spectrograms can been seen as form of image, in actuality their statistical properties are quite different from those found in natural images. For instance, because spectrograms have a temporal structure, flipping along this temporal dimension is not a coherent augmentation to perform. Thus, only augmentations which respect the unique statistical properties of spectrograms have been used (see transforms.py).

Once trained, music recommendation is simply a matter of performing nearest neighbour search on the projections obtained from the model.

Quick Start

Training

The Wav2RecNet model, which underlies Wav2Rec() (below), can be trained using any audio dataset. For an example of training the model using the FMA dataset see experiments/fma/train.ipynb.

Inference

The Wav2Rec() class along with a Wav2RecDataset() dataset can be used to generate recommendations of similar music.

from pathlib import Path
from wav2rec import Wav2Rec, Wav2RecDataset

MUSIC_PATH = Path("music")  # directory of music
MODEL_PATH = Path("checkpoints/my_trained_model.ckpt")  # trained model

my_dataset = Wav2RecDataset(MUSIC_PATH, ext="mp3").scan()

model = Wav2Rec(MODEL_PATH)
model.fit(my_dataset)

Once fit, we can load a piece of sample piece of audio

waveform = my_dataset.load_audio(Path("my_song.mp3"))

and get some recommendations for similar music.

metrics, paths = model.recommend(waveform, n=3)

Above, metrics is a 2D array which stores the similarity metrics (cosine similarity by default) between waveform and each recommendation. The paths object is also a 2D array, but it contains the paths to the recommended music files.

Note: To get an intuition for the representations that will underlie these recommendations, check out experiments/fma/inference.ipynb.

Documentation

Documentation can be found here.

References

Papers

@misc{grill2020bootstrap,
    title = {Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning},
    author = {Jean-Bastien Grill and Florian Strub and Florent Altché and Corentin Tallec and Pierre H. Richemond and Elena Buchatskaya and Carl Doersch and Bernardo Avila Pires and Zhaohan Daniel Guo and Mohammad Gheshlaghi Azar and Bilal Piot and Koray Kavukcuoglu and Rémi Munos and Michal Valko},
    year = {2020},
    eprint = {2006.07733},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG}
}
@misc{chen2020exploring,
    title={Exploring Simple Siamese Representation Learning}, 
    author={Xinlei Chen and Kaiming He},
    year={2020},
    eprint={2011.10566},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Software

Research Ideas

There are a lot of interesting ways self-supervised learning could be used with music data. Below, I have listed a few ideas that may be worth exploring.

  • As part of preprocessing, use a model to remove or dampen vocals. This would likely have the effect of shifting recommendation away from any given artist and towards the genre the artist works in.
  • As part of preprocessing, use a model to isolate the vocals. This would likely have the opposite effect of the change above, pushing the model towards artist matching and away from genre recommendations.
  • Using this technique to construct feature vectors which can be used to condition a music GAN (see Nistal, Lattner & Richard (2021) for similar ideas).

wav2rec's People

Contributors

tariqahassan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.