Giter Site home page Giter Site logo

songeraclassifier's Introduction

As part of CS224N here at Stanford I began learning about the various uses of deep learning in natural language processing. As part of this course, I decided to begin a project to try and classify music genre using lyrics only which has typically been a tough problem in the music information retrieval (MIR) field. At the culmination of the course I was so invested in the course that I continued working on it and eventually published this research in ISMIR 2017, held in Suzhou, China.

To begin the project I took inspiration from the paper by Yang et al. using a Hierachical Attention Network (HAN) to classify documents. Similarly to documents, lyrics contain a hierachical structure: words go into lines, lines into sections (verse/chorus/...), and sections then form the whole song. Further, from the attention mechanism we can then extract and visualise where the network is applying its weights.

Using intact lyrics the song can be split into layers. At each layer we apply a bidirectional recurrent neural network (RNN) to obtain hidden state representations. The attention mechanism is then applied to form a weighted sum of that layers hidden representations i.e. a weighted sum of the word hidden representation vectors forms the line vector. We have now passed to a higher layer and can repeat the process until we finally end up with a vector which summarizes the whole song, from which we can classify via a softmax activation.

The structure of the network can be seen for the example song of 'Happy Birthday' below. HAN Architecture

The HAN model was written using TensorFlow, with Keras as the top layer whenever possible. This great blog post by Richard Liao really helped guide me, since I was just starting out with TensorFlow then.

I was incredibly lucky to have been provided intact song lyrics from LyricFind, without which this project would not have been possible. After pre-processing the lyrics and tokenizing them we were able to train the HAN and test its performance. The HAN was compared to several baseline model and performed well compared to previous research, although the LSTM outperformed in in the 20 genre case. Results can be seen here: HAN results

HN is the HAN network without the attention mechanism. HAN-L is the HAN model with layers at the word and line level. HAN-S is the HAN model with layers at the word and section level.

As is evident, classifying solely by lyrics remains a hard task! It's hard to compare to previous research, with varying number of genres used, varying genres in those lists, and no real standardisation of genres. However, we believe that these scores were some of the best reported!

The benefit of using the attention mechanism is the abilitiy to now feed in lyrics and visualise where the netowrk is applying heavy weights. In other words, we can see which words, or lines, the network deems important to classifying the genre. Below are some examples where the network correctly predicts the genre. HAN results HAN results HAN results

We can see heavy weights on classic country words 'baby', 'driveway', 'ai', etc. Similarly, for the hip-hop/rap song we see mispellings and lite-cuss words heavily weighted. When heavier cuss words are used the HAN similarly applied heavy weights. One interesting pattern in rock we noticed was the heavy weighting of second-person pronouns such as 'you' or 'your'. This is contrasted by the weighting of first-person pronouns 'me', 'mine' in hip-hop/rap!

Of course, the network didn't get it right all the time. Below are some examples of the HAN incorrectly classifying the genre, although in each case you can see why it has done so. HAN results HAN results HAN results

Classification by lyrics alone is always going to be tricky, with vague genre boundaries and cover songs hardening matters. However, in combination with audio or symbolic data we believe the HAN could help boost genre classification accuracies.

All the code is avalable in this repo, although I apologise in advance for not being formatted very cleanly.

For all the gory details you can read the full paper here.

Please get in touch if you have any questions!

songeraclassifier's People

Contributors

alextsaptsinos avatar gazlaws-dev avatar

Stargazers

Wade avatar

Watchers

James Cloos avatar  avatar paper2code - bot avatar

Forkers

morristech

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.