Giter Site home page Giter Site logo

albertsuarez / searchly Goto Github PK

View Code? Open in Web Editor NEW
23.0 3.0 3.0 17.8 MB

🎶 Song similarity search API based on lyrics

Home Page: https://asuarez.dev/searchly

License: MIT License

Python 70.99% Dockerfile 0.40% HTML 15.12% Shell 1.18% CSS 3.08% JavaScript 9.24%
song similarity-search api lyrics python word2vec nmslib flask redoc

searchly's Introduction


SearchLy


HitCount Python application GitHub stars GitHub forks GitHub contributors GitHub license Open Source Love

🎶 Song similarity search API based on lyrics

This API is no longer deployed for its public usage. Run your own deployment for using it.

Contents

  1. Motivation
  2. Requirements
  3. Recommendations
  4. Usage
  5. Run tests
  6. Development
    1. Development mode
    2. Logging
    3. Scripts
    4. How to add a new test
  7. Authors
  8. License

Motivation

This project was built in order to create an API for searching similarities based on song lyrics. There are a lot of songs in the industry and most of them are talking about the same topic. What I wanted to prove with SearchLy was to estimate how similar are two songs between them based on the meaning of their lyrics.

SearchLy is using a database of 150k songs from AZLyrics, using this scraper, which is being updated periodically. Then, using word2vec and NMSLIB, it was possible to create an index where you can search similarities using the k-nearest neighbors (KNN) algorithm. For having a visual image of this index, check the visualization NMSLIB tool.

The API is available here along with its documentation. Test it on this website demo.

Note: I am currently using a micro-instance from DigitalOcean where the API is deployed, so you should expect a bad performance. However, if this API becomes popular I will deploy it in a bigger instance.

Input from song Input from content Result

Requirements

  1. Python 3.7+
  2. docker-ce (as provided by docker package repos)
  3. docker-compose (as provided by PyPI)

Recommendations

Usage of virtualenv is recommended for package library / runtime isolation.

Usage

To run the API, please execute the following commands from the root directory:

  1. Setup virtual environment

  2. Install dependencies

    pip3 install -r requirements.lock
  3. Initialize database (if is not initialized)

    source db/deploy.sh
  4. Run the server as a docker container with docker-compose

    docker-compose up -d --build

    or as a Python module (after enabling the Development mode)

    python3 -m src.searchly

Run tests

  1. Run Searchly locally with the Development mode enabled.

  2. Run tests

    python3 -m unittest discover -v
    

Development

Development mode

Edit src/searchly/__init__.py and switch DEVELOPMENT_MODE flag from False to True for enabling development mode.

# DEVELOPMENT_MODE = False
DEVELOPMENT_MODE = True

Logging

For checking the logs of the whole stack in real time, the following command is recommend it:

docker-compose logs -f

Scripts

The module src/searchly/scripts contains a bunch of scripts whose allow to create and build the needed index for searching the similarity between song lyrics. It's needed to have the Development mode enabled for using the scripts.

  1. Fill database (fill_database.py): from a zip file extracted from the AZLyrics scraper, found on this repository, fills the database with all the data on it.
  2. Train (train.py): given the data of the database, extracts all the features from the song lyrics and trains a word2vec model. The results will be saved on the datafolder.
  3. Build (build.py): given the trained word2vec model, builds an NMSLIB index for allowing searchs on the API. The index file will be saved on the data folder.
  4. Extract maximum distance (extract_maximum_distance.py): given the trained word2vec model and the built index, searchs across all the database for getting the maximum distance between two points. This is needed for computing the percentage of similarity instead of returning a raw distance on the API response. The result will be saved on the data folder.

How to add a new test

Create a new Python file called test_*.py in test.searchly with the following structure:

import unittest


class NewTest(unittest.TestCase):
    
    def test_v0(self):
        expected = 5
        result = 2 + 3
        self.assertEqual(expected, result)

# ...

if __name__ == '__main__':
    unittest.main()

Authors

License

MIT © SearchLy

searchly's People

Contributors

albertsuarez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

searchly's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.