Giter Site home page Giter Site logo

seinfeld-fingerprinting's Introduction

Seinfeld Fingerprinting

CS591- Computational Audio: Final Project

Running the program

After installing dependencies:

  • Runs on Python 2.7 on CSA2.BU.EDU server
  • Uses a MySQL DB (use createtable.sql)
  • Run Main.py
  • Put sample in same directory
  • Type 1 to fingerprint and 2 to recognize

Example Run:

How it works

Consists of two main functions:

  1. Fingerprinting an episode
  2. Recognizing a sample

Fingerprinting

I followed methodologies taught in class, reviewed the fingerprinting powerpoint and used research papers that explain the Shazam method. The standard is taking the spectrogram from the Discrete Fourier Transform (DFT) and then apply a peak picking algorithm (finding local maxima). We find the local maxima at each window using a window size of 4096 and overlap of 2048. We then need to create a hash with the episode ID. This will be used during recognition when we apply the same fingerprinting algorithm to the sample and then match the hashes. To create a unique hash we take the local maxima and the delta between adjacent maximas. We then store these hashes in the database.

Raw spectrogram from Seinfeld Episode (21min): The Frogger (Episode 9 Episode 18)

(via Audacity)

I first take the raw data and apply a low pass filter with a frequency threshold of 5,000 Hz. By analyzing this spectrogram, I was able to determine how to eliminate noise. After replaying many segments and playing with filters I determined the useful data, in TV recognition, was between 20 and 5,000 Hz. I found most dialogue was in this range, as well as a few extraneous sounds. Unfortunately, unhelpful signals such as laugh tracks remained after the low pass filter.

Here we can see how the low pass filter changes the spectrogram, amplifies frequencies lower than 5,000hz and removes any from above. This should help with accuracy and efficiency.

When creating the spectrum, I used a window size of 4096 and an overlap of 2048.

Next, I need to find a strong metric in each window, which will help me “fingerprint” each episode. The industry uses local maxima and there are a few ways to do this. After running the DFT on each window (taking into account overlap), we returns a 2D array of frequencies for each window. To find the local maxima I would find the highest frequency in each list.

It’s easier and more practical to use a method from the scipy library, which can find the highest peak very quickly.

Now we turn the correctly picked local maximas into hashes, so we can easily recall them. For hashing, I used the Dejavu library, which hashes the frequency and time difference between two adjacent peaks. Solely hashing frequencies would cause many collisions, so we also hash the delta’s of the adjacent peaks.

Hash( peak frequencies, delta of adjacent peaks)

We then insert all the hashes into the database. I created my own MySQL DB for efficient read/writes. With only 13 episodes we have 42,000 fingerprints.

DB Schema: Episodes (episode_id, episode_name) Fingerprints (hash,episode_id, offset)

Relate fingerprints.episode_id = episodes.episode_id

Recognizing:

Input -> Raw Data -> Run Low Pass Filter- > Peak Picking -> Hash -> Find Matches -> determine episode with largest presence in the matches.

Essentially, we run the input sample on the same algorithm used for fingerprinting, but instead of inserting the fingerprints into the database we look for matches. Using the episode id from the match we can determine the episode name.

Example of recognition:

Although we correctly identify “The Alternate Side”, we did find many incorrect matches. The highest match count came from id 13 so we return that episode. This sample was recorded on my computer’s mic from Youtube, which accounts for added noise.

seinfeld-fingerprinting's People

Contributors

mauerbac avatar

Stargazers

 avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.