Giter Site home page Giter Site logo

shivam4162 / chrome-extension-spam-notifier Goto Github PK

View Code? Open in Web Editor NEW

This project forked from aliasvishnu/chrome-extension-spam-notifier

0.0 1.0 0.0 54.74 MB

[Incomplete] A chrome extension that tells you if a mail you're currently drafting is going to be classified as spam or not.

Jupyter Notebook 85.74% JavaScript 11.54% HTML 2.72%

chrome-extension-spam-notifier's Introduction

Chrome extension to warn you if your mail seems like spam.

My goal was to build a decent Naive Bayes classifier, port it to Javascript and offer a Chrome extension for Gmail. However, I don't have experience with Chrome extensions and am having less time. If you would like to build it, I can help you with the classifier.

Requirements

  • Jupyter notebook
  • Python 2.7
  • Numpy

Usage

  • Check Classifier/SpamHam.ipynb for classifier code.
  • Extract Dataset.pickle and load spam and ham from it.
  • Don't run the 2nd cell, it's for demonstration only.

Features

  • Performed data cleaning.
  • Naive Bayes written from scratch.
  • Bag of words model.
  • Laplace smoothing to handle unknown words.
  • Hyperparameter search using holdout (validation) set.

Results

  • Overall test accuracy - 97%
  • [Important] False positive rate - 0.5% (Only 5 mails falsely classified as spam for every ~2000 mails).

Observations

  • Naive bayes is simple but awesome.
  • Comments on failure cases -
Mail contents Comments
['subject', 'si', 'back', 'si', 'back', 'folsom'] Mail too short classified as spam.
['subject', 'congratulations', 'congratulations', 'expanded', 'role', 'hope', 'means', 'get', 'lots', 'money', 'fewer', 'hours'] Highly likely that it is spam, unless sender knows the person
['subject', 'please', 'note', 'new', 'email', 'address', 'effective', 'today', 'please', 'send', 'future', 'correspondence', 'staceykn', '@', 'yahoo', 'com', 'thanks'] Legit failure.
['subject', 'registration', 'confirmation', 'spinner', 'com', 'thank', 'joining', 'spinner', 'com', 'web', 's', 'largest', 'source', 'free', 'streaming', 'music', 'just', 'wanted', 'confirm', 'registration', 'spinner', 'now', 'complete', 'access', 'spinner', 's', 'professionally', 'programmed', 'music', 'channels', 'entire', 'spinner', 'com', 'website', 'just', 'remind', 'player', 'website', '(', 'importantly', ')', 'music', 'totally', 'free', 'user', 'name', 'junglo', 'omitted', 'password', 'privacy', 'please', 'hang', 'email', 'can', 'easily', 'retrieve', 'user', 'name', 'forget', 'forget', 'password', '?', 'enter', 'user', 'name', 'email', 'address', 'password', 'retrieval', 'page', 'll', 'email', 'can', 'also', 'change', 'email', 'address', 'profile', 'page', 'http', 'www', 'spinner', 'com', 'profile', 'download', 'installation', 'assistance', 'visit', 'help', 'pages', 'http', 'www', 'spinner', 'com', 'help', 'download', 'questions', 'problems', 'please', 'email', 'us', 'feedback', '@', 'spinner', 'com', 'users', 'listen', 'million', 'songs', 'week', 'almost', 'every', 'genre', 'music', 'can', 'imagine', 'also', 'learn', 'everything', 'hear', 'clicking', 'player', 'get', 'bios', 'every', 'artist', 'hear', 'clicking', '"', 'artist', 'info', '"', 'purchase', 'cds', 'clicking', '"', 'buy', 'cd', '"', 'click', '"', 'rate', 'song', '"', 'let', 'djs', 'know', 'think', 'song', 'hear', 'll', 'use', 'feedback', 'determine', 'song', 'play', 'click', '"', 'channel', '"', 'see', 'week', 's', 'top', 'songs', 'channel', 're', 'playing', 'ranked', 'order', 'popularity', 'even', 'great', 'info', 'channels', 'music', 'visit', 'spinner', 'com', 'can', 'find', 'channels', 'play', 'favorite', 'artists', 'download', 'great', 'free', 'songs', 'take', 'part', 'cool', 'promotions', 'contests', 'thanks', 'registering', 'like', 'hear', 'tell', 'friend', 'spinner', 'enjoy', 'music', 'spinner', 'crew', 'received', 'email', 'error', 'can', 'unsubscribe', 'service', 'http', 'www', 'spinner', 'com', 'unsubscribe'] Legit failure. Contains a lot of 'password' and 'free' and 'download'
['subject', 'registration', 'confirmation', 'yahoo', 'account', 'information', 'help', 'reply', 'message', 'request', 'account', 'click', 'registration', 'confirmed', 'welcome', 'yahoo', 'message', 'confirms', 'new', 'account', 'yahoo', '?', 'yahoo', 'id', 'mpfarmer', 'email', 'address', 'dfarmer', '@', 'enron', 'com', '?', 'email', 'address', 'listed', 'incorrect', 'changes', 'future', '?', 'please', 'keep', 'email', 'contact', 'current', 'can', 'help', 'case', 'can', 't', 'access', 'yahoo', 'account', 'update', 'email', 'address', 'first', 'sign', 'yahoo', 'yahoo', 'personalized', 'service', 'click', '"', 'account', 'info', '"', 'top', 'page', '(', '"', 'account', 'info', '"', 'available', 'please', 'click', '"', 'options', '"', 'first', '"', 'account', 'info', '"', ')', 'field', 'need', 'change', 'called', '"', 'alternate', 'email', 'address', '"', '?', 'can', 'new', 'yahoo', 'id', '?', 'yahoo', 'mail', 'yahoo', 'mail', 'get', 'yahoo', 'com', 'email', 'address', 'yahoo', 'auctions', 'yahoo', 'auctions', 'buy', 'sell', 'free', '[', 'image', ']', 'yahoo', 'chat', 'chat', 'people', 'world', 'yahoo', 'calendar', 'yahoo', 'calendar', 'keep', 'organized', 'yahoo', 'yahoo', 'customize', 'way', 'use', 'internet', 'yahoo', 'games', 'yahoo', 'games', 'new', 'classic', 'online', 'games', 'yahoo', 'travel', 'yahoo', 'travel', 'reservations', 'tickets', 'discounts', 'information', 'yahoo', 'shopping', 'yahoo', 'shopping', 'search', 'favorite', 'products', 'brands', 'compare', 'prices', 'yahoo', 'messenger', 'yahoo', 'messenger', 'get', 'notified', 'new', 'mail', 'send', 'instant', 'messages', 'show', 'thank', 'signing', 'yahoo', '?', 'didn', 't', 'request', 'account', '?', 'please', 'click', 'didn', 't', 'request', 'longer', 'wish', 'prompted', 'deletion', 'key', 'please', 'cut', 'paste', 'following', 'text', 'box', 'requested', 'deletion', 'page', 'also', 'reply', 'email', '(', 'make', 'sure', 'copy', 'entire', 'email', 'reply', ')', 'remove', 'subject', 'line', '[', ']'] Legit failure.
['subject', 'amazon', 'com', 'order', '(', '#', ')', 'greetings', 'amazon', 'com', 'thought', 'd', 'like', 'know', 'shipped', 'items', 'today', 'completes', 'order', 'thanks', 'shopping', 'amazon', 'com', 'hope', 'see', 'soon', 'can', 'track', 'status', 'order', 'orders', 'online', 'visiting', '"', 'account', '"', 'page', 'http', 'www', 'amazon', 'com', 'account', 'can', 'track', 'order', 'shipment', 'status', 'review', 'estimated', 'delivery', 'dates', 'cancel', 'unshipped', 'items', 'return', 'items', 'much', 'following', 'items', 'included', 'shipment', 'qty', 'item', 'price', 'shipped', 'subtotal', 'dewalt', 'dw', 'k', 'volt', '"', '$', '$', 'item', 'subtotal', '$', 'shipping', '&', 'handling', '$', 'total', '$', 'paid', 'via', 'mastercard', '$', 'paid', 'via', 'gift', 'certificate', 'usd', '(', 'privacy', 'reasons', 'recipient', 's', 'address', 'shown', ')', 've', 'explored', 'links', '"', 'account', '"', 'page', 'still', 'need', 'get', 'touch', 'us', 'order', 'e', 'mail', 'us', 'orders', '@', 'amazon', 'com', 'next', 'visit', 'web', 'site', 'come', 'see', 's', 'new', 'clicking', 'link', 'right', 'hand', 'side', 'home', 'page', 'visiting', 'url', 'cs', '', 'ae', '', 'nfy', 've', 'selected', 'assortment', 'new', 'releases', 'recommendations', 'informative', 'articles', 'think', 'appeal', 'thank', 'shopping', 'amazon', 'com', 'amazon', 'com', 'earth', 's', 'biggest', 'selection', 'orders', '@', 'amazon', 'com', 'http', 'www', 'amazon', 'com']
['subject', 'singles', 'debate', 'hey', 'check', 'dude', 'smashing', 'pumkins', 'song', 'drown', 'can', 't', 'find', 'cd', 'whaz', '?', '?'] Seriously, how is this not spam?
['subject', 'valpak', 'com', '"', 'print', 'later', '"', 'link', 'link', 'print', 'ready', 'valpak', 'com', 'savings', 'page', 'requested', 'link', 'will', 'remain', 'active', 'next', 'hours', 'may', 'print', 'convenience', 'click', 'print', 'saved', 'couponsshare', 'good', 'savings', 'sense', 'forward', 'link', 'friend', 'can', 'print', 'save'] Legit failure.

References

chrome-extension-spam-notifier's People

Contributors

aliasvishnu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.