Giter Site home page Giter Site logo

spam-detector's Introduction

SPAM NO_SPAM Data Analysis

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as matplot
import matplotlib.image as mpimg
import seaborn as sns
%matplotlib inline
from wordcloud import WordCloud

import scattertext as st
import spacy
from IPython.display import HTML


import spacy
from gensim.models import word2vec
from scattertext import SampleCorpora, word_similarity_explorer_gensim, Word2VecFromParsedCorpus
from scattertext.CorpusFromParsedDocuments import CorpusFromParsedDocuments
dataset = pd.read_json("dataset.json")
dataset.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
body label subject
0 hello , offer fantastic 100 % free access most... SPAM Subject: re : free !\n
1 * * * * * * * * * * * * * * * * * * * * * * * ... SPAM Subject: bulk email profit\n
2 stock invest interest , please carefully revie... SPAM Subject: possible + 900 % stock investment ret...
3 syntax project innovationskolleg " formal mode... NOT_SPAM Subject: minus workshop split constituent\n
4 multidisciplinary periodical : call comment * ... NOT_SPAM Subject: multidisciplinary periodical : call c...
dataset.shape
(702, 3)
dataset.isnull().any()
body       False
label      False
subject    False
dtype: bool
fig, axs = plt.subplots(ncols=1, figsize=(12,6))
g = sns.countplot(dataset["label"])
plt.tight_layout()
plt.show();

png

nlp = spacy.en.English()
corpus = st.CorpusFromPandas(dataset, category_col='label',  text_col='body',nlp=nlp).build()
html = st.produce_scattertext_explorer(corpus, category='SPAM',category_name='SPAM',not_category_name='NOT_SPAM',width_in_pixels=1000)
open("Convention-Visualization.html", 'wb').write(html.encode('utf-8'));

#Notebook server crash while loading the html file. So render the html file into broswer and upload the snapshot
#visualization purpose only.
img = mpimg.imread('Convention-Visualization.png')
matplot.rcParams['figure.figsize'] = (30.0, 10.0)
plt.imshow(img)
plt.show()

png

dataset_spam = dataset.loc[dataset.label == 'SPAM',['body']]
dataset_not_spam = dataset.loc[dataset.label == 'NOT_SPAM',['body']]
dataset_spam.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
body
0 hello , offer fantastic 100 % free access most...
1 * * * * * * * * * * * * * * * * * * * * * * * ...
2 stock invest interest , please carefully revie...
5 locate anyone anywhere usa * * * * * * * old f...
6 hope n't object complete stranger mail , belie...
dataset_not_spam.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
body
3 syntax project innovationskolleg " formal mode...
4 multidisciplinary periodical : call comment * ...
7 inform untimely death jochem schindler , prof ...
9 week ago , post query language moo site . rece...
10 cycorp seek enthusiastic , highly-motivate mul...
wordcloud_spam = WordCloud(max_font_size=40).generate(' '.join(list(dataset_spam['body'])))
plt.figure()
plt.imshow(wordcloud_spam, interpolation="bilinear")
plt.axis("off")
plt.show()

png

wordcloud_not_spam = WordCloud(max_font_size=40).generate(' '.join(list(dataset_not_spam['body'])))
plt.figure()
plt.imshow(wordcloud_not_spam, interpolation="bilinear")
plt.axis("off")
plt.show()

png

spam-detector's People

Contributors

mahendrathapa avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.