Contains the .py files & .txt files for Udacity's Feature Selection Machine Learning Mini Project on the Enron email dataset
-
parse_out_email_text.py A .py file that parses an email, removes metadata and stems it
-
vectorize_text.py _A .py file that extracts features prepares the emails for classification"
-
find_signture.py _A .py file that creates training and testing sets, creates the classifier, evaluates the accuracy of model and identifies the most predictive feature.
-
from_chris.txt A text document required for running vectorixe_text.py
-
from_sara.txt A text document required for running vectorixe_text.py
Running these files requires the Enron email dataset - a link to this can be found in the details below
-
All other necessary files can be found at the Udacity github account found here: https://github.com/udacity/ud120-projects
-
Instructions on how to get started are found here: https://www.udacity.com/course/viewer#!/c-ud120/l-2254358555/m-2959448580
-
The complete e-mail data set can be downloaded from here: https://www.cs.cmu.edu/~./enron/enron_mail_20150507.tgz - if you have run startup.py from the tools folder cloned from the Udacity github then you don't need to download from the above link.