Giter Site home page Giter Site logo

mynameisvinn / emailparser Goto Github PK

View Code? Open in Web Editor NEW
85.0 4.0 17.0 18 KB

remove signature blocks from emails

License: MIT License

Python 60.36% Jupyter Notebook 39.64%
python natural-language-processing nlp email-parsing signature-blocks email-parser

emailparser's People

Contributors

mozammilwy avatar mynameisvinn avatar toshiro92 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

emailparser's Issues

ZeroDivisionError


ZeroDivisionError Traceback (most recent call last)
in
1 file = r"C:\Users\Sashankh\axaxaxax.txt"
----> 2 convert(file)

in convert(fname, threshold)
11 fn = fname.split(".")
12 new_fname = fn[0] + "_clean." + fn[1]
---> 13 _generate_text(sentences, new_fname)
14
15 def _read_email(fname):

in _generate_text(sentences, fname, threshold)
43 with open(fname, "w") as new_file:
44 for sentence in sentences:
---> 45 if _prob_block(sentence, tagger) < threshold:
46 new_file.write(sentence)
47

in prob_block(sentence, pos_tagger)
61 doc = pos_tagger(sentence)
62 verb_count = np.sum([token.pos
!= "VERB" for token in doc])
---> 63 return float(verb_count) / len(doc)

ZeroDivisionError: float division by zero

module not working even for the sample email

Hi I was trying using your code in Python 3.6. and it gives the following error:

Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: nlp.add_pipe(nlp.create_pipe('sentencizer')) Alternatively, add the dependency parser, or set sentence boundaries by setting doc[i].sent_start

Not sure how to rectify this. Can you please help

Misinterpreting headers

This may be useful for some emails but the idea of taking the ratio of non-verbs to total words in a given sentence will not always work. If you have a html email with headers like

Tent Setup Guide Newsletter

This strategy will fault in this example as the given heading does not have verb structures and has 100% non verbs. This strategy will replace useful features such as headers in the given mail.

TypeError: a bytes-like object is required, not 'str'

Hi vin,

Nice code! Very useful!!

I'm trying to re-run your code using your example, but I got the following error message. Sincerely hope you could help me with. Thanks a lot!


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-5e07bfb80a1e> in <module>()
      1 pos_tagger = English()
      2 msg_raw = read_email('emails/test0.txt')
----> 3 sentences = corpus2sentences(msg_raw)

~\Desktop\EmailParser-master\Parser.py in corpus2sentences(corpus)
     12     """split corpus into a list of sentences.
     13     """
---> 14     return corpus.strip().split('\n')
     15 
     16 def generate_text(sentences, pos_parser, fname, threshold=0.9):

TypeError: a bytes-like object is required, not 'str'

Thanks,
S

OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

Getting this error while running the example file also -

`---------------------------------------------------------------------------
OSError Traceback (most recent call last)
in
----> 1 convert(fname) # a copy of test0, without signature block

~\EmailParser-master\Parser.py in convert(fname, threshold)
11 fn = fname.split(".")
12 new_fname = fn[0] + "_clean." + fn[1]
---> 13 _generate_text(sentences, new_fname)
14
15 def _read_email(fname):

~\EmailParser-master\Parser.py in _generate_text(sentences, fname, threshold)
41 Lower thresholds will result in more false positives.
42 """
---> 43 tagger = spacy.load('en_core_web_sm')
44
45 with open(fname, "w") as new_file:

~\Anaconda3\lib\site-packages\spacy_init_.py in load(name, **overrides)
28 if depr_path not in (True, False, None):
29 warnings.warn(Warnings.W001.format(path=depr_path), DeprecationWarning)
---> 30 return util.load_model(name, **overrides)
31
32

~\Anaconda3\lib\site-packages\spacy\util.py in load_model(name, **overrides)
173 elif hasattr(name, "exists"): # Path or Path-like to model data
174 return load_model_from_path(name, **overrides)
--> 175 raise IOError(Errors.E050.format(name=name))
176
177

OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.