Giter Site home page Giter Site logo

email-extractor's Introduction

Extract emails from rtf, txt, text, doc,docx, and PDF file

  • Install Python3 and Pip3
  • pip3 install -r requirements.txt
  • python extract_emails.py --help

Note:

If your file has a doc extension then you must have

  • On Windows you must install pypiwin32
  • On Linux or Mac Install Libre Office

pypiwin32 is a Windows python module so ignore the install error on Linux-based os.

Options

  • --dir option to provide the directory/folder absolute path, default is current folder
  • --file option to scan only one file
  • --ext option to restrict the scanning of file extensions, default all supported extensions
  • --dst option to set the output file name, by default it will print on the console

NOTE: Change the output file for each run otherwise it will overwrite the existing results.

Usage

Extract emails from a specific file xyz.pdf

python extract_emails.py --file=xyz.pdf --dst=emails.txt

Extract emails from all files from a folder/directory XYZ

python extract_emails.py --dir=XYZ --dst=emails.txt

While scanning a folder/directory you can specify file extensions as well, for example, it should only scan pdf files and then do

python extract_emails.py --dir=XYZ --dst=emails.txt --ext pdf

Scan directory but only parse doc and pdf files

python extract_emails.py --dir=XYZ --dst=emails.txt --ext pdf doc

email-extractor's People

Contributors

sonu-zomato avatar sonus21 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

email-extractor's Issues

Add exact version of the dependencies

I have noticed that the PyPDF2 doesn't support a command on their latest version. So it is best practice to include the version of the dependencies in the requirements.txt!
Cheers.

Fixes PyPDF2

To get it running on my machine, I had to make these adjustments:

  • Fork

  • Add the Six module as issued here #1

Please add six module to requirements.txt

I was getting this error

Traceback (most recent call last):
  File "E:\Downloads\email-extractor-master\extract_emails.py", line 9, in <module>
    from striprtf import striprtf
  File "E:\Downloads\email-extractor-master\striprtf.py", line 20, in <module>
    from six import unichr
ModuleNotFoundError: No module named 'six'

'pip install six' fixes it

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.