bjherger / resumeparser Goto Github PK

A framework to parse resumes, extract contact & other information, and check for required terms

Python 100.00%

resumeparser's Introduction

ResumeParser

A utility to make handling many resumes easier by automatically pulling contact information, required skills and custom text fields. These results are then surfaced as a convenient summary CSV.

Quick Start Guide

# Install requirements
pip install -r requirements.txt

# Retrieve language model from spacy
python -m spacy download en

# Run code (with default configurations)
cd bin/
python main.py

# Review output
open ../data/output/resume_summary.csv

Getting started

Repo structure

bin/main.py: Code entry point
confs/confs.yaml.template: Configuration file template
data/input/example_resumes: Example resumes, which are parsed w/ default configurations
data/output/resume_summary.csv: Results from parsing example resumes

Python Environment

Python code in this repo utilizes packages that are not part of the common library. To make sure you have all of the appropriate packages, please use pip to install the requirements.txt file. For more details, please see the pip documentation

Configuration file

This program utilizes a configuration file to set program parameters. You can run this program with the default parameters view sample output, but you'll probably want to create a config file and modify it to get the most value from this program.

# Create configuration file from template
scp confs/confs.yaml.template confs/confs.yaml

# Modify confs to match your needs
open confs/confs.yaml

The configuration file has a few parameters you can tweak:

resume_directory: A directory containing resumes you'd like to parse
summary_output_directory: Where to place the .csv file, summarizing your resumes
data_schema_dir: The directory to store table schema. This is mostly for development purposes
skills: A YAML list of skills. Each element in this list can either be a string (e.g. skill1 or machine learning), or a list aliases for the same skill (e.g. [skill2_alias_A, skill2_alias_B] or [ml, machine learning, machine-learning])
universities: A YAML list of universities you'd like to search for

Contact

Feel free to contact me at 13herger <at> gmail <dot> com. If you're interested in projects like this, check out my website and blog

resumeparser's People

Contributors

Stargazers

Watchers

Forkers

vfulco rublev09 pronebel ingeniouslabs 210230 thomas0sae arjunpmm projjol-zz caohy1988 bckenstler hpcosta libardo1 weisong82 hennik xyz8 drsnowbird boben ryankramer rock999 priyanka-parida ursweetpal orgatnlp akshayjh gtaneja phoenixlqh sthitaprajnas alvincjin samthebrand gragtah tadjibaev sherlockjjj bobquest33 param-harrison tscung nafets33 harolcalzada sohag07hasan ihirendev abhi55555 hscup rava-dosa faithgithub repautle calmzeala meloncloud ilumtics woakes070048 shubhampachori12110095 joriscram gengkunling saketjnu ek-ok engahmed1190 ufukhurriyetoglu longbinchen siddharthgopi abhutani pawankjha25 itsrrm97 afcarl karanr-hexaware abhiskaushik tranvantriet algobasket hbcbh1999 scorpionfay pilgrim2go vamsijkrishna bharatg faisal-rizwan36 bhaskarshankarling kumar-rajendran layangi tamarouc anishpurohit stungkit yiyujia kashenfelter pavithraprbd vishwas31 matias0422 sambhav13 mrsam ryubi paridhikhaitan letubert jironghuang dave45678 jordankupersmith tusharlp mohit-nathrani govindsharma7 kranthi4711 rishabh1193 p2c2e sagarkarnati iwangu zeroows poseidon4006 naveenkoneti

resumeparser's Issues

TypeError: Argument 'string' has incorrect type (expected unicode, got str)

Hi there,

Is this an install issue or a bug:

INFO:root:Archiving data set schema(s) for step name: extract
INFO:root:Working data_set: observations
INFO:root:End extract
INFO:root:Begin transform
Traceback (most recent call last):
File "main.py", line 111, in
main()
File "main.py", line 39, in main
observations, nlp = transform(observations, nlp)
File "main.py", line 81, in transform
observations['candidate_name'] = observations['text'].apply(lambda x:
File "/home/user/.local/lib/python2.7/site-packages/pandas/core/series.py", line 3591, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/lib.pyx", line 2217, in pandas._libs.lib.map_infer
File "main.py", line 82, in
field_extraction.candidate_name_extractor(x, nlp))
File "/home/user/Developer/ResumeParser-master/bin/field_extraction.py", line 13, in candidate_name_extractor
doc = nlp(input_string)
File "/home/user/.local/lib/python2.7/site-packages/spacy/language.py", line 427, in call
doc = self.make_doc(text)
File "/home/user/.local/lib/python2.7/site-packages/spacy/language.py", line 453, in make_doc
return self.tokenizer(text)
TypeError: Argument 'string' has incorrect type (expected unicode, got str)

Thanks

what's the cons/pros compare with another tool antonydeepak/ResumeParser?

https://github.com/antonydeepak/ResumeParser

python main.py thowing errors

python main.py
Traceback (most recent call last):
File "main.py", line 20, in
from bin import field_extraction
File "/home/lance/Downloads/ResumeParser-master/bin/field_extraction.py", line 3, in
from gensim.utils import simple_preprocess
File "/home/lance/.local/lib/python2.7/site-packages/gensim/init.py", line 5, in
from gensim import parsing, corpora, matutils, interfaces, models, similarities, summarization, utils # noqa:F401
File "/home/lance/.local/lib/python2.7/site-packages/gensim/parsing/init.py", line 4, in
from .preprocessing import (remove_stopwords, strip_punctuation, strip_punctuation2, # noqa:F401
File "/home/lance/.local/lib/python2.7/site-packages/gensim/parsing/preprocessing.py", line 42, in
from gensim import utils
File "/home/lance/.local/lib/python2.7/site-packages/gensim/utils.py", line 45, in
from smart_open import smart_open
File "/home/lance/.local/lib/python2.7/site-packages/smart_open/init.py", line 28, in
from .smart_open_lib import open, parse_uri, smart_open, register_compressor
File "/home/lance/.local/lib/python2.7/site-packages/smart_open/smart_open_lib.py", line 24, in
import urllib.parse
ImportError: No module named parse

Project lacks CI/CD

Testing on Py3.X, Py2.7, windows and linux should be sufficient. Could use CircleCI, and this config as a template.

README should be up to date

README should be up to date. Known issues:

README references requirements.txt, not environment.yaml
Contact section email not formatted correctly
Anaconda environment does not have descriptive name

Getting errors when setting up latest version of Resume Parser?

I have install anaconda and when i want to create the environment from the environment.yml file
I type -conda env create -f environment.yml
And get this error-
resolve package not found -openssl 1.0.2h 2 conda.

By the way you guys have done a great job, Thank you :)

requirements.txt / environment.yaml are out of sync

The following resources are out of sync, and should be updated:

requirements.txt
envionrment.yaml
REAMDME.md

requirement.txt is the most up to date / correct.

After running once or twice get into this error: File Not Found: ../confs/config.yaml.template

I am constantly getting this error after some running program for 2 3 times.
File Not Found Error: Error 2: No Such File or Directory: ../confs/config.yaml.template
however this is present in your given repository which I downloaded. I am facing this problem for like 3 - 4 hours now. It just arises from no where after running perfectly for some time.

Issue while creating virtual environment.

Unicode encoding error

Just cloned the environment installed it, yet I faced this issue once run main.py
File "main.py", line 112, in
main()
File "main.py", line 43, in main
load(observations, nlp)
File "main.py", line 105, in load
observations.to_csv(path_or_buf=output_path, index_label='index')
File "/usr/lib64/python2.7/site-packages/pandas/core/generic.py", line 3020, in to_csv
formatter.save()
File "/usr/lib64/python2.7/site-packages/pandas/io/formats/csvs.py", line 172, in save
self._save()
File "/usr/lib64/python2.7/site-packages/pandas/io/formats/csvs.py", line 288, in _save
self._save_chunk(start_i, end_i)
File "/usr/lib64/python2.7/site-packages/pandas/io/formats/csvs.py", line 315, in _save_chunk
self.cols, self.writer)
File "pandas/_libs/writers.pyx", line 75, in pandas._libs.writers.write_csv_rows
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)

"Can you please describe how to extract work segment"?

Hi bjherger,

Thank you so much to you and appreciate for your work.

I tried to run your code. and modified according to my requirement in my local. It works fine for some cases. I am testing my code with different pdfs patterns. It working fine but Now I stuck in problem.
my pdf_to_text_list is below:

'b"Software Developer',
 'Software Developer - Coulsdon Sixth Form College',
 'London',
 'Work Experience',
 'Software Developer',
 'Coulsdon Sixth Form College - Coulsdon, Surrey, UK',
 'June 2015 to Present',
 'Roles and Responsibilities: ',
 '\\xe2\\x80\\xa2 Core developer to build the college information database and add on new features to the web applications. ',
 '\\xe2\\x80\\xa2 Develops several Groovy Restful API endpoints in spring and tests it in swagger-ui. ',

Now I am trying to get work_segment from above text but it gives me only below result:
['Work Experience', 'Software Developer']

it is not go to the 'June 2015 to Present' line. Where this line or any words not in education keywords, skill keywords, project keywords, other keywords and education degree keywords.

Can you please help me in this?
why it not take that particular sentence and it break while loop?

Thanks in Advance.

Environment.yml is over specified

The project only really needs textract, spacy and pandas. Should remove everything else

Building requirments fails

Hi,

I installed Python 2.7 because in 3 the build fails.

Unfortunately it also fails in 2.7 with these errors

ERROR: Command errored out with exit status 1:
command: /usr/bin/python /usr/local/lib/python2.7/dist-packages/pip/_vendor/pep517/_in_process.py build_wheel /tmp/tmpvQx73l
cwd: /tmp/pip-install-0pWXCD/murmurhash
Complete output (26 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
creating build/lib.linux-x86_64-2.7/murmurhash
copying murmurhash/about.py -> build/lib.linux-x86_64-2.7/murmurhash
copying murmurhash/init.py -> build/lib.linux-x86_64-2.7/murmurhash
creating build/lib.linux-x86_64-2.7/murmurhash/tests
copying murmurhash/tests/test_import.py -> build/lib.linux-x86_64-2.7/murmurhash/tests
copying murmurhash/tests/test_against_mmh3.py -> build/lib.linux-x86_64-2.7/murmurhash/tests
copying murmurhash/tests/init.py -> build/lib.linux-x86_64-2.7/murmurhash/tests
copying murmurhash/mrmr.pyx -> build/lib.linux-x86_64-2.7/murmurhash
copying murmurhash/init.pxd -> build/lib.linux-x86_64-2.7/murmurhash
copying murmurhash/mrmr.pxd -> build/lib.linux-x86_64-2.7/murmurhash
creating build/lib.linux-x86_64-2.7/murmurhash/include
creating build/lib.linux-x86_64-2.7/murmurhash/include/murmurhash
copying murmurhash/include/murmurhash/MurmurHash3.h -> build/lib.linux-x86_64-2.7/murmurhash/include/murmurhash
copying murmurhash/include/murmurhash/MurmurHash2.h -> build/lib.linux-x86_64-2.7/murmurhash/include/murmurhash
running build_ext
building 'murmurhash.mrmr' extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/murmurhash
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fdebug-prefix-map=/build/python2.7-U5f0ID/python2.7-2.7.18=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -I/tmp/pip-install-0pWXCD/murmurhash/murmurhash/include -I/usr/include/python2.7 -c murmurhash/mrmr.cpp -o build/temp.linux-x86_64-2.7/murmurhash/mrmr.o -O3 -Wno-strict-prototypes -Wno-unused-function
unable to execute 'x86_64-linux-gnu-gcc': No such file or directory
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

ERROR: Failed building wheel for murmurhash
Building wheel for srsly (PEP 517) ... error
ERROR: Command errored out with exit status 1:
command: /usr/bin/python /usr/local/lib/python2.7/dist-packages/pip/_vendor/pep517/_in_process.py build_wheel /tmp/tmp6qx4HF
cwd: /tmp/pip-install-0pWXCD/srsly
Complete output (76 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
creating build/lib.linux-x86_64-2.7/srsly
copying srsly/_pickle_api.py -> build/lib.linux-x86_64-2.7/srsly
copying srsly/about.py -> build/lib.linux-x86_64-2.7/srsly
copying srsly/_json_api.py -> build/lib.linux-x86_64-2.7/srsly
copying srsly/init.py -> build/lib.linux-x86_64-2.7/srsly
copying srsly/util.py -> build/lib.linux-x86_64-2.7/srsly
copying srsly/_msgpack_api.py -> build/lib.linux-x86_64-2.7/srsly
creating build/lib.linux-x86_64-2.7/srsly/cloudpickle
copying srsly/cloudpickle/cloudpickle.py -> build/lib.linux-x86_64-2.7/srsly/cloudpickle
copying srsly/cloudpickle/init.py -> build/lib.linux-x86_64-2.7/srsly/cloudpickle
creating build/lib.linux-x86_64-2.7/srsly/ujson
copying srsly/ujson/init.py -> build/lib.linux-x86_64-2.7/srsly/ujson
creating build/lib.linux-x86_64-2.7/srsly/msgpack
copying srsly/msgpack/_version.py -> build/lib.linux-x86_64-2.7/srsly/msgpack
copying srsly/msgpack/_ext_type.py -> build/lib.linux-x86_64-2.7/srsly/msgpack
copying srsly/msgpack/exceptions.py -> build/lib.linux-x86_64-2.7/srsly/msgpack
copying srsly/msgpack/_msgpack_numpy.py -> build/lib.linux-x86_64-2.7/srsly/msgpack
copying srsly/msgpack/init.py -> build/lib.linux-x86_64-2.7/srsly/msgpack
copying srsly/msgpack/util.py -> build/lib.linux-x86_64-2.7/srsly/msgpack
creating build/lib.linux-x86_64-2.7/srsly/tests
copying srsly/tests/test_pickle_api.py -> build/lib.linux-x86_64-2.7/srsly/tests
copying srsly/tests/test_msgpack_api.py -> build/lib.linux-x86_64-2.7/srsly/tests
copying srsly/tests/init.py -> build/lib.linux-x86_64-2.7/srsly/tests
copying srsly/tests/test_json_api.py -> build/lib.linux-x86_64-2.7/srsly/tests
creating build/lib.linux-x86_64-2.7/srsly/tests/cloudpickle
copying srsly/tests/cloudpickle/testutils.py -> build/lib.linux-x86_64-2.7/srsly/tests/cloudpickle
copying srsly/tests/cloudpickle/init.py -> build/lib.linux-x86_64-2.7/srsly/tests/cloudpickle
copying srsly/tests/cloudpickle/cloudpickle_file_test.py -> build/lib.linux-x86_64-2.7/srsly/tests/cloudpickle
creating build/lib.linux-x86_64-2.7/srsly/tests/ujson
copying srsly/tests/ujson/test_ujson.py -> build/lib.linux-x86_64-2.7/srsly/tests/ujson
copying srsly/tests/ujson/init.py -> build/lib.linux-x86_64-2.7/srsly/tests/ujson
creating build/lib.linux-x86_64-2.7/srsly/tests/msgpack
copying srsly/tests/msgpack/test_pack.py -> build/lib.linux-x86_64-2.7/srsly/tests/msgpack
copying srsly/tests/msgpack/test_buffer.py -> build/lib.linux-x86_64-2.7/srsly/tests/msgpack
copying srsly/tests/msgpack/test_numpy.py -> build/lib.linux-x86_64-2.7/srsly/tests/msgpack
copying srsly/tests/msgpack/test_extension.py -> build/lib.linux-x86_64-2.7/srsly/tests/msgpack
copying srsly/tests/msgpack/test_newspec.py -> build/lib.linux-x86_64-2.7/srsly/tests/msgpack
copying srsly/tests/msgpack/test_unpack.py -> build/lib.linux-x86_64-2.7/srsly/tests/msgpack
copying srsly/tests/msgpack/test_memoryview.py -> build/lib.linux-x86_64-2.7/srsly/tests/msgpack
copying srsly/tests/msgpack/test_limits.py -> build/lib.linux-x86_64-2.7/srsly/tests/msgpack
copying srsly/tests/msgpack/test_read_size.py -> build/lib.linux-x86_64-2.7/srsly/tests/msgpack
copying srsly/tests/msgpack/test_except.py -> build/lib.linux-x86_64-2.7/srsly/tests/msgpack
copying srsly/tests/msgpack/test_subtype.py -> build/lib.linux-x86_64-2.7/srsly/tests/msgpack
copying srsly/tests/msgpack/test_format.py -> build/lib.linux-x86_64-2.7/srsly/tests/msgpack
copying srsly/tests/msgpack/init.py -> build/lib.linux-x86_64-2.7/srsly/tests/msgpack
copying srsly/tests/msgpack/test_seq.py -> build/lib.linux-x86_64-2.7/srsly/tests/msgpack
copying srsly/tests/msgpack/test_case.py -> build/lib.linux-x86_64-2.7/srsly/tests/msgpack
copying srsly/tests/msgpack/test_stricttype.py -> build/lib.linux-x86_64-2.7/srsly/tests/msgpack
copying srsly/tests/msgpack/test_sequnpack.py -> build/lib.linux-x86_64-2.7/srsly/tests/msgpack
copying srsly/ujson/JSONtoObj.c -> build/lib.linux-x86_64-2.7/srsly/ujson
copying srsly/ujson/objToJSON.c -> build/lib.linux-x86_64-2.7/srsly/ujson
copying srsly/ujson/ujson.c -> build/lib.linux-x86_64-2.7/srsly/ujson
copying srsly/ujson/version.h -> build/lib.linux-x86_64-2.7/srsly/ujson
copying srsly/ujson/py_defines.h -> build/lib.linux-x86_64-2.7/srsly/ujson
copying srsly/msgpack/_unpacker.pyx -> build/lib.linux-x86_64-2.7/srsly/msgpack
copying srsly/msgpack/packer.pyx -> build/lib.linux-x86_64-2.7/srsly/msgpack
copying srsly/msgpack/unpack.h -> build/lib.linux-x86_64-2.7/srsly/msgpack
copying srsly/msgpack/pack_template.h -> build/lib.linux-x86_64-2.7/srsly/msgpack
copying srsly/msgpack/unpack_define.h -> build/lib.linux-x86_64-2.7/srsly/msgpack
copying srsly/msgpack/pack.h -> build/lib.linux-x86_64-2.7/srsly/msgpack
copying srsly/msgpack/sysdep.h -> build/lib.linux-x86_64-2.7/srsly/msgpack
copying srsly/msgpack/unpack_template.h -> build/lib.linux-x86_64-2.7/srsly/msgpack
copying srsly/msgpack/buff_converter.h -> build/lib.linux-x86_64-2.7/srsly/msgpack
running build_ext
building 'srsly.msgpack.unpacker' extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/srsly
creating build/temp.linux-x86_64-2.7/srsly/msgpack
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fdebug-prefix-map=/build/python2.7-U5f0ID/python2.7-2.7.18=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -D__LITTLE_ENDIAN=1 -Isrsly/msgpack -I/usr/include/python2.7 -I. -I/tmp/pip-install-0pWXCD/srsly/include -I/usr/include/python2.7 -c srsly/msgpack/_unpacker.cpp -o build/temp.linux-x86_64-2.7/srsly/msgpack/_unpacker.o -O2 -Wno-strict-prototypes -Wno-unused-function
unable to execute 'x86_64-linux-gnu-gcc': No such file or directory
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

ERROR: Failed building wheel for srsly
Building wheel for cymem (PEP 517) ... error
ERROR: Command errored out with exit status 1:
command: /usr/bin/python /usr/local/lib/python2.7/dist-packages/pip/_vendor/pep517/_in_process.py build_wheel /tmp/tmpXNmkZC
cwd: /tmp/pip-install-0pWXCD/cymem
Complete output (21 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
creating build/lib.linux-x86_64-2.7/cymem
copying cymem/about.py -> build/lib.linux-x86_64-2.7/cymem
copying cymem/init.py -> build/lib.linux-x86_64-2.7/cymem
package init file 'cymem/tests/init.py' not found (or not a regular file)
creating build/lib.linux-x86_64-2.7/cymem/tests
copying cymem/tests/test_import.py -> build/lib.linux-x86_64-2.7/cymem/tests
copying cymem/cymem.pyx -> build/lib.linux-x86_64-2.7/cymem
copying cymem/init.pxd -> build/lib.linux-x86_64-2.7/cymem
copying cymem/cymem.pxd -> build/lib.linux-x86_64-2.7/cymem
running build_ext
building 'cymem.cymem' extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/cymem
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fdebug-prefix-map=/build/python2.7-U5f0ID/python2.7-2.7.18=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -I/usr/include/python2.7 -c cymem/cymem.cpp -o build/temp.linux-x86_64-2.7/cymem/cymem.o -O3 -Wno-strict-prototypes -Wno-unused-function
unable to execute 'x86_64-linux-gnu-gcc': No such file or directory
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

ERROR: Failed building wheel for cymem
Building wheel for wasabi (setup.py) ... done
Created wheel for wasabi: filename=wasabi-0.8.2-py2-none-any.whl size=23854 sha256=506b103cb21740ca634120e92daf5ca5d8c05f760ab6e6284b4fa0e6a4fa1723
Stored in directory: /home/lance/.cache/pip/wheels/e5/0d/73/60652093a6c3b50a2c65cd76ff115ef2e2dec3e923003a8ad4
Building wheel for pathlib (setup.py) ... done
Created wheel for pathlib: filename=pathlib-1.0.1-py2-none-any.whl size=14347 sha256=41182051f92aa48e962bbc560c258ceafae8e2f436aa67914f6966f53e3ee79f
Stored in directory: /home/lance/.cache/pip/wheels/46/37/4f/332bcea757140ff34e14dec7be65931f544c7ac94eb671ae9f
Building wheel for functools32 (setup.py) ... done
Created wheel for functools32: filename=functools32-3.2.3.post2-py2-none-any.whl size=14556 sha256=dad93770ee1ce525099f7bac51841217d0dd1759568e3add91bf29b2890d51b3
Stored in directory: /home/lance/.cache/pip/wheels/c2/ea/a3/25af52265fad6418a74df0b8d9ca8b89e0b3735dbd4d0d3794
Successfully built PyYAML smart-open wasabi pathlib functools32
Failed to build murmurhash srsly cymem
ERROR: Could not build wheels for murmurhash, srsly, cymem which use PEP 517 and cannot be installed directly

Can you help me able to fetch the name separately?

Hi, amazing job, but it will be helpful if i can fetch the name from the resume.

only PDF files are supported

Hi Brendan,

I was able to resolve all the dependency and environmental issues, and the program is executing well.
One limitation I found that it works only with PDF files, MS-Word file are not supported.
Can you provide solution to this?

Thanks
Rishi

can't find model 'en'

getting this error when launching main.py
guess something in confs?
IOError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

Textract Dependency Issues

I installed Anaconda and followed the install instructions to the letter. When I run python main.py I get the following error:

File "/Users/Scott/anaconda2/envs/resume/lib/python2.7/site-packages/textract/parsers/utils.py", line 101, in run
    ' '.join(args), pipe.returncode, stdout, stderr,
textract.exceptions.ShellError: The command `pdf2txt.py ../data/input/example_resumes/Brendan_Herger_Resume.pdf` failed because the executable
`pdf2txt.py` is not installed on your system.

pdf2txt is a dependency of textract, but I notice that it's commented out of the requirements.txt in 1.6.1. I tried to manually install it, but it didn't resolve the issue.

[Feature Request] Can you provide API calls?

Hey @bjherger , I have been using your old Resume Parser itself(previous version).
Can you help me by providing REST API calls?

Getting Package errors

Traceback (most recent call last):
File "G:\ResumeParser-master\bin\main.py", line 111, in
main()
File "G:\ResumeParser-master\bin\main.py", line 33, in main
observations = extract()
File "G:\ResumeParser-master\bin\main.py", line 68, in extract
observations['text'] = observations['file_path'].apply(lib.convert_pdf)
File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\series.py", line 3591, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas_libs\lib.pyx", line 2217, in pandas._libs.lib.map_infer
File "G:\ResumeParser-master\bin\lib.py", line 140, in convert_pdf
return open(output_filepath).read()
File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 481: character maps to

view.encoding()
Traceback (most recent call last):
File "<pyshell#0>", line 1, in
view.encoding()
NameError: name 'view' is not defined

For single alphabet skills such as C or R it shows every doc/pdf has that skill

In case of skills such as R or C which are single alphabet, the output isn't as expected. It shows that such skills are available in all the resumes. Looks like it is due to the regex search which finds R & C kind of alphabets in all docs. May be the approach should be relooked at, and instead of regex search, we can look at nlp techniques such NER to identify these skills.

Python 3 support

Unicode Decoding Error

I am getting the following error when trying out the code on a resume in English.

Error
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 481: character maps to

Traceback
Traceback (most recent call last):
File "main.py", line 116, in
main()
File "main.py", line 33, in main
observations = extract()
File "main.py", line 68, in extract
observations['text'] = observations['file_path'].apply(lib.convert_pdf)
File "C:\Users\pgovindaraju\Desktop\Python_Projects\Talent-Acquisition\venv\lib\site-packages\pandas\core\series.py", line 3591, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas_libs\lib.pyx", line 2217, in pandas._libs.lib.map_infer
File "C:\Users\pgovindaraju\Desktop\Python_Projects\Talent-Acquisition\ResumeParser\bin\lib.py", line 140, in convert_pdf
return open(output_filepath).read()
File "C:\Users\pgovindaraju\Desktop\Python_Projects\Talent-Acquisition\venv\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 481: character maps to

Syntax error with Resumechecker.py

Resumes not able to be parsed

Hey there, I'm running ResumeParser on Ubuntu 18.04.1 in a VirtualBox and I am running into issues when trying to parse through a set of nine resumes that I have obtained.

My environment sets up fine and the code runs, but when I look at the output .csv file, I'm finding that only three of the resumes are actually able to be parsed, while the rest have 'NOT FOUND' in the text field and blanks for all the skills that I have defined to be extracted in my configuration .yaml file.

When I try the sample resumes provided in the repository, the code parses them perfectly and is able to find relevant text, even with my specified configurations. The code seems to break when I try to use a different set of resumes.

I'm wondering if this is an issue with the resumes that I am using or if perhaps I can change configurations or something else to make the code parse them? Let me know your thoughts. Thanks!

Extracting cvs not able to complete

UnicodeDecoding Error

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 480: character maps to <undefined>

Does anyone know what causes this?

Sample Resumes Incorrect Output

For the sample resumes provided, why are we getting outputs such as ['Vrije', 'MIT'] in the universities field, even though they are not included in the resume, but only in the config file?

WindowsError when launching for first time

C:\Users\Acer\Anaconda3\envs\myenv\lib\site-packages\gensim\utils.py:1197: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
INFO:root:Begin extract
C:\Users\Acer\Documents\PFE\ResumeParser\bin\lib.py:25: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
CONFS = yaml.load(open(confs_path))
INFO:root:Found 5 candidate files
INFO:root:Subset candidate files to extensions w/ available parsers. 5 files remain
Traceback (most recent call last):
File "main.py", line 111, in
main()
File "main.py", line 33, in main
observations = extract()
File "main.py", line 68, in extract
observations['text'] = observations['file_path'].apply(lib.convert_pdf)
File "C:\Users\Acer\Anaconda3\envs\myenv\lib\site-packages\pandas\core\series.py", line 3591, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/lib.pyx", line 2217, in pandas._libs.lib.map_infer
File "C:\Users\Acer\Documents\PFE\ResumeParser\bin\lib.py", line 130, in convert_pdf
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
File "C:\Users\Acer\Anaconda3\envs\myenv\lib\subprocess.py", line 394, in init
errread, errwrite)
File "C:\Users\Acer\Anaconda3\envs\myenv\lib\subprocess.py", line 644, in _execute_child
startupinfo)
WindowsError: [Error 193] %1 is not a valid Win32 application

Getting this error when launching the main.py ..

Can't successfully run requirements.tx

/tmp/pip-install-l5j75cwk/pandas/.eggs/numpy-1.20.1-py3.8-linux-x86_64.egg/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
17 | #warning "Using deprecated NumPy API, disable it with "
| ^
pandas/_libs/algos.c: In function ‘__Pyx_modinit_type_init_code’:
pandas/_libs/algos.c:158842:3: warning: ‘tp_print’ is deprecated [-Wdeprecated-declarations]
158842 | __pyx_scope_struct____Pyx_CFunc_object__ndarrayobject_to_py.tp_print = 0;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/python3.8/object.h:746,
from /usr/include/python3.8/pytime.h:6,
from /usr/include/python3.8/Python.h:85,
from pandas/_libs/algos.c:38:
/usr/include/python3.8/cpython/object.h:260:30: note: declared here
260 | Py_DEPRECATED(3.8) int (tp_print)(PyObject , FILE , int);
| ^~
pandas/_libs/algos.c:158850:3: warning: ‘tp_print’ is deprecated [-Wdeprecated-declarations]
158850 | __pyx_type___pyx_array.tp_print = 0;
| ^~~~~~~~~~~~~~~~
In file included from /usr/include/python3.8/object.h:746,
from /usr/include/python3.8/pytime.h:6,
from /usr/include/python3.8/Python.h:85,
from pandas/_libs/algos.c:38:
/usr/include/python3.8/cpython/object.h:260:30: note: declared here
260 | Py_DEPRECATED(3.8) int (tp_print)(PyObject , FILE , int);
| ^~
pandas/_libs/algos.c:158855:3: warning: ‘tp_print’ is deprecated [-Wdeprecated-declarations]
158855 | __pyx_type___pyx_MemviewEnum.tp_print = 0;
| ^~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/python3.8/object.h:746,
from /usr/include/python3.8/pytime.h:6,
from /usr/include/python3.8/Python.h:85,
from pandas/_libs/algos.c:38:
/usr/include/python3.8/cpython/object.h:260:30: note: declared here
260 | Py_DEPRECATED(3.8) int (tp_print)(PyObject , FILE , int);
| ^~
pandas/_libs/algos.c:158870:3: warning: ‘tp_print’ is deprecated [-Wdeprecated-declarations]
158870 | __pyx_type___pyx_memoryview.tp_print = 0;
| ^~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/python3.8/object.h:746,
from /usr/include/python3.8/pytime.h:6,
from /usr/include/python3.8/Python.h:85,
from pandas/_libs/algos.c:38:
/usr/include/python3.8/cpython/object.h:260:30: note: declared here
260 | Py_DEPRECATED(3.8) int (tp_print)(PyObject , FILE , int);
| ^~
pandas/_libs/algos.c:158883:3: warning: ‘tp_print’ is deprecated [-Wdeprecated-declarations]
158883 | __pyx_type___pyx_memoryviewslice.tp_print = 0;
| ^~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/python3.8/object.h:746,
from /usr/include/python3.8/pytime.h:6,
from /usr/include/python3.8/Python.h:85,
from pandas/_libs/algos.c:38:
/usr/include/python3.8/cpython/object.h:260:30: note: declared here
260 | Py_DEPRECATED(3.8) int (tp_print)(PyObject , FILE *, int);
| ^~
x86_64-linux-gnu-gcc: fatal error: Killed signal terminated program cc1
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

ERROR: Failed building wheel for pandas

Failed to build pandas

Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-install-l5j75cwk/pandas/setup.py", line 730, in
setup(name=DISTNAME,
File "/usr/lib/python3/dist-packages/setuptools/init.py", line 144, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.8/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/lib/python3.8/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3/dist-packages/setuptools/command/install.py", line 61, in run
return orig.install.run(self)
File "/usr/lib/python3.8/distutils/command/install.py", line 589, in run
self.run_command('build')
File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3.8/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3.8/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/tmp/pip-install-l5j75cwk/pandas/setup.py", line 372, in build_extensions
self.check_cython_extensions(self.extensions)
File "/tmp/pip-install-l5j75cwk/pandas/setup.py", line 366, in check_cython_extensions
raise Exception("""Cython-generated file '{src}' not found.
Exception: Cython-generated file 'pandas/_libs/algos.c' not found.
Cython is required to compile pandas from a development branch.
Please install Cython or download a release package of pandas.

----------------------------------------

ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-l5j75cwk/pandas/setup.py'"'"'; file='"'"'/tmp/pip-install-l5j75cwk/pandas/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-vg31y4zu/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.8/pandas Check the logs for full command output.

undefined symbol: PyFPE_jbuf

Traceback (most recent call last):
File "main.py", line 13, in
import spacy
File "/user/.conda/envs/resume/lib/python2.7/site-packages/spacy/init.py", line 5, in
from .deprecated import resolve_model_name
File "/user/.conda/envs/resume/lib/python2.7/site-packages/spacy/deprecated.py", line 8, in
from .cli import download
File "/user/.conda/envs/resume/lib/python2.7/site-packages/spacy/cli/init.py", line 5, in
from .train import train, train_config
File "/user/.conda/envs/resume/lib/python2.7/site-packages/spacy/cli/train.py", line 8, in
from ..scorer import Scorer
File "/user/.conda/envs/resume/lib/python2.7/site-packages/spacy/scorer.py", line 4, in
from .gold import tags_to_entities
File "spacy/morphology.pxd", line 25, in init spacy.gold (spacy/gold.cpp:23505)
File "spacy/vocab.pxd", line 27, in init spacy.morphology (spacy/morphology.cpp:10713)
File ".env/lib/python2.7/site-packages/preshed/counter.pxd", line 13, in init spacy.vocab (spacy/vocab.cpp:19474)
ImportError: /user/.local/lib/python2.7/site-packages/preshed/counter.so: undefined symbol: PyFPE_jbuf

NoPackagesFoundError: Packages missing in current win-64 channels

Hi bjherger,

first of all, huge thanks for ResumeParser!

Unfortunately, when I try to set up the environment.yml on Windows 10 64bit and Anaconda for Py 2.7, I get the following error.

Could you please help me out here? I tried different Conda and Python versions, but nothing worked.

Thanks a lot!

`Using Anaconda API: https://api.anaconda.org
Fetching package metadata .........
Solving package specifications: .
NoPackagesFoundError: Packages missing in current win-64 channels:
  - python 2.7.12 1
  - readline 6.2 2
  - sqlite 3.13.0 0
  - tk 8.5.18 0
  - zlib 1.2.8 3
`

:Error occurred during regex search

it runs but i have multiple lines :Error occurred during regex search
and a wrong output

License?

Under what license is this code released?

Running into a type mismatch error

Hi thanks for this package, I was able to resolve dependencies and get it to run, but ran into this:

INFO:root:Begin transform
Traceback (most recent call last):
File "main.py", line 111, in
main()
File "main.py", line 39, in main
observations, nlp = transform(observations, nlp)
File "main.py", line 81, in transform
observations['candidate_name'] = observations['text'].apply(lambda x:
File "/Library/Python/2.7/site-packages/pandas/core/series.py", line 3591, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/lib.pyx", line 2217, in pandas._libs.lib.map_infer
File "main.py", line 82, in
field_extraction.candidate_name_extractor(x, nlp))
File "/Users/vladin/Downloads/ResumeParser-master/bin/field_extraction.py", line 13, in candidate_name_extractor
doc = nlp(input_string)
File "/Library/Python/2.7/site-packages/spacy/language.py", line 320, in call
doc = self.make_doc(text)
File "/Library/Python/2.7/site-packages/spacy/language.py", line 293, in
self.make_doc = lambda text: self.tokenizer(text)
TypeError: Argument 'string' has incorrect type (expected unicode, got str)

Couldn't able to run the project

I get this error when I try to run this command

source activate resume

Traceback (most recent call last):
File "/anaconda2/lib/python2.7/site-packages/conda/gateways/logging.py", line 64, in emit
msg = self.format(record)
File "/anaconda2/lib/python2.7/logging/init.py", line 734, in format
return fmt.format(record)
File "/anaconda2/lib/python2.7/logging/init.py", line 465, in format
record.message = record.getMessage()
File "/anaconda2/lib/python2.7/logging/init.py", line 329, in getMessage
msg = msg % self.args
File "/anaconda2/lib/python2.7/site-packages/conda/init.py", line 43, in repr
return '%s: %s' % (self.class.name, text_type(self))
File "/anaconda2/lib/python2.7/site-packages/conda/init.py", line 47, in str
return text_type(self.message % self._kwargs)
ValueError: unsupported format character '{' (0x7b) at index 528
Logged from file exceptions.py, line 770