Giter Site home page Giter Site logo

jobcatcher's Introduction

Unit test result

JobCatcher

JobCatcher Screenshot

JobCatcher is a daemon that retrieves job offers from multiple job boards feeds and generates custom RSS feeds and HTML reports for you. This is a decentralized software meant to run of your own server.

JobCatcher comes with a filter feature, so you can filter company names with black or whitelists.

Think of it as a RSS feed reader with filter feature.

I would like this software to be under GPLv2 License. But I need to check if this is compatible with dependencies I've choosen.

Work in Progress

The project is fully in development and many features need to be implemented. It is developed in Python. This is my first time I use Python on a non-basic project. So I guess my code is not so pythonic ... yet. Feel free to help me or show me mistakes I could have made or improvements I could do.

Dependencies

python-html2text, python-requests, python-beautifulsoup

Usage (mainly development options for now)

--all              sync the blacklist, fetch the offers and generates reports.
--feeds            download the all feeds in the config
--feed=JOBBOARD    download only the feed from JOBBOARD in the config
--pages            download the all pages in the config
--page=JOBBOARD    download only the pages from JOBBOARD in the config
--inserts          inserts all pages to offers
--insert=JOBBOARD  insert JOBBOARD pages to offers
--moves            move datas to offer
--move=JOBBOARD    move JOBBOARD datas to offer
--clean=JOBBOARD   clean offers from JOBBOARD source
--report           generate a full report
--version          output version information and exit

Reports are generated into the local "www" directory.

I start jobcatcher.py -s manually with crontab for now. But this should change soon.

List of supported Job Boards

Unit test result

  • Apec.fr (France)
  • Cadreonline (France)
  • Eures (Europe)
  • PoleEmploi (France)
  • Progressive Recruitment (France)
  • RegionsJob
  • CentreJob (France)
  • NordJob (France)
  • PacaJob (France)
  • RhoneJob (France)
  • EstJob (France)
  • OuestJob (France)
  • SudOuestJob (France)
  • ParisJob (France)

TODO

  • Lolix.org (France)
  • Linux.com (Int.)
  • L'eXpress-Board (France)
  • Remixjobs.com (France)

Installation

Debian, Ubuntu

# Install a packages
apt-get update
apt-get install sqlite3 python-pip git 
pip install virtualenv virtualenvwrapper


# Configure virtualenvwrapper
cat << EOF >> ~/.bashrc
export WORKON_HOME=$HOME/.virtualenvs
export PROJECT_HOME=$HOME/Devel
source /usr/local/bin/virtualenvwrapper.sh
EOF
source ~/.bashrc

# Prepare jobcatcher environment
mkvirtualenv --no-site-packages -p /usr/bin/python2.7 jobcatcher
add2virtualenv /opt/JobCatcher

# Install jobcatcher project
cd opt
git clone -b unstable https://github.com/badele/JobCatcher.git
cd JobCatcher
pip install -r requirements.txt

Utilisation

Modify the config.py and execute

workon jobcatcher
python jobcatcher.py --all

Help us to add new job boards to JobCatcher ! :)

Contributors

jobcatcher's People

Contributors

badele avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

jobcatcher's Issues

Dynamic report: empty pages

It seams that when the last of dynamic pages contains exactly offers_per_page offers, an empty page is added after it.

Add 'report' keyword to test.py for configs

$ python test.py
ERROR: test_jobcatcher (main.TestPackages)

Execute the jobcatcher functions

Traceback (most recent call last):
File "test.py", line 157, in test_jobcatcher
jobcatcher.executeall(configs)
File "/home/yscialom/Projets/JobCatcher/jobcatcher.py", line 530, in executeall
generatereport(conf)
File "/home/yscialom/Projets/JobCatcher/jobcatcher.py", line 535, in generatereport
r.generate()
File "/home/yscialom/Projets/JobCatcher/jobcatcher.py", line 221, in generate
self.generateReport(True)
File "/home/yscialom/Projets/JobCatcher/jobcatcher.py", line 323, in generateReport
self.header(report)
File "/home/yscialom/Projets/JobCatcher/jobcatcher.py", line 239, in header
if self.configs['report']['dynamic']:
KeyError: 'report'

--clean option doesn't work anymore

(jobcatcher)yoann@thinkpad:~/dev/apec$ python jobcatcher.py --clean=Apec
Traceback (most recent call last):
  File "jobcatcher.py", line 1241, in <module>
    clean(configs, options.clean)
  File "jobcatcher.py", line 1095, in clean
    utilities.db_checkandcreate(conf)
  File "/home/yoann/dev/apec/utilities.py", line 183, in db_checkandcreate
    if not db_istableexists(configs, 'offers'):
  File "/home/yoann/dev/apec/utilities.py", line 189, in db_istableexists
    conn = lite.connect(configs['database'])
TypeError: 'Config' object has no attribute '__getitem__'

Requests connection errors (Connection reset by peer) crash JobCatcher

(jobcatcher)yoann@yoann:~/dev/perso/JobCatcher$ python jobcatcher.py --all
Download http://www.centrejob.com/fr/rss/flux.aspx?&fonction=10
Traceback (most recent call last):
  File "jobcatcher.py", line 879, in <module>
    executeall(configs)
  File "jobcatcher.py", line 650, in executeall
    downloadfeeds(conf)
  File "jobcatcher.py", line 679, in downloadfeeds
    downloadfeed(conf, jobboardname)
  File "jobcatcher.py", line 672, in downloadfeed
    plugin.downloadFeeds(feeds)
  File "/home/yoann/dev/perso/JobCatcher/jobcatcher.py", line 120, in downloadFeeds
    self.downloadFeed(feed, interval, forcedownload)
  File "/home/yoann/dev/perso/JobCatcher/jobcatcher.py", line 113, in downloadFeed
    utilities.downloadFile(feed['url'], datas, saveto, True, interval)
  File "/home/yoann/dev/perso/JobCatcher/utilities.py", line 151, in downloadFile
    r = download(url, datas)
  File "/home/yoann/dev/perso/JobCatcher/utilities.py", line 125, in download
    r = requests.get(url)
  File "/home/yoann/.virtualenvs/jobcatcher/local/lib/python2.7/site-packages/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/home/yoann/.virtualenvs/jobcatcher/local/lib/python2.7/site-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/yoann/.virtualenvs/jobcatcher/local/lib/python2.7/site-packages/requests/sessions.py", line 361, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/yoann/.virtualenvs/jobcatcher/local/lib/python2.7/site-packages/requests/sessions.py", line 464, in send
    r = adapter.send(request, **kwargs)
  File "/home/yoann/.virtualenvs/jobcatcher/local/lib/python2.7/site-packages/requests/adapters.py", line 356, in send
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.centrejob.com', port=80): Max retries exceeded with url: /fr/rss/flux.aspx?&fonction=10 (Caused by <class 'socket.error'>: [Errno 104] Connection reset by peer)

Add a dynamic pagination system

− yoannsculo: "I have collected 6481 offers on my database. Usually it takes a while (some seconds) to load the entire page. With the javascript firefox tells me the jquery scripts are not responding. I don't know how we could solve this. Either we use a paging system or we find a other way to lighten the page. If you want I can send you my database".

  • yscialom : "As of the final aim of JobCatcher is to display fresh job offers to the user, a database as big as 6481 entries seems a bit overkill. Don't we want to automatically delete old offers (say, older that two monthes)? In addition to that, i'm willing to add a dynamic pagination system. I'm happy to know thhough that my code isn't working properly for big reports; i'll work on that."

jobcatcher -s crashes when a job provider website is not (x)html complient?

See attached log:
$ ./jobcatcher.py -s
Fetching CENTREJOB
main date Sat, 12 Oct 2013 00:00:00 +0200
Downloading http://www.centrejob.com/clients/offres_chartees/offre_chartee_modele.aspx?numoffre=77148&de=consultation#xtor=RSS-105217407
Processing 77148
Downloading http://www.centrejob.com/clients/offres_chartees/offre_chartee_modele.aspx?numoffre=77143&de=consultation#xtor=RSS-105217407
Processing 77143
Fetching PROGRESSIVE
Traceback (most recent call last):
File "./jobcatcher.py", line 220, in
bot.run()
File "./jobcatcher.py", line 135, in run
item.fetch()
File "/home/yscialom/Etudes/recherche-emploi/JobCatcher/jobboards/Progressive.py", line 103, in fetch
self.fetch_url(url)
File "/home/yscialom/Etudes/recherche-emploi/JobCatcher/jobboards/Progressive.py", line 41, in fetch_url
xmldoc = minidom.parseString( content )
File "/usr/lib/python2.7/xml/dom/minidom.py", line 1931, in parseString
return expatbuilder.parseString(string)
File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 940, in parseString
return builder.parseString(string)
File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 223, in parseString
parser.Parse(string, True)
xml.parsers.expat.ExpatError: mismatched tag: line 58, column 6

Dynamic report does not work on google chrome

Google Chrome (tested version 30) does not accept the folowing syntax:

attribute: function(arg = default_value) {
   ...
}

It may also refuse any sort of default value for a function argument.

--p2pinit error: undefined getjobboardlist

(jobcatcher)yoann@thinkpad:~/dev/apec$ python jobcatcher.py --p2pinit
Traceback (most recent call last):
  File "jobcatcher.py", line 1307, in <module>
    p.initcache()
  File "jobcatcher.py", line 421, in initcache
    plugins = getjobboardlist(self.configs)
NameError: global name 'getjobboardlist' is not defined

[Report] 'NA' Salary filter seems to filter offers without 'NA' strings

For example, I can see an offer with the 'SMIC' string when I don't enable NA filtering. If I do, I can't find the SMIC offer anymore. Same thing for other strings such as 'Negociable', 'A determiner', 'selon CCN51', ... that should not disappear untill I put them in the filter :P

Use random user agent

For prevent blocking by jobboard site, use the random user agent for downloading page

Jobcatcher is blocked when there is an error in a Jobboard class

Traceback (most recent call last):
  File "jobcatcher.py", line 1261, in <module>
    executeall(configs, selecteduser)
  File "jobcatcher.py", line 1000, in executeall
    downloadfeeds(conf, selecteduser)
  File "jobcatcher.py", line 1032, in downloadfeeds
    plugin = utilities.loadJobBoard(jobboardname, conf)
  File "/home/yoann/dev/perso/JobCatcher/utilities.py", line 175, in loadJobBoard
    'jobboards.%s' % jobboardname
  File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
ImportError: No module named cadresonline

It should not be blocking.

Eures - strftime error in analyzePage

Error in unstable (57e3d23) with jobcatcher -a

==================================
Eures
==================================
Download http://ec.europa.eu/eures/eures-searchengine/servlet/./ShowJvServlet?lg=FR&pesId=62&uniqueJvId=004GKRV&nnImport=false 
Download http://ec.europa.eu/eures/eures-searchengine/servlet/./ShowJvServlet?lg=FR&pesId=62&uniqueJvId=004GPYD&nnImport=false 
Download http://ec.europa.eu/eures/eures-searchengine/servlet/./ShowJvServlet?lg=FR&pesId=62&uniqueJvId=007PFFX&nnImport=false 
Download http://ec.europa.eu/eures/eures-searchengine/servlet/./ShowJvServlet?lg=FR&pesId=62&uniqueJvId=007WGKK&nnImport=false 
Download http://ec.europa.eu/eures/eures-searchengine/servlet/./ShowJvServlet?lg=FR&pesId=62&uniqueJvId=008RSZH&nnImport=false 
Download http://ec.europa.eu/eures/eures-searchengine/servlet/./ShowJvServlet?lg=FR&pesId=62&uniqueJvId=008TBSP&nnImport=false 
Download http://ec.europa.eu/eures/eures-searchengine/servlet/./ShowJvServlet?lg=FR&pesId=62&uniqueJvId=009XWRX&nnImport=false 
Download http://ec.europa.eu/eures/eures-searchengine/servlet/./ShowJvServlet?lg=FR&pesId=62&uniqueJvId=009XWRZ&nnImport=false 
Download http://ec.europa.eu/eures/eures-searchengine/servlet/./ShowJvServlet?lg=FR&pesId=62&uniqueJvId=009XWSG&nnImport=false 
Download http://ec.europa.eu/eures/eures-searchengine/servlet/./ShowJvServlet?lg=FR&pesId=62&uniqueJvId=009XWYB&nnImport=false 
Traceback (most recent call last):
  File "./jobcatcher.py", line 638, in <module>
    executeall(configs)
  File "./jobcatcher.py", line 544, in executeall
    pagesinsert(conf)
  File "./jobcatcher.py", line 574, in pagesinsert
    fd.analyzesPages()
  File "./jobcatcher.py", line 79, in analyzesPages
    plugin.analyzePage(content.url, content.page)
  File "/home/yoann/dev/apec/jobboards/Eures.py", line 81, in analyzePage
    "%d/%m/%Y").strftime('%s')
TypeError: must be string, not None

[Report] Create popups with content of the offers

When we click on an offer, it would be nice to display a popup with :

  • a small openstreetmap square map with a location information
  • the whole offer information and content
  • a link to the jobboard

[Report] Numbers of offers in the menu are not correct

Counter values differ from the number of displayed offers (without any filter)

For instance, my counters show:

    All 575 offers
    538 filtered offers (93.57%)
    37 blacklisted offers (6.43%)

I have 30 offers / page

  • All offers -> 17 pages with 17 offers left on the 18th (total of 527 offers)
  • filtered offers -> 16 pages with 11 offers left on the 17th (total of 491 offers)

Can't call jobcatcher from www

$ pwd
.../JobCatcher/www
$ ../jobcatcher --all
Traceback (most recent call last):
  File "../jobcatcher.py", line 905, in <module>
    executeall(configs)
  File "../jobcatcher.py", line 675, in executeall
    initblacklist(conf)
  File "../jobcatcher.py", line 691, in initblacklist
    utilities.blocklist_load(conf)
  File "/home/yscialom/Projets/JobCatcher/utilities.py", line 279, in blocklist_load
    fp = open('blacklist_company.txt', 'r')
IOError: [Errno 2] No such file or directory: 'blacklist_company.txt'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.