better-job-finder

Project Goal

To develop a "bring your own model" job search platform where you can label and train your own personalized job recommendation model.

Background

Job titles often have different underlying roles. When looking for a job with a particular role, the job title alone cannot guarantee a role matching what you are looking for.

For example, the job title "Data Scientist" is generic, and contains different job roles. Data science can be broken down into 3 main roles: Data engineer, Data analyst, and ML engineer. The problem with job posting websites is that the particular data science role you are looking for is filled with the noise of the other roles. By having a recommendation model that is based on your own preferences, it can help to reduce noise in the job search space.

Demo

Installation
Usage
Initial data collection
Recommendation Model
Data Streaming
Job Finder Platform
Monetization Capability

Installation

Step 1: Clone Repo

git clone https://github.com/Taher-Dohadwala/better-job-finder.git

Step 2: Run setup script
Setup directory structure, create virtual env (venv), and install dependencies

bash setup.sh

Step 3: Download starter model, place into models/recommendation directory
Google Drive

Usage

To run the Job Finder Platform app

streamlit run job_finder_platform.py

To fine-tune model with new labeled data

python training.py

Initial data collection

First attempts to scrape data from job posting failed due scraping too much and being captcha blocked.

Project development then continued with data scraped and posted on Kaggle:

These 3 datasets were explored and combined to form the initial training dataset for our language model.

Recommendation Model

Utilized the 🤗 Hugging Face framework for Transformers combined with TensorFlow.
The recommendation model is the bert-base-uncased Transformer model

Using helper scripts to aid with the labeling process, manually labeled 300 job descriptions, as "Interesting" or "Not interesting".

Then transferred the training script to Google Colab and fine-tuned the Transformer model for sequence classification for 20 epochs.

Data Streaming

The Job Finding platform streams job postings from multiple sources.

The DataStreamer Object utilizes the DataSource interface to allow for easily adding new data sources.

Currently only Indeed.com is scraped for job postings. The limitations to data streaming is that random sleep between scrapes is neccesary in order to not be blocked via captcha.

Job Finder Platform

Utilized Streamlit for rapid UI prototyping. The Job Finder Platform contains two pages.

The first being the manual search and label page. Users have the option to expand and read the full job description, and decide whether it was Interesting or Not interesting. At the bottom on the page is a button to save the labels which can be used to fine-tune the Transformer model in the future.

The second being the search and view only the recommended jobs. Users can enter job searches and recommendation model will return only the highest confidence jobs.

Monetization Capability

Selling user personalized dataset of job searches, location, and job results that they thought are interesting and not interesting.

How does this makes money?

Selling data to job sites gives them another dimension of characterization for each person. This can lead to them providing better job results, that end up being applied too.

taher-dohadwala / better-job-finder Goto Github PK