To develop a "bring your own model" job search platform where you can label and train your own personalized job recommendation model.
Job titles often have different underlying roles. When looking for a job with a particular role, the job title alone cannot guarantee a role matching what you are looking for.
For example, the job title "Data Scientist" is generic, and contains different job roles. Data science can be broken down into 3 main roles: Data engineer, Data analyst, and ML engineer. The problem with job posting websites is that the particular data science role you are looking for is filled with the noise of the other roles. By having a recommendation model that is based on your own preferences, it can help to reduce noise in the job search space.
- Installation
- Usage
- Initial data collection
- Recommendation Model
- Data Streaming
- Job Finder Platform
- Monetization Capability
Step 1: Clone Repo
git clone https://github.com/Taher-Dohadwala/better-job-finder.git
Step 2: Run setup script
Setup directory structure, create virtual env (venv), and install dependencies
bash setup.sh
Step 3: Download starter model, place into models/recommendation directory
Google Drive
To run the Job Finder Platform app
streamlit run job_finder_platform.py
To fine-tune model with new labeled data
python training.py
First attempts to scrape data from job posting failed due scraping too much and being captcha blocked.
Project development then continued with data scraped and posted on Kaggle:
- 10000 Data Scientist Job Postings from the USA
- Data Science Job Posting on Glassdoor
- Data Scientist Jobs
These 3 datasets were explored and combined to form the initial training dataset for our language model.
Utilized the ๐ค Hugging Face framework for Transformers combined with TensorFlow.
The recommendation model is the bert-base-uncased Transformer model
Using helper scripts to aid with the labeling process, manually labeled 300 job descriptions, as "Interesting" or "Not interesting".
Then transferred the training script to Google Colab and fine-tuned the Transformer model for sequence classification for 20 epochs.
The Job Finding platform streams job postings from multiple sources.
The DataStreamer Object utilizes the DataSource interface to allow for easily adding new data sources.
Currently only Indeed.com is scraped for job postings. The limitations to data streaming is that random sleep between scrapes is neccesary in order to not be blocked via captcha.
Utilized Streamlit for rapid UI prototyping. The Job Finder Platform contains two pages.
The first being the manual search and label page. Users have the option to expand and read the full job description, and decide whether it was Interesting or Not interesting. At the bottom on the page is a button to save the labels which can be used to fine-tune the Transformer model in the future.
The second being the search and view only the recommended jobs. Users can enter job searches and recommendation model will return only the highest confidence jobs.
Selling user personalized dataset of job searches, location, and job results that they thought are interesting and not interesting.
Selling data to job sites gives them another dimension of characterization for each person. This can lead to them providing better job results, that end up being applied too.