Giter Site home page Giter Site logo

taher-dohadwala / better-job-finder Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 259 KB

Bring your own model job search platform, where you can label and train your own personalized recommendation model.

License: MIT License

Jupyter Notebook 34.69% Python 64.12% Shell 1.19%

better-job-finder's Introduction

better-job-finder

Python Streamlit TensorFlow Bert

Project Goal

To develop a "bring your own model" job search platform where you can label and train your own personalized job recommendation model.


Background

Job titles often have different underlying roles. When looking for a job with a particular role, the job title alone cannot guarantee a role matching what you are looking for.

For example, the job title "Data Scientist" is generic, and contains different job roles. Data science can be broken down into 3 main roles: Data engineer, Data analyst, and ML engineer. The problem with job posting websites is that the particular data science role you are looking for is filled with the noise of the other roles. By having a recommendation model that is based on your own preferences, it can help to reduce noise in the job search space.


Demo

demo


Table of contents


Installation

Step 1: Clone Repo

git clone https://github.com/Taher-Dohadwala/better-job-finder.git

Step 2: Run setup script
Setup directory structure, create virtual env (venv), and install dependencies

bash setup.sh

Step 3: Download starter model, place into models/recommendation directory
Google Drive


Usage

To run the Job Finder Platform app

streamlit run job_finder_platform.py

To fine-tune model with new labeled data

python training.py

Initial data collection

First attempts to scrape data from job posting failed due scraping too much and being captcha blocked.

Project development then continued with data scraped and posted on Kaggle:

These 3 datasets were explored and combined to form the initial training dataset for our language model.


Recommendation Model

Utilized the ๐Ÿค— Hugging Face framework for Transformers combined with TensorFlow.
The recommendation model is the bert-base-uncased Transformer model

Using helper scripts to aid with the labeling process, manually labeled 300 job descriptions, as "Interesting" or "Not interesting".

Then transferred the training script to Google Colab and fine-tuned the Transformer model for sequence classification for 20 epochs.


Data Streaming

The Job Finding platform streams job postings from multiple sources.

The DataStreamer Object utilizes the DataSource interface to allow for easily adding new data sources.

Currently only Indeed.com is scraped for job postings. The limitations to data streaming is that random sleep between scrapes is neccesary in order to not be blocked via captcha.


Job Finder Platform

Utilized Streamlit for rapid UI prototyping. The Job Finder Platform contains two pages.

The first being the manual search and label page. Users have the option to expand and read the full job description, and decide whether it was Interesting or Not interesting. At the bottom on the page is a button to save the labels which can be used to fine-tune the Transformer model in the future.

The second being the search and view only the recommended jobs. Users can enter job searches and recommendation model will return only the highest confidence jobs.


Monetization Capability

Selling user personalized dataset of job searches, location, and job results that they thought are interesting and not interesting.

How does this makes money?

Selling data to job sites gives them another dimension of characterization for each person. This can lead to them providing better job results, that end up being applied too.


better-job-finder's People

Contributors

taher-dohadwala avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.