Powered by: data mining, web automation, deep learning, TensorFlow, Google's Universal Sentence Encoder
Do you find job hunting in this modern world such boring work, just as I do? Fear not, here is an AI job finder software for you! The AI learns from your job preferences and automatically browses Indeed everyday and sends you personalized new job recommendations tailored to you. Say goodbye to hours of your life wasted on chores, and spend the minimal time to send over resumes. Focus on what's important in life. Automate away the chores.
In development, this project is divided into 3 parts.
- Data mining and labeling
- Utilizes python selenium API to scrap Indeed.com for data
- Label positive data. Dataset is small.
- Mine large amounts of negative data. Dataset here is huge.
- Train, evaluate, and export deep learning model
- Uses tf.keras API to construct sequential model
- Model stacks Google's Universal Sentence Encoder module with a DNN top level classifier
- Uses class weights to compensate for a highly unbalanced dataset
- Train to maximize val_auc value (auc: Area Under the Curve)
- Further optimize neural net architecture using Keras Tuner and Hyperband algorithm to fine tune hyperparameters
- Evaluate based on confusion matrix, true/false postives/negatives, precision and recall
- Export trained model to Saved Model format
- Have a dedicated computer as a host tf_server
- Indeed crawler
- Uses selenium headless mode for background automation tasks
- Performs task daily to find new jobs in the area around the human user
In production, the task is shared between 2 computers or virtual machines.
- Computer A performs daily web crawling on Indeed.com and collects new jobs in the area.
- Computer B acts as the TensorFlow model server.
- A collects input data, sends to B.
- B performs 1 batched feedforward pass on its neural net, and sends output vector to A.
- A interprets the output vector, and compiles report of its findings, sends to human user.
a sample email report. links provided for easier access to job page. ranked based on probability.
model metrics when evaluated against probability_threshold = 0.5
model architecture. majority of model is frozen. module through transfer learning from tf_hub