Giter Site home page Giter Site logo

fake-jobs's Introduction

fake-jobs

Kaggle competition: classification problem on imbalanced tabular data. Part of the 'Machine Learning' course grading, for the Data Science MSc by the UB (2022-23).

Description

Are you able to spot fake ads?

In this dataset we have ads textual descriptions as well as contextual and metadata information. The goal is to identify which of them are fake ads.

Features

  • job_id: ID of the Ads
  • title: title of the ad
  • location: Name of the location:
  • department: Name of the department in the company where the candidate will be hired.
  • salary_range: Range of salary…
  • company_profile: Description of the company
  • description: Description of the job
  • requirements: list of mandatory requirements for application
  • benefits: additional benefits to the job description
  • telecommuting: True if telework is available
  • has_company_logo: True if the ad shows the logo
  • has_questions: True if screening questions are present
  • employment_type: categorical description of the required dedication of the offering (full-time, part-time, …)
  • required_experience: categorical with required entry level experience title
  • required_education: categorical with required education required
  • industry: categorical with type of industry (telecom, automotive, …)
  • function: categorical summarizing job function (sales, it, consulting, engineering, …)
  • requireddoughnutscomsumption: normalized average amount of doughnuts that the employee is expected to consume every day.

Labels

  • fraudulent: corresponds to the desired feature to be predicted. (0: non-fraudulent, 1: fraudulent)

Evaluation

The evaluation metric for this competition is Mean F1-Score. The F1 score, commonly used in information retrieval, measures accuracy using the statistics precision and recall.

The F1 metric weights recall and precision equally, and a good retrieval algorithm will maximize both precision and recall simultaneously. Thus, moderately good performance on both will be favored over extremely good performance on one and poor performance on the other.

Submission Format

For every ad in the dataset, submission files should contain two columns: Id and Category. Id corresponds to the id of the data sample (not the ad id). And Category is an integer with value 0 or 1 according to the prediction.

The file should contain a header and have the following format:

Id,Category
1,1

Run the scripts

Requisites

A dedicated conda environment should be created to be able to run the scripts:

$ conda create env -f environment/env.yml
$ conda activate fake-jobs 

Execute

Once with the environment activated, place yourself in the package folder and just type the following to run the scripts:

python inference.py

fake-jobs's People

Contributors

gcastro-98 avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.