Giter Site home page Giter Site logo

headhunter-cv-analysis's Introduction

HeadHunter CV data analysis

Overview

This project is focused on analysing a dataset of CVs from a job search platorm HeadHunter.

The analysis steps include:

  1. Exploring data
  2. Transforming data and engineering useful features
  3. Visualizing resulting data and looking for dependencies
  4. Cleaning data by removing duplicates and outliers using three-sigma rule and logarithmic Z-score

The outcome of the project is a dataset prepared to be used as a part of the training sample for various ML models such as linear or logistic regression for predicting certain outcomes such as what is a competitive salary for a candidate with specific experience and requests.

Datasets

  • Пол -> Gender
  • возраст -> Age
  • ЗП -> Salary (desired salary in RUB)
  • Ищет работу на должность -> Desired position
  • Город -> City
  • переезд -> Relocation (readiness to relocate)
  • командировки -> Travel (readiness to travel)
  • Занятость -> Availability (full-time, part-time etc.)
  • График -> Schedule (workdays schedule)
  • Опыт работы -> Experience
  • Последнее/нынешнее место работы -> Last/current company
  • Последняя/нынешняя должность -> Last/current position
  • Образование и ВУЗ -> Level of education and university
  • Обновление резюме -> CV last updated on
  • Авто -> Car
  • currency
  • per (time interval of the measurement - e.g. 'D' is a day)
  • date
  • time
  • close (closing price in RUB)
  • vol (trading volume)
  • proportion (how many units of the currency the close price involve. E.g. if close for USD $= 120$ and proportion $= 2$, then the USD<>RUB rate is $120 / 2 = 60$)

Data visualization examples

Median salary (RUB) expectation by level of education

salary_by_edu

Candidates with higher education expect the highest salary, while the general and special school grads expect to be paid the lowest. In the middle, there are people who didn't yet finish their higher education.

Salary (RUB) expectation by city

salary_by_city

Candidates in Moscow expect the highest pay check while candidate expectations in other cities are distributed similarly.

Median salary (RUB) expectation by readiness to relocate & travel

salary_by_reloc_trave

The highest expected salary is amongts candidates who are willing to relocate byt not willing to travel for business trips. The second highest are professionals who are ready to do both, with the lowest expected salary observed in the segment of those whoe aren't ready to relocate or travel.

Median salary (RUB) expectation by age and education level

salary_age_edu

Amongst the candidates with higher degree, there seems to be a trend that the salary expectations grow with the age, which is quite logical.

At the same time, professionals between 18 and 22 years old hold similar salary expectations regardless of their education level. That can be potentially explain by their focus on getting experience rather than a high pay check.

Work experience VS age

xp_age

We can clearly see 7 outliers whose work experience is larger or equal to their age (impossible).

Data cleaning example

Removing outliers by age

log_distro_age

Since the age distribution had a shape similar to logarithmic, I logarithmically scaled this feaure and used the 3-sigma rule to find potential outliers.

After than, I ran a Z-score analysis to verify and remove the outliers by their age.

Tech stack

Language & version

Data analysis

Data visualization

headhunter-cv-analysis's People

Contributors

gettergit avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.