Using the given data set for New York City Current Job Posting data.
Focus on applying the learnt data analytics concepts and try to share your findings on following aspects:
- a) What are the highest paid Skills in the US market?
- b) What are the job categories, which involve above mentioned niche skills?
- c) Applying clustering concepts, please depict visually what are the different salary ranges based on job category and years of experience.
The results should consists of
- a) The python script file or Jupyter notebook containing all the code for the proposed solution. Write all code in single file only, with proper comments. Don’t include data file in the zipped file.
- b) A word document file containing answer to the following three sub questions (as asked above ) based on the analysis that you have carried out earlier.
- a. Obtain a structure for the data using Python Programming Language – 1 marks
- b. Create the required schema to read the data into the required format into rows and columns – 1 marks
- c. Schema must be normalized, field types must be appropriate as per fields available. Proper data model e.g. – 8 marks Select the appropriate features (columns) and parse the same, cleanup if required and convert to required categories
- a. Identify the required variables
- a. Reason for the selection of the variable above
- a. What text parsing applied on for the required fields
- a. Missing values exist in following columns - 5 marks
- b. Special characters in some columns need to be handled - 5 marks
- a. What are the highest paid Skills in the US market? – 20 marks
- i. Python code which queries on Top 10 Skills with Salary ranges – 15 marks
- ii. If student have depicted using graphs, it would be good – 5 marks
- b. What are the job categories, which involve above mentioned niche skills? – 20 marks
- i. Python code which queries and depicts Top 10 Job categories with above query result-set skills – 10 marks
- ii. Graph to be plotted - 10 marks
- c. Applying clustering concepts, please depict visually what are the different salary ranges based on job category and years of experience. – 10 marks
- i. Graphically plot all 3 dependent variables – i.e. job category, salary and years of experience
- ii. Graph must be readable and understandable
- iii. Graph type chosen
- iv. Graph colour used
- v. Legend and labels used