Giter Site home page Giter Site logo

dsc-exploring-and-transforming-json-schemas-lab-online-ds-grad's Introduction

Exploring and Transforming JSON Schemas - Lab

Introduction

In this lab, you'll practice exploring a JSON file whose structure and schema is unknown to you. We will provide you with limited information, and you will explore the dataset to answer the specified question.

Objectives

You will be able to:

  • Use the JSON module to load and parse JSON documents
  • Explore and extract data using unknown JSON schemas
  • Convert JSON to a pandas dataframe

Your Task: Create a Bar Graph of the Top 10 States with the Highest Asthma Rates for Adults Age 18+

The information you need to create this graph is located in disease_data.json. It contains both data and metadata.

You are given the following codebook/data dictionary:

  • The actual data values are associated with the key 'DataValue'
  • The state names are associated with the key 'LocationDesc'
  • To filter to the appropriate records, make sure:
    • The 'Question' is 'Current asthma prevalence among adults aged >= 18 years'
    • The 'StratificationCategoryID1' is 'OVERALL'
    • The 'DataValueTypeID' is 'CRDPREV'
    • The 'LocationDesc' is not 'United States'

The provided JSON file contains both data and metadata, and you will need to parse the metadata in order to understand the meanings of the values in the data.

No further information about the structure of this file is provided.

Load the JSON File

Load the data from the file disease_data.json into a variable data.

# Your code here 

Explore the Overall Structure

What is the overall data type of data?

# Your code here

What are the keys?

# Your code here

What are the data types associates with those keys?

# Your code here (data)
# Your code here (metadata)

Perform additional exploration to understand the contents of these values. For dictionaries, what are their keys? For lists, what is the length, and what does the first element look like?

# Your code here (add additional cells as needed)

As you likely identified, we have a list of lists forming the 'data'. In order to make sense of that list of lists, we need to find the meaning of each index, i.e. the names of the columns.

Identify the Column Names

Look through the metadata to find the names of the columns, and assign that variable to column_names. This should be a list of strings. (If you just get the values associated with the 'columns' key, you will have a list of dictionaries, not a list of strings.)

# Your code here (add additional cells as needed)

The following code checks that you have the correct column names:

# Run this cell without changes

# 42 total columns
assert len(column_names) == 42

# Each name should be a string, not a dict
assert type(column_names[0]) == str and type(column_names[-1]) == str

# Check that we have some specific strings
assert "DataValue" in column_names
assert "LocationDesc" in column_names
assert "Question" in column_names
assert "StratificationCategoryID1" in column_names
assert "DataValueTypeID" in column_names

Filter Rows Based on Columns

Recall that we only want to include records where:

  • The 'Question' is 'Current asthma prevalence among adults aged >= 18 years'
  • The 'StratificationCategoryID1' is 'OVERALL'
  • The 'DataValueTypeID' is 'CRDPREV'
  • The 'LocationDesc' is not 'United States'

Combining knowledge of the data and metadata, filter out the rows of data that are not relevant.

(You may find the pandas library useful here.)

# Your code here (add additional cells as needed)

You should have 54 records after filtering.

Extract the Attributes Required for Plotting

For each record, the only information we actually need for the graph is the 'DataValue' and 'LocationDesc'. Create a list of records that only contains these two attributes.

Also, make sure that the data values are numbers, not strings.

# Your code here (create additional cells as needed)

Find Top 10 States

Sort by 'DataValue' and limit to the first 10 records.

# Your code here (add additional cells as needed)

Separate the Names and Values for Plotting

Assign the names of the top 10 states to a list-like variable names, and the associated values to a list-like variable values. Then the plotting code below should work correctly to make the desired bar graph.

# Replace None with appropriate code

names = None
values = None
# Run this cell without changes

import matplotlib.pyplot as plt
fig, ax = plt.subplots()

ax.barh(names[::-1], values[::-1]) # Values inverted so highest is at top
ax.set_title('Adult Asthma Rates by State in 2016')
ax.set_xlabel('Percent 18+ with Asthma');

Summary

Well done! In this lab you got some extended practice exploring the structure of JSON files and visualizing data!

dsc-exploring-and-transforming-json-schemas-lab-online-ds-grad's People

Contributors

hoffm386 avatar lmcm18 avatar loredirick avatar mathymitchell avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.