Giter Site home page Giter Site logo

data-science-essentials's Introduction

Data Science Essentials

Data Science Essentials These files contain the lab steps and slides from the Microsoft Data Science Essentials course. To attend the full course, sign up for free on edX .

data-science-essentials's People

Contributors

graememalcolm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-science-essentials's Issues

Lab 2 missing notebook

After dragging the Automobile price data (Raw) dataset from Samples to the experiment canvas, I have been instructed to click on the output of the dataset, but within the dropdown menu I'm not seeing Open in a new Notebook, but rather only the other three options (Download, Visualize and Generate Data Access Code). Has the Open in a new Notebook functionality been removed or relocated? Is there a workaround solution?

Command to show plot missing in Lab 2 Instructions

In the pdf - https://github.com/MicrosoftLearning/Data-Science-Essentials/blob/master/Labs/DAT203.1X%20Lab%202%20-%20Summary%20Statistics.pdf
The section Visualize Summary Statistics for Price under the python experiment with the function plotstats, does not display the plot/graph. This is because it is missing a plt.show() line.

It should be

def plotstats(df, col):
          import matplotlib.pyplot as plt
          ## Setup for ploting two charts one over the other
          plt.clf()
          fig, ax = plt.subplots(2, 1, figsize = (12,8))
          ## First a box plot
          df.dropna().boxplot(col, ax = ax[0], vert=False,
                              return_type='dict')
          ## Plot the histogram
          temp = df[col].as_matrix()
          ax[1].hist(temp, bins = 30, alpha = 0.7)
          plt.ylabel('Number of Cars')
          plt.xlabel(col)
          plt.show()
          return [col]

missing notebook for Module 4

Module 4 - Data Exploration notebook missing from the handout pack. Any chance of it being added? Thanks.
There's Visualizing data, and in the Demos folder, DataFrames and DataVisualization notebooks.

Point shape mismatch for gas/diesel cars in Lab 5

In Lab 5 instructions the following Python code is given to visualize outliers:

def auto_scatter_outlier(df, plot_cols):
    import matplotlib.pyplot as plt
    outlier = [0,0,1,1] # Vector of outlier indicators
    fuel = ['gas','diesel','gas','diesel']  # vector of fuel types
    color = ['DarkBlue','DarkBlue','Red','Red'] # vector of color choices for plot
    marker = ['x','o','o','x'] # vector of shape choices for plot
    for col in plot_cols: # loop over the columns
        fig = plt.figure(figsize=(6, 6))
        ax = fig.gca()
        ## Loop over the zip of the four vectors an subset the data and
        ## create the plot using the aesthetics provided
        for o, f, c, m in zip(outlier, fuel, color, marker):
            temp = df.ix[(df['outlier'] == o) & (df['fueltype'] == f)]           
            if temp.shape[0] > 0:                    
                temp.plot(kind = 'scatter', x = col, y = 'lnprice' , 
                           ax = ax, color = c, marker = m)                                 
        ax.set_title('Scatter plot of lnprice vs. ' + col)
        fig.savefig('scatter_' + col + '.png')
    return plot_cols

It`s located under "Visualize outliers in Python" section, step 6.

The typo is in the following lines:

fuel = ['gas','diesel','gas','diesel']  # vector of fuel types
...
marker = ['x','o','o','x'] # vector of shape choices for plot

The point shape has to be different for gas and diesel cars, but, according to the code, it is X for gas cars and O for diesel cars when the samples are not outliers, and O for gas cars and X for diesel cars when the samples are outliers.
The same error could be seen in the "Demo: Finding outliers" lecture video.

Changing the code to

fuel = ['gas','diesel','gas','diesel']  # vector of fuel types
...
marker = ['x','o','x','o'] # vector of shape choices for plot

gives the lnprice vs citympg plot that corresponds to the other gas/diesel plots shown in the lectures: diesel cars are grouped and located above gas cars (more efficient and pricy).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.