Data Science Essentials
These files contain the lab steps and slides from the Microsoft Data Science Essentials course. To attend the full course, sign up for free on edX .
Course files for the Microsoft data Science Essentials Course
License: MIT License
These files contain the lab steps and slides from the Microsoft Data Science Essentials course. To attend the full course, sign up for free on edX .
After dragging the Automobile price data (Raw) dataset from Samples to the experiment canvas, I have been instructed to click on the output of the dataset, but within the dropdown menu I'm not seeing Open in a new Notebook, but rather only the other three options (Download, Visualize and Generate Data Access Code). Has the Open in a new Notebook functionality been removed or relocated? Is there a workaround solution?
In the pdf - https://github.com/MicrosoftLearning/Data-Science-Essentials/blob/master/Labs/DAT203.1X%20Lab%202%20-%20Summary%20Statistics.pdf
The section Visualize Summary Statistics for Price
under the python experiment with the function plotstats
, does not display the plot/graph. This is because it is missing a plt.show()
line.
It should be
def plotstats(df, col):
import matplotlib.pyplot as plt
## Setup for ploting two charts one over the other
plt.clf()
fig, ax = plt.subplots(2, 1, figsize = (12,8))
## First a box plot
df.dropna().boxplot(col, ax = ax[0], vert=False,
return_type='dict')
## Plot the histogram
temp = df[col].as_matrix()
ax[1].hist(temp, bins = 30, alpha = 0.7)
plt.ylabel('Number of Cars')
plt.xlabel(col)
plt.show()
return [col]
Galton published his famous paper in 1885, showing that the highs of children regressed to the mean of the population
should read
Galton published his famous paper in 1885, showing that the heights of children regressed to the mean of the population
Module 4 - Data Exploration notebook missing from the handout pack. Any chance of it being added? Thanks.
There's Visualizing data, and in the Demos folder, DataFrames and DataVisualization notebooks.
In Lab 5 instructions the following Python code is given to visualize outliers:
def auto_scatter_outlier(df, plot_cols):
import matplotlib.pyplot as plt
outlier = [0,0,1,1] # Vector of outlier indicators
fuel = ['gas','diesel','gas','diesel'] # vector of fuel types
color = ['DarkBlue','DarkBlue','Red','Red'] # vector of color choices for plot
marker = ['x','o','o','x'] # vector of shape choices for plot
for col in plot_cols: # loop over the columns
fig = plt.figure(figsize=(6, 6))
ax = fig.gca()
## Loop over the zip of the four vectors an subset the data and
## create the plot using the aesthetics provided
for o, f, c, m in zip(outlier, fuel, color, marker):
temp = df.ix[(df['outlier'] == o) & (df['fueltype'] == f)]
if temp.shape[0] > 0:
temp.plot(kind = 'scatter', x = col, y = 'lnprice' ,
ax = ax, color = c, marker = m)
ax.set_title('Scatter plot of lnprice vs. ' + col)
fig.savefig('scatter_' + col + '.png')
return plot_cols
It`s located under "Visualize outliers in Python" section, step 6.
The typo is in the following lines:
fuel = ['gas','diesel','gas','diesel'] # vector of fuel types
...
marker = ['x','o','o','x'] # vector of shape choices for plot
The point shape has to be different for gas and diesel cars, but, according to the code, it is X for gas cars and O for diesel cars when the samples are not outliers, and O for gas cars and X for diesel cars when the samples are outliers.
The same error could be seen in the "Demo: Finding outliers" lecture video.
Changing the code to
fuel = ['gas','diesel','gas','diesel'] # vector of fuel types
...
marker = ['x','o','x','o'] # vector of shape choices for plot
gives the lnprice vs citympg plot that corresponds to the other gas/diesel plots shown in the lectures: diesel cars are grouped and located above gas cars (more efficient and pricy).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.