The dot product is a crucial mathematical operation that we'll be using in many algorithms going forward.
It is defined as the sum of the products of the corresponding elements of two vectors.
Mathematically:
$ a = [a_1, a_2,...a_n]$
$ b = [b_1, b_2,...b_n]$
$ a \bullet b = \sum_{i=1}^{n} a_ib_i + a_2b_2 + ... + a_nb_n$
import numpy as np
a = np.array(range(5))
b = np.array(range(5,10))
print('a :', a)
print('b :', b)
a : [0 1 2 3 4]
b : [5 6 7 8 9]
def dot_product(a,b):
#Your code goes here
return none
Great! The dot product of a and b can also be calculated by:
Recall that
Write a second function that calculates the dot product of a and b using this alternative calculation.
def dot_product2(a,b):
#Your code goes here
return none
Soon, we're going to expand our simple linear regression into the more generalized linear regression involving multiple variables. Instead of looking at the Gross Domestic Sales of a movie in terms of its budget alone, we'll consider more variables such as ratings and reviews to improve our predictions.
When doing this, we will have a matrix of data where each column is a specific feature such as the budget, or the imdb review score, while each row will be an observance, one of the movies in our dataset.
For example
import pandas as pd
x = pd.read_excel('movie_data_detailed_with_ols.xlsx')
x = x[['budget', 'imdbRating','Metascore', 'imdbVotes']]
x.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
budget | imdbRating | Metascore | imdbVotes | |
---|---|---|---|---|
0 | 13000000 | 6.8 | 48 | 206513 |
1 | 45658735 | 0.0 | 0 | 0 |
2 | 20000000 | 8.1 | 96 | 537525 |
3 | 61000000 | 6.7 | 55 | 173726 |
4 | 40000000 | 7.5 | 62 | 74170 |
x = np.array(x)
x
array([[1.3000000e+07, 6.8000000e+00, 4.8000000e+01, 2.0651300e+05],
[4.5658735e+07, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00],
[2.0000000e+07, 8.1000000e+00, 9.6000000e+01, 5.3752500e+05],
[6.1000000e+07, 6.7000000e+00, 5.5000000e+01, 1.7372600e+05],
[4.0000000e+07, 7.5000000e+00, 6.2000000e+01, 7.4170000e+04],
[2.2500000e+08, 6.3000000e+00, 2.8000000e+01, 1.2876600e+05],
[9.2000000e+07, 5.3000000e+00, 2.8000000e+01, 1.8058500e+05],
[1.2000000e+07, 7.8000000e+00, 5.5000000e+01, 2.4008700e+05],
[1.3000000e+07, 5.7000000e+00, 4.8000000e+01, 3.0576000e+04],
[1.3000000e+08, 4.9000000e+00, 3.3000000e+01, 1.7436500e+05],
[4.0000000e+07, 7.3000000e+00, 9.0000000e+01, 3.9839000e+05],
[2.5000000e+07, 7.2000000e+00, 5.8000000e+01, 7.5884000e+04],
[5.0000000e+07, 6.2000000e+00, 5.2000000e+01, 7.6001000e+04],
[1.8000000e+07, 7.3000000e+00, 7.8000000e+01, 1.7098600e+05],
[5.5000000e+07, 7.8000000e+00, 8.3000000e+01, 3.6824400e+05],
[3.0000000e+07, 7.4000000e+00, 8.5000000e+01, 1.4232800e+05],
[7.8000000e+07, 6.4000000e+00, 5.9000000e+01, 7.5138000e+04],
[7.6000000e+07, 7.4000000e+00, 6.2000000e+01, 3.2466400e+05],
[5.5000000e+06, 6.6000000e+00, 6.6000000e+01, 2.0894800e+05],
[1.2000000e+08, 6.6000000e+00, 6.1000000e+01, 3.7813100e+05],
[1.1000000e+08, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00],
[1.0000000e+08, 6.7000000e+00, 5.2000000e+01, 9.2389000e+04],
[4.0000000e+07, 5.9000000e+00, 3.5000000e+01, 2.2430000e+04],
[7.0000000e+07, 6.7000000e+00, 4.9000000e+01, 1.9876700e+05],
[1.7000000e+07, 6.5000000e+00, 5.7000000e+01, 1.3994000e+05],
[1.6000000e+08, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00],
[1.5000000e+08, 7.5000000e+00, 7.4000000e+01, 4.8355500e+05],
[1.4000000e+08, 5.8000000e+00, 4.1000000e+01, 1.5821000e+05],
[6.0000000e+07, 6.7000000e+00, 4.0000000e+01, 1.8884600e+05],
[3.0000000e+07, 7.1000000e+00, 0.0000000e+00, 0.0000000e+00]])
3. Write a function that predicts a vector of model predictions $\hat{y}$ given a matrix of data x, and a vector of coefficient weights w.
Mathematically:
def poly_regress_predict(x,w):
#Your code goes here
return y_hat