dataquestio / solutions Goto Github PK

View Code? Open in Web Editor NEW

1.1K 1.1K 1.6K 12.96 MB

Solutions for projects.

Jupyter Notebook 99.91% Python 0.09%

solutions's People

Contributors

Stargazers

Watchers

Forkers

endryu sandyice74 ellasanders adarsh0806 mdbconsulting wangjun15thu jamesjohnson92 syzdemonhunter yonfai jerminal liangweitang dddomiko laglead krishnatray absarf derdrdirk mel-oka maethirion prestonjames3 tkamag abhisaharan jfulghum bjpreisler intersong blackfoxgamingstudio raytanio mizturriaga yiweifang tejagst thebenimou akinhwan albertoholts nastorga xingzhixi fajk yangliu0813 olafusimichael qrodp tonio145 yamh nicoleeickhoff leetec tangoscut pavleenkaur xinghuang0120 yildirimarda freena22 erjan-a hydrosquall henryhere genki3ng tommygoji celineguilbert phattranky stockblog sallamdas tanchoonkit13 hoon-ki olabowale letrang255 daisukehirata siranzheng327 andaluri shadowyell astro313 dar01 praveenpuvvula commit-live-students majidzjg dsbib vykr peizhongxu judemoon qgogithub ncols sharlynx bjornkristmanns lingqiaoqin nachocarracedo mariouchatoun mmmurphyor clarkyu2016 johnbrendanowens aepuripraveenkumar jopmaco whumphries88 anuragbnrj tappyness1 noahsnail vladimirk-dev heritageit kristianherger peachevil aravamudhananand prashant4736 pskpg86 bakijan joydeepml willbedatascientist bawcos

solutions's Issues

Small typo

Just a typo with the variable you meant to specify:

"There are two columns, and auto, which are numeric values with extra characters being stored as text. We'll clean and convert these."

The variable auto should be odometer

what parameter did you consider to know if it is an album?

About the Mission191Solutions.ipynb
It seems that all album_id is not null and is a real album.
We have sales/listen/plays in 2 tracks of the same artist, which is not a proper album.
Could you please explain the last query?

Error: n() should only be called in a data context

Freq_2015 <- dataset_2015 %>%
group_by(year, fandango = Fandango_Stars)%>%
summarize(Freq = n())

Mission350Solutions

Hello,
Note: This is for cases who intent to download and use the database from the original source (see documentation) + Jupyter localhost.
As of today, I found a possible bug in the solution provided to open The App Store data set: this data set was keeping the index label and numbers at the ios_header and ios variables. Later, printing the explore_data function, was leading to inconsistent number of rows and columns.

This is the solution i implemented (feel free to adjust to a more pythonic style):

The App Store data set

opened_file = open('AppleStore.csv', encoding='utf8')
read_file = reader(opened_file)
ios = list(read_file)
#ios_header = ios[0] # Generate errors
ios_header = ios[0][1:] # Fixed
#ios = ios[1:] # Generate errors
ios = [row[1:] for row in ios][1:] # Fixed with list comprehensions

Can you tell me the issue with my code?

def member_english(a_string): for character in a_string: if ord(character) > 127: return False else: return True

test_app_names = ['Instagram', '爱奇艺PPS -《欢乐颂2》电视剧热播', 'Docs To Go™ Free Office Suite', 'Instachat 😜'] for app_name in test_app_names: eng_or_non_eng = member_english(app_name) print(app_name + " is in English : ", eng_or_non_eng)
It is checking just the First Character. what's the difference between putting "Return True" inside the loop with else statement and outside the loop, If I put it outside, It works fine . I'm getting a little confused here.

Wrong results in the "Customers and Products Analysis Using SQL" Solution

The solution for calculating low stock (in screen 4) is wrongly sorted in ascending order.
The code:

SELECT productCode, 
       ROUND(SUM(quantityOrdered) * 1.0 / (SELECT quantityInStock
                                             FROM products p
                                            WHERE od.productCode = p.productCode), 2) AS low_stock
  FROM orderdetails od
 GROUP BY productCode
 ORDER BY low_stock
 LIMIT 10;

wrongly assumes that a higher priority for restocking is given to products with a lower "low_stock" value.
The code clearly shows that a lower value will correspond with a lower quantityordered and/or higher quantityInStock. This is clearly against the purpose stated in the project's instructions.

A simple correction can be made to the code by simply sorting by descending

Couple of issues

Couple of questions

In [8]:
import sqlite3
import pandas as pd

conn = sqlite3.connect("factbook.db")
cursor = conn.cursor()
cursor.execute(q1).fetchall()

The q1 is missing?

Also, you have two q7 queries, and the second q7 is identical to q6.

What is wrong with it?

I have being running the code of ln15 but on my notebook is detects all English apps as true despite of ™ or any such character. Can anyone explain why?

plt.axhspan Mission188Solution.ipynb

to display graphs correctly with red colored zone 0.3 - 0.6 the plt.axhspan line needs to be just before plt.show()

float(app[3]) creates an error

because of e.g. '19M'...

using float(app[2]) instead for the purpose

I keep getting an error for this and not sure what I'm doing wrong. The rest of my code up until here works fine.

variables storing values from functions as none type

I am trying to assign values extracted from the function to a variable but while printing the variable, it shows it as none type but it displays the data. Here's my code : prime_genre = display_table(ios_final, -5) print("prime_genre : ", prime_genre)

Question related to ln[8] or screen 5

hi, i am a bit confused regarding the condition("if name in review_max and review_max[name] < n_reviews: ") specified to extract the app with highest number of reviews. I am unable to understand the second part of condition after and. Even if i remove that part the output remains exactly the same so what difference it makes. What it actually explain to my code?

A mistake detected

I notice that there is a small mistake in the input cell 21, the last foreign key constraint in C1.

Getting Syntax Error when creating View combining Customer/Invoice

I'd like to combine customer and invoice in a view to make the Selecting Albums to Purchase project easier to answer, however, I am getting a syntax error when creating a view via the following code. Any thoughts? thanks :)

CREATE VIEW usa_customer AS
SELECT c.country, i.customer_id, i.invoice_id FROM customer c
WHERE c.country = "USA"
INNER JOIN invoice i ON c.customer_id = i.customer_id

SELECT * from usa_customer;

Problem with Guided Prison Break: Fetch_year function

Hello, I'm at mission 610 and I have a problem when I try replace the full date with just the year:

This is the code I'm using.

for row in data:
    date = fetch_year(row[0])
    row[0] = date

This is the error I get...

TypeError                                 Traceback (most recent call last)
<ipython-input-10-8a078f6cee5c> in <module>
      1 for row in data:
----> 2     date = fetch_year(row[0])
      3     row[0] = date

~/notebook/helper.py in fetch_year(date_string)
     10 
     11 def fetch_year(date_string):
---> 12     return int(re.findall("\d{4}", date_string)[0])
     13 
     14 def barplot(list_of_2_element_list):

/dataquest/system/env/python3/lib/python3.8/re.py in findall(pattern, string, flags)
    237 
    238     Empty matches are included in the result."""
--> 239     return _compile(pattern, flags).findall(string)
    240 
    241 def finditer(pattern, string, flags=0):

TypeError: expected string or bytes-like object

Error Message: String out of Range

I'm not sure what I am doing wrong. The message says that the string is out of range, but I'm not sure how that's possible.

Mission350: ios data contains duplicate

Hi,
Based on Kaggle discussion, it seems that there are two duplicate data ('Mannequin Challenge', 'VR Roller Coaster') in the data set.
Here is what I did to remove the duplicates:

iOS non-English apps not getting removed?

I'm having a hard time recreating the part in the tutorial where the non-English apps get removed from the ios data set. The solutions works totally fine for the Android apps but when I copy/paste exactly, no iOS apps get removed...

Also obviously I can't figure out how to do the indentation when I post this question.

android_english = []
ios_english = []

for app in android_clean:
name = app[0]
if is_english(name):
android_english.append(app)

for app in ios[1:]:
name = app[1]
if max_three_non_english(name):
ios_english.append(app)

print(len(android_english))
print(len(ios_english))

Found sth in Mission350

Following the guide on Mission350Solutions.ipynb (I really appreciate this rookie-friendly notebook),I have found something about duplicated apps.🤒

Firstly,it's not proper to judge duplicated apps just by thier names.For example,there are two apps from different types(game and family) both named Solitaire,but you will loss this information if you only judge duplication by apps' name.
Secondly,for certain app named Cardiac diagnosis (heart rate, arrhythmia),it has both free version and paid version,if you do the same thing like you did on Solitaire,you will miss the free data or the paid data which will influence your analysis on all free apps.

Not sure if I make myself clear,once again,I reallyyyyy appreciate your brilliant work!!!

Can You Explain this code snippet to me ?

`
def display_table(dataset,index):
table = freq_table(dataset,index)
table_display = []
for key in table:
key_val_as_tuple = (table[key], key) # Can you explain this to me(With some results printed)
table_display.append(key_val_as_tuple)

table_sorted = sorted(table_display, reverse = True)  
for entry in table_sorted:
    print(entry[1], ':', entry[0])

library(purrr) Error in library(purrr) : there is no package called ‘purrr’

I am failing to load the purrr package. How do I go about fixing that?

ERROR: Line magic function `%mathplotlib` not found.

I'm using jupiter provided by dataquest and I get an error
"ERROR: Line magic function %mathplotlib not found". I searched online and the solutions are variants of installation of more recent versions of python. But since the jupiter resides on your server, I can't do that. Can you please suggest a solution? Thanks.

Typo mistake

In below sentence in Mission350Solutions.ipynb:
"We add the current row (app) to the android_clean list, and the app name (name) to the already_cleaned list if"

Did you mean "already_added"? Cause I cannot find that variable.

Thanks

Duy

What I am doing wrong

What am I doing wrong? I get this output:
ValueErrorTraceback (most recent call last)
in ()
2 for app in google_data:
3 name = app[0]
----> 4 n_reviews = float(app[3])
5 if name in reviews_max and reviews_max[name]<n_reviews_max:
6 reviews_max[name]+=n_reviews

ValueError: could not convert string to float: 'Reviews'
when I run this code:
reviews_max = {}
for app in google_data:
name = app[0]
n_reviews = float(app[3])
if name in reviews_max and reviews_max[name]<n_reviews_max:
reviews_max[name]+=n_reviews
elif name not in reviews_max:
reviews_max[name]=n_reviews
print(reviews_max)

Mission218Solution - Improvement suggestion

In the mission's guideline, the following is advised:

Manually create a dictionary, mapping that maps each key from race_counts to the population count of the race from census.

This is because our race_counts keys are different from the data coming in from census, and the Asian/Pacific Islander race is represented in two groups in census.

Therefore, mapping is created like this in the solution notebook:

mapping = {
    "Asian/Pacific Islander": 15159516 + 674625,
    "Native American/Native Alaskan": 3739506,
    "Black": 40250635,
    "Hispanic": 44618105,
    "White": 197318956
}

race_per_hundredk = {}
for k,v in race_counts.items():
    race_per_hundredk[k] = (v / mapping[k]) * 100000

race_per_hundredk

Improvement suggestion

I think manual data entry is very error-prone. Therefore, I suggest the following:

# map "race" to "race headers", not actual numbers in census
mapping  = {
    'Asian/Pacific Islander': ['Race Alone - Native Hawaiian', 'Other Pacific Islander'],
    'Black': ['Race Alone - Black or African American'],
    'Hispanic': ['Race Alone - Hispanic'], 
    'Native American/Native Alaskan': ['Race Alone - American Indian and Alaska Native'],
    'White': ['Race Alone - White'],
}

race_per_hundredk = {}

# better naming for variables
for race, death_count in race_counts.items():
    # iterate through headers list in census
    for i ,header in enumerate(census[0]):
        if header in mapping[race]:
            race_per_hundredk.setdefault(race, 0)
            # get matching value from data list in census
            race_per_hundredk[race] += race_counts[race] / int(census[1][i]) * 100000
            
print(race_per_hundredk)

This is may be less readable, but I think we should avoid manual entry almost at all costs when dealing with data analysis.

Error: Mission244Solutions Digits Classifier

Issue with solution guide for the digits classifier. from sklearn.neural_network import MLPClassifier is needed to properly run the NN

How to display the result of app profile recommendation for android as a dictionary and find category with highest average.

Here's my code `diff_categories = freq_table(android_final, 1)
for category in diff_categories:
total = 0
len_category = 0
for app in android_final:
apps_category = app[1]
if apps_category == category:
n_installs = app[5]
n_installs = n_installs.replace('+','')
n_installs = n_installs.replace(',','')

        total += float(n_installs)
        len_category += 1
avg_category = total / len_category
print(category, " : ", avg_category)



        `

Time data '8' does not match format error

I keep getting this same error, even after copying and pasting the code from the solutions.

ValueError: time data '8' does not match format '%m/%d/%Y %H:%M'

I am guessing that there is an entry that should be deleted to fix this problem.

Can anyone help with this issue??

What percent of majors are predominantly male? Predominantly female?

On mission 146 Visualizing Earnings Based on College Major
How do i determine the percentage based on the histograms?

Using sort(positive_tested_top_3), it follows this Bangladesh, United Kingdom, United States as the top 3

solutions/Mission505Solutions.Rmd

Line 148 in 3823959

    
           positive_tested_top_3 <- c("United Kingdom" = 0.11, "United States" = 0.10, "Turkey" = 0.08)

Data cleaning issue # Mission 294 solutions

There is no need to convert strings on these columns:

price
odometer

Kindly update that

Using Function gives ' Object of Type None type has no len()'.

I am trying to use functions to solve this but I am getting the above stated error, Please help. Here's my code :
`android_final = []
ios_final = []
def free_apps(dataset , index):
for app in dataset:
price = app[index]
if (price == 0) and (dataset == English_only_apps_android):
return android_final.append(app)
elif (price == 0) and (dataset == English_only_apps_ios):
return ios_final.append(app)

android_useful = free_apps(English_only_apps_android,7)
ios_useful = free_apps(English_only_apps_ios,4)

def datasize():
length_useful_android = len(android_useful)
length_useful_ios = len(ios_useful)
print("Length of Useful Data from the Android Dataset : ", length_useful_android)
print("\nLength of Useful Data from the IOS Dataset : ", length_useful_ios)

datasize()
`

Data cleaning naming issue #Mission294

In solution missions 294, We are supposed to clean and rename these columns:

price
odometer

In your notebook you have instructed us to clean and rename these columns, which is WRONG:

price
auto

Kindly change that to avoid inconveniencing other students while learning. Thank you

Correction for line 38.

tafe_survey_updated = tafe_survey_updated.rename({
'Record ID': 'id',
'CESSATION YEAR': 'cease_date',
'Reason for ceasing employment': 'separationtype',
'Gender.     What is your Gender?': 'gender',
'CurrentAge.     Current Age': 'age',
'Employment Type.     Employment Type': 'employment_status',
'Classification.     Classification': 'position',
'LengthofServiceOverall. Overall Length of Service at Institute (in years)': 'institute_service',
'LengthofServiceCurrent. Length of Service at current workplace (in years)': 'role_service'
}, axis=1)
tafe_survey_updated.columns

If your code doesn.t work , try this one.

Addition to Mission 155 Solution

The solution file didn't extract the optimal value for k in each model. Below is the code for the extraction of optimal k value:

import operator

two_feat_k_value = {}
three_feat_k_value = {}
four_feat_k_value = {}
five_feat_k_value = {}
six_feat_k_value = {}

dict_ = k_rmse_results.copy()

for k, v in dict_.items():
    for key,val in v.items():
        if k == '2 best features':
            two_feat_k_value[key] = val
        elif k == '3 best features':
            three_feat_k_value[key] = val
        elif k == '4 best features':
            four_feat_k_value[key] = val
        elif k == '5 best features':
            five_feat_k_value[key] = val
        else:
            six_feat_k_value[key] = val
            
print('Optimal k-values:')
print('two best features: {}'.format(min(two_feat_k_value.items(), key=operator.itemgetter(1))[1]))
print('three best features: {}'.format(min(three_feat_k_value.items(), key=operator.itemgetter(1))[1]))
print('four best features: {}'.format(min(four_feat_k_value.items(), key=operator.itemgetter(1))[1]))
print('five best features: {}'.format(min(five_feat_k_value.items(), key=operator.itemgetter(1))[1]))
print('six best features: {}'.format(min(six_feat_k_value.items(), key=operator.itemgetter(1))[1]))

Mission350: Solution mistake?

While defining under_100_m = [], the aim is to keep only communication applications with less than 100m installs. However, the code in the solution keeps all applications with less than 100m installs. Below is a suggestion on how the code might look like:

under_100_m = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if float(n_installs) < 100000000 and (app[1] == 'COMMUNICATION'):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘dbWriteTable’ for signature ‘"SQLiteConnection", "character", "function"’

Hi,

I don't know what it is wrong with my coding. Can anybody help me? :)

title: "Designing and Creating a Database (Intermediate SQL in R): Guided Project Solutions"
output: html_notebook

Download packages

library(tidyverse)
library(RSQLite)
library(DBI)

Importing to SQLite

conn <- dbConnect(SQLite(), "mlb.db")
dbWriteTable(conn = conn, name = "person_codes", 
             value = person, row.names = FALSE, header = TRUE)
dbWriteTable(conn = conn, name = "team_codes", 
             value = team, row.names = FALSE, header = TRUE)
dbWriteTable(conn = conn, name = "park_codes", 
             value = park, row.names = FALSE, header = TRUE)

### **It returns: Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘dbWriteTable’ for signature ‘"SQLiteConnection", "character", "function"’**

Missing 469

Hi,

I could not find the guided project for https://app.dataquest.io/m/469/guided-project%3A-popular-data-science-questions/2/stack-exchange

https://github.com/dataquestio/solutions/blob/master/Mission469Solutions.ipynb is not found

Thanks,

Hacker news walk through

Hey guys on page 4 of the hacker news walk through i ran into a problem. when following the walk through and trying to convert the comment average I got an error regarding changing a str to a int. I think this is due to the steps on page 3 where you add the title to an empty list but never pull the comment count with it. attached you will find my quick solution, i hope this helps.