dataquestio / solutions Goto Github PK
View Code? Open in Web Editor NEWSolutions for projects.
Solutions for projects.
There is no need to convert strings on these columns:
price
odometer
Kindly update that
Hi,
Download packages
library(tidyverse)
library(RSQLite)
library(DBI)
Importing to SQLite
conn <- dbConnect(SQLite(), "mlb.db")
dbWriteTable(conn = conn, name = "person_codes",
value = person, row.names = FALSE, header = TRUE)
dbWriteTable(conn = conn, name = "team_codes",
value = team, row.names = FALSE, header = TRUE)
dbWriteTable(conn = conn, name = "park_codes",
value = park, row.names = FALSE, header = TRUE)
### **It returns: Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘dbWriteTable’ for signature ‘"SQLiteConnection", "character", "function"’**
I'm having a hard time recreating the part in the tutorial where the non-English apps get removed from the ios data set. The solutions works totally fine for the Android apps but when I copy/paste exactly, no iOS apps get removed...
Also obviously I can't figure out how to do the indentation when I post this question.
android_english = []
ios_english = []
for app in android_clean:
name = app[0]
if is_english(name):
android_english.append(app)
for app in ios[1:]:
name = app[1]
if max_three_non_english(name):
ios_english.append(app)
print(len(android_english))
print(len(ios_english))
Issue with solution guide for the digits classifier. from sklearn.neural_network import MLPClassifier
is needed to properly run the NN
On mission 146 Visualizing Earnings Based on College Major
How do i determine the percentage based on the histograms?
In below sentence in Mission350Solutions.ipynb:
"We add the current row (app) to the android_clean list, and the app name (name) to the already_cleaned list if"
Did you mean "already_added"? Cause I cannot find that variable.
Thanks
Duy
Here's my code `diff_categories = freq_table(android_final, 1)
for category in diff_categories:
total = 0
len_category = 0
for app in android_final:
apps_category = app[1]
if apps_category == category:
n_installs = app[5]
n_installs = n_installs.replace('+','')
n_installs = n_installs.replace(',','')
total += float(n_installs)
len_category += 1
avg_category = total / len_category
print(category, " : ", avg_category)
`
def member_english(a_string): for character in a_string: if ord(character) > 127: return False else: return True
test_app_names = ['Instagram', '爱奇艺PPS -《欢乐颂2》电视剧热播', 'Docs To Go™ Free Office Suite', 'Instachat 😜'] for app_name in test_app_names: eng_or_non_eng = member_english(app_name) print(app_name + " is in English : ", eng_or_non_eng)
It is checking just the First Character. what's the difference between putting "Return True" inside the loop with else statement and outside the loop, If I put it outside, It works fine . I'm getting a little confused here.
I'm using jupiter provided by dataquest and I get an error
"ERROR: Line magic function %mathplotlib
not found". I searched online and the solutions are variants of installation of more recent versions of python. But since the jupiter resides on your server, I can't do that. Can you please suggest a solution? Thanks.
Hello,
Note: This is for cases who intent to download and use the database from the original source (see documentation) + Jupyter localhost.
As of today, I found a possible bug in the solution provided to open The App Store data set: this data set was keeping the index label and numbers at the ios_header and ios variables. Later, printing the explore_data function, was leading to inconsistent number of rows and columns.
This is the solution i implemented (feel free to adjust to a more pythonic style):
The App Store data set
opened_file = open('AppleStore.csv', encoding='utf8')
read_file = reader(opened_file)
ios = list(read_file)
#ios_header = ios[0] # Generate errors
ios_header = ios[0][1:] # Fixed
#ios = ios[1:] # Generate errors
ios = [row[1:] for row in ios][1:] # Fixed with list comprehensions
Hey guys on page 4 of the hacker news walk through i ran into a problem. when following the walk through and trying to convert the comment average I got an error regarding changing a str to a int. I think this is due to the steps on page 3 where you add the title to an empty list but never pull the comment count with it. attached you will find my quick solution, i hope this helps.
I keep getting this same error, even after copying and pasting the code from the solutions.
ValueError: time data '8' does not match format '%m/%d/%Y %H:%M'
I am guessing that there is an entry that should be deleted to fix this problem.
Can anyone help with this issue??
I notice that there is a small mistake in the input cell 21, the last foreign key constraint in C1.
Hi,
I could not find the guided project for https://app.dataquest.io/m/469/guided-project%3A-popular-data-science-questions/2/stack-exchange
https://github.com/dataquestio/solutions/blob/master/Mission469Solutions.ipynb is not found
Thanks,
Freq_2015 <- dataset_2015 %>%
group_by(year, fandango = Fandango_Stars)%>%
summarize(Freq = n())
The solution file didn't extract the optimal value for k in each model. Below is the code for the extraction of optimal k value:
import operator
two_feat_k_value = {}
three_feat_k_value = {}
four_feat_k_value = {}
five_feat_k_value = {}
six_feat_k_value = {}
dict_ = k_rmse_results.copy()
for k, v in dict_.items():
for key,val in v.items():
if k == '2 best features':
two_feat_k_value[key] = val
elif k == '3 best features':
three_feat_k_value[key] = val
elif k == '4 best features':
four_feat_k_value[key] = val
elif k == '5 best features':
five_feat_k_value[key] = val
else:
six_feat_k_value[key] = val
print('Optimal k-values:')
print('two best features: {}'.format(min(two_feat_k_value.items(), key=operator.itemgetter(1))[1]))
print('three best features: {}'.format(min(three_feat_k_value.items(), key=operator.itemgetter(1))[1]))
print('four best features: {}'.format(min(four_feat_k_value.items(), key=operator.itemgetter(1))[1]))
print('five best features: {}'.format(min(five_feat_k_value.items(), key=operator.itemgetter(1))[1]))
print('six best features: {}'.format(min(six_feat_k_value.items(), key=operator.itemgetter(1))[1]))
I am trying to use functions to solve this but I am getting the above stated error, Please help. Here's my code :
`android_final = []
ios_final = []
def free_apps(dataset , index):
for app in dataset:
price = app[index]
if (price == 0) and (dataset == English_only_apps_android):
return android_final.append(app)
elif (price == 0) and (dataset == English_only_apps_ios):
return ios_final.append(app)
android_useful = free_apps(English_only_apps_android,7)
ios_useful = free_apps(English_only_apps_ios,4)
def datasize():
length_useful_android = len(android_useful)
length_useful_ios = len(ios_useful)
print("Length of Useful Data from the Android Dataset : ", length_useful_android)
print("\nLength of Useful Data from the IOS Dataset : ", length_useful_ios)
datasize()
`
tafe_survey_updated = tafe_survey_updated.rename({
'Record ID': 'id',
'CESSATION YEAR': 'cease_date',
'Reason for ceasing employment': 'separationtype',
'Gender. What is your Gender?': 'gender',
'CurrentAge. Current Age': 'age',
'Employment Type. Employment Type': 'employment_status',
'Classification. Classification': 'position',
'LengthofServiceOverall. Overall Length of Service at Institute (in years)': 'institute_service',
'LengthofServiceCurrent. Length of Service at current workplace (in years)': 'role_service'
}, axis=1)
tafe_survey_updated.columns
If your code doesn.t work , try this one.
Hello, I'm at mission 610 and I have a problem when I try replace the full date with just the year:
This is the code I'm using.
for row in data:
date = fetch_year(row[0])
row[0] = date
This is the error I get...
TypeError Traceback (most recent call last)
<ipython-input-10-8a078f6cee5c> in <module>
1 for row in data:
----> 2 date = fetch_year(row[0])
3 row[0] = date
~/notebook/helper.py in fetch_year(date_string)
10
11 def fetch_year(date_string):
---> 12 return int(re.findall("\d{4}", date_string)[0])
13
14 def barplot(list_of_2_element_list):
/dataquest/system/env/python3/lib/python3.8/re.py in findall(pattern, string, flags)
237
238 Empty matches are included in the result."""
--> 239 return _compile(pattern, flags).findall(string)
240
241 def finditer(pattern, string, flags=0):
TypeError: expected string or bytes-like object
`
def display_table(dataset,index):
table = freq_table(dataset,index)
table_display = []
for key in table:
key_val_as_tuple = (table[key], key) # Can you explain this to me(With some results printed)
table_display.append(key_val_as_tuple)
table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
print(entry[1], ':', entry[0])
`
I am failing to load the purrr package. How do I go about fixing that?
Couple of questions
In [8]:
import sqlite3
import pandas as pd
conn = sqlite3.connect("factbook.db")
cursor = conn.cursor()
cursor.execute(q1).fetchall()
The q1 is missing?
Also, you have two q7 queries, and the second q7 is identical to q6.
I'd like to combine customer and invoice in a view to make the Selecting Albums to Purchase project easier to answer, however, I am getting a syntax error when creating a view via the following code. Any thoughts? thanks :)
CREATE VIEW usa_customer AS
SELECT c.country, i.customer_id, i.invoice_id FROM customer c
WHERE c.country = "USA"
INNER JOIN invoice i ON c.customer_id = i.customer_id
SELECT * from usa_customer;
In the mission's guideline, the following is advised:
Manually create a dictionary, mapping that maps each key from race_counts to the population count of the race from census.
This is because our race_counts
keys are different from the data coming in from census
, and the Asian/Pacific Islander
race is represented in two groups in census
.
Therefore, mapping
is created like this in the solution notebook:
mapping = {
"Asian/Pacific Islander": 15159516 + 674625,
"Native American/Native Alaskan": 3739506,
"Black": 40250635,
"Hispanic": 44618105,
"White": 197318956
}
race_per_hundredk = {}
for k,v in race_counts.items():
race_per_hundredk[k] = (v / mapping[k]) * 100000
race_per_hundredk
I think manual data entry is very error-prone. Therefore, I suggest the following:
# map "race" to "race headers", not actual numbers in census
mapping = {
'Asian/Pacific Islander': ['Race Alone - Native Hawaiian', 'Other Pacific Islander'],
'Black': ['Race Alone - Black or African American'],
'Hispanic': ['Race Alone - Hispanic'],
'Native American/Native Alaskan': ['Race Alone - American Indian and Alaska Native'],
'White': ['Race Alone - White'],
}
race_per_hundredk = {}
# better naming for variables
for race, death_count in race_counts.items():
# iterate through headers list in census
for i ,header in enumerate(census[0]):
if header in mapping[race]:
race_per_hundredk.setdefault(race, 0)
# get matching value from data list in census
race_per_hundredk[race] += race_counts[race] / int(census[1][i]) * 100000
print(race_per_hundredk)
This is may be less readable, but I think we should avoid manual entry almost at all costs when dealing with data analysis.
solutions/Mission505Solutions.Rmd
Line 148 in 3823959
About the Mission191Solutions.ipynb
It seems that all album_id is not null and is a real album.
We have sales/listen/plays in 2 tracks of the same artist, which is not a proper album.
Could you please explain the last query?
Just a typo with the variable you meant to specify:
"There are two columns, and auto, which are numeric values with extra characters being stored as text. We'll clean and convert these."
The variable auto should be odometer
Following the guide on Mission350Solutions.ipynb (I really appreciate this rookie-friendly notebook),I have found something about duplicated apps.🤒
Firstly,it's not proper to judge duplicated apps just by thier names.For example,there are two apps from different types(game and family) both named Solitaire,but you will loss this information if you only judge duplication by apps' name.
Secondly,for certain app named Cardiac diagnosis (heart rate, arrhythmia),it has both free version and paid version,if you do the same thing like you did on Solitaire,you will miss the free data or the paid data which will influence your analysis on all free apps.
Not sure if I make myself clear,once again,I reallyyyyy appreciate your brilliant work!!!
The solution for calculating low stock (in screen 4) is wrongly sorted in ascending order.
The code:
SELECT productCode,
ROUND(SUM(quantityOrdered) * 1.0 / (SELECT quantityInStock
FROM products p
WHERE od.productCode = p.productCode), 2) AS low_stock
FROM orderdetails od
GROUP BY productCode
ORDER BY low_stock
LIMIT 10;
wrongly assumes that a higher priority for restocking is given to products with a lower "low_stock" value.
The code clearly shows that a lower value will correspond with a lower quantityordered and/or higher quantityInStock. This is clearly against the purpose stated in the project's instructions.
A simple correction can be made to the code by simply sorting by descending
In solution missions 294, We are supposed to clean and rename these columns:
price
odometer
In your notebook you have instructed us to clean and rename these columns, which is WRONG:
price
auto
Kindly change that to avoid inconveniencing other students while learning. Thank you
I have being running the code of ln15 but on my notebook is detects all English apps as true despite of ™ or any such character. Can anyone explain why?
While defining under_100_m = [], the aim is to keep only communication applications with less than 100m installs. However, the code in the solution keeps all applications with less than 100m installs. Below is a suggestion on how the code might look like:
under_100_m = []
for app in android_final:
n_installs = app[5]
n_installs = n_installs.replace(',', '')
n_installs = n_installs.replace('+', '')
if float(n_installs) < 100000000 and (app[1] == 'COMMUNICATION'):
under_100_m.append(float(n_installs))
sum(under_100_m) / len(under_100_m)
Hi,
Based on Kaggle discussion, it seems that there are two duplicate data ('Mannequin Challenge', 'VR Roller Coaster') in the data set.
Here is what I did to remove the duplicates:
hi, i am a bit confused regarding the condition("if name in review_max and review_max[name] < n_reviews: ") specified to extract the app with highest number of reviews. I am unable to understand the second part of condition after and. Even if i remove that part the output remains exactly the same so what difference it makes. What it actually explain to my code?
because of e.g. '19M'...
using float(app[2]) instead for the purpose
What am I doing wrong? I get this output:
ValueErrorTraceback (most recent call last)
in ()
2 for app in google_data:
3 name = app[0]
----> 4 n_reviews = float(app[3])
5 if name in reviews_max and reviews_max[name]<n_reviews_max:
6 reviews_max[name]+=n_reviews
ValueError: could not convert string to float: 'Reviews'
when I run this code:
reviews_max = {}
for app in google_data:
name = app[0]
n_reviews = float(app[3])
if name in reviews_max and reviews_max[name]<n_reviews_max:
reviews_max[name]+=n_reviews
elif name not in reviews_max:
reviews_max[name]=n_reviews
print(reviews_max)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.