Giter Site home page Giter Site logo

musclehub_submission's People

Contributors

kaumaron avatar

Watchers

 avatar

musclehub_submission's Issues

Executes correctly, but want to point out a technicality

df['AB_Test_Group'] = df['Fitness_Test'].apply(lambda x: 'B' if x == None else 'A')

It's good practice when working with None data types to use the operators
is or pd.isnull() instead of ==
and
is not or pd.notnull() instead of !=.

The reason for this is that by definition , None is not a value and something cannot be equal to it. However, the people you write python know that people often make the same assumpution as you and made a workaround where ==None will raise is None. You can read up on it here. Your code executes correctly as written, so no worries here.

Well written SQL query!

from
(select first_name, last_name, gender, email, visit_date
from visits
where visit_date >='7-1-17') a
left join
(select first_name, last_name, email, fitness_test_date
from fitness_tests) b
on a.first_name = b.first_name
and a.last_name = b.last_name
and a.email = b.email
left join
(select first_name, last_name, email, application_date
from applications) c
on a.first_name = c.first_name
and a.last_name = c.last_name
and a.email = c.email
left join
(select first_name, last_name, email, purchase_date
from purchases) d
on a.first_name = d.first_name
and a.last_name = d.last_name
and a.email = d.email;
''')

Very nice SQL query! You make good use of subqueries and aliases here.

Good use of logic to interpret p-vals

print('p-value: {:.6f} | {:.6f} - '.format(p_val, chi_p_val)),
if p_val > 0.05 or chi_p_val > 0.05:
print('null hypothesis cannot be rejected.')
else:
print('null hypothesis can be rejected!\nSignficant difference from untreated group (A).')

Smart use of if/else logic to interpret the results of your statistical tests.

Summary

AMAZING WORK!! Your code is impeccable and your presentation (plus the video!) communicates a firm understanding of the data and how it translates to this real-world problem.

All dataframes, graphs and statistical significance tests were constructed and ran correctly in your code. Well done! You went above and beyond when customization of your graphs. The final line graph and Sankey graph are brilliant ways of visualizing the funnel (I agree much better than bar graphs). Plus the inclusion of binomial tests and logic to interpret your p-val results demonstrates a thorough understanding of the statistical concepts.

In the slides and video, I appreciate the thorough descriptions of the problem and steps you took to solve it. You really translated and condensed the process in your own words, demonstrating your complete understanding of the project. Your inclusion of actual individuals from the dataset under qualitative examples is a great touch. Including additional statistics like lift make your argument more convincing and would be much appreciated by a client.

All in all, fantastic job! To be honest one of the best and most thorough projects I have reviewed. I like your passion! You have a great analytical future ahead of you!

Great customization

ax = plt.subplot()
plt.bar(
range(app_pivot.shape[0]),
app_pivot['Percent with Application']*100,
yerr = None,
capsize=5,
color = tableau20[5],
alpha = 1)
plt.title('Percentage of Visitors Who Apply', fontsize=16, y = 1.1)
ax.set_xticks(range(app_pivot.shape[0]))
ax.set_yticks(range(0,int(app_pivot['Percent with Application'].max()*100 + 10),5))
ax.set_xticklabels(['Fitness Test', 'No Fitness Test'])
ax.set_ylabel('Applied (%)', fontsize=14)
ax.set_xlabel('Test Group', fontsize=14)
ax.get_xaxis().tick_bottom()
ax.get_yaxis().tick_left()
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
plt.xticks(fontsize=14, rotation=0)
plt.yticks(fontsize=14)
plt.show

Great work customizing your graphs. You clearly have gone through the documentation and are familiar with all the options available to alter your graphs. My favorites in here are the removal of the right and top spines, the color of your bars and adjust font size.

You know what you're doing

tableau20 = [(31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),
(44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),
(148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),
(227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),
(188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229)]
for i in range(len(tableau20)):
r, g, b = tableau20[i]
tableau20[i] = (r / 255., g / 255., b / 255.)
angle = 0
plt.figure(dpi=300, facecolor='w')
_, texts = plt.pie(ab_counts['Counts'],
labels=['A: {:.1f}%'.format(
ab_counts.loc[0,'Counts'] * 100.0 / ab_counts['Counts'].sum() ),
'B: {:.1f}%'.format(
ab_counts.loc[1,'Counts'] * 100.0 / ab_counts['Counts'].sum() )
],
labeldistance = 0.88,
colors = tableau20[::4],
wedgeprops = { 'linewidth' : 7, 'edgecolor' : 'white' },
startangle = angle
)
for text in texts:
text.set_color('white')
text.set_fontweight('bold')
text.set_rotation(angle)
text.set_horizontalalignment('center')
plt.axis('equal')
p = plt.gcf()
p.gca().add_artist(plt.Circle( (0,0), 0.8, color='white'))
plt.title('Members of A/B Categories', y = 0.462, fontweight='bold', fontsize = 12)
plt.subplots_adjust(right=0.9)
plt.tight_layout()
plt.show()
plt.savefig('ab_test_pie_chart.png', dpi=600)
plt.close()

Very advanced! You clearly know how to make customized charts beyond what our course teaches. Great work!

Wow

a = [
app_pivot[app_pivot['AB_Test_Group'] == 'A']['Total'].values[0],
app_pivot[app_pivot['AB_Test_Group'] == 'A']['Application'].values[0],
final_member_pivot[final_member_pivot['AB_Test_Group'] == 'A']['Member'].values[0],
]
b = [
app_pivot[app_pivot['AB_Test_Group'] == 'B']['Total'].values[0],
app_pivot[app_pivot['AB_Test_Group'] == 'B']['Application'].values[0],
final_member_pivot[final_member_pivot['AB_Test_Group'] == 'B']['Member'].values[0],
]
a,b
# In[31]:
ax = plt.subplot()
plt.plot(
range(len(a)),
a,
color = tableau20[0],
alpha = 1,
marker = 'o',
label = 'Fitness Test')
plt.plot(
range(len(b)),
b,
color = tableau20[4],
alpha = 1,
linestyle= '--',
marker = 'o',
label = 'No Fitness Test')
plt.title('Visitors in Each Test Group', fontsize=16, y = 1.1)
ax.set_xticks(range(len(a)))
ax.set_yticks(range(0,int(max(max(a,b))*1.10),int(round(max(max(a,b))*.1,-2))))
ax.set_xticklabels(['Vistors in Group','Applications', 'Members'])
ax.set_ylabel('Number', fontsize=14)
ax.set_xlabel('Step in Funnel', fontsize=14, fontweight = 'bold')
ax.get_xaxis().tick_bottom()
ax.get_yaxis().tick_left()
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
plt.xticks(fontsize=14, rotation=0)
plt.yticks(fontsize=14)
plt.legend()
plt.show
# In[36]:
a_flow = [app_pivot[app_pivot['AB_Test_Group'] == 'A']['Total'].values[0],
-app_pivot[app_pivot['AB_Test_Group'] == 'A']['No Application'].values[0],
-member_pivot[member_pivot['AB_Test_Group'] == 'A']['Not Member'].values[0],
-final_member_pivot[final_member_pivot['AB_Test_Group'] == 'A']\
['Member'].values[0]]
#a_flow = [float(x) * 100 / a_flow[0] for x in a_flow]
b_flow = [app_pivot[app_pivot['AB_Test_Group'] == 'B']['Total'].values[0],
-member_pivot[member_pivot['AB_Test_Group'] == 'B']['Not Member'].values[0],
-app_pivot[app_pivot['AB_Test_Group'] == 'B']['No Application'].values[0],
-final_member_pivot[final_member_pivot['AB_Test_Group'] == 'B']\
['Member'].values[0]]
#b_flow = [float(x) * 100/ b_flow[0] for x in b_flow]
a_flow, b_flow
# In[34]:
from matplotlib.sankey import Sankey
# In[42]:
labels = ['', 'Did Not Apply', 'Applied but\nDid Not\nBecome Member' , 'Became\nMember']
labels2 = ['', 'Applied but\nDid Not\nBecome Member', 'Did Not Apply', 'Became\nMember']
plt.close()
fig = plt.figure(figsize=(7,7.2), dpi=600, frameon=False)
ax = fig.add_subplot(2, 1, 1, xticks=[], yticks=[], frameon=False,)
ax.set_title("Flow Diagram of MuscleHub Visitors",y = 1.1, fontweight='bold')
ax2 = fig.add_subplot(2,1,2, xticks=[], yticks=[], frameon=False,)
sankey = Sankey(ax=ax, scale=0.0005, offset=0.65, head_angle=120,
format = '%d', unit=' visitors', gap=1,
radius=0.05, shoulder=0.05,)
sankey.add(flows = a_flow,
labels = labels,
orientations= [0, 1, 1, 0],
pathlengths=[0, 1, 2.5, 1],
trunklength=5.,
facecolor=tableau20[0],
label="Fitness Test"
).finish()
sankey2 = Sankey(ax=ax2, scale=0.0005, offset=0.65, head_angle=120,
format = '%d', unit=' visitors', gap=1,
radius=0.04, shoulder=0.05,)
sankey2.add(flows = b_flow,
labels = labels2,
orientations= [0, -1, -1, 0],
pathlengths=[0, 2.5, 1, 1],
trunklength=5.,
facecolor=tableau20[4],
label="No Fitness Test"
).finish()
ax.text(0.25, 0.25, 'Fitness Test', horizontalalignment='center',
verticalalignment='center', transform=ax.transAxes, fontweight='bold')
ax2.text(0.25, -0.3, 'No Fitness Test', horizontalalignment='center',
verticalalignment='center', transform=ax.transAxes, fontweight='bold')
plt.tight_layout()
plt.show()

Just wow

Additional statistical test and inputs from tables

p_val = binom_test(x = [app_pivot.loc[1,'Application'], app_pivot.loc[1,'No Application']],
p = app_pivot.loc[0,'Percent with Application'], alternative='greater')
chi_p_val= chi2_contingency([app_pivot.loc[0,['Application','No Application']],
app_pivot.loc[1,['Application','No Application']]])[1]

I appreciate that you generate your input values and contingency tables directly from the app_pivot data rather than inputting manually. Plus, bonus points for including the additional binomial test.

Nice summary statements

print('Rows, Columns: {}'.format(df.shape))
print('Columns: {}'.format(', '.join(df.columns)))

Did we ever cover string replacers with .format()? Anyway good work! This is the current, up-to-date method for printing passing data into strings and print to the terminal. You are clearly bringing in outside knowledge or did some good research!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.