The musclehub_submission from kaumaron

Executes correctly, but want to point out a technicality

Line 177 in dafbc96

    
           df['AB_Test_Group'] = df['Fitness_Test'].apply(lambda x: 'B' if x == None else 'A')

It's good practice when working with None data types to use the operators
is or pd.isnull() instead of ==
and
is not or pd.notnull() instead of !=.

The reason for this is that by definition , None is not a value and something cannot be equal to it. However, the people you write python know that people often make the same assumpution as you and made a workaround where ==None will raise is None. You can read up on it here. Your code executes correctly as written, so no worries here.

Well written SQL query!

musclehub_submission/musclehub.py

Lines 120 to 152 in dafbc96

    
           from  
        
           (select first_name, last_name, gender, email, visit_date 
        
           from visits 
        
           where visit_date >='7-1-17') a 
        
           left join  
        
           (select first_name, last_name, email, fitness_test_date 
        
           from fitness_tests) b 
        
           on a.first_name = b.first_name 
        
               and a.last_name = b.last_name 
        
               and a.email = b.email 
        
           left join 
        
           (select first_name, last_name, email, application_date 
        
           from applications) c 
        
           on a.first_name = c.first_name 
        
               and a.last_name = c.last_name 
        
               and a.email = c.email 
        
           left join 
        
           (select first_name, last_name, email, purchase_date 
        
           from purchases) d 
        
           on a.first_name = d.first_name 
        
               and a.last_name = d.last_name 
        
               and a.email = d.email; 
        
           ''')

Very nice SQL query! You make good use of subqueries and aliases here.

Much appreciated!

musclehub_submission/musclehub.py

Line 168 in dafbc96

get_ipython().magic('matplotlib inline')

Good use of the "magic line" to generate graphs within the notebook. Another way of executing this is: %matplotlib inline

Good use of logic to interpret p-vals

musclehub_submission/musclehub.py

Lines 324 to 328 in dafbc96

    
           print('p-value: {:.6f} | {:.6f} - '.format(p_val, chi_p_val)), 
        
           if p_val > 0.05 or chi_p_val > 0.05: 
        
               print('null hypothesis cannot be rejected.') 
        
           else: 
        
               print('null hypothesis can be rejected!\nSignficant difference from untreated group (A).')

Smart use of if/else logic to interpret the results of your statistical tests.

Summary

AMAZING WORK!! Your code is impeccable and your presentation (plus the video!) communicates a firm understanding of the data and how it translates to this real-world problem.

All dataframes, graphs and statistical significance tests were constructed and ran correctly in your code. Well done! You went above and beyond when customization of your graphs. The final line graph and Sankey graph are brilliant ways of visualizing the funnel (I agree much better than bar graphs). Plus the inclusion of binomial tests and logic to interpret your p-val results demonstrates a thorough understanding of the statistical concepts.

In the slides and video, I appreciate the thorough descriptions of the problem and steps you took to solve it. You really translated and condensed the process in your own words, demonstrating your complete understanding of the project. Your inclusion of actual individuals from the dataset under qualitative examples is a great touch. Including additional statistics like lift make your argument more convincing and would be much appreciated by a client.

All in all, fantastic job! To be honest one of the best and most thorough projects I have reviewed. I like your passion! You have a great analytical future ahead of you!

Great customization

musclehub_submission/musclehub.py

Lines 454 to 474 in dafbc96

    
           ax = plt.subplot() 
        
           plt.bar( 
        
               range(app_pivot.shape[0]), 
        
               app_pivot['Percent with Application']*100, 
        
               yerr = None, 
        
               capsize=5, 
        
               color = tableau20[5], 
        
               alpha = 1) 
        
           plt.title('Percentage of Visitors Who Apply', fontsize=16, y = 1.1) 
        
           ax.set_xticks(range(app_pivot.shape[0])) 
        
           ax.set_yticks(range(0,int(app_pivot['Percent with Application'].max()*100 + 10),5)) 
        
           ax.set_xticklabels(['Fitness Test', 'No Fitness Test']) 
        
           ax.set_ylabel('Applied (%)', fontsize=14) 
        
           ax.set_xlabel('Test Group', fontsize=14) 
        
           ax.get_xaxis().tick_bottom()   
        
           ax.get_yaxis().tick_left()   
        
           ax.spines["top"].set_visible(False)   
        
           ax.spines["right"].set_visible(False) 
        
           plt.xticks(fontsize=14, rotation=0)   
        
           plt.yticks(fontsize=14)  
        
           plt.show

Great work customizing your graphs. You clearly have gone through the documentation and are familiar with all the options available to alter your graphs. My favorites in here are the removal of the right and top spines, the color of your bars and adjust font size.

Good sanity check!

musclehub_submission/musclehub.py

Line 37 in dafbc96

df.head()

Good job printing to the terminal as a sanity check and to make sure you understand the data!

You know what you're doing

musclehub_submission/musclehub.py

Lines 203 to 242 in dafbc96

    
           tableau20 = [(31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),     
        
                       (44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),     
        
                       (148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),     
        
                       (227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),     
        
                       (188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229)] 
        
           for i in range(len(tableau20)):     
        
              r, g, b = tableau20[i]     
        
              tableau20[i] = (r / 255., g / 255., b / 255.) 
        
           angle = 0 
        
           plt.figure(dpi=300, facecolor='w') 
        
           _, texts = plt.pie(ab_counts['Counts'], 
        
                  labels=['A: {:.1f}%'.format( 
        
                              ab_counts.loc[0,'Counts'] * 100.0 / ab_counts['Counts'].sum() ), 
        
                          'B: {:.1f}%'.format( 
        
                              ab_counts.loc[1,'Counts'] * 100.0 / ab_counts['Counts'].sum() ) 
        
                         ], 
        
                  labeldistance = 0.88, 
        
                  colors = tableau20[::4], 
        
                  wedgeprops = { 'linewidth' : 7, 'edgecolor' : 'white' }, 
        
                  startangle = angle 
        
                 ) 
        
           for text in texts: 
        
              text.set_color('white') 
        
              text.set_fontweight('bold') 
        
              text.set_rotation(angle) 
        
              text.set_horizontalalignment('center') 
        
           plt.axis('equal') 
        
           p = plt.gcf() 
        
           p.gca().add_artist(plt.Circle( (0,0), 0.8, color='white')) 
        
           plt.title('Members of A/B Categories', y = 0.462, fontweight='bold', fontsize = 12) 
        
           plt.subplots_adjust(right=0.9) 
        
           plt.tight_layout() 
        
           plt.show() 
        
           plt.savefig('ab_test_pie_chart.png', dpi=600) 
        
           plt.close()

Very advanced! You clearly know how to make customized charts beyond what our course teaches. Great work!

Wow

musclehub_submission/musclehub.py

Lines 531 to 647 in dafbc96

    
           a = [ 
        
               app_pivot[app_pivot['AB_Test_Group'] == 'A']['Total'].values[0], 
        
               app_pivot[app_pivot['AB_Test_Group'] == 'A']['Application'].values[0], 
        
               final_member_pivot[final_member_pivot['AB_Test_Group'] == 'A']['Member'].values[0], 
        
           ] 
        
           b = [ 
        
               app_pivot[app_pivot['AB_Test_Group'] == 'B']['Total'].values[0], 
        
               app_pivot[app_pivot['AB_Test_Group'] == 'B']['Application'].values[0], 
        
               final_member_pivot[final_member_pivot['AB_Test_Group'] == 'B']['Member'].values[0], 
        
           ] 
        
           a,b 
        
           # In[31]: 
        
           ax = plt.subplot() 
        
           plt.plot( 
        
               range(len(a)), 
        
               a, 
        
               color = tableau20[0], 
        
               alpha = 1, 
        
               marker = 'o', 
        
               label = 'Fitness Test') 
        
           plt.plot( 
        
               range(len(b)), 
        
               b, 
        
               color = tableau20[4], 
        
               alpha = 1, 
        
               linestyle= '--', 
        
               marker = 'o', 
        
               label = 'No Fitness Test') 
        
           plt.title('Visitors in Each Test Group', fontsize=16, y = 1.1) 
        
           ax.set_xticks(range(len(a))) 
        
           ax.set_yticks(range(0,int(max(max(a,b))*1.10),int(round(max(max(a,b))*.1,-2)))) 
        
           ax.set_xticklabels(['Vistors in Group','Applications', 'Members']) 
        
           ax.set_ylabel('Number', fontsize=14) 
        
           ax.set_xlabel('Step in Funnel', fontsize=14, fontweight = 'bold') 
        
           ax.get_xaxis().tick_bottom()   
        
           ax.get_yaxis().tick_left()   
        
           ax.spines["top"].set_visible(False)   
        
           ax.spines["right"].set_visible(False) 
        
           plt.xticks(fontsize=14, rotation=0)   
        
           plt.yticks(fontsize=14)  
        
           plt.legend() 
        
           plt.show 
        
           # In[36]: 
        
           a_flow = [app_pivot[app_pivot['AB_Test_Group'] == 'A']['Total'].values[0], 
        
                    -app_pivot[app_pivot['AB_Test_Group'] == 'A']['No Application'].values[0], 
        
                    -member_pivot[member_pivot['AB_Test_Group'] == 'A']['Not Member'].values[0], 
        
                    -final_member_pivot[final_member_pivot['AB_Test_Group'] == 'A']\ 
        
                     ['Member'].values[0]] 
        
           #a_flow = [float(x) * 100 / a_flow[0] for x  in a_flow] 
        
           b_flow = [app_pivot[app_pivot['AB_Test_Group'] == 'B']['Total'].values[0], 
        
                    -member_pivot[member_pivot['AB_Test_Group'] == 'B']['Not Member'].values[0], 
        
                    -app_pivot[app_pivot['AB_Test_Group'] == 'B']['No Application'].values[0], 
        
                    -final_member_pivot[final_member_pivot['AB_Test_Group'] == 'B']\ 
        
                     ['Member'].values[0]] 
        
           #b_flow = [float(x) * 100/ b_flow[0] for x  in b_flow] 
        
           a_flow, b_flow 
        
           # In[34]: 
        
           from matplotlib.sankey import Sankey 
        
           # In[42]: 
        
           labels = ['', 'Did Not Apply', 'Applied but\nDid Not\nBecome Member' , 'Became\nMember'] 
        
           labels2 = ['', 'Applied but\nDid Not\nBecome Member', 'Did Not Apply', 'Became\nMember'] 
        
           plt.close() 
        
           fig = plt.figure(figsize=(7,7.2), dpi=600, frameon=False) 
        
           ax = fig.add_subplot(2, 1, 1, xticks=[], yticks=[], frameon=False,) 
        
           ax.set_title("Flow Diagram of MuscleHub Visitors",y = 1.1, fontweight='bold') 
        
           ax2 = fig.add_subplot(2,1,2, xticks=[], yticks=[], frameon=False,) 
        
           sankey = Sankey(ax=ax, scale=0.0005, offset=0.65, head_angle=120, 
        
                           format = '%d', unit=' visitors', gap=1,  
        
                           radius=0.05, shoulder=0.05,) 
        
           sankey.add(flows = a_flow, 
        
                       labels = labels, 
        
                       orientations= [0, 1, 1, 0], 
        
                       pathlengths=[0, 1, 2.5, 1], 
        
                       trunklength=5., 
        
                       facecolor=tableau20[0], 
        
                       label="Fitness Test" 
        
                     ).finish() 
        
           sankey2 = Sankey(ax=ax2, scale=0.0005, offset=0.65, head_angle=120, 
        
                           format = '%d', unit=' visitors', gap=1,  
        
                           radius=0.04, shoulder=0.05,) 
        
           sankey2.add(flows = b_flow, 
        
                       labels = labels2, 
        
                       orientations= [0, -1, -1, 0], 
        
                       pathlengths=[0, 2.5, 1, 1], 
        
                       trunklength=5., 
        
                       facecolor=tableau20[4], 
        
                       label="No Fitness Test" 
        
                     ).finish() 
        
           ax.text(0.25, 0.25, 'Fitness Test', horizontalalignment='center', 
        
                   verticalalignment='center', transform=ax.transAxes, fontweight='bold') 
        
           ax2.text(0.25, -0.3, 'No Fitness Test', horizontalalignment='center', 
        
                   verticalalignment='center', transform=ax.transAxes, fontweight='bold') 
        
           plt.tight_layout() 
        
           plt.show()

Just wow

Getting more detail, I like it!

musclehub_submission/musclehub.py

Lines 113 to 114 in dafbc96

    
                   a.gender as Gender, 
        
                   a.email as email,

You're pulling in additional columns that we did not specify, wondering what you're going to use them for! Excited to see your analysis!

Additional statistical test and inputs from tables

musclehub_submission/musclehub.py

Lines 316 to 321 in dafbc96

    
           p_val = binom_test(x = [app_pivot.loc[1,'Application'], app_pivot.loc[1,'No Application']], 
        
                      p = app_pivot.loc[0,'Percent with Application'], alternative='greater') 
        
           chi_p_val= chi2_contingency([app_pivot.loc[0,['Application','No Application']], 
        
                            app_pivot.loc[1,['Application','No Application']]])[1]

I appreciate that you generate your input values and contingency tables directly from the app_pivot data rather than inputting manually. Plus, bonus points for including the additional binomial test.

Nice summary statements

musclehub_submission/musclehub.py

Lines 154 to 155 in dafbc96

    
           print('Rows, Columns: {}'.format(df.shape)) 
        
           print('Columns: {}'.format(', '.join(df.columns)))

Did we ever cover string replacers with .format()? Anyway good work! This is the current, up-to-date method for printing passing data into strings and print to the terminal. You are clearly bringing in outside knowledge or did some good research!

kaumaron / musclehub_submission Goto Github PK

musclehub_submission's People

Contributors

Watchers

musclehub_submission's Issues

Executes correctly, but want to point out a technicality

Well written SQL query!

Much appreciated!

Good use of logic to interpret p-vals

Summary

Great customization

Good sanity check!

You know what you're doing

Wow

Getting more detail, I like it!

Additional statistical test and inputs from tables

Nice summary statements

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	from

	(select first_name, last_name, gender, email, visit_date
	from visits
	where visit_date >='7-1-17') a

	left join

	(select first_name, last_name, email, fitness_test_date
	from fitness_tests) b

	on a.first_name = b.first_name
	and a.last_name = b.last_name
	and a.email = b.email

	left join

	(select first_name, last_name, email, application_date
	from applications) c

	on a.first_name = c.first_name
	and a.last_name = c.last_name
	and a.email = c.email

	left join

	(select first_name, last_name, email, purchase_date
	from purchases) d

	on a.first_name = d.first_name
	and a.last_name = d.last_name
	and a.email = d.email;
	''')

	print('p-value: {:.6f} \| {:.6f} - '.format(p_val, chi_p_val)),
	if p_val > 0.05 or chi_p_val > 0.05:
	print('null hypothesis cannot be rejected.')
	else:
	print('null hypothesis can be rejected!\nSignficant difference from untreated group (A).')

	ax = plt.subplot()
	plt.bar(
	range(app_pivot.shape[0]),
	app_pivot['Percent with Application']*100,
	yerr = None,
	capsize=5,
	color = tableau20[5],
	alpha = 1)
	plt.title('Percentage of Visitors Who Apply', fontsize=16, y = 1.1)
	ax.set_xticks(range(app_pivot.shape[0]))
	ax.set_yticks(range(0,int(app_pivot['Percent with Application'].max()*100 + 10),5))
	ax.set_xticklabels(['Fitness Test', 'No Fitness Test'])
	ax.set_ylabel('Applied (%)', fontsize=14)
	ax.set_xlabel('Test Group', fontsize=14)
	ax.get_xaxis().tick_bottom()
	ax.get_yaxis().tick_left()
	ax.spines["top"].set_visible(False)
	ax.spines["right"].set_visible(False)
	plt.xticks(fontsize=14, rotation=0)
	plt.yticks(fontsize=14)
	plt.show

	tableau20 = [(31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),
	(44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),
	(148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),
	(227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),
	(188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229)]
	for i in range(len(tableau20)):
	r, g, b = tableau20[i]
	tableau20[i] = (r / 255., g / 255., b / 255.)

	angle = 0

	plt.figure(dpi=300, facecolor='w')
	_, texts = plt.pie(ab_counts['Counts'],
	labels=['A: {:.1f}%'.format(
	ab_counts.loc[0,'Counts'] * 100.0 / ab_counts['Counts'].sum() ),
	'B: {:.1f}%'.format(
	ab_counts.loc[1,'Counts'] * 100.0 / ab_counts['Counts'].sum() )
	],
	labeldistance = 0.88,
	colors = tableau20[::4],
	wedgeprops = { 'linewidth' : 7, 'edgecolor' : 'white' },
	startangle = angle
	)


	for text in texts:
	text.set_color('white')
	text.set_fontweight('bold')
	text.set_rotation(angle)
	text.set_horizontalalignment('center')

	plt.axis('equal')
	p = plt.gcf()
	p.gca().add_artist(plt.Circle( (0,0), 0.8, color='white'))
	plt.title('Members of A/B Categories', y = 0.462, fontweight='bold', fontsize = 12)
	plt.subplots_adjust(right=0.9)
	plt.tight_layout()
	plt.show()
	plt.savefig('ab_test_pie_chart.png', dpi=600)
	plt.close()

	a = [
	app_pivot[app_pivot['AB_Test_Group'] == 'A']['Total'].values[0],
	app_pivot[app_pivot['AB_Test_Group'] == 'A']['Application'].values[0],
	final_member_pivot[final_member_pivot['AB_Test_Group'] == 'A']['Member'].values[0],
	]

	b = [
	app_pivot[app_pivot['AB_Test_Group'] == 'B']['Total'].values[0],
	app_pivot[app_pivot['AB_Test_Group'] == 'B']['Application'].values[0],
	final_member_pivot[final_member_pivot['AB_Test_Group'] == 'B']['Member'].values[0],
	]
	a,b


	# In[31]:

	ax = plt.subplot()
	plt.plot(
	range(len(a)),
	a,
	color = tableau20[0],
	alpha = 1,
	marker = 'o',
	label = 'Fitness Test')
	plt.plot(
	range(len(b)),
	b,
	color = tableau20[4],
	alpha = 1,
	linestyle= '--',
	marker = 'o',
	label = 'No Fitness Test')
	plt.title('Visitors in Each Test Group', fontsize=16, y = 1.1)
	ax.set_xticks(range(len(a)))
	ax.set_yticks(range(0,int(max(max(a,b))1.10),int(round(max(max(a,b)).1,-2))))
	ax.set_xticklabels(['Vistors in Group','Applications', 'Members'])
	ax.set_ylabel('Number', fontsize=14)
	ax.set_xlabel('Step in Funnel', fontsize=14, fontweight = 'bold')
	ax.get_xaxis().tick_bottom()
	ax.get_yaxis().tick_left()
	ax.spines["top"].set_visible(False)
	ax.spines["right"].set_visible(False)
	plt.xticks(fontsize=14, rotation=0)
	plt.yticks(fontsize=14)
	plt.legend()
	plt.show


	# In[36]:

	a_flow = [app_pivot[app_pivot['AB_Test_Group'] == 'A']['Total'].values[0],
	-app_pivot[app_pivot['AB_Test_Group'] == 'A']['No Application'].values[0],
	-member_pivot[member_pivot['AB_Test_Group'] == 'A']['Not Member'].values[0],
	-final_member_pivot[final_member_pivot['AB_Test_Group'] == 'A']\
	['Member'].values[0]]
	#a_flow = [float(x) * 100 / a_flow[0] for x in a_flow]

	b_flow = [app_pivot[app_pivot['AB_Test_Group'] == 'B']['Total'].values[0],
	-member_pivot[member_pivot['AB_Test_Group'] == 'B']['Not Member'].values[0],
	-app_pivot[app_pivot['AB_Test_Group'] == 'B']['No Application'].values[0],
	-final_member_pivot[final_member_pivot['AB_Test_Group'] == 'B']\
	['Member'].values[0]]
	#b_flow = [float(x) * 100/ b_flow[0] for x in b_flow]

	a_flow, b_flow


	# In[34]:

	from matplotlib.sankey import Sankey


	# In[42]:

	labels = ['', 'Did Not Apply', 'Applied but\nDid Not\nBecome Member' , 'Became\nMember']
	labels2 = ['', 'Applied but\nDid Not\nBecome Member', 'Did Not Apply', 'Became\nMember']

	plt.close()


	fig = plt.figure(figsize=(7,7.2), dpi=600, frameon=False)

	ax = fig.add_subplot(2, 1, 1, xticks=[], yticks=[], frameon=False,)
	ax.set_title("Flow Diagram of MuscleHub Visitors",y = 1.1, fontweight='bold')
	ax2 = fig.add_subplot(2,1,2, xticks=[], yticks=[], frameon=False,)

	sankey = Sankey(ax=ax, scale=0.0005, offset=0.65, head_angle=120,
	format = '%d', unit=' visitors', gap=1,
	radius=0.05, shoulder=0.05,)
	sankey.add(flows = a_flow,
	labels = labels,
	orientations= [0, 1, 1, 0],
	pathlengths=[0, 1, 2.5, 1],
	trunklength=5.,
	facecolor=tableau20[0],
	label="Fitness Test"
	).finish()

	sankey2 = Sankey(ax=ax2, scale=0.0005, offset=0.65, head_angle=120,
	format = '%d', unit=' visitors', gap=1,
	radius=0.04, shoulder=0.05,)
	sankey2.add(flows = b_flow,
	labels = labels2,
	orientations= [0, -1, -1, 0],
	pathlengths=[0, 2.5, 1, 1],
	trunklength=5.,
	facecolor=tableau20[4],
	label="No Fitness Test"
	).finish()

	ax.text(0.25, 0.25, 'Fitness Test', horizontalalignment='center',
	verticalalignment='center', transform=ax.transAxes, fontweight='bold')
	ax2.text(0.25, -0.3, 'No Fitness Test', horizontalalignment='center',
	verticalalignment='center', transform=ax.transAxes, fontweight='bold')

	plt.tight_layout()
	plt.show()

	p_val = binom_test(x = [app_pivot.loc[1,'Application'], app_pivot.loc[1,'No Application']],
	p = app_pivot.loc[0,'Percent with Application'], alternative='greater')


	chi_p_val= chi2_contingency([app_pivot.loc[0,['Application','No Application']],
	app_pivot.loc[1,['Application','No Application']]])[1]

	print('Rows, Columns: {}'.format(df.shape))
	print('Columns: {}'.format(', '.join(df.columns)))