Different Types of Join Example for Practice.
This is practice notebook you can fork star and use this notebook for your use. For More Read this Article: https://www.analyticsvidhya.com/blog/2020/02/joins-in-pandas-master-the-different-types-of-joins-in-python/
product = pd .DataFrame ({
'Product_ID' :[101 ,102 ,103 ,104 ,105 ,106 ,107 ],
'Product_name' :['Watch' ,'Bag' ,'Shoes' ,'Smartphone' ,'Books' ,'Oil' ,'Laptop' ],
'Category' :['Fashion' ,'Fashion' ,'Fashion' ,'Electronics' ,'Study' ,'Grocery' ,'Electronics' ],
'Price' :[299.0 ,1350.50 ,2999.0 ,14999.0 ,145.0 ,110.0 ,79999.0 ],
'Seller_City' :['Delhi' ,'Mumbai' ,'Chennai' ,'Kolkata' ,'Delhi' ,'Chennai' ,'Bengalore' ]
})
Product_ID
Product_name
Category
Price
Seller_City
0
101
Watch
Fashion
299.0
Delhi
1
102
Bag
Fashion
1350.5
Mumbai
2
103
Shoes
Fashion
2999.0
Chennai
3
104
Smartphone
Electronics
14999.0
Kolkata
4
105
Books
Study
145.0
Delhi
5
106
Oil
Grocery
110.0
Chennai
6
107
Laptop
Electronics
79999.0
Bengalore
Inner join is the most common type of join you’ll be working with. It returns a dataframe with only those rows that have common characteristics.
customer = pd .DataFrame ({
'id' :[1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ],
'name' :['Olivia' ,'Aditya' ,'Cory' ,'Isabell' ,'Dominic' ,'Tyler' ,'Samuel' ,'Daniel' ,'Jeremy' ],
'age' :[20 ,25 ,15 ,10 ,30 ,65 ,35 ,18 ,23 ],
'Product_ID' :[101 ,0 ,106 ,0 ,103 ,104 ,0 ,0 ,107 ],
'Purchased_Product' :['Watch' ,'NA' ,'Oil' ,'NA' ,'Shoes' ,'Smartphone' ,'NA' ,'NA' ,'Laptop' ],
'City' :['Mumbai' ,'Delhi' ,'Bangalore' ,'Chennai' ,'Chennai' ,'Delhi' ,'Kolkata' ,'Delhi' ,'Mumbai' ]
})
customer
id
name
age
Product_ID
Purchased_Product
City
0
1
Olivia
20
101
Watch
Mumbai
1
2
Aditya
25
0
NA
Delhi
2
3
Cory
15
106
Oil
Bangalore
3
4
Isabell
10
0
NA
Chennai
4
5
Dominic
30
103
Shoes
Chennai
5
6
Tyler
65
104
Smartphone
Delhi
6
7
Samuel
35
0
NA
Kolkata
7
8
Daniel
18
0
NA
Delhi
8
9
Jeremy
23
107
Laptop
Mumbai
pd .merge (product ,customer ,on = 'Product_ID' )
Product_ID
Product_name
Category
Price
Seller_City
id
name
age
Purchased_Product
City
0
101
Watch
Fashion
299.0
Delhi
1
Olivia
20
Watch
Mumbai
1
103
Shoes
Fashion
2999.0
Chennai
5
Dominic
30
Shoes
Chennai
2
104
Smartphone
Electronics
14999.0
Kolkata
6
Tyler
65
Smartphone
Delhi
3
106
Oil
Grocery
110.0
Chennai
3
Cory
15
Oil
Bangalore
4
107
Laptop
Electronics
79999.0
Bengalore
9
Jeremy
23
Laptop
Mumbai
pd .merge (product ,customer ,left_on = 'Product_name' ,right_on = 'Purchased_Product' )
Product_ID_x
Product_name
Category
Price
Seller_City
id
name
age
Product_ID_y
Purchased_Product
City
0
101
Watch
Fashion
299.0
Delhi
1
Olivia
20
101
Watch
Mumbai
1
103
Shoes
Fashion
2999.0
Chennai
5
Dominic
30
103
Shoes
Chennai
2
104
Smartphone
Electronics
14999.0
Kolkata
6
Tyler
65
104
Smartphone
Delhi
3
106
Oil
Grocery
110.0
Chennai
3
Cory
15
106
Oil
Bangalore
4
107
Laptop
Electronics
79999.0
Bengalore
9
Jeremy
23
107
Laptop
Mumbai
pd .merge (product ,customer ,how = 'inner' ,left_on = ['Product_ID' ,'Seller_City' ],right_on = ['Product_ID' ,'City' ])
Product_ID
Product_name
Category
Price
Seller_City
id
name
age
Purchased_Product
City
0
103
Shoes
Fashion
2999.0
Chennai
5
Dominic
30
Shoes
Chennai
Full Join, also known as Full Outer Join, returns all those records which either have a match in the left or right dataframe.
pd .merge (product ,customer ,on = 'Product_ID' ,how = 'outer' )
Product_ID
Product_name
Category
Price
Seller_City
id
name
age
Purchased_Product
City
0
101
Watch
Fashion
299.0
Delhi
1.0
Olivia
20.0
Watch
Mumbai
1
102
Bag
Fashion
1350.5
Mumbai
NaN
NaN
NaN
NaN
NaN
2
103
Shoes
Fashion
2999.0
Chennai
5.0
Dominic
30.0
Shoes
Chennai
3
104
Smartphone
Electronics
14999.0
Kolkata
6.0
Tyler
65.0
Smartphone
Delhi
4
105
Books
Study
145.0
Delhi
NaN
NaN
NaN
NaN
NaN
5
106
Oil
Grocery
110.0
Chennai
3.0
Cory
15.0
Oil
Bangalore
6
107
Laptop
Electronics
79999.0
Bengalore
9.0
Jeremy
23.0
Laptop
Mumbai
7
0
NaN
NaN
NaN
NaN
2.0
Aditya
25.0
NA
Delhi
8
0
NaN
NaN
NaN
NaN
4.0
Isabell
10.0
NA
Chennai
9
0
NaN
NaN
NaN
NaN
7.0
Samuel
35.0
NA
Kolkata
10
0
NaN
NaN
NaN
NaN
8.0
Daniel
18.0
NA
Delhi
pd .merge (product ,customer ,on = 'Product_ID' ,how = 'outer' ,indicator = True )
Product_ID
Product_name
Category
Price
Seller_City
id
name
age
Purchased_Product
City
_merge
0
101
Watch
Fashion
299.0
Delhi
1.0
Olivia
20.0
Watch
Mumbai
both
1
102
Bag
Fashion
1350.5
Mumbai
NaN
NaN
NaN
NaN
NaN
left_only
2
103
Shoes
Fashion
2999.0
Chennai
5.0
Dominic
30.0
Shoes
Chennai
both
3
104
Smartphone
Electronics
14999.0
Kolkata
6.0
Tyler
65.0
Smartphone
Delhi
both
4
105
Books
Study
145.0
Delhi
NaN
NaN
NaN
NaN
NaN
left_only
5
106
Oil
Grocery
110.0
Chennai
3.0
Cory
15.0
Oil
Bangalore
both
6
107
Laptop
Electronics
79999.0
Bengalore
9.0
Jeremy
23.0
Laptop
Mumbai
both
7
0
NaN
NaN
NaN
NaN
2.0
Aditya
25.0
NA
Delhi
right_only
8
0
NaN
NaN
NaN
NaN
4.0
Isabell
10.0
NA
Chennai
right_only
9
0
NaN
NaN
NaN
NaN
7.0
Samuel
35.0
NA
Kolkata
right_only
10
0
NaN
NaN
NaN
NaN
8.0
Daniel
18.0
NA
Delhi
right_only
Left join, also known as Left Outer Join, returns a dataframe containing all the rows of the left dataframe.
pd .merge (product ,customer ,on = 'Product_ID' ,how = 'left' , indicator = True )
Product_ID
Product_name
Category
Price
Seller_City
id
name
age
Purchased_Product
City
_merge
0
101
Watch
Fashion
299.0
Delhi
1.0
Olivia
20.0
Watch
Mumbai
both
1
102
Bag
Fashion
1350.5
Mumbai
NaN
NaN
NaN
NaN
NaN
left_only
2
103
Shoes
Fashion
2999.0
Chennai
5.0
Dominic
30.0
Shoes
Chennai
both
3
104
Smartphone
Electronics
14999.0
Kolkata
6.0
Tyler
65.0
Smartphone
Delhi
both
4
105
Books
Study
145.0
Delhi
NaN
NaN
NaN
NaN
NaN
left_only
5
106
Oil
Grocery
110.0
Chennai
3.0
Cory
15.0
Oil
Bangalore
both
6
107
Laptop
Electronics
79999.0
Bengalore
9.0
Jeremy
23.0
Laptop
Mumbai
both
Right join, also known as Right Outer Join, is similar to the Left Outer Join. The only difference is that all the rows of the right dataframe are taken as it is and only those of the left dataframe that are common in both.
pd .merge (product ,customer ,on = 'Product_ID' ,how = 'right' , indicator = True )
Product_ID
Product_name
Category
Price
Seller_City
id
name
age
Purchased_Product
City
_merge
0
101
Watch
Fashion
299.0
Delhi
1
Olivia
20
Watch
Mumbai
both
1
103
Shoes
Fashion
2999.0
Chennai
5
Dominic
30
Shoes
Chennai
both
2
104
Smartphone
Electronics
14999.0
Kolkata
6
Tyler
65
Smartphone
Delhi
both
3
106
Oil
Grocery
110.0
Chennai
3
Cory
15
Oil
Bangalore
both
4
107
Laptop
Electronics
79999.0
Bengalore
9
Jeremy
23
Laptop
Mumbai
both
5
0
NaN
NaN
NaN
NaN
2
Aditya
25
NA
Delhi
right_only
6
0
NaN
NaN
NaN
NaN
4
Isabell
10
NA
Chennai
right_only
7
0
NaN
NaN
NaN
NaN
7
Samuel
35
NA
Kolkata
right_only
8
0
NaN
NaN
NaN
NaN
8
Daniel
18
NA
Delhi
right_only
Handling Redundancy/Duplicates in Joins
product_dup = pd .DataFrame ({'Product_ID' :[101 ,102 ,103 ,104 ,105 ,106 ,107 ,103 ,107 ],
'Product_name' :['Watch' ,'Bag' ,'Shoes' ,'Smartphone' ,'Books' ,'Oil' ,'Laptop' ,'Shoes' ,'Laptop' ],
'Category' :['Fashion' ,'Fashion' ,'Fashion' ,'Electronics' ,'Study' ,'Grocery' ,'Electronics' ,'Fashion' ,'Electronics' ],
'Price' :[299.0 ,1350.50 ,2999.0 ,14999.0 ,145.0 ,110.0 ,79999.0 ,2999.0 ,79999.0 ],
'Seller_City' :['Delhi' ,'Mumbai' ,'Chennai' ,'Kolkata' ,'Delhi' ,'Chennai' ,'Bengalore' ,'Chennai' ,'Bengalore' ]})
Product_ID
Product_name
Category
Price
Seller_City
0
101
Watch
Fashion
299.0
Delhi
1
102
Bag
Fashion
1350.5
Mumbai
2
103
Shoes
Fashion
2999.0
Chennai
3
104
Smartphone
Electronics
14999.0
Kolkata
4
105
Books
Study
145.0
Delhi
5
106
Oil
Grocery
110.0
Chennai
6
107
Laptop
Electronics
79999.0
Bengalore
7
103
Shoes
Fashion
2999.0
Chennai
8
107
Laptop
Electronics
79999.0
Bengalore
pd .merge (product_dup ,customer ,how = 'inner' ,on = 'Product_ID' )
Product_ID
Product_name
Category
Price
Seller_City
id
name
age
Purchased_Product
City
0
101
Watch
Fashion
299.0
Delhi
1
Olivia
20
Watch
Mumbai
1
103
Shoes
Fashion
2999.0
Chennai
5
Dominic
30
Shoes
Chennai
2
103
Shoes
Fashion
2999.0
Chennai
5
Dominic
30
Shoes
Chennai
3
104
Smartphone
Electronics
14999.0
Kolkata
6
Tyler
65
Smartphone
Delhi
4
106
Oil
Grocery
110.0
Chennai
3
Cory
15
Oil
Bangalore
5
107
Laptop
Electronics
79999.0
Bengalore
9
Jeremy
23
Laptop
Mumbai
6
107
Laptop
Electronics
79999.0
Bengalore
9
Jeremy
23
Laptop
Mumbai
pd .merge (product_dup .drop_duplicates (),customer ,how = 'inner' ,on = 'Product_ID' )
Product_ID
Product_name
Category
Price
Seller_City
id
name
age
Purchased_Product
City
0
101
Watch
Fashion
299.0
Delhi
1
Olivia
20
Watch
Mumbai
1
103
Shoes
Fashion
2999.0
Chennai
5
Dominic
30
Shoes
Chennai
2
104
Smartphone
Electronics
14999.0
Kolkata
6
Tyler
65
Smartphone
Delhi
3
106
Oil
Grocery
110.0
Chennai
3
Cory
15
Oil
Bangalore
4
107
Laptop
Electronics
79999.0
Bengalore
9
Jeremy
23
Laptop
Mumbai
pd .merge (product_dup ,customer ,how = 'inner' ,on = 'Product_ID' ,validate = 'many_to_many' )
Product_ID
Product_name
Category
Price
Seller_City
id
name
age
Purchased_Product
City
0
101
Watch
Fashion
299.0
Delhi
1
Olivia
20
Watch
Mumbai
1
103
Shoes
Fashion
2999.0
Chennai
5
Dominic
30
Shoes
Chennai
2
103
Shoes
Fashion
2999.0
Chennai
5
Dominic
30
Shoes
Chennai
3
104
Smartphone
Electronics
14999.0
Kolkata
6
Tyler
65
Smartphone
Delhi
4
106
Oil
Grocery
110.0
Chennai
3
Cory
15
Oil
Bangalore
5
107
Laptop
Electronics
79999.0
Bengalore
9
Jeremy
23
Laptop
Mumbai
6
107
Laptop
Electronics
79999.0
Bengalore
9
Jeremy
23
Laptop
Mumbai