Giter Site home page Giter Site logo

muskanmahajan37 / pandas-joins Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ashishpatel26/pandas-joins

0.0 0.0 0.0 20 KB

This is practice notebook you can fork star and use this notebook for your use. For More Read this Article: https://www.analyticsvidhya.com/blog/2020/02/joins-in-pandas-master-the-different-types-of-joins-in-python/

Jupyter Notebook 100.00%

pandas-joins's Introduction

Different Types of Join Example for Practice.

This is practice notebook you can fork star and use this notebook for your use. For More Read this Article: https://www.analyticsvidhya.com/blog/2020/02/joins-in-pandas-master-the-different-types-of-joins-in-python/

import pandas as pd
product=pd.DataFrame({
    'Product_ID':[101,102,103,104,105,106,107],
    'Product_name':['Watch','Bag','Shoes','Smartphone','Books','Oil','Laptop'],
    'Category':['Fashion','Fashion','Fashion','Electronics','Study','Grocery','Electronics'],
    'Price':[299.0,1350.50,2999.0,14999.0,145.0,110.0,79999.0],
    'Seller_City':['Delhi','Mumbai','Chennai','Kolkata','Delhi','Chennai','Bengalore']
})
product
Product_ID Product_name Category Price Seller_City
0 101 Watch Fashion 299.0 Delhi
1 102 Bag Fashion 1350.5 Mumbai
2 103 Shoes Fashion 2999.0 Chennai
3 104 Smartphone Electronics 14999.0 Kolkata
4 105 Books Study 145.0 Delhi
5 106 Oil Grocery 110.0 Chennai
6 107 Laptop Electronics 79999.0 Bengalore

Inner Join

Inner join is the most common type of join you’ll be working with. It returns a dataframe with only those rows that have common characteristics.

customer=pd.DataFrame({
    'id':[1,2,3,4,5,6,7,8,9],
    'name':['Olivia','Aditya','Cory','Isabell','Dominic','Tyler','Samuel','Daniel','Jeremy'],
    'age':[20,25,15,10,30,65,35,18,23],
    'Product_ID':[101,0,106,0,103,104,0,0,107],
    'Purchased_Product':['Watch','NA','Oil','NA','Shoes','Smartphone','NA','NA','Laptop'],
    'City':['Mumbai','Delhi','Bangalore','Chennai','Chennai','Delhi','Kolkata','Delhi','Mumbai']
})
customer
id name age Product_ID Purchased_Product City
0 1 Olivia 20 101 Watch Mumbai
1 2 Aditya 25 0 NA Delhi
2 3 Cory 15 106 Oil Bangalore
3 4 Isabell 10 0 NA Chennai
4 5 Dominic 30 103 Shoes Chennai
5 6 Tyler 65 104 Smartphone Delhi
6 7 Samuel 35 0 NA Kolkata
7 8 Daniel 18 0 NA Delhi
8 9 Jeremy 23 107 Laptop Mumbai
pd.merge(product,customer,on='Product_ID')
Product_ID Product_name Category Price Seller_City id name age Purchased_Product City
0 101 Watch Fashion 299.0 Delhi 1 Olivia 20 Watch Mumbai
1 103 Shoes Fashion 2999.0 Chennai 5 Dominic 30 Shoes Chennai
2 104 Smartphone Electronics 14999.0 Kolkata 6 Tyler 65 Smartphone Delhi
3 106 Oil Grocery 110.0 Chennai 3 Cory 15 Oil Bangalore
4 107 Laptop Electronics 79999.0 Bengalore 9 Jeremy 23 Laptop Mumbai
pd.merge(product,customer,left_on='Product_name',right_on='Purchased_Product')
Product_ID_x Product_name Category Price Seller_City id name age Product_ID_y Purchased_Product City
0 101 Watch Fashion 299.0 Delhi 1 Olivia 20 101 Watch Mumbai
1 103 Shoes Fashion 2999.0 Chennai 5 Dominic 30 103 Shoes Chennai
2 104 Smartphone Electronics 14999.0 Kolkata 6 Tyler 65 104 Smartphone Delhi
3 106 Oil Grocery 110.0 Chennai 3 Cory 15 106 Oil Bangalore
4 107 Laptop Electronics 79999.0 Bengalore 9 Jeremy 23 107 Laptop Mumbai
pd.merge(product,customer,how='inner',left_on=['Product_ID','Seller_City'],right_on=['Product_ID','City'])
Product_ID Product_name Category Price Seller_City id name age Purchased_Product City
0 103 Shoes Fashion 2999.0 Chennai 5 Dominic 30 Shoes Chennai

Full Join in Pandas

Full Join, also known as Full Outer Join, returns all those records which either have a match in the left or right dataframe.

pd.merge(product,customer,on='Product_ID',how='outer')
Product_ID Product_name Category Price Seller_City id name age Purchased_Product City
0 101 Watch Fashion 299.0 Delhi 1.0 Olivia 20.0 Watch Mumbai
1 102 Bag Fashion 1350.5 Mumbai NaN NaN NaN NaN NaN
2 103 Shoes Fashion 2999.0 Chennai 5.0 Dominic 30.0 Shoes Chennai
3 104 Smartphone Electronics 14999.0 Kolkata 6.0 Tyler 65.0 Smartphone Delhi
4 105 Books Study 145.0 Delhi NaN NaN NaN NaN NaN
5 106 Oil Grocery 110.0 Chennai 3.0 Cory 15.0 Oil Bangalore
6 107 Laptop Electronics 79999.0 Bengalore 9.0 Jeremy 23.0 Laptop Mumbai
7 0 NaN NaN NaN NaN 2.0 Aditya 25.0 NA Delhi
8 0 NaN NaN NaN NaN 4.0 Isabell 10.0 NA Chennai
9 0 NaN NaN NaN NaN 7.0 Samuel 35.0 NA Kolkata
10 0 NaN NaN NaN NaN 8.0 Daniel 18.0 NA Delhi
pd.merge(product,customer,on='Product_ID',how='outer',indicator=True)
Product_ID Product_name Category Price Seller_City id name age Purchased_Product City _merge
0 101 Watch Fashion 299.0 Delhi 1.0 Olivia 20.0 Watch Mumbai both
1 102 Bag Fashion 1350.5 Mumbai NaN NaN NaN NaN NaN left_only
2 103 Shoes Fashion 2999.0 Chennai 5.0 Dominic 30.0 Shoes Chennai both
3 104 Smartphone Electronics 14999.0 Kolkata 6.0 Tyler 65.0 Smartphone Delhi both
4 105 Books Study 145.0 Delhi NaN NaN NaN NaN NaN left_only
5 106 Oil Grocery 110.0 Chennai 3.0 Cory 15.0 Oil Bangalore both
6 107 Laptop Electronics 79999.0 Bengalore 9.0 Jeremy 23.0 Laptop Mumbai both
7 0 NaN NaN NaN NaN 2.0 Aditya 25.0 NA Delhi right_only
8 0 NaN NaN NaN NaN 4.0 Isabell 10.0 NA Chennai right_only
9 0 NaN NaN NaN NaN 7.0 Samuel 35.0 NA Kolkata right_only
10 0 NaN NaN NaN NaN 8.0 Daniel 18.0 NA Delhi right_only

Left Join in Pandas

Left join, also known as Left Outer Join, returns a dataframe containing all the rows of the left dataframe.

pd.merge(product,customer,on='Product_ID',how='left', indicator=True)
Product_ID Product_name Category Price Seller_City id name age Purchased_Product City _merge
0 101 Watch Fashion 299.0 Delhi 1.0 Olivia 20.0 Watch Mumbai both
1 102 Bag Fashion 1350.5 Mumbai NaN NaN NaN NaN NaN left_only
2 103 Shoes Fashion 2999.0 Chennai 5.0 Dominic 30.0 Shoes Chennai both
3 104 Smartphone Electronics 14999.0 Kolkata 6.0 Tyler 65.0 Smartphone Delhi both
4 105 Books Study 145.0 Delhi NaN NaN NaN NaN NaN left_only
5 106 Oil Grocery 110.0 Chennai 3.0 Cory 15.0 Oil Bangalore both
6 107 Laptop Electronics 79999.0 Bengalore 9.0 Jeremy 23.0 Laptop Mumbai both

Right Join in Pandas

Right join, also known as Right Outer Join, is similar to the Left Outer Join. The only difference is that all the rows of the right dataframe are taken as it is and only those of the left dataframe that are common in both.

pd.merge(product,customer,on='Product_ID',how='right', indicator= True)
Product_ID Product_name Category Price Seller_City id name age Purchased_Product City _merge
0 101 Watch Fashion 299.0 Delhi 1 Olivia 20 Watch Mumbai both
1 103 Shoes Fashion 2999.0 Chennai 5 Dominic 30 Shoes Chennai both
2 104 Smartphone Electronics 14999.0 Kolkata 6 Tyler 65 Smartphone Delhi both
3 106 Oil Grocery 110.0 Chennai 3 Cory 15 Oil Bangalore both
4 107 Laptop Electronics 79999.0 Bengalore 9 Jeremy 23 Laptop Mumbai both
5 0 NaN NaN NaN NaN 2 Aditya 25 NA Delhi right_only
6 0 NaN NaN NaN NaN 4 Isabell 10 NA Chennai right_only
7 0 NaN NaN NaN NaN 7 Samuel 35 NA Kolkata right_only
8 0 NaN NaN NaN NaN 8 Daniel 18 NA Delhi right_only

Handling Redundancy/Duplicates in Joins

product_dup=pd.DataFrame({'Product_ID':[101,102,103,104,105,106,107,103,107],
                          'Product_name':['Watch','Bag','Shoes','Smartphone','Books','Oil','Laptop','Shoes','Laptop'],
                          'Category':['Fashion','Fashion','Fashion','Electronics','Study','Grocery','Electronics','Fashion','Electronics'],
                          'Price':[299.0,1350.50,2999.0,14999.0,145.0,110.0,79999.0,2999.0,79999.0],
                          'Seller_City':['Delhi','Mumbai','Chennai','Kolkata','Delhi','Chennai','Bengalore','Chennai','Bengalore']})
product_dup
Product_ID Product_name Category Price Seller_City
0 101 Watch Fashion 299.0 Delhi
1 102 Bag Fashion 1350.5 Mumbai
2 103 Shoes Fashion 2999.0 Chennai
3 104 Smartphone Electronics 14999.0 Kolkata
4 105 Books Study 145.0 Delhi
5 106 Oil Grocery 110.0 Chennai
6 107 Laptop Electronics 79999.0 Bengalore
7 103 Shoes Fashion 2999.0 Chennai
8 107 Laptop Electronics 79999.0 Bengalore
pd.merge(product_dup,customer,how='inner',on='Product_ID')
Product_ID Product_name Category Price Seller_City id name age Purchased_Product City
0 101 Watch Fashion 299.0 Delhi 1 Olivia 20 Watch Mumbai
1 103 Shoes Fashion 2999.0 Chennai 5 Dominic 30 Shoes Chennai
2 103 Shoes Fashion 2999.0 Chennai 5 Dominic 30 Shoes Chennai
3 104 Smartphone Electronics 14999.0 Kolkata 6 Tyler 65 Smartphone Delhi
4 106 Oil Grocery 110.0 Chennai 3 Cory 15 Oil Bangalore
5 107 Laptop Electronics 79999.0 Bengalore 9 Jeremy 23 Laptop Mumbai
6 107 Laptop Electronics 79999.0 Bengalore 9 Jeremy 23 Laptop Mumbai
pd.merge(product_dup.drop_duplicates(),customer,how='inner',on='Product_ID')
Product_ID Product_name Category Price Seller_City id name age Purchased_Product City
0 101 Watch Fashion 299.0 Delhi 1 Olivia 20 Watch Mumbai
1 103 Shoes Fashion 2999.0 Chennai 5 Dominic 30 Shoes Chennai
2 104 Smartphone Electronics 14999.0 Kolkata 6 Tyler 65 Smartphone Delhi
3 106 Oil Grocery 110.0 Chennai 3 Cory 15 Oil Bangalore
4 107 Laptop Electronics 79999.0 Bengalore 9 Jeremy 23 Laptop Mumbai
pd.merge(product_dup,customer,how='inner',on='Product_ID',validate='many_to_many')
Product_ID Product_name Category Price Seller_City id name age Purchased_Product City
0 101 Watch Fashion 299.0 Delhi 1 Olivia 20 Watch Mumbai
1 103 Shoes Fashion 2999.0 Chennai 5 Dominic 30 Shoes Chennai
2 103 Shoes Fashion 2999.0 Chennai 5 Dominic 30 Shoes Chennai
3 104 Smartphone Electronics 14999.0 Kolkata 6 Tyler 65 Smartphone Delhi
4 106 Oil Grocery 110.0 Chennai 3 Cory 15 Oil Bangalore
5 107 Laptop Electronics 79999.0 Bengalore 9 Jeremy 23 Laptop Mumbai
6 107 Laptop Electronics 79999.0 Bengalore 9 Jeremy 23 Laptop Mumbai

pandas-joins's People

Contributors

ashishpatel26 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.