Giter Site home page Giter Site logo

vinitsr7 / instagram-followee-prediction Goto Github PK

View Code? Open in Web Editor NEW
9.0 1.0 6.0 286 KB

Given a directed social graph, we have to predict missing edges to recommend friends/connnections/followers in the graph.

Home Page: https://www.kaggle.com/c/FacebookRecruiting

License: MIT License

Jupyter Notebook 100.00%
graph networkx networkx-graph machine-learning random-forest-classifier feature-extraction featurization eda hits-algorithm page-rank

instagram-followee-prediction's Introduction

Social-network-Graph-Link-Prediction---Facebook-Challenge

Problem statement:

Given a directed social graph, we have to predict missing links to recommend friends/connnections/followers (Link Prediction in graph)

Data Overview

Dataset from facebook's recruting challenge on kaggle https://www.kaggle.com/c/FacebookRecruiting
data contains two columns: source and destination edge pairs in the directed graph.
- Data columns (total 2 columns):
- source_node int64
- destination_node int64

Mapping the problem to supervised learning problem:

  • Map this to a binary classification task with 0 implying an absence of an edge and 1 implying the presence of the edge.
    Now, we need to featurize a pair of vertices (u_i,u_j) such that these features can help us predict the presence/absence of an edge.

Performance metric for supervised learning:

  • Both precision and recall are important, hence F1 score is good choice
  • Confusion matrix

Requinments:

  • Python3
  • numpy
  • pandas
  • matplotlib
  • seaborn
  • networkx
    NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. We are using it to implement our Graphs.

How To Use?

  1. Run FB_EDA file, which has all the Exploratory Data analysis, and train test split process.
  2. Run FB_Featurization which has all featurization done.
  3. Run FB_Models, training part is done by Random Forest Classifier

Featurization:

we will create these features for both train and test data points

  1. jaccard_followers
  2. jaccard_followees
  3. cosine_followers
  4. cosine_followees
  5. num_followers_s
  6. num_followees_s
  7. num_followers_d
  8. num_followees_d
  9. inter_followers
  10. inter_followees
  11. adar index
  12. is following back
  13. belongs to same weakly connect components
  14. shortest path between source and destination
  15. Weight Features
    • weight of incoming edges
    • weight of outgoing edges
    • weight of incoming edges + weight of outgoing edges
    • weight of incoming edges * weight of outgoing edges
    • 2*weight of incoming edges + weight of outgoing edges
    • weight of incoming edges + 2*weight of outgoing edges
  16. Page Ranking of source
  17. Page Ranking of dest
  18. katz of source
  19. katz of dest
  20. hubs of source
  21. hubs of dest
  22. authorities_s of source
  23. authorities_s of dest

Reference to Features:

  1. Jaccard Distance: http://www.statisticshowto.com/jaccard-index/
  2. Cosine Similarity(Otsuka-Ochiai coefficient): https://en.wikipedia.org/wiki/Cosine_similarity
  3. Page Rank
  4. https://networkx.github.io/documentation/networkx1.10/reference/generated/networkx.algorithms.link_analysis.pagerank_alg.pagerank.html
  5. Shortest Path: https://stackoverflow.com/questions/9430027/networkx-shortest-path-length
  6. Weakly Connected Components: https://www.quora.com/What-are-strongly-and-weakly-connected-components
  7. Adamic/Adar Index: https://en.wikipedia.org/wiki/Adamic/Adar_index
  8. Katz Centrality:
  9. https://www.geeksforgeeks.org/katz-centrality-centrality-measure/
  10. HITS Score(Hubs and Authority): https://en.wikipedia.org/wiki/HITS_algorithm

instagram-followee-prediction's People

Contributors

vinitsr7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.