Comments (2)
I think we should start more simply and just assess the performance of various metrics on B to predict the holdout portion of A. Here are possible metrics we can assess:
- the prior probability of edge existence (based on B-derived XSwapped networks)
- random walk edge predictions on homogeneous networks
- jaccard similarity on homogeneous networks and bipartite networks
Does this sound like a reasonable starting point? This way we can see on different networks how predictive the prior probability of edge existence is compared to other simple metrics that do require access to actual edges.
It would also be interesting to find actual predictions other people created (that we don't have to implement) and compare their performance to the prior edge probability.
from xswap-analysis.
Do you include DWPC in "Random walk", or do you mean even simpler, simply the nodes that you reach in a certain number of steps, including loops? For the time being, I am going with DWPCs, but the other would be easy to compare.
Using an example homogeneous network, I setup the following prediction task.
- Drop 20% of network edges
- Compute features using the pruned network
- Generate 1000 permutations of the pruned network
- Compute features using each permuted network and average each feature to get "priors" for each feature (not doing degree grouping for now)
- Construct train/test data:
- Sample 70% of the node pairs where an edge was pruned. Get an equal number of node pairs where edge does not exist in original or pruned -> Training data
- Remaining 30% of pruned edge node pairs and an equal number of non-existent edge node pairs -> Test data
- Scale data using min/max scaling and predict existence of edge in original network using logistic regression
I computed the following features for each of the node pairs:
dwpc_2
- DWPC on two-step metapathPiPiP
(protein-interacts-protein-interacts-protein)dwpc_3
- DWPC on two-step metapathPiPiPiP
jaccard
- Jaccard coefficient. For networks, this is computed using the neighboring node sets. Magnitude of intersection divided by magnitude of union.cn
- Common neighbors. The number of shared neighbors. In the context of the previous bullet, this is the magnitude of the neighbor set intersectionedge_prior
- Fraction of permutations in which an edge exists between two nodesdwpc_2_prior
- Mean ofdwpc_2
computed on each permuted networkdwpc_3_prior
- Mean ofdwpc_3
computed on each permuted networkjaccard_prior
- Mean ofjaccard
computed on each permuted networkcn_prior
- Mean ofcn
computed on each permuted network
Below are the results of this prediction task using various features
Features | F1 | Average precision | AUROC |
---|---|---|---|
dwpc_2 , dwpc_3 |
0.5540 | 0.7503 | 0.7469 |
jaccard |
0.3719 | 0.6059 | 0.6136 |
jaccard , cn |
0.3806 | 0.6123 | 0.6140 |
jaccard , cn , dwpc_2 , dwpc_3 |
0.5349 | 0.7486 | 0.7461 |
edge_prior |
0.5834 | 0.7490 | 0.7216 |
dwpc_2_prior , dwpc_3_prior |
0.6585 | 0.7808 | 0.7050 |
jaccard_prior , cn_prior |
0.5613 | 0.7082 | 0.6684 |
edge_prior , jaccard_prior , cn_prior , dwpc_2_prior , dwpc_3_prior |
0.6594 | 0.7839 | 0.7157 |
jaccard , cn , dwpc_2 , dwpc_3 , edge_prior , jaccard_prior , cn_prior , dwpc_2_prior , dwpc_3_prior |
0.6965 | 0.8217 | 0.7473 |
I plotted some curves to look a little closer at the results. Some of these curves look pretty strange to me, and I'm not sure what would cause this behavior. Note that every task would have had a 50/50 split in true outcomes (ie. 50% of node pairs had an edge that was removed during pruning).
-
jaccard
,cn
was actually less informative thanjaccard_prior
,cn_prior
from xswap-analysis.
Related Issues (2)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xswap-analysis.