Giter Site home page Giter Site logo

tianbian95 / bigcn Goto Github PK

View Code? Open in Web Editor NEW
238.0 4.0 69.0 100.08 MB

Source Codes: Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks--AAAI 2020

Python 99.07% Shell 0.93%
rumor-detection graph-convolutional-networks social-network-analysis

bigcn's Introduction

Paper of the source codes released:

Tian Bian, Xi Xiao, Tingyang Xu, Peilin Zhao, Wenbing Huang, Yu Rong, Junzhou Huang. Rumor Detectionon Social Media with Bi-Directional Graph Convolutional Networks. AAAI 2020.

Datasets:

The datasets used in the experiments were based on the three publicly available Weibo and Twitter datasets released by Ma et al. (2016) and Ma et al. (2017):

Jing Ma, Wei Gao, Prasenjit Mitra, Sejeong Kwon, Bernard J Jansen, Kam-Fai Wong, and Meeyoung Cha. Detecting rumors from microblogs with recurrent neural networks. In Proceedings of IJCAI 2016.

Jing Ma, Wei Gao, Kam-Fai Wong. Detect Rumors in Microblog Posts Using Propagation Structure via Kernel Learning. ACL 2017.

In the 'data' folder we provide the pre-processed data files used for our experiments. The raw datasets can be respectively downloaded from https://www.dropbox.com/s/46r50ctrfa0ur1o/rumdect.zip?dl=0. and https://www.dropbox.com/s/7ewzdrbelpmrnxu/rumdetect2017.zip?dl=0.

The Weibo datafile 'weibotree.txt' is in a tab-sepreted column format, where each row corresponds to a weibo. Consecutive columns correspond to the following pieces of information:
1: root-id -- an unique identifier describing the tree (weiboid of the root);
2: index-of-parent-weibo -- an index number of the parent weibo for the current weibo;
3: index-of-the-current-weibo -- an index number of the current weibo;
4: list-of-index-and-counts -- the rest of the line contains space separated index-count pairs, where a index-count pair is in format of "index:count", E.g., "index1:count1 index2:count2" (extracted from the "text" field in the json format from Weibo raw datasets)

For a detailed description of Twitter datafile 'data.TD_RvNN.vol_5000.txt' can be seen at RvNN.

Dependencies:

python==3.5.2
numpy==1.18.1
torch==1.4.0
torch_scatter==1.4.0
torch_sparse==0.4.3
torch_cluster==1.4.5
torch_geometric==1.3.2
tqdm==4.40.0
joblib==0.14.1

Make sure that cuda/bin, cuda/include and cuda/lib64 are in your $PATH, $CPATH and $LD_LIBRARY_PATH respectively before the installation, e.g.:

$ echo $PATH
>>> /usr/local/cuda/bin:...

$ echo $CPATH
>>> /usr/local/cuda/include:...

and

$ echo $LD_LIBRARY_PATH
>>> /usr/local/cuda/lib64

on Linux or

$ echo $DYLD_LIBRARY_PATH
>>> /usr/local/cuda/lib

on macOS.

Reproduce the experimental results:

Run script

$ sh main.sh

and choose "model/Weibo/BiGCN_Weibo.py" for BiGCN model on Weibo dataset or "model/Twitter/BiGCN_Twitter.py" on Twitter15/Twitter16 dataset.

In "main.sh", two arguments need to be specified, representing the datasetname and iteration times respectively. E.g.,

python ./model/Twitter/BiGCN_Twitter.py Twitter15 100

will reproduce the average experimental results of 100 iterations of BiGCN model on Twitter15 dataset with 5-fold cross-validation.

If you find this code useful, please let us know and cite our paper.
If you have any question, please contact Tian at: bt18 at mails dot tsinghua dot edu dot cn.

bigcn's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

bigcn's Issues

.npz file not found error

While trying to run "getTwittergraph.py" file I get the error:
No such file or directory: '/BiGCN/Process/data/Twitter15graph/731166399389962242.npz'
How can I fix this?

早停

这里是在交叉验证的测试集做早停了吧?这样没问题吗?而且测试集也还在做梯度更新

代码运行有问题。

在getweibograph.py中,函数loadeid()中参数有id,但是后面却没有id的任何定义。因此在np.savez()中id无法显示,后面就无法保存

Getting freeze_support() error

Hello I'm getting the following error when I run the command 'python ./model/Twitter/BiGCN_Weibo.py 100'
I've tested it on Colab, PyCharm, and by using WindowsShell

loading weibo label:
4664
2351 2313
reading Weibo tree
tree no: 4659
loading train set
train no: 3618
loading test set
test no: 909
0%| | 0/227 [00:00<?, ?it/s]loading weibo label:
4664
2351 2313
reading Weibo tree
tree no: 4659
loading train set
0%| | 0/227 [00:00<?, ?it/s]Traceback (most recent call last):
File "", line 1, in
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 263, in run_path
return _run_module_code(code, init_globals, run_name,
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 96, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\BiGCN-master\BiGCN-master\model\Weibo\BiGCN_Weibo.py", line 216, in
train_losses, val_losses, train_accs, val_accs, accs_0, acc1_0, pre1_0, rec1_0, F1_0, acc2_0, pre2_0, rec2_0, F2_0 = train_GCN(treeDic,
File "D:\BiGCN-master\BiGCN-master\model\Weibo\BiGCN_Weibo.py", line 116, in train_GCN
for Batch_data in tqdm_train_loader:
File "D:\BiGCN-master\BiGCN-master\venv\lib\site-packages\tqdm\std.py", line 1097, in iter
for obj in iterable:
train no: 3626
loading test set
test no: 901
File "D:\BiGCN-master\BiGCN-master\venv\lib\site-packages\torch\utils\data\dataloader.py", line 359, in iter
return self._get_iterator()
File "D:\BiGCN-master\BiGCN-master\venv\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "D:\BiGCN-master\BiGCN-master\venv\lib\site-packages\torch\utils\data\dataloader.py", line 918, in init
w.start()
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 326, in _Popen
return Popen(process_obj)
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\popen_spawn_win32.py", line 45, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

0%| | 0/227 [00:00<?, ?it/s]

Early rumor detection

How to set the time for early rumor detection? How to divide the data before the time point, or how to read the data before the static point?

Can not reprocduce the performance reported in the paper

Hi,I run your code with 'python ./model/Twitter/BiGCN_Twitter.py Twitter15 10' to get the average experimental results of 10 iterations of BiGCN model on Twitter15 (running 100 iterations takes too much time).
I got the result 'Total_Test_Accuracy: 0.8582|NR F1: 0.8364|FR F1: 0.8537|TR F1: 0.9099|UR F1: 0.8220', which has a big gap compared with the results reported in your paper '0.886 0.891 0.860 0.930 0.864'.

how to deal this issues?

RuntimeError: scatter_add() expected at most 5 argument(s) but received 6 argument(s). Declaration: scatter_add(Tensor src, Tensor index, int dim=-1, Tensor? out=None, int? dim_size=None) -> (Tensor)
0%| | 0/10 [00:01<?, ?it/s]

What are the labels of responsive/retweet posts?

Hi. Thank you for your sharing.

In your paper Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks, it is mentioned that GCN is applied on a graph, which uses source tweet r and its responsive/retweet posts as nodes. But in the dataset Twitter15/16, only source tweet has a label. What are the labels of these responsive/retweet posts? Maybe their labels are consistent with their source tweet?

Many thanks.

cross validation

In each iteration, load5foldData is called to get 5 different datasets. Each dataset, consisting of the training set and the testing set, is used to train and test the model.
But those 5 datasets are generated from the same data. Thus, the data in fold0_x_test to test the model may be in fold1_x_train and it's used to train the model again, which means the model could learn from the fold$n$_x_test in each iteration. The whole data is used for training but there is no other separate testing set to evaluate the model, resulting in significantly higher accuracy.

The parameter

Hi, I have gone through your code and found that you only optimize the parameter of BUGCN. May I know the reason that you do not optimize the parameter of TUGCN

Regarding the preprocessed dataset doubt

There are "index: count" pairs in each row of the preprocessed data file....what do they signify? is it something like the tweet text has been tokenized and based on all the token you have a vocabulary/dictionary with key: count pairs and each row symbolizes the token index: count pair from the vocabulary. Could you please provide more details on how have you preprocessed the dataset?

I have trouble in running your code(CUDA out of memory error)

I have tried to reduce the batch_size to 2, but I can only run a few of epochs, and I also tried to add th.cuda.empty_cache() in your code.
I run this: python ./model/Weibo/BiGCN_Weibo.py 100

and my envirment is :
GPU : RTX3080.
pytorch 1.4.0
CUDA10.1

二分类和四分类的综合准确率

在计算各类准确率的时候有考虑真负(TN),但在考虑综合准确率的时候未考虑真负,导致综合准确率低于各分类准确率的平均

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.