tianbian95 / bigcn Goto Github PK

Source Codes: Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks--AAAI 2020

Python 99.07% Shell 0.93%

rumor-detection graph-convolutional-networks social-network-analysis

bigcn's Introduction

Paper of the source codes released:

Tian Bian, Xi Xiao, Tingyang Xu, Peilin Zhao, Wenbing Huang, Yu Rong, Junzhou Huang. Rumor Detectionon Social Media with Bi-Directional Graph Convolutional Networks. AAAI 2020.

Datasets:

The datasets used in the experiments were based on the three publicly available Weibo and Twitter datasets released by Ma et al. (2016) and Ma et al. (2017):

Jing Ma, Wei Gao, Prasenjit Mitra, Sejeong Kwon, Bernard J Jansen, Kam-Fai Wong, and Meeyoung Cha. Detecting rumors from microblogs with recurrent neural networks. In Proceedings of IJCAI 2016.

Jing Ma, Wei Gao, Kam-Fai Wong. Detect Rumors in Microblog Posts Using Propagation Structure via Kernel Learning. ACL 2017.

In the 'data' folder we provide the pre-processed data files used for our experiments. The raw datasets can be respectively downloaded from https://www.dropbox.com/s/46r50ctrfa0ur1o/rumdect.zip?dl=0. and https://www.dropbox.com/s/7ewzdrbelpmrnxu/rumdetect2017.zip?dl=0.

The Weibo datafile 'weibotree.txt' is in a tab-sepreted column format, where each row corresponds to a weibo. Consecutive columns correspond to the following pieces of information:
1: root-id -- an unique identifier describing the tree (weiboid of the root);
2: index-of-parent-weibo -- an index number of the parent weibo for the current weibo;
3: index-of-the-current-weibo -- an index number of the current weibo;
4: list-of-index-and-counts -- the rest of the line contains space separated index-count pairs, where a index-count pair is in format of "index:count", E.g., "index1:count1 index2:count2" (extracted from the "text" field in the json format from Weibo raw datasets)

For a detailed description of Twitter datafile 'data.TD_RvNN.vol_5000.txt' can be seen at RvNN.

Dependencies:

python==3.5.2
numpy==1.18.1
torch==1.4.0
torch_scatter==1.4.0
torch_sparse==0.4.3
torch_cluster==1.4.5
torch_geometric==1.3.2
tqdm==4.40.0
joblib==0.14.1

Make sure that cuda/bin, cuda/include and cuda/lib64 are in your $PATH, $CPATH and $LD_LIBRARY_PATH respectively before the installation, e.g.:

$ echo $PATH
>>> /usr/local/cuda/bin:...

$ echo $CPATH
>>> /usr/local/cuda/include:...

and

$ echo $LD_LIBRARY_PATH
>>> /usr/local/cuda/lib64

on Linux or

$ echo $DYLD_LIBRARY_PATH
>>> /usr/local/cuda/lib

on macOS.

Reproduce the experimental results:

Run script

$ sh main.sh

and choose "model/Weibo/BiGCN_Weibo.py" for BiGCN model on Weibo dataset or "model/Twitter/BiGCN_Twitter.py" on Twitter15/Twitter16 dataset.

In "main.sh", two arguments need to be specified, representing the datasetname and iteration times respectively. E.g.,

python ./model/Twitter/BiGCN_Twitter.py Twitter15 100

will reproduce the average experimental results of 100 iterations of BiGCN model on Twitter15 dataset with 5-fold cross-validation.

If you find this code useful, please let us know and cite our paper.
If you have any question, please contact Tian at: bt18 at mails dot tsinghua dot edu dot cn.

bigcn's People

Stargazers

Watchers

Forkers

ammieqi jinmiaos jihochoi myhrbeu vishakhamalik12 xuchanguniversity chenmol chang111 kldcr peacegui liuqiangboy iitians zisenianhua xinke0802 tmacmilan xhsun1997 franciszchen lwz-vision jerrygaolondon xsy0rival anshiquanshu66 newxerichorizon weilingwei96 scone-snu stevenji jiqis lmh233 sofanhe sanrenyimu autwind hdeng26 wanshuzhen jewellll kranthiakssy xiamenwcy zalentine gszswork fychf noobimp sallyzhu gymbeijing brucezhou95 bo-work kzwang001 hadise-zb dennislaplacian aaronnorrish dibinsvds leon-377 shaun2h cwkd agangbe owenleng bhaskar-j weenieli billiecn zkxunimelb wxxxxx2023 anusha4120 joshchang0111 liugangdao jiayi-gao yidamu xinmiao1999 ethanhu777 calypsoidea nightliuguoxing itsmesid12 bazina

bigcn's Issues

.npz file not found error

While trying to run "getTwittergraph.py" file I get the error:
No such file or directory: '/BiGCN/Process/data/Twitter15graph/731166399389962242.npz'
How can I fix this?

早停

这里是在交叉验证的测试集做早停了吧？这样没问题吗？而且测试集也还在做梯度更新

代码运行有问题。

在getweibograph.py中，函数loadeid（）中参数有id，但是后面却没有id的任何定义。因此在np.savez()中id无法显示，后面就无法保存

How to conduct early rumor detection?

Could you please release the source code for early rumor detection?Thank you!!!

sys.argv[2] *** IndexError: list index out of range

sys.argv[2]
*** IndexError: list index out of range

Getting freeze_support() error

Hello I'm getting the following error when I run the command 'python ./model/Twitter/BiGCN_Weibo.py 100'
I've tested it on Colab, PyCharm, and by using WindowsShell

loading weibo label:
4664
2351 2313
reading Weibo tree
tree no: 4659
loading train set
train no: 3618
loading test set
test no: 909
0%| | 0/227 [00:00<?, ?it/s]loading weibo label:
4664
2351 2313
reading Weibo tree
tree no: 4659
loading train set
0%| | 0/227 [00:00<?, ?it/s]Traceback (most recent call last):
File "", line 1, in
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 263, in run_path
return _run_module_code(code, init_globals, run_name,
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 96, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\BiGCN-master\BiGCN-master\model\Weibo\BiGCN_Weibo.py", line 216, in
train_losses, val_losses, train_accs, val_accs, accs_0, acc1_0, pre1_0, rec1_0, F1_0, acc2_0, pre2_0, rec2_0, F2_0 = train_GCN(treeDic,
File "D:\BiGCN-master\BiGCN-master\model\Weibo\BiGCN_Weibo.py", line 116, in train_GCN
for Batch_data in tqdm_train_loader:
File "D:\BiGCN-master\BiGCN-master\venv\lib\site-packages\tqdm\std.py", line 1097, in iter
for obj in iterable:
train no: 3626
loading test set
test no: 901
File "D:\BiGCN-master\BiGCN-master\venv\lib\site-packages\torch\utils\data\dataloader.py", line 359, in iter
return self._get_iterator()
File "D:\BiGCN-master\BiGCN-master\venv\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "D:\BiGCN-master\BiGCN-master\venv\lib\site-packages\torch\utils\data\dataloader.py", line 918, in init
w.start()
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\context.py", line 326, in _Popen
return Popen(process_obj)
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\popen_spawn_win32.py", line 45, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "C:\Users\User\AppData\Local\Programs\Python\Python38\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.
0%| | 0/227 [00:00<?, ?it/s]

Run this on Google Colab

Early rumor detection

How to set the time for early rumor detection? How to divide the data before the time point, or how to read the data before the static point?

Can not reprocduce the performance reported in the paper

Hi，I run your code with 'python ./model/Twitter/BiGCN_Twitter.py Twitter15 10' to get the average experimental results of 10 iterations of BiGCN model on Twitter15 (running 100 iterations takes too much time).
I got the result 'Total_Test_Accuracy: 0.8582|NR F1: 0.8364|FR F1: 0.8537|TR F1: 0.9099|UR F1: 0.8220', which has a big gap compared with the results reported in your paper '0.886 0.891 0.860 0.930 0.864'.

how to deal this issues?

RuntimeError: scatter_add() expected at most 5 argument(s) but received 6 argument(s). Declaration: scatter_add(Tensor src, Tensor index, int dim=-1, Tensor? out=None, int? dim_size=None) -> (Tensor)
0%| | 0/10 [00:01<?, ?it/s]

What are the labels of responsive/retweet posts?

Hi. Thank you for your sharing.

In your paper Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks, it is mentioned that GCN is applied on a graph, which uses source tweet r and its responsive/retweet posts as nodes. But in the dataset Twitter15/16, only source tweet has a label. What are the labels of these responsive/retweet posts? Maybe their labels are consistent with their source tweet?

Many thanks.

请问lable_All的中每列数据的含义是什么

对于lable_All文件中的第五六七列的数据含义不明白

cross validation

In each iteration, load5foldData is called to get 5 different datasets. Each dataset, consisting of the training set and the testing set, is used to train and test the model.
But those 5 datasets are generated from the same data. Thus, the data in fold0_x_test to test the model may be in fold1_x_train and it's used to train the model again, which means the model could learn from the fold$n$_x_test in each iteration. The whole data is used for training but there is no other separate testing set to evaluate the model, resulting in significantly higher accuracy.

GCN能用于有向图吗

关于代码的疑问：这是bug还是我理解有误？

The parameter

Hi, I have gone through your code and found that you only optimize the parameter of BUGCN. May I know the reason that you do not optimize the parameter of TUGCN

Regarding the preprocessed dataset doubt

There are "index: count" pairs in each row of the preprocessed data file....what do they signify? is it something like the tweet text has been tokenized and based on all the token you have a vocabulary/dictionary with key: count pairs and each row symbolizes the token index: count pair from the vocabulary. Could you please provide more details on how have you preprocessed the dataset?