Giter Site home page Giter Site logo

cgsum's Introduction

CGSum

code and dataset for AAAI 2021 paper: Enhancing Scientific Papers Summarization with Citation Graph


PYROUGE Installation

we recommend using the following commands to install the PYROUGE environment:

sudo apt-get install libxml-perl libxml-dom-perl
pip install git+git://github.com/bheinzerling/pyrouge
export PYROUGE_HOME_DIR=the/path/to/RELEASE-1.5.5
pyrouge_set_rouge_path $PYROUGE_HOME_DIR
chmod +x $PYROUGE_HOME_DIR/ROUGE-1.5.5.pl

You can refer to https://github.com/andersjo/pyrouge/tree/master/tools/ROUGE-1.5.5 for RELEASE-1.5.5 and remember to build Wordnet 2.0 instead of 1.6 in RELEASE-1.5.5/data\

cd $PYROUGE_HOME_DIR/data/WordNet-2.0-Exceptions/
./buildExeptionDB.pl . exc WordNet-2.0.exc.db
cd ../
ln -s WordNet-2.0-Exceptions/WordNet-2.0.exc.db WordNet-2.0.exc.db

DataSet SSN

The whole dataset and its corresponding citation relationship can be download through this link

example of our dataset:

{
  "paper_id": "102498304", # unique id of this paper
  "title":"Weak Galerkin finite element method for Poisson’s ...", # title of this paper
  "abstract":"in this paper , the weak galerkin finite element method for second order eilliptc   problems employing polygonal or  ...", # human written abstract
  "text":[
  	["The weak galerkin finite element method using triangulated meshes was proposed by .."],
 	 ["Let @inlineform1 be a partition of the domain Ω consisting of polygons in two dimensional"], 
  	...
  ] # body text, 
  "section_names": ["Introduction", " Shape Regularity",  ...] # corresponding section names to sections
  "domain":"Mathematic", # class label
}
...

You can download our preprocessed dataset which can be directly loaded by dataloader.py via SSN (inductive) and SSN (transductive). Note that we divide the dataset in two ways. The transductive division indicates that most neighbors of papers in test set are from the training set, but considering that in real cases, the test papers may from a new graph which has nothing to do with papers we used for training, thus we introduce SNN (inductive), by splitting the whole citation graph into three independent subgraphs – training, validation and test graphs. Our preprocessed datasets are chunked to 500 words, for full document you can retrieve them from the whole dataset by paper_id

你也可以通过百度云下载我们的数据集 SSN完整数据集和引用关系 提取码 v4u8

SSN inductive 提取码 gk4j

SSN transductive 提取码 17kw

requirements for running our code

Train and Test

Hyperparameters in the train.py/test.py script has been set to default, we also provide the example to run our code in train.sh and test.sh. you can train/test our model using the following command:

  • training
python train_CGSum.py  --visible_gpu 0  --model_dir  save_models/CGSum_1hop  --dataset_dir  SSN/inductive --setting inductive --n_hop 1
python train_CGSum.py  --visible_gpu 0  --model_dir  save_models/CGSum_1hop  --dataset_dir  SSN/transductive --setting transductive --n_hop 1
  • testing
python test_CGSum.py  --visible_gpu 0  --model_dir  save_models/CGSum_1hop  --model_name CGSum_inductive_1hopNbrs.pt --setting inductive  --decode_dir decode_path  --result_dir results --n_hop 1  --min_dec_steps 130
python test_CGSum.py  --visible_gpu 0  --model_dir  save_models/CGSum_1hop  --model_name CGSum_transductive_1hopNbrs.pt --setting transductive  --decode_dir decode_path  --result_dir results --n_hop 1  --min_dec_steps 140

To test our model , remember to replace the pyrouge root set in data_util/utils.py to your own path. you can also download our trained model to reproduce our results: inductive 1hop, inductive 2hop, transductive 1hop, transductive 2hop

our dataset is retrieved from S2ORC, the implementation of BertSum can refer to PreSumm, thanks for their works.

cgsum's People

Contributors

chenxinan-fdu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

cgsum's Issues

About paper_id

Hi, thank you for sharing your awesome work, is the paper_id in your dataset created by yourself or is there any way I can link them to other resources such as arxiv or semantic scholar, thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.