allenai / scitldr Goto Github PK

View Code? Open in Web Editor NEW

743.0 26.0 84.0 54.99 MB

Home Page: https://scitldr.apps.allenai.org/

License: Apache License 2.0

Python 90.96% Shell 3.58% R 5.45%

scitldr's People

Contributors

Stargazers

Watchers

Forkers

xennygrimmato aihill rohitpandey13 sts-sadr shaan360 anandhperumal manikant92 mihaibujanca zeta1999 nunofernandes-plight dragomirradev shamanez abiraja2004 gdsttian lujinxuan-fang neverneverendup danielpnewman raudaschl yakushechkin khushsi beaulima beniiche maxcodextc mominmalik33 bayoha itsmesachee zhangjianzhang cardosojr cvee112 ahsane henrys-lab mardom drericebert vibster thedanielhanke eslamengi stjordanis roelvdp rickeyestes bharatr21 ieee-npd rogervaas lihuaiguang bhaskarbharat haya985 doctorado-ml vinven7 adam-mehdi anuragraj junjunhencool humanfact shenjiawei19 mrbeussink adbmd dennisbakhuis codeaudit danielizard hdocmsu duongch4 daywatch amdens-sci lianaling otoliths mcx kolpashnikova asmita-mukherjee teshomegit mahta-r mokhlesurrahman gewaltig austinjp remybonnav andyomondi17 gg-big-org sakeeb91 ganjigajanan techthiyanes catalyst-plus yuriiguy centaurioun joe-nano hbcbh1999 lgs

scitldr's Issues

Incompatible with the latest version of `fairseq`?

Here is a Colab notebook where I tried to do inference using this model.

loading archive file /content/models
loading archive file SciTLDR-Data/SciTLDR-AIC/ctrl-bin
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/fairseq/checkpoint_utils.py", line 151, in load_checkpoint_to_cpu
    from fairseq.fb_pathmgr import fb_pathmgr
ModuleNotFoundError: No module named 'fairseq.fb_pathmgr'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "scripts/generate.py", line 99, in <module>
    generate_TLDRs(**vars(args))
  File "scripts/generate.py", line 17, in generate_TLDRs
    task='translation'
  File "/usr/local/lib/python3.7/dist-packages/fairseq/models/bart/model.py", line 104, in from_pretrained
    **kwargs,
  File "/usr/local/lib/python3.7/dist-packages/fairseq/hub_utils.py", line 68, in from_pretrained
    arg_overrides=kwargs,
  File "/usr/local/lib/python3.7/dist-packages/fairseq/checkpoint_utils.py", line 190, in load_model_ensemble_and_task
    state = load_checkpoint_to_cpu(filename, arg_overrides)
  File "/usr/local/lib/python3.7/dist-packages/fairseq/checkpoint_utils.py", line 160, in load_checkpoint_to_cpu
    path, map_location=lambda s, l: default_restore_location(s, "cpu")
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 692, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\x0a'.

How you guys designed the demo? Is it with ONEX?

Demo should get arxiv link

Can you please improve the demo to get as input a link or pdf?

Error Running make_datafiles.sh

Hello authors, awesome work!! I am trying to replicate your work for my project as mentioned. However, I am stuck at the below code snippet, please see if you can help.

Running
cd SciTLDR-Data
export TASK=SciTLDR-A
chmod +x make_datafiles.sh
./make_datafiles.sh # BPE preprocess

Error
`usage: to_stories.py [-h] [--mapping_dir MAPPING_DIR] [--out_dir OUT_DIR]
[--num_cores NUM_CORES]
data_dir
to_stories.py: error: the following arguments are required: data_dir
usage: make_datafiles.py [-h] [--stories_dir STORIES_DIR] [--urldir URLDIR]
[--finished_files_dir FINISHED_FILES_DIR]
make_datafiles.py: error: argument --finished_files_dir: expected one argument
--2021-03-28 10:36:22-- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘encoder.json’ not modified on server. Omitting download.

--2021-03-28 10:36:23-- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘vocab.bpe’ not modified on server. Omitting download.

--2021-03-28 10:36:23-- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘dict.txt’ not modified on server. Omitting download.

Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/train.source'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/train.target'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/val.source'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/val.target'
usage: fairseq-preprocess [-h] [--no-progress-bar]
[--log-interval LOG_INTERVAL]
[--log-format {json,none,simple,tqdm}]
[--tensorboard-logdir TENSORBOARD_LOGDIR]
[--wandb-project WANDB_PROJECT] [--azureml-logging]
[--seed SEED] [--cpu] [--tpu] [--bf16]
[--memory-efficient-bf16] [--fp16]
[--memory-efficient-fp16] [--fp16-no-flatten-grads]
[--fp16-init-scale FP16_INIT_SCALE]
[--fp16-scale-window FP16_SCALE_WINDOW]
[--fp16-scale-tolerance FP16_SCALE_TOLERANCE]
[--min-loss-scale MIN_LOSS_SCALE]
[--threshold-loss-scale THRESHOLD_LOSS_SCALE]
[--user-dir USER_DIR]
[--empty-cache-freq EMPTY_CACHE_FREQ]
[--all-gather-list-size ALL_GATHER_LIST_SIZE]
[--model-parallel-size MODEL_PARALLEL_SIZE]
[--quantization-config-path QUANTIZATION_CONFIG_PATH]
[--profile] [--reset-logging] [--suppress-crashes]
[--use-plasma-view] [--plasma-path PLASMA_PATH]
[--criterion {sentence_ranking,wav2vec,model,label_smoothed_cross_entropy,latency_augmented_label_smoothed_cross_entropy,legacy_masked_lm_loss,nat_loss,ctc,label_smoothed_cross_entropy_with_alignment,cross_entropy,sentence_prediction,composite_loss,masked_lm,adaptive_loss,vocab_parallel_cross_entropy}]
[--tokenizer {space,moses,nltk}]
[--bpe {bytes,gpt2,byte_bpe,sentencepiece,bert,characters,hf_byte_bpe,fastbpe,subword_nmt}]
[--simul-type {hard_aligned,infinite_lookback,waitk,waitk_fixed_pre_decision,hard_aligned_fixed_pre_decision,infinite_lookback_fixed_pre_decision}]
[--optimizer {nag,adafactor,adam,composite,adagrad,adamax,sgd,cpu_adam,lamb,adadelta}]
[--lr-scheduler {cosine,pass_through,polynomial_decay,reduce_lr_on_plateau,inverse_sqrt,tri_stage,triangular,manual,fixed}]
[--scoring {sacrebleu,bleu,wer,chrf}] [--task TASK]
[-s SRC] [-t TARGET] [--trainpref FP]
[--validpref FP] [--testpref FP] [--align-suffix FP]
[--destdir DIR] [--thresholdtgt N]
[--thresholdsrc N] [--tgtdict FP] [--srcdict FP]
[--nwordstgt N] [--nwordssrc N] [--alignfile ALIGN]
[--dataset-impl FORMAT] [--joined-dictionary]
[--only-source] [--padding-factor N] [--workers N]
fairseq-preprocess: error: argument --destdir: expected one argument
usage: build_ctrl_datasets.py [-h] [--outdir OUTDIR] datadir
build_ctrl_datasets.py: error: the following arguments are required: datadir
Times to run script: 2.3285547892252606e-06 min
Making bin file for URLs listed in /ctrl/mapping/mapping_test.txt...
Traceback (most recent call last):
File "make_datafiles.py", line 117, in
write_to_bin(all_test_urls, args.stories_dir, os.path.join(args.finished_files_dir, "test"))
File "make_datafiles.py", line 76, in write_to_bin
url_list = read_text_file(url_file)
File "make_datafiles.py", line 17, in read_text_file
with open(text_file, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/mapping/mapping_test.txt'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/train.source'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/train.target'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/val.source'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/val.target'
2021-03-28 10:36:44 | INFO | fairseq_cli.preprocess | Namespace(align_suffix=None, alignfile=None, all_gather_list_size=16384, azureml_logging=False, bf16=False, bpe=None, cpu=False, criterion='cross_entropy', dataset_impl='mmap', destdir='/ctrl-bin/', empty_cache_freq=0, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, joined_dictionary=False, log_format=None, log_interval=100, lr_scheduler='fixed', memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, model_parallel_size=1, no_progress_bar=False, nwordssrc=-1, nwordstgt=-1, only_source=False, optimizer=None, padding_factor=8, plasma_path='/tmp/plasma', profile=False, quantization_config_path=None, reset_logging=False, scoring='bleu', seed=1, simul_type=None, source_lang='source', srcdict='dict.txt', suppress_crashes=False, target_lang='target', task='translation', tensorboard_logdir=None, testpref=None, tgtdict='dict.txt', threshold_loss_scale=None, thresholdsrc=0, thresholdtgt=0, tokenizer=None, tpu=False, trainpref='/ctrl/train.bpe', use_plasma_view=False, user_dir=None, validpref='/ctrl/val.bpe', wandb_project=None, workers=60)
2021-03-28 10:36:45 | INFO | fairseq_cli.preprocess | [source] Dictionary: 50264 types
Traceback (most recent call last):
File "/usr/local/bin/fairseq-preprocess", line 33, in
sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-preprocess')())
File "/content/fairseq/fairseq_cli/preprocess.py", line 394, in cli_main
main(args)
File "/content/fairseq/fairseq_cli/preprocess.py", line 284, in main
make_all(args.source_lang, src_dict)
File "/content/fairseq/fairseq_cli/preprocess.py", line 252, in make_all
make_dataset(vocab, args.trainpref, "train", lang, num_workers=args.workers)
File "/content/fairseq/fairseq_cli/preprocess.py", line 248, in make_dataset
make_binary_dataset(vocab, input_prefix, output_prefix, lang, num_workers)
File "/content/fairseq/fairseq_cli/preprocess.py", line 133, in make_binary_dataset
offsets = Binarizer.find_offsets(input_file, num_workers)
File "/content/fairseq/fairseq/binarizer.py", line 106, in find_offsets
with open(PathManager.get_local_path(filename), "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/train.bpe.source'`

How you have appended Control Codes to the input, when training?

Is it just by appending topic and TDLR relevant codes to the beginning of the text sequences? Do you use the usual GPT-Tokenizer on the Control Codes?

Error Running make_datafiles.sh for AIC

Authors, great work!! My Organisation sees a huge value in investing further in your research work.

However, I am stalled, running into the below error while running, can you help?

Is there a Docker Image you can share with a pre-built environment ??
& possibly even API version of code you are using for a demo?

It seems some libraries' versions are a conflict causing broken code, specifically fairseq ??

Running
cd SciTLDR-Data
export TASK=SciTLDR-AIC
chmod +x make_datafiles.sh
./make_datafiles.sh # BPE preprocess

Errors
self._kwargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "/jupyter_mount/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 96, in initializer
bpe = get_encoder(self.args.encoder_json, self.args.vocab_bpe)
File "/usr/local/lib/python3.6/dist-packages/fairseq/data/encoders/gpt2_bpe_utils.py", line 132, in get_encoder
with open(encoder_json_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'encoder.json'
Process ForkPoolWorker-73260:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "/jupyter_mount/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 96, in initializer
bpe = get_encoder(self.args.encoder_json, self.args.vocab_bpe)
File "/usr/local/lib/python3.6/dist-packages/fairseq/data/encoders/gpt2_bpe_utils.py", line 132, in get_encoder
with open(encoder_json_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'encoder.json'
Process ForkPoolWorker-73261:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "/jupyter_mount/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 96, in initializer
bpe = get_encoder(self.args.encoder_json, self.args.vocab_bpe)
File "/usr/local/lib/python3.6/dist-packages/fairseq/data/encoders/gpt2_bpe_utils.py", line 132, in get_encoder
with open(encoder_json_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'encoder.json'
Process ForkPoolWorker-73262:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "/jupyter_mount/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 96, in initializer
bpe = get_encoder(self.args.encoder_json, self.args.vocab_bpe)
File "/usr/local/lib/python3.6/dist-packages/fairseq/data/encoders/gpt2_bpe_utils.py", line 132, in get_encoder
with open(encoder_json_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'encoder.json'
Process ForkPoolWorker-73263:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "/jupyter_mount/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 96, in initializer
bpe = get_encoder(self.args.encoder_json, self.args.vocab_bpe)
File "/usr/local/lib/python3.6/dist-packages/fairseq/data/encoders/gpt2_bpe_utils.py", line 132, in get_encoder
with open(encoder_json_path, "r") as f:

Packages Not Found When Creating conda Environment From requirements.txt

When trying to create a conda environment using the requirments.txt file in the root directory on both Windows and Ubuntu, I get the following error:

PackagesNotFoundError: The following packages are not available from current channels:

pprint
torch==1.4.0
fairseq
pysbd==0.1.4

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

rom
and use the search bar at the top of the page.

I tried creating a new environment and pip installing each package in requirements.txt one at a time, but I got the same kind of error.

Error while using generate.py

Hello, @armancohan @kyleclo @isabelcachola
I am trying to generate a summary from this command given in the repo.
!python generate.py model/ data_input/ out/ --checkpoint_file checkpoint_best.pt --beam 2 --lenpen 0.4 --test_fname test.hypo
Here model is the folder containing checkpoint given for summarization( bart.tldr-aic)
data_input contains test.source file which is a text file containing the source content of the test.jsonl file of scitldr/SciTLDR-Data/SciTLDR-AIC/
out is an empty folder to store the output.

I am getting this error.

Traceback (most recent call last):
File "generate.py", line 100, in
generate_TLDRs(**vars(args))
File "generate.py", line 17, in generate_TLDRs
task='translation'
File "/content/gdrive/My Drive/Internship_SPI_Smita/Scientific_paper_summarization/scitldr/fairseq/fairseq/models/bart/model.py", line 112, in from_pretrained
**kwargs,
File "/content/gdrive/My Drive/Internship_SPI_Smita/Scientific_paper_summarization/scitldr/fairseq/fairseq/hub_utils.py", line 73, in from_pretrained
arg_overrides=kwargs,
File "/content/gdrive/My Drive/Internship_SPI_Smita/Scientific_paper_summarization/scitldr/fairseq/fairseq/checkpoint_utils.py", line 243, in load_model_ensemble_and_task
task = tasks.setup_task(args)
File "/content/gdrive/My Drive/Internship_SPI_Smita/Scientific_paper_summarization/scitldr/fairseq/fairseq/tasks/init.py", line 27, in setup_task
return TASK_REGISTRY[task_cfg.task].setup_task(task_cfg, **kwargs)
File "/content/gdrive/My Drive/Internship_SPI_Smita/Scientific_paper_summarization/scitldr/fairseq/fairseq/tasks/translation.py", line 226, in setup_task
paths = utils.split_paths(args.data)
File "/content/gdrive/My Drive/Internship_SPI_Smita/Scientific_paper_summarization/scitldr/fairseq/fairseq/utils.py", line 59, in split_paths
if "://" not in paths
TypeError: argument of type 'NoneType' is not iterable

Please help me on how to solve this.

Error running python scripts/cal-rouge.py

Hello authors, awesome work!! I am trying to replicate your work for my project as mentioned. However, I am stuck at the below code snippet, please see if you can help.

Platform/Environment
Google Colab

Running
!sudo python scripts/cal-rouge.py '/pathto/scitldr/test.hypo' 'SciTLDR-Data/SciTLDR-A/test.jsonl' --workers 1

Error
0% 0/618 [00:00<?, ?it/s]Preparing documents... 0 line(s) ignored Running ROUGE... Traceback (most recent call last): File "scripts/cal-rouge.py", line 160, in <module> main() File "scripts/cal-rouge.py", line 152, in main all_dfs = process(args.gold, args.candidate, args.candidate.split('/')[-1], args) File "scripts/cal-rouge.py", line 105, in process results = [get_rouge(d) for d in tqdm(data)] File "scripts/cal-rouge.py", line 105, in <listcomp> results = [get_rouge(d) for d in tqdm(data)] File "scripts/cal-rouge.py", line 48, in get_rouge return _get_rouge(args['pred'], args['data']) File "scripts/cal-rouge.py", line 73, in _get_rouge files2rouge.run(cand_file, gold_file, ignore_empty=True, saveto=log_file) File "/usr/local/lib/python3.7/dist-packages/files2rouge-2.1.0-py3.7.egg/files2rouge/files2rouge.py", line 73, in run output = r.convert_and_evaluate(rouge_args=rouge_args_str) File "/usr/local/lib/python3.7/dist-packages/pyrouge/Rouge155.py", line 368, in convert_and_evaluate rouge_output = self.evaluate(system_id, rouge_args) File "/usr/local/lib/python3.7/dist-packages/pyrouge/Rouge155.py", line 343, in evaluate rouge_output = check_output(command, env=env).decode("UTF-8") File "/usr/lib/python3.7/subprocess.py", line 411, in check_output **kwargs).stdout File "/usr/lib/python3.7/subprocess.py", line 488, in run with Popen(*popenargs, **kwargs) as process: File "/usr/lib/python3.7/subprocess.py", line 800, in __init__ restore_signals, start_new_session) File "/usr/lib/python3.7/subprocess.py", line 1551, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) PermissionError: [Errno 13] Permission denied: 'setup_rogue/ROUGE-1.5.5.pl' 0% 0/618 [00:00<?, ?it/s]

P.S. : While executing the dependent command !python setup_rouge.py , as prompted, I named the folder setup_rogue, in the present working directory, inside which ROUGE-1.5.5.pl was placed. Now this is giving a Permission error.

Unpickling error when running generation script with pretrained weights

I renamed the bart.tldr-ao.pt model to checkpoint_best.pt and tried running the generation script as python scripts/generate.py model/ SciTLDR-Data/SciTLDR-A/ctrl ./ --beam 2 --lenpen 0.4 --test_fname test.hypo as shown in the github instructors, but meet the following error:

Traceback (most recent call last):
  File "scripts/generate.py", line 99, in <module>
    generate_TLDRs(**vars(args))
  File "scripts/generate.py", line 17, in generate_TLDRs
    task='translation'
  File "/research/home/maxlitster/scitldr/repo/fairseq/models/bart/model.py", line 136, in from_pretrained
    **kwargs,
  File "/research/home/maxlitster/scitldr/repo/fairseq/hub_utils.py", line 75, in from_pretrained
    arg_overrides=kwargs,
  File "/research/home/maxlitster/scitldr/repo/fairseq/checkpoint_utils.py", line 339, in load_model_ensemble_and_task
    state = load_checkpoint_to_cpu(filename, arg_overrides)
  File "/research/home/maxlitster/scitldr/repo/fairseq/checkpoint_utils.py", line 263, in load_checkpoint_to_cpu
    state = torch.load(f, map_location=torch.device("cpu"))
  File "/home/maxlitster/miniconda3/envs/tldr/lib/python3.7/site-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/maxlitster/miniconda3/envs/tldr/lib/python3.7/site-packages/torch/serialization.py", line 692, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\x0a'.

I'm curious if anyone else has any advice as to how to generate text with the pretrained weights. Thanks!