Giter Site home page Giter Site logo

scitldr's People

Contributors

isabelcachola avatar kyleclo avatar rreas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scitldr's Issues

EMNLP Camera Ready

Models to upload:

  • bart.tldr-ao
  • bart.tldr-aic
  • bart-xsum.tldr-aic
  • bart-xsum.tldr-ao
  • CATTS.tldr-ao
  • CATTS.tldr-aic
  • CATTS-XSUM.tldr-ao
  • CATTS-XSUM.tldr-aic

Other TODOs:

  • Update data
  • Update evaluation script
  • Update readme
  • Add EMNLP scripts
  • Add EMNLP citation
  • Update paper with reviewer promises
  • Write data/ model documentation
  • Update preprint

Incompatible with the latest version of `fairseq`?

Here is a Colab notebook where I tried to do inference using this model.

loading archive file /content/models
loading archive file SciTLDR-Data/SciTLDR-AIC/ctrl-bin
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/fairseq/checkpoint_utils.py", line 151, in load_checkpoint_to_cpu
    from fairseq.fb_pathmgr import fb_pathmgr
ModuleNotFoundError: No module named 'fairseq.fb_pathmgr'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "scripts/generate.py", line 99, in <module>
    generate_TLDRs(**vars(args))
  File "scripts/generate.py", line 17, in generate_TLDRs
    task='translation'
  File "/usr/local/lib/python3.7/dist-packages/fairseq/models/bart/model.py", line 104, in from_pretrained
    **kwargs,
  File "/usr/local/lib/python3.7/dist-packages/fairseq/hub_utils.py", line 68, in from_pretrained
    arg_overrides=kwargs,
  File "/usr/local/lib/python3.7/dist-packages/fairseq/checkpoint_utils.py", line 190, in load_model_ensemble_and_task
    state = load_checkpoint_to_cpu(filename, arg_overrides)
  File "/usr/local/lib/python3.7/dist-packages/fairseq/checkpoint_utils.py", line 160, in load_checkpoint_to_cpu
    path, map_location=lambda s, l: default_restore_location(s, "cpu")
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 692, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\x0a'.

Error Running make_datafiles.sh

Hello authors, awesome work!! I am trying to replicate your work for my project as mentioned. However, I am stuck at the below code snippet, please see if you can help.

Running
cd SciTLDR-Data
export TASK=SciTLDR-A
chmod +x make_datafiles.sh
./make_datafiles.sh # BPE preprocess

Error
`usage: to_stories.py [-h] [--mapping_dir MAPPING_DIR] [--out_dir OUT_DIR]
[--num_cores NUM_CORES]
data_dir
to_stories.py: error: the following arguments are required: data_dir
usage: make_datafiles.py [-h] [--stories_dir STORIES_DIR] [--urldir URLDIR]
[--finished_files_dir FINISHED_FILES_DIR]
make_datafiles.py: error: argument --finished_files_dir: expected one argument
--2021-03-28 10:36:22-- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘encoder.json’ not modified on server. Omitting download.

--2021-03-28 10:36:23-- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘vocab.bpe’ not modified on server. Omitting download.

--2021-03-28 10:36:23-- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘dict.txt’ not modified on server. Omitting download.

Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/train.source'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/train.target'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/val.source'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/val.target'
usage: fairseq-preprocess [-h] [--no-progress-bar]
[--log-interval LOG_INTERVAL]
[--log-format {json,none,simple,tqdm}]
[--tensorboard-logdir TENSORBOARD_LOGDIR]
[--wandb-project WANDB_PROJECT] [--azureml-logging]
[--seed SEED] [--cpu] [--tpu] [--bf16]
[--memory-efficient-bf16] [--fp16]
[--memory-efficient-fp16] [--fp16-no-flatten-grads]
[--fp16-init-scale FP16_INIT_SCALE]
[--fp16-scale-window FP16_SCALE_WINDOW]
[--fp16-scale-tolerance FP16_SCALE_TOLERANCE]
[--min-loss-scale MIN_LOSS_SCALE]
[--threshold-loss-scale THRESHOLD_LOSS_SCALE]
[--user-dir USER_DIR]
[--empty-cache-freq EMPTY_CACHE_FREQ]
[--all-gather-list-size ALL_GATHER_LIST_SIZE]
[--model-parallel-size MODEL_PARALLEL_SIZE]
[--quantization-config-path QUANTIZATION_CONFIG_PATH]
[--profile] [--reset-logging] [--suppress-crashes]
[--use-plasma-view] [--plasma-path PLASMA_PATH]
[--criterion {sentence_ranking,wav2vec,model,label_smoothed_cross_entropy,latency_augmented_label_smoothed_cross_entropy,legacy_masked_lm_loss,nat_loss,ctc,label_smoothed_cross_entropy_with_alignment,cross_entropy,sentence_prediction,composite_loss,masked_lm,adaptive_loss,vocab_parallel_cross_entropy}]
[--tokenizer {space,moses,nltk}]
[--bpe {bytes,gpt2,byte_bpe,sentencepiece,bert,characters,hf_byte_bpe,fastbpe,subword_nmt}]
[--simul-type {hard_aligned,infinite_lookback,waitk,waitk_fixed_pre_decision,hard_aligned_fixed_pre_decision,infinite_lookback_fixed_pre_decision}]
[--optimizer {nag,adafactor,adam,composite,adagrad,adamax,sgd,cpu_adam,lamb,adadelta}]
[--lr-scheduler {cosine,pass_through,polynomial_decay,reduce_lr_on_plateau,inverse_sqrt,tri_stage,triangular,manual,fixed}]
[--scoring {sacrebleu,bleu,wer,chrf}] [--task TASK]
[-s SRC] [-t TARGET] [--trainpref FP]
[--validpref FP] [--testpref FP] [--align-suffix FP]
[--destdir DIR] [--thresholdtgt N]
[--thresholdsrc N] [--tgtdict FP] [--srcdict FP]
[--nwordstgt N] [--nwordssrc N] [--alignfile ALIGN]
[--dataset-impl FORMAT] [--joined-dictionary]
[--only-source] [--padding-factor N] [--workers N]
fairseq-preprocess: error: argument --destdir: expected one argument
usage: build_ctrl_datasets.py [-h] [--outdir OUTDIR] datadir
build_ctrl_datasets.py: error: the following arguments are required: datadir
Times to run script: 2.3285547892252606e-06 min
Making bin file for URLs listed in /ctrl/mapping/mapping_test.txt...
Traceback (most recent call last):
File "make_datafiles.py", line 117, in
write_to_bin(all_test_urls, args.stories_dir, os.path.join(args.finished_files_dir, "test"))
File "make_datafiles.py", line 76, in write_to_bin
url_list = read_text_file(url_file)
File "make_datafiles.py", line 17, in read_text_file
with open(text_file, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/mapping/mapping_test.txt'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/train.source'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/train.target'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/val.source'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/val.target'
2021-03-28 10:36:44 | INFO | fairseq_cli.preprocess | Namespace(align_suffix=None, alignfile=None, all_gather_list_size=16384, azureml_logging=False, bf16=False, bpe=None, cpu=False, criterion='cross_entropy', dataset_impl='mmap', destdir='/ctrl-bin/', empty_cache_freq=0, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, joined_dictionary=False, log_format=None, log_interval=100, lr_scheduler='fixed', memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, model_parallel_size=1, no_progress_bar=False, nwordssrc=-1, nwordstgt=-1, only_source=False, optimizer=None, padding_factor=8, plasma_path='/tmp/plasma', profile=False, quantization_config_path=None, reset_logging=False, scoring='bleu', seed=1, simul_type=None, source_lang='source', srcdict='dict.txt', suppress_crashes=False, target_lang='target', task='translation', tensorboard_logdir=None, testpref=None, tgtdict='dict.txt', threshold_loss_scale=None, thresholdsrc=0, thresholdtgt=0, tokenizer=None, tpu=False, trainpref='/ctrl/train.bpe', use_plasma_view=False, user_dir=None, validpref='/ctrl/val.bpe', wandb_project=None, workers=60)
2021-03-28 10:36:45 | INFO | fairseq_cli.preprocess | [source] Dictionary: 50264 types
Traceback (most recent call last):
File "/usr/local/bin/fairseq-preprocess", line 33, in
sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-preprocess')())
File "/content/fairseq/fairseq_cli/preprocess.py", line 394, in cli_main
main(args)
File "/content/fairseq/fairseq_cli/preprocess.py", line 284, in main
make_all(args.source_lang, src_dict)
File "/content/fairseq/fairseq_cli/preprocess.py", line 252, in make_all
make_dataset(vocab, args.trainpref, "train", lang, num_workers=args.workers)
File "/content/fairseq/fairseq_cli/preprocess.py", line 248, in make_dataset
make_binary_dataset(vocab, input_prefix, output_prefix, lang, num_workers)
File "/content/fairseq/fairseq_cli/preprocess.py", line 133, in make_binary_dataset
offsets = Binarizer.find_offsets(input_file, num_workers)
File "/content/fairseq/fairseq/binarizer.py", line 106, in find_offsets
with open(PathManager.get_local_path(filename), "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/train.bpe.source'`

Error Running make_datafiles.sh for AIC

Authors, great work!! My Organisation sees a huge value in investing further in your research work.

However, I am stalled, running into the below error while running, can you help?

  • Is there a Docker Image you can share with a pre-built environment ??
  • & possibly even API version of code you are using for a demo?

It seems some libraries' versions are a conflict causing broken code, specifically fairseq ??

Running
cd SciTLDR-Data
export TASK=SciTLDR-AIC
chmod +x make_datafiles.sh
./make_datafiles.sh # BPE preprocess

Errors
self._kwargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "/jupyter_mount/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 96, in initializer
bpe = get_encoder(self.args.encoder_json, self.args.vocab_bpe)
File "/usr/local/lib/python3.6/dist-packages/fairseq/data/encoders/gpt2_bpe_utils.py", line 132, in get_encoder
with open(encoder_json_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'encoder.json'
Process ForkPoolWorker-73260:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "/jupyter_mount/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 96, in initializer
bpe = get_encoder(self.args.encoder_json, self.args.vocab_bpe)
File "/usr/local/lib/python3.6/dist-packages/fairseq/data/encoders/gpt2_bpe_utils.py", line 132, in get_encoder
with open(encoder_json_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'encoder.json'
Process ForkPoolWorker-73261:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "/jupyter_mount/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 96, in initializer
bpe = get_encoder(self.args.encoder_json, self.args.vocab_bpe)
File "/usr/local/lib/python3.6/dist-packages/fairseq/data/encoders/gpt2_bpe_utils.py", line 132, in get_encoder
with open(encoder_json_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'encoder.json'
Process ForkPoolWorker-73262:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "/jupyter_mount/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 96, in initializer
bpe = get_encoder(self.args.encoder_json, self.args.vocab_bpe)
File "/usr/local/lib/python3.6/dist-packages/fairseq/data/encoders/gpt2_bpe_utils.py", line 132, in get_encoder
with open(encoder_json_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'encoder.json'
Process ForkPoolWorker-73263:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "/jupyter_mount/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 96, in initializer
bpe = get_encoder(self.args.encoder_json, self.args.vocab_bpe)
File "/usr/local/lib/python3.6/dist-packages/fairseq/data/encoders/gpt2_bpe_utils.py", line 132, in get_encoder
with open(encoder_json_path, "r") as f:

Packages Not Found When Creating conda Environment From requirements.txt

When trying to create a conda environment using the requirments.txt file in the root directory on both Windows and Ubuntu, I get the following error:

PackagesNotFoundError: The following packages are not available from current channels:

  • pprint
  • torch==1.4.0
  • fairseq
  • pysbd==0.1.4

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

rom
and use the search bar at the top of the page.

I tried creating a new environment and pip installing each package in requirements.txt one at a time, but I got the same kind of error.

Error while using generate.py

Hello, @armancohan @kyleclo @isabelcachola
I am trying to generate a summary from this command given in the repo.
!python generate.py model/ data_input/ out/ --checkpoint_file checkpoint_best.pt --beam 2 --lenpen 0.4 --test_fname test.hypo
Here model is the folder containing checkpoint given for summarization( bart.tldr-aic)
data_input contains test.source file which is a text file containing the source content of the test.jsonl file of scitldr/SciTLDR-Data/SciTLDR-AIC/
out is an empty folder to store the output.

I am getting this error.

Traceback (most recent call last):
File "generate.py", line 100, in
generate_TLDRs(**vars(args))
File "generate.py", line 17, in generate_TLDRs
task='translation'
File "/content/gdrive/My Drive/Internship_SPI_Smita/Scientific_paper_summarization/scitldr/fairseq/fairseq/models/bart/model.py", line 112, in from_pretrained
**kwargs,
File "/content/gdrive/My Drive/Internship_SPI_Smita/Scientific_paper_summarization/scitldr/fairseq/fairseq/hub_utils.py", line 73, in from_pretrained
arg_overrides=kwargs,
File "/content/gdrive/My Drive/Internship_SPI_Smita/Scientific_paper_summarization/scitldr/fairseq/fairseq/checkpoint_utils.py", line 243, in load_model_ensemble_and_task
task = tasks.setup_task(args)
File "/content/gdrive/My Drive/Internship_SPI_Smita/Scientific_paper_summarization/scitldr/fairseq/fairseq/tasks/init.py", line 27, in setup_task
return TASK_REGISTRY[task_cfg.task].setup_task(task_cfg, **kwargs)
File "/content/gdrive/My Drive/Internship_SPI_Smita/Scientific_paper_summarization/scitldr/fairseq/fairseq/tasks/translation.py", line 226, in setup_task
paths = utils.split_paths(args.data)
File "/content/gdrive/My Drive/Internship_SPI_Smita/Scientific_paper_summarization/scitldr/fairseq/fairseq/utils.py", line 59, in split_paths
if "://" not in paths
TypeError: argument of type 'NoneType' is not iterable

Please help me on how to solve this.

Error running python scripts/cal-rouge.py

Hello authors, awesome work!! I am trying to replicate your work for my project as mentioned. However, I am stuck at the below code snippet, please see if you can help.

Platform/Environment
Google Colab

Running
!sudo python scripts/cal-rouge.py '/pathto/scitldr/test.hypo' 'SciTLDR-Data/SciTLDR-A/test.jsonl' --workers 1

Error
0% 0/618 [00:00<?, ?it/s]Preparing documents... 0 line(s) ignored Running ROUGE... Traceback (most recent call last): File "scripts/cal-rouge.py", line 160, in <module> main() File "scripts/cal-rouge.py", line 152, in main all_dfs = process(args.gold, args.candidate, args.candidate.split('/')[-1], args) File "scripts/cal-rouge.py", line 105, in process results = [get_rouge(d) for d in tqdm(data)] File "scripts/cal-rouge.py", line 105, in <listcomp> results = [get_rouge(d) for d in tqdm(data)] File "scripts/cal-rouge.py", line 48, in get_rouge return _get_rouge(args['pred'], args['data']) File "scripts/cal-rouge.py", line 73, in _get_rouge files2rouge.run(cand_file, gold_file, ignore_empty=True, saveto=log_file) File "/usr/local/lib/python3.7/dist-packages/files2rouge-2.1.0-py3.7.egg/files2rouge/files2rouge.py", line 73, in run output = r.convert_and_evaluate(rouge_args=rouge_args_str) File "/usr/local/lib/python3.7/dist-packages/pyrouge/Rouge155.py", line 368, in convert_and_evaluate rouge_output = self.evaluate(system_id, rouge_args) File "/usr/local/lib/python3.7/dist-packages/pyrouge/Rouge155.py", line 343, in evaluate rouge_output = check_output(command, env=env).decode("UTF-8") File "/usr/lib/python3.7/subprocess.py", line 411, in check_output **kwargs).stdout File "/usr/lib/python3.7/subprocess.py", line 488, in run with Popen(*popenargs, **kwargs) as process: File "/usr/lib/python3.7/subprocess.py", line 800, in __init__ restore_signals, start_new_session) File "/usr/lib/python3.7/subprocess.py", line 1551, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) PermissionError: [Errno 13] Permission denied: 'setup_rogue/ROUGE-1.5.5.pl' 0% 0/618 [00:00<?, ?it/s]

P.S. : While executing the dependent command !python setup_rouge.py , as prompted, I named the folder setup_rogue, in the present working directory, inside which ROUGE-1.5.5.pl was placed. Now this is giving a Permission error.

Unpickling error when running generation script with pretrained weights

I renamed the bart.tldr-ao.pt model to checkpoint_best.pt and tried running the generation script as python scripts/generate.py model/ SciTLDR-Data/SciTLDR-A/ctrl ./ --beam 2 --lenpen 0.4 --test_fname test.hypo as shown in the github instructors, but meet the following error:

Traceback (most recent call last):
  File "scripts/generate.py", line 99, in <module>
    generate_TLDRs(**vars(args))
  File "scripts/generate.py", line 17, in generate_TLDRs
    task='translation'
  File "/research/home/maxlitster/scitldr/repo/fairseq/models/bart/model.py", line 136, in from_pretrained
    **kwargs,
  File "/research/home/maxlitster/scitldr/repo/fairseq/hub_utils.py", line 75, in from_pretrained
    arg_overrides=kwargs,
  File "/research/home/maxlitster/scitldr/repo/fairseq/checkpoint_utils.py", line 339, in load_model_ensemble_and_task
    state = load_checkpoint_to_cpu(filename, arg_overrides)
  File "/research/home/maxlitster/scitldr/repo/fairseq/checkpoint_utils.py", line 263, in load_checkpoint_to_cpu
    state = torch.load(f, map_location=torch.device("cpu"))
  File "/home/maxlitster/miniconda3/envs/tldr/lib/python3.7/site-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/maxlitster/miniconda3/envs/tldr/lib/python3.7/site-packages/torch/serialization.py", line 692, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\x0a'.

I'm curious if anyone else has any advice as to how to generate text with the pretrained weights. Thanks!

How to train a TLDR generation model from scratch?

Dear author, thank you for your superior work! I want to ask if there is some code or script file for training the TLDR generation model from scratch? It seems that there is only a eval script in the project but no train file. Hope your respond~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.