allenai / scitldr Goto Github PK
View Code? Open in Web Editor NEWHome Page: https://scitldr.apps.allenai.org/
License: Apache License 2.0
Home Page: https://scitldr.apps.allenai.org/
License: Apache License 2.0
Models to upload:
Other TODOs:
Here is a Colab notebook where I tried to do inference using this model.
loading archive file /content/models
loading archive file SciTLDR-Data/SciTLDR-AIC/ctrl-bin
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/fairseq/checkpoint_utils.py", line 151, in load_checkpoint_to_cpu
from fairseq.fb_pathmgr import fb_pathmgr
ModuleNotFoundError: No module named 'fairseq.fb_pathmgr'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "scripts/generate.py", line 99, in <module>
generate_TLDRs(**vars(args))
File "scripts/generate.py", line 17, in generate_TLDRs
task='translation'
File "/usr/local/lib/python3.7/dist-packages/fairseq/models/bart/model.py", line 104, in from_pretrained
**kwargs,
File "/usr/local/lib/python3.7/dist-packages/fairseq/hub_utils.py", line 68, in from_pretrained
arg_overrides=kwargs,
File "/usr/local/lib/python3.7/dist-packages/fairseq/checkpoint_utils.py", line 190, in load_model_ensemble_and_task
state = load_checkpoint_to_cpu(filename, arg_overrides)
File "/usr/local/lib/python3.7/dist-packages/fairseq/checkpoint_utils.py", line 160, in load_checkpoint_to_cpu
path, map_location=lambda s, l: default_restore_location(s, "cpu")
File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 529, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 692, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\x0a'.
Can you please improve the demo to get as input a link or pdf?
Hello authors, awesome work!! I am trying to replicate your work for my project as mentioned. However, I am stuck at the below code snippet, please see if you can help.
Running
cd SciTLDR-Data
export TASK=SciTLDR-A
chmod +x make_datafiles.sh
./make_datafiles.sh # BPE preprocess
Error
`usage: to_stories.py [-h] [--mapping_dir MAPPING_DIR] [--out_dir OUT_DIR]
[--num_cores NUM_CORES]
data_dir
to_stories.py: error: the following arguments are required: data_dir
usage: make_datafiles.py [-h] [--stories_dir STORIES_DIR] [--urldir URLDIR]
[--finished_files_dir FINISHED_FILES_DIR]
make_datafiles.py: error: argument --finished_files_dir: expected one argument
--2021-03-28 10:36:22-- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘encoder.json’ not modified on server. Omitting download.
--2021-03-28 10:36:23-- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘vocab.bpe’ not modified on server. Omitting download.
--2021-03-28 10:36:23-- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘dict.txt’ not modified on server. Omitting download.
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/train.source'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/train.target'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/val.source'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/val.target'
usage: fairseq-preprocess [-h] [--no-progress-bar]
[--log-interval LOG_INTERVAL]
[--log-format {json,none,simple,tqdm}]
[--tensorboard-logdir TENSORBOARD_LOGDIR]
[--wandb-project WANDB_PROJECT] [--azureml-logging]
[--seed SEED] [--cpu] [--tpu] [--bf16]
[--memory-efficient-bf16] [--fp16]
[--memory-efficient-fp16] [--fp16-no-flatten-grads]
[--fp16-init-scale FP16_INIT_SCALE]
[--fp16-scale-window FP16_SCALE_WINDOW]
[--fp16-scale-tolerance FP16_SCALE_TOLERANCE]
[--min-loss-scale MIN_LOSS_SCALE]
[--threshold-loss-scale THRESHOLD_LOSS_SCALE]
[--user-dir USER_DIR]
[--empty-cache-freq EMPTY_CACHE_FREQ]
[--all-gather-list-size ALL_GATHER_LIST_SIZE]
[--model-parallel-size MODEL_PARALLEL_SIZE]
[--quantization-config-path QUANTIZATION_CONFIG_PATH]
[--profile] [--reset-logging] [--suppress-crashes]
[--use-plasma-view] [--plasma-path PLASMA_PATH]
[--criterion {sentence_ranking,wav2vec,model,label_smoothed_cross_entropy,latency_augmented_label_smoothed_cross_entropy,legacy_masked_lm_loss,nat_loss,ctc,label_smoothed_cross_entropy_with_alignment,cross_entropy,sentence_prediction,composite_loss,masked_lm,adaptive_loss,vocab_parallel_cross_entropy}]
[--tokenizer {space,moses,nltk}]
[--bpe {bytes,gpt2,byte_bpe,sentencepiece,bert,characters,hf_byte_bpe,fastbpe,subword_nmt}]
[--simul-type {hard_aligned,infinite_lookback,waitk,waitk_fixed_pre_decision,hard_aligned_fixed_pre_decision,infinite_lookback_fixed_pre_decision}]
[--optimizer {nag,adafactor,adam,composite,adagrad,adamax,sgd,cpu_adam,lamb,adadelta}]
[--lr-scheduler {cosine,pass_through,polynomial_decay,reduce_lr_on_plateau,inverse_sqrt,tri_stage,triangular,manual,fixed}]
[--scoring {sacrebleu,bleu,wer,chrf}] [--task TASK]
[-s SRC] [-t TARGET] [--trainpref FP]
[--validpref FP] [--testpref FP] [--align-suffix FP]
[--destdir DIR] [--thresholdtgt N]
[--thresholdsrc N] [--tgtdict FP] [--srcdict FP]
[--nwordstgt N] [--nwordssrc N] [--alignfile ALIGN]
[--dataset-impl FORMAT] [--joined-dictionary]
[--only-source] [--padding-factor N] [--workers N]
fairseq-preprocess: error: argument --destdir: expected one argument
usage: build_ctrl_datasets.py [-h] [--outdir OUTDIR] datadir
build_ctrl_datasets.py: error: the following arguments are required: datadir
Times to run script: 2.3285547892252606e-06 min
Making bin file for URLs listed in /ctrl/mapping/mapping_test.txt...
Traceback (most recent call last):
File "make_datafiles.py", line 117, in
write_to_bin(all_test_urls, args.stories_dir, os.path.join(args.finished_files_dir, "test"))
File "make_datafiles.py", line 76, in write_to_bin
url_list = read_text_file(url_file)
File "make_datafiles.py", line 17, in read_text_file
with open(text_file, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/mapping/mapping_test.txt'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/train.source'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/train.target'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/val.source'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/val.target'
2021-03-28 10:36:44 | INFO | fairseq_cli.preprocess | Namespace(align_suffix=None, alignfile=None, all_gather_list_size=16384, azureml_logging=False, bf16=False, bpe=None, cpu=False, criterion='cross_entropy', dataset_impl='mmap', destdir='/ctrl-bin/', empty_cache_freq=0, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, joined_dictionary=False, log_format=None, log_interval=100, lr_scheduler='fixed', memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, model_parallel_size=1, no_progress_bar=False, nwordssrc=-1, nwordstgt=-1, only_source=False, optimizer=None, padding_factor=8, plasma_path='/tmp/plasma', profile=False, quantization_config_path=None, reset_logging=False, scoring='bleu', seed=1, simul_type=None, source_lang='source', srcdict='dict.txt', suppress_crashes=False, target_lang='target', task='translation', tensorboard_logdir=None, testpref=None, tgtdict='dict.txt', threshold_loss_scale=None, thresholdsrc=0, thresholdtgt=0, tokenizer=None, tpu=False, trainpref='/ctrl/train.bpe', use_plasma_view=False, user_dir=None, validpref='/ctrl/val.bpe', wandb_project=None, workers=60)
2021-03-28 10:36:45 | INFO | fairseq_cli.preprocess | [source] Dictionary: 50264 types
Traceback (most recent call last):
File "/usr/local/bin/fairseq-preprocess", line 33, in
sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-preprocess')())
File "/content/fairseq/fairseq_cli/preprocess.py", line 394, in cli_main
main(args)
File "/content/fairseq/fairseq_cli/preprocess.py", line 284, in main
make_all(args.source_lang, src_dict)
File "/content/fairseq/fairseq_cli/preprocess.py", line 252, in make_all
make_dataset(vocab, args.trainpref, "train", lang, num_workers=args.workers)
File "/content/fairseq/fairseq_cli/preprocess.py", line 248, in make_dataset
make_binary_dataset(vocab, input_prefix, output_prefix, lang, num_workers)
File "/content/fairseq/fairseq_cli/preprocess.py", line 133, in make_binary_dataset
offsets = Binarizer.find_offsets(input_file, num_workers)
File "/content/fairseq/fairseq/binarizer.py", line 106, in find_offsets
with open(PathManager.get_local_path(filename), "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/train.bpe.source'`
Is it just by appending topic and TDLR relevant codes to the beginning of the text sequences? Do you use the usual GPT-Tokenizer on the Control Codes?
Authors, great work!! My Organisation sees a huge value in investing further in your research work.
However, I am stalled, running into the below error while running, can you help?
It seems some libraries' versions are a conflict causing broken code, specifically fairseq ??
Running
cd SciTLDR-Data
export TASK=SciTLDR-AIC
chmod +x make_datafiles.sh
./make_datafiles.sh # BPE preprocess
Errors
self._kwargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "/jupyter_mount/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 96, in initializer
bpe = get_encoder(self.args.encoder_json, self.args.vocab_bpe)
File "/usr/local/lib/python3.6/dist-packages/fairseq/data/encoders/gpt2_bpe_utils.py", line 132, in get_encoder
with open(encoder_json_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'encoder.json'
Process ForkPoolWorker-73260:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "/jupyter_mount/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 96, in initializer
bpe = get_encoder(self.args.encoder_json, self.args.vocab_bpe)
File "/usr/local/lib/python3.6/dist-packages/fairseq/data/encoders/gpt2_bpe_utils.py", line 132, in get_encoder
with open(encoder_json_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'encoder.json'
Process ForkPoolWorker-73261:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "/jupyter_mount/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 96, in initializer
bpe = get_encoder(self.args.encoder_json, self.args.vocab_bpe)
File "/usr/local/lib/python3.6/dist-packages/fairseq/data/encoders/gpt2_bpe_utils.py", line 132, in get_encoder
with open(encoder_json_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'encoder.json'
Process ForkPoolWorker-73262:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "/jupyter_mount/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 96, in initializer
bpe = get_encoder(self.args.encoder_json, self.args.vocab_bpe)
File "/usr/local/lib/python3.6/dist-packages/fairseq/data/encoders/gpt2_bpe_utils.py", line 132, in get_encoder
with open(encoder_json_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'encoder.json'
Process ForkPoolWorker-73263:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "/jupyter_mount/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 96, in initializer
bpe = get_encoder(self.args.encoder_json, self.args.vocab_bpe)
File "/usr/local/lib/python3.6/dist-packages/fairseq/data/encoders/gpt2_bpe_utils.py", line 132, in get_encoder
with open(encoder_json_path, "r") as f:
When trying to create a conda environment using the requirments.txt file in the root directory on both Windows and Ubuntu, I get the following error:
PackagesNotFoundError: The following packages are not available from current channels:
Current channels:
To search for alternate channels that may provide the conda package you're
looking for, navigate to
https://anaconda.org
rom
and use the search bar at the top of the page.
I tried creating a new environment and pip installing each package in requirements.txt one at a time, but I got the same kind of error.
Hello, @armancohan @kyleclo @isabelcachola
I am trying to generate a summary from this command given in the repo.
!python generate.py model/ data_input/ out/ --checkpoint_file checkpoint_best.pt --beam 2 --lenpen 0.4 --test_fname test.hypo
Here model is the folder containing checkpoint given for summarization( bart.tldr-aic)
data_input contains test.source file which is a text file containing the source content of the test.jsonl file of scitldr/SciTLDR-Data/SciTLDR-AIC/
out is an empty folder to store the output.
I am getting this error.
Traceback (most recent call last):
File "generate.py", line 100, in
generate_TLDRs(**vars(args))
File "generate.py", line 17, in generate_TLDRs
task='translation'
File "/content/gdrive/My Drive/Internship_SPI_Smita/Scientific_paper_summarization/scitldr/fairseq/fairseq/models/bart/model.py", line 112, in from_pretrained
**kwargs,
File "/content/gdrive/My Drive/Internship_SPI_Smita/Scientific_paper_summarization/scitldr/fairseq/fairseq/hub_utils.py", line 73, in from_pretrained
arg_overrides=kwargs,
File "/content/gdrive/My Drive/Internship_SPI_Smita/Scientific_paper_summarization/scitldr/fairseq/fairseq/checkpoint_utils.py", line 243, in load_model_ensemble_and_task
task = tasks.setup_task(args)
File "/content/gdrive/My Drive/Internship_SPI_Smita/Scientific_paper_summarization/scitldr/fairseq/fairseq/tasks/init.py", line 27, in setup_task
return TASK_REGISTRY[task_cfg.task].setup_task(task_cfg, **kwargs)
File "/content/gdrive/My Drive/Internship_SPI_Smita/Scientific_paper_summarization/scitldr/fairseq/fairseq/tasks/translation.py", line 226, in setup_task
paths = utils.split_paths(args.data)
File "/content/gdrive/My Drive/Internship_SPI_Smita/Scientific_paper_summarization/scitldr/fairseq/fairseq/utils.py", line 59, in split_paths
if "://" not in paths
TypeError: argument of type 'NoneType' is not iterable
Please help me on how to solve this.
Hello authors, awesome work!! I am trying to replicate your work for my project as mentioned. However, I am stuck at the below code snippet, please see if you can help.
Platform/Environment
Google Colab
Running
!sudo python scripts/cal-rouge.py '/pathto/scitldr/test.hypo' 'SciTLDR-Data/SciTLDR-A/test.jsonl' --workers 1
Error
0% 0/618 [00:00<?, ?it/s]Preparing documents... 0 line(s) ignored Running ROUGE... Traceback (most recent call last): File "scripts/cal-rouge.py", line 160, in <module> main() File "scripts/cal-rouge.py", line 152, in main all_dfs = process(args.gold, args.candidate, args.candidate.split('/')[-1], args) File "scripts/cal-rouge.py", line 105, in process results = [get_rouge(d) for d in tqdm(data)] File "scripts/cal-rouge.py", line 105, in <listcomp> results = [get_rouge(d) for d in tqdm(data)] File "scripts/cal-rouge.py", line 48, in get_rouge return _get_rouge(args['pred'], args['data']) File "scripts/cal-rouge.py", line 73, in _get_rouge files2rouge.run(cand_file, gold_file, ignore_empty=True, saveto=log_file) File "/usr/local/lib/python3.7/dist-packages/files2rouge-2.1.0-py3.7.egg/files2rouge/files2rouge.py", line 73, in run output = r.convert_and_evaluate(rouge_args=rouge_args_str) File "/usr/local/lib/python3.7/dist-packages/pyrouge/Rouge155.py", line 368, in convert_and_evaluate rouge_output = self.evaluate(system_id, rouge_args) File "/usr/local/lib/python3.7/dist-packages/pyrouge/Rouge155.py", line 343, in evaluate rouge_output = check_output(command, env=env).decode("UTF-8") File "/usr/lib/python3.7/subprocess.py", line 411, in check_output **kwargs).stdout File "/usr/lib/python3.7/subprocess.py", line 488, in run with Popen(*popenargs, **kwargs) as process: File "/usr/lib/python3.7/subprocess.py", line 800, in __init__ restore_signals, start_new_session) File "/usr/lib/python3.7/subprocess.py", line 1551, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) PermissionError: [Errno 13] Permission denied: 'setup_rogue/ROUGE-1.5.5.pl' 0% 0/618 [00:00<?, ?it/s]
P.S. : While executing the dependent command !python setup_rouge.py
, as prompted, I named the folder setup_rogue, in the present working directory, inside which ROUGE-1.5.5.pl was placed. Now this is giving a Permission error.
I renamed the bart.tldr-ao.pt
model to checkpoint_best.pt
and tried running the generation script as python scripts/generate.py model/ SciTLDR-Data/SciTLDR-A/ctrl ./ --beam 2 --lenpen 0.4 --test_fname test.hypo
as shown in the github instructors, but meet the following error:
Traceback (most recent call last):
File "scripts/generate.py", line 99, in <module>
generate_TLDRs(**vars(args))
File "scripts/generate.py", line 17, in generate_TLDRs
task='translation'
File "/research/home/maxlitster/scitldr/repo/fairseq/models/bart/model.py", line 136, in from_pretrained
**kwargs,
File "/research/home/maxlitster/scitldr/repo/fairseq/hub_utils.py", line 75, in from_pretrained
arg_overrides=kwargs,
File "/research/home/maxlitster/scitldr/repo/fairseq/checkpoint_utils.py", line 339, in load_model_ensemble_and_task
state = load_checkpoint_to_cpu(filename, arg_overrides)
File "/research/home/maxlitster/scitldr/repo/fairseq/checkpoint_utils.py", line 263, in load_checkpoint_to_cpu
state = torch.load(f, map_location=torch.device("cpu"))
File "/home/maxlitster/miniconda3/envs/tldr/lib/python3.7/site-packages/torch/serialization.py", line 529, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/maxlitster/miniconda3/envs/tldr/lib/python3.7/site-packages/torch/serialization.py", line 692, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\x0a'.
I'm curious if anyone else has any advice as to how to generate text with the pretrained weights. Thanks!
Dear author, thank you for your superior work! I want to ask if there is some code or script file for training the TLDR generation model from scratch? It seems that there is only a eval script in the project but no train file. Hope your respond~
Sorry, I misunderstood. The BART pre-trained model can actually handle 2048 tokens.
Hello, the demo app here seems to be out of service.
Also, could you please share inference script you are using for the app for a single input sequence?
Thanks a lot.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.