tomhosking / separator Goto Github PK

Code for the paper "Factorising Meaning and Form for Intent-Preserving Paraphrasing", Tom Hosking & Mirella Lapata (ACL 2021)

License: MIT License

Python 100.00%

separator's Introduction

Tom Hosking

I'm a PhD student in NLP at Edinburgh supervised by Mirella Lapata, working on discrete latent variable models for language generation.

Projects

HIRO: Hierarchical Indexing for Retrieval-Augmented Opinion Summarization - Tom Hosking, Hao Tang & Mirella Lapata

Human Feedback is not Gold Standard - Tom Hosking, Phil Blunsom & Max Bartolo (ICLR 2024)

Hercules - Code for the paper "Attributable and Scalable Opinion Summarization", Tom Hosking, Hao Tang & Mirella Lapata (ACL 2023)

HRQ-VAE - Code for the paper "Hierarchical Sketch Induction for Paraphrase Generation", Tom Hosking, Hao Tang & Mirella Lapata (ACL 2022)

Separator - Code for the paper "Factorising Meaning and Form for Intent-Preserving Paraphrasing", Tom Hosking & Mirella Lapata (ACL 2021)

TorchSeq - a sequence modelling framework, built in PyTorch

McKenzie - a Slurm scheduler job monitor

My neural question generation model implementation

separator's People

Contributors

Stargazers

Watchers

Forkers

hts221-diquest aderragui bobycv06fpm alberto-s-ramos linjianli

separator's Issues

Cannot download the data and model zip files. Links are not working

how to infer a different discrete syntactic encoding during generation ?

Thanks for your great work! However, in your paper, it mentioned " we can generate diverse paraphrases of an input question at test time by inferring a different discrete syntactic encoding". I wonder how can I achieve this?

generating multiple paraphrases of an input sentence

Hi Tom,

Thanks for releasing Separator! I am using it on a custom set of sentences as shown here : https://github.com/tomhosking/separator#run-inference-over-a-custom-dataset

I would like to generate "n" paraphrased questions for each input questions (say n=5).
Is there a way to do this when calling instance.inference as shown below?

# Finally, run inference
_, _, (pred_output, _, _), _ = instance.inference(data_loader.test_loader)

My apologies if there is a straighforward solution; I am not very familiar with torchseq.
Thanks!
Tejas.

No module named 'torchseq.agents.para_agent'

Getting Error when installing using requirements.txt

Hey Tom

, I am trying to get the results of your trained models in my local machine. When I am trying to install dependencies using "python -m pip install -r requirements.txt", I am getting the following error. Can u help me out here? Thanks :)

Missing Models

Hi Tom,

I came to try your approach but could not download the models or the data. Got a 301 at the URLs in the readme.
Could you provide them again?
Much appreciated 🙏

number of datasets

hi, I don't know which dataset you use in your paper. I found there exists 278055/26275/26275 number of data for train/dev/test in training-triples,and have 278055/26275/26275 number of data for train/dev/test in qqp-splitforgeneval. So which dataset you use to reproduce your paper result?

'JsonDataLoader' object has no attribute 'test_loader'

After installing torchseq, I try to replicate the result with the command below:
torchseq --load ./models/separator-qqp-v1.2 --test
torchseq --load ./models/separator-qqp-v1.2 --validate

However, the following errors occur:

How can I solve this problem?

Confused about the paralex dataset

Hi, I find there are four sub-datasets in the paralex fold after downloading the zip file. Meanwhile, the train/test/dev sizes of datasets are almost different in " wikianswers-triples-chunk-extendstop-realexemplars-resample-drop30-N5-R100", " wikianswers-para-allqs", and "wikianswers-para-splitforgeneval". Could you please highlight the file finally used in your experiments? :)

About self-bleu metric

Hello, I notice the self-bleu metric in your code:

        refs = [q["paras"] for q in rows]
        inputs = [[q["sem_input"]] for q in rows]

        # refs = [x["paras"] for x in qs_by_para_split]
        max_num_refs = max([len(x) for x in refs])
        refs_padded = [x + [x[0]] * (max_num_refs - len(x)) for x in refs]

        tgt_bleu = sacrebleu.corpus_bleu(output, list(zip(*refs_padded))).score
        self_bleu = sacrebleu.corpus_bleu(output, list(zip(*inputs))).score

You regard the "sem_input" as the reference sentences, but as the definition of self-bleu, it's should calculate the bleu score of different predicted sentence, could you explain it?

Thanks!

size mismatch for bottleneck.quantizer._alpha: copying a param with shape torch.Size([4]) from checkpoint, the shape in current model is torch.Size([16]).

Hello,

I am trying to reproduce your results but I have the following error when I run the command:

torchseq --load ./models/separator-qqp-v1.2 --test --cpu

The error is:

size mismatch for bottleneck.quantizer._alpha: copying a param with shape torch.Size([4]) from checkpoint, the shape in current model is torch.Size([16]).

Any idea how to debug this? I put the model in the right folder and also the data in the data folder.

Thanks!

Runtime Error regarding GPU

Hi Tom, getting errors regarding GPU. I don't have GPU. What are the changes in the package torchseq to work for only CPU users?