salesforce / convsumm Goto Github PK

View Code? Open in Web Editor NEW

38.0 6.0 8.0 50 KB

License: BSD 3-Clause "New" or "Revised" License

Python 99.55% Shell 0.45%

convsumm's Introduction

Conversational Summarization

Controllable Abstractive Dialogue Summarization with Sketch Supervision

convsumm's People

Contributors

Stargazers

Watchers

Forkers

yale-lily jshin49 jinhyeong-lim isabella232 derkmed harshulagarwal

convsumm's Issues

Error occurs when load trained summarization models using the huggingface library.

thanks for sharing this code!
it seems there are some errors when using the trained model:

libc++abi.dylib: terminating with uncaught exception of type c10::Error: owning_ptr == NullType::singleton() || owning_ptr->refcount_.load() > 0 INTERNAL ASSERT FAILED at ../c10/util/intrusive_ptr.h:348, please report a bug to PyTorch. intrusive_ptr: Can only intrusive_ptr::reclaim() owning pointers that were created using intrusive_ptr::release(). (reclaim at ../c10/util/intrusive_ptr.h:348)

I use the code below(which is provided in README):
from transformers import pipeline
summarizer = pipeline("summarization", model="Salesforce/bart-large-xsum-samsum", device=0)
text = "<s> {}".format(" <s> ".join(conv))
summary = summarizer(text, min_length=10, max_length=100, num_beams=4)[0]["summary_text"]

how can I solve this problem?
thanks!

Samsum corpus for finetuning

I wanted to know if you have used samsum corpus for fine-tuning these two models (Salesforce/bart-large-xsum-samsum and Salesforce/cods-bart-large-xsum-samsum) ?

Can you please let me know, how to run the SOTA CODS model mentioned in the paper.

Amazing work. I read the paper and it says, CODS can actually control the number of sentences in the summary. I think you have mentioned the generation pipeline under the topic of CODS: Salesforce/cods-bart-large-xsum-samsum.

I am interested in generating 1 sentence summaries. But I noticed sometimes, it does not generate TLDR, where it throws an error saying list out of index (due to this split(" TLDR ")[1]). Is there a specific reason to this?

When the number of words in conversation is less than the maximum output length (400), it says manually reduce the max_length parameter. Why are you using the parameter, 400 for this? Is it a hyperparameter? If I remove it will it generate dynamic length outputs?

Training segment predictor

Hello salesforce,

I am interested in your project but have a question.
When training the segment predictor, where comes the golden label?
The problem is, when I am running the code "train_segment_predictor", data loader only loads datas with the label "segment", but the preprocessed data does not have it.

Thanks for you kind reply.

Pre-trained dialogue segmentation model

Dear authors,
Could you please provide an access to the pre-trained segmentation model? I suppose that it may be really useful for other related tasks as well.
Thanks.

salesforce / convsumm Goto Github PK

convsumm's Introduction

Conversational Summarization

convsumm's People

Contributors

Stargazers

Watchers

Forkers

convsumm's Issues

Error occurs when load trained summarization models using the huggingface library.

Samsum corpus for finetuning

Can you please let me know, how to run the SOTA CODS model mentioned in the paper.

Training segment predictor

Pre-trained dialogue segmentation model

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent