Giter Site home page Giter Site logo

cnn_lstm_seq2seq's Introduction

CNN_LSTM_Seq2Seq

Abstractive Text Summarization Using Sequence to Sequence Model

Project Overview

Abstractive text summarization, on the other hand, generates summaries by compressing the information in the input text in a lossy manner such that the main ideas are preserved. The advantage of abstractive text summarization is that it can use words that are not in the text and reword the information to make the summarizes more readable. In this model, a CNN-LSTM encoder and LSTM decoder model are used to generate headlines for articles using the Gigaword dataset. To improve the quality of the generated summaries, a Bahdanau attention mechanism, a pointer-generator network and a beam-search inference decoder are applied to the model.

Install

This project requires Python 3.6 and the following Python libraries installed:

You will also need to have software installed to run and execute a Jupyter Notebook

If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included. Make sure that you select the Python 3.6 installer.

Architecture

alt text

Hyperparameters

Parameters Values
Kernel Size [1,3,5]
Filter Size 100
Encoder Hidden Units 256
Encoder Layers 1
Decoder Hidden Units 512
Decoder Layers 1
Beam Width 10
Embedding 300d - GloVe
Dropout 0.5
Loss Function torch.nn.CrossEntropyLoss
Optimizer Adam Optimizer
Learning Rate 0.001

Dataset

The model is trained on the Gigaword corpus found at https://github.com/harvardnlp/sent-summary. The dataset contains the first sentence of articles as the input text and the headlines as the ground-truth summaries.

Results

The generated summaries achieved a ROUGE-1 score of 29.79 using the files2rouge function.

cnn_lstm_seq2seq's People

Contributors

murak038 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

cnn_lstm_seq2seq's Issues

Greedy Decoder: RuntimeError: CUDA out of memory during

While computing the Greedy Decoder script getting the error. Can you suggest the type of GPU and the amount of memory required to run the script?
The output of the script while running in spyder the following code along with its error is given below. I have also enclosed all details. The machine is Dell Precision Tower 7920.
with torch.no_grad():
greedy_time = time.time() # start timer
loss_greedy = []
greedy_predict = []
model.eval()
# initialize the encoder hidden states
val_hidden = model.encoder.init_hidden(batch_size=32)
for x_val, y_val in get_batches(val_text,val_summary,batch_size=32):
# convert data to PyTorch tensor
x_val = torch.from_numpy(x_val).to(device)
y_val = torch.from_numpy(y_val).to(device)
val_hidden = tuple([each.data for each in val_hidden])
# run the greedy decoder
val_loss, prediction = model.inference_greedy(x,y,val_hidden,criterion,batch_size=32)
loss_greedy.append(val_loss.item())
greedy_predict.append(prediction)

model.train()
print("Greedy Test: {0} s".format(time.time()-greedy_time))
print("Val Greedy Loss: {:.4f}".format(np.mean(loss_greedy)))

Traceback (most recent call last):

File "", line 14, in
val_loss, prediction = model.inference_greedy(x,y,val_hidden,criterion,batch_size=32)

File "", line 59, in inference_greedy
logits, d_hidden = self.decoder(dec_input,enc_output,d_hidden,x,batch_size)

File "/home/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)RuntimeError: CUDA out of memory

File "", line 74, in forward
output_probability = torch.mul(p_pointer.unsqueeze(1),pointer_prob) + torch.mul(p_gen.unsqueeze(1),generator_prob)

RuntimeError: CUDA out of memory. Tried to allocate 11.75 MiB (GPU 0; 7.92 GiB total capacity; 5.84 GiB already allocated; 20.75 MiB free; 717.94 MiB cached)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.