Giter Site home page Giter Site logo

shivam-dave / automated-text-summarization Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 15 KB

This repository contains code for training and evaluating models for automated text summarization using deep learning techniques. It includes preprocessing steps, model training, evaluation on test data, and utilities for tokenization and sequence padding.

License: MIT License

Python 100.00%

automated-text-summarization's Introduction

Automated Text Summarization

The "Automated Text Summarization" repository hosts scripts for building and evaluating deep learning models that automate text summarization tasks. Using LSTM-based sequence-to-sequence models, it preprocesses and tokenizes datasets like CNN/Daily Mail, trains models for summarization, and evaluates them on test sets. The repository provides functionalities for training, testing, and evaluating model performance, supporting efficient text summarization tasks in natural language processing applications.

Features

  • Data Preprocessing: Includes text cleaning, tokenization, and padding of sequences.
  • Model Architecture: LSTM-based encoder-decoder model with attention mechanism for text summarization.
  • Training and Evaluation: Scripts for training models, evaluating performance metrics, and saving tokenizers for reproducibility.
  • Utilities: Tokenizer management and data generator functions for efficient model training.

Requirements

  • TensorFlow
  • Keras
  • NumPy
  • Pandas
  • NLTK
  • tqdm
  • rouge-score

Installation

  1. Clone the repository:
    git clone https://github.com/yourusername/Automated-Text-Summarization.git
    cd Automated-Text-Summarization

2.Install dependencies:
```bash
pip install -r requirements.txt
  1. Run the scripts:
python train_model.py

4.Dataset :

The project uses the following dataset:

  • Dataset Name: CNN/Daily Mail Dataset
  • Size: Approximately 530 MB

To download the dataset, please visit Kaggle CNN/Daily Mail Dataset.

Usage

  1. Prepare your dataset in CSV format with 'article' and 'highlights' columns.
  2. Adjust model parameters and hyperparameters in the scripts as needed.
  3. Run scripts for training, evaluation, and inference.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For any questions or issues, please contact [email protected] .

automated-text-summarization's People

Contributors

shivam-dave avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.