The "Automated Text Summarization" repository hosts scripts for building and evaluating deep learning models that automate text summarization tasks. Using LSTM-based sequence-to-sequence models, it preprocesses and tokenizes datasets like CNN/Daily Mail, trains models for summarization, and evaluates them on test sets. The repository provides functionalities for training, testing, and evaluating model performance, supporting efficient text summarization tasks in natural language processing applications.
- Data Preprocessing: Includes text cleaning, tokenization, and padding of sequences.
- Model Architecture: LSTM-based encoder-decoder model with attention mechanism for text summarization.
- Training and Evaluation: Scripts for training models, evaluating performance metrics, and saving tokenizers for reproducibility.
- Utilities: Tokenizer management and data generator functions for efficient model training.
- TensorFlow
- Keras
- NumPy
- Pandas
- NLTK
- tqdm
- rouge-score
- Clone the repository:
git clone https://github.com/yourusername/Automated-Text-Summarization.git cd Automated-Text-Summarization
2.Install dependencies:
```bash
pip install -r requirements.txt
- Run the scripts:
python train_model.py
4.Dataset :
The project uses the following dataset:
- Dataset Name: CNN/Daily Mail Dataset
- Size: Approximately 530 MB
To download the dataset, please visit Kaggle CNN/Daily Mail Dataset.
- Prepare your dataset in CSV format with 'article' and 'highlights' columns.
- Adjust model parameters and hyperparameters in the scripts as needed.
- Run scripts for training, evaluation, and inference.
This project is licensed under the MIT License. See the LICENSE file for details.
For any questions or issues, please contact [email protected] .