Giter Site home page Giter Site logo

bastian / abstractive-summarization-of-meetings Goto Github PK

View Code? Open in Web Editor NEW
19.0 4.0 7.0 966 KB

The source code for my bachelor's thesis "Abstractive Summarization of Meetings"

License: Apache License 2.0

Python 100.00%
bert machine-learning abstractive-summarization bachelor-thesis texar tensorflow transformer

abstractive-summarization-of-meetings's Introduction

Abstractive Summarization of Meetings

This project contains the source code for my bachelor's thesis "Abstractive Text Summarization of Meetings".

Requirements

This project was only tested with Python 3.6 but should also work with more recent version of Python. For dependency versions, take a look at the requirements.txt file.

Execution

Preparing the data

python prepare_data.py

reads the data.[train|dev|test].tsv files and generates 3 TFRecord data files train.tf_record, eval.tf_record, and test.tf_record. These files are used for training.

Training

python main.py --run_mode=train_and_evaluate

starts the training.

Testing

python main.py --run_mode=test

can be used to calculate BLEU and ROUGE scores on the test data. It will print the results into the console and write the three files test-inputs.txt, test-predictions.txt, test-targets.txt in the /outputs folder. These files contain the sentences in a human readable format.

Predicting

python main.py --run_mode=predict

takes the content from the /data/predict.txt file and creates two files in the output-folder: predict-inputs.txt and predict-predictions.txt.

Credits

Data

The data from the predict.txt and data.[train|dev|test].tsv files is taken from the AMI Corpus and processed using the NITE XML Toolkit. The code that parses the corpus can be found at Meeting-Parser.

License

The AMI Corpus license can be found here: AMI Meeting Corpus License.

Code

Main parts of the code are taken from the Texar examples for BERT and Transformers. They can be found under the following links:

These examples are licensed under the Apache License 2.0. Copied files contain a link to their original version in the file header. Any of my modifications are also licensed under the same license.

Inspiration

This project was inspired by the GitHub repository Abstractive Summarization With Transfer Learning. This project uses no source code of the repository, though. The repository is also based on the Texar examples and thus has similar code.

License

This project is licensed under the Apache License 2.0.

abstractive-summarization-of-meetings's People

Contributors

bastian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

abstractive-summarization-of-meetings's Issues

link to model

My hardware doesnt support training this model. Google colab is giving memory issues. Can anyone pls send a drive link with model or anything.

problem in data

Hellow
i'm tring to add my meeting text to but the result not good in file predit-predctions.txt
before add text must run Meeting-Parser. ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.