Giter Site home page Giter Site logo

gerpt's Introduction

GERPT - Training a German Generative Transformer Model using N-Gram Multihot Encodings

Experiments for my thesis ๐Ÿค—

Setup

Install necessary dependencies:

pip install -r requirements.txt

Optional

Compile the CUDA extensions for the N-Gram Multihot approach:

cd cpp
python setup.py install

To run the training on GPUs please install pytorch with CUDA support.

The following tasks can all be run with: tools/run_all.sh

Pre-Training

Pre-Process

The preprocess script sets the vocabulary and the tokenized dataset up. The easiest way is to use the training config, with the configs data for the dataset, saved_dict and saved_data for the outfile of the dictionary and tokenized dataset respectively.

NOTE: The data setting can be a huggingface dataset set or a local one that is prefixed with "text/"

python preprocess.py --config configs/base.yaml

Training

The training script will either train a standard implementation of a LSTM or Transformer model, with the N-Gram Multihot approach.

All parameters can be defined in a yaml configuration file. See configs/base.yaml for possible options or run python train.py --help.

python train.py --config configs/base.yaml

Parameters can also be set through the command line and will overwrite the yaml configs.

Downstream Evaluation

For downstream evaluation we use the flair library. In another yaml configuration file (see configs/flair_base.yaml) different downstream tasks can be declared. If the setting use is set to True training for the task is started. Multiple training tasks can be declared.

python train_ds.py --config configs/flair_base.yaml

Troubleshooting

  • Deepspeed tries to access some tmp folders for cuda extensions, that the user may not have permissions for. Export TORCH_EXTENSIONS_DIR to a new location.

h, e, l, l, o, ,w . ,o , r, l, d h, he, el, ll, lo, , , wo, or, rl, ld

gerpt's People

Contributors

hallerpatrick avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.