Giter Site home page Giter Site logo

gotomypc / on-grammar-improvements-of-gpt-2-generation Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gideon-stein/on-grammar-improvements-of-gpt-2-generation

0.0 0.0 0.0 8.85 MB

complete implementation of the paper "On improving GPTs language generation through grammar error rate reduction"

Jupyter Notebook 96.43% Python 3.57%

on-grammar-improvements-of-gpt-2-generation's Introduction

On-grammar-improvements-of-GPT-2-generation

This repository includes the complete code for the paper "On improving GPTs language generation through grammar error rate reduction". The purpose of this repository is to make the conducted experiments reproducable, give additional insights and inspire future research.

drawing

Getting Started

This repository includes the following things:

  • Documentation of the Dataset building process
  • Finetuning, Grammar Correction, and Generation scripts that were used during this research project
  • Documentation of the complete evaluation process
  • A mountain of generated samples that was used for evaluation
  • Documentation of the model combination evaluation
  • Documentation of example generation for our paper.

Build on

Installation

To install dependencies simply run

pip install -r requirements.txt

To rebuild the missing empty folders for external resources and saves simply run

python create_empty_folders.py

You should be good to go.

The following external resources should be added in order to retrace all steps:

Finetuning scripts usage:

python run_lm_finetuning_frozen_3.py --output_dir=model_save/test --model_type=gpt2 --model_name_or_path=gpt2 --do_train --train_data_file=train.txt 
python run_generation_edited.py  --output_dir=model_save/test --model_type=gpt2 --model_name_or_path=gpt2 --do_train --train_data_file=data_files/train.txt

Generation scripts usage:

python transgenerator_translation.py --model_path=../trained_models/test/pytorch_model.bin --text_path ../build_data/test.txt --n_data 1000 --save_path test.p
python run_generation_edited.py  --model_name_or_path=model_save/test/pytorch_model.bin --save_name test 

Parameters can be added and changed accordingly to the script.

Grammar correction scripts usage:

python grammar_parser_json.py --path data/small-117M.train.jsonl --save_replace True --name test
python grammar_parser_txt.py --path base.txt --save_replace True --name test

Authors

  • Gideon Stein - Initial work - Github

on-grammar-improvements-of-gpt-2-generation's People

Contributors

yondijr avatar gideon-stein avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.