Giter Site home page Giter Site logo

harryposher / ppocoder Goto Github PK

View Code? Open in Web Editor NEW

This project forked from reddy-lab-code-research/ppocoder

0.0 0.0 0.0 18.16 MB

Code for "PPOCoder: Execution-based Code Generation using Deep Reinforcement Learning"

License: MIT License

Shell 0.28% C++ 0.17% Python 98.69% C 0.30% PHP 0.05% Java 0.24% C# 0.28%

ppocoder's Introduction

PPOCoder

Official Implementation of Execution-based Code Generation using Deep Reinforcement Learning

Overview

The utilization of programming language (PL) models, pretrained on large-scale code corpora, as a means of automating software engineering processes has demonstrated considerable potential in streamlining various code generation tasks such as code completion, code translation, and program synthesis. However, current approaches mainly rely on supervised fine-tuning objectives borrowed from text generation, neglecting specific sequence-level features of code, including but not limited to compilability as well as syntactic and functional correctness. To address this limitation, we propose PPOCoder, a new framework for code generation that combines pretrained PL models with Proximal Policy Optimization (PPO) deep reinforcement learning and employs execution feedback as the external source of knowledge into the model optimization. PPOCoder is transferable across different code generation tasks and PLs.


Overview of the PPOCoder with actor and critic models: The action is sampled from the policy based on the given source data $x$ (NL or PL). Then, a reward is obtained for each action to guide and control policy updates. The reward function is composed of four elements: (a) compiler feedback; (b) syntactic matching score based on ASTs; (c) semantic matching score based on DFGs; and (d) KL-divergence penalty between active policy and the reference pretrained model. The critic model estimates value based on the obtained reward and PPOCoder will be optimized with PPO, which takes into account both value and policy optimization.

Environment Installation

To run the code, install the dependencies in requirements.txt.

pip install -r requirements.txt

Datasets

We finetune/evaluate models on the following major dataset benchmarks for different code generation tasks:

  • CodeSearchNet (CSN) is available here
  • XLCoST is available here
  • APPS is available here
  • MBPP is available here

We preprocess the data and construct input/output sequences in the same manner as outlined in the original benchmark papers. Unzip and place all benchmarks in the data folder.

Run

We have created run.sh script to execute PPO-based PL model fine-tuning based on the compiler signal. To run the script for different code generation tasks, configure the following parameters:

Parameters Description Example Values
l1 Source Language java
l2 Target Language cpp
asp Action Space Size 5
ns Number of Synthetic Samples 10
data_path Path to the original data samples data/xlcost/java-cpp/
output_path Path to save generations and outputs saved_results/java-cpp/
baseline_output_dir Path to the base finetuned CodeT5 (before RL) outputs baselines/saved_models/java-cpp/
load_model_path Path to the base finetuned CodeT5 model (before RL) for each downstream task baselines/saved_models/java-cpp/pytorch_model.bin
max_source_length Maxmim Source Length 400
max_target_length Maxmim Target Length 400
train_batch_size Training Batch Size 32
test_batch_size Testing Batch Size 48
lr Learning Rate 1e-6
kl_coef Initial coefficient of the KL divergence penalty in the reward 0.1
kl_target Tararget of the KL which adaptively controls the KL coefficient 1
vf_coef Coefficient of the vf error in the ppo loss 1e-3
run Index of the run 1

Running run.sh saves generated programs in a .txt file and the model weights at the end of each epoch.

Citation

If you find the paper or the repo useful, please cite it with

@article{shojaee2023ppocoder,
  title={Execution-based Code Generation using Deep Reinforcement Learning},
  author={Shojaee, Parshin and Jain, Aneesh and Tipirneni, Sindhu and Reddy, Chandan K},
  journal={arXiv preprint arXiv:2301.13816},
  year={2023}
}

ppocoder's People

Contributors

parshinsh avatar parshinshojaee avatar aneeshjain avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.