Giter Site home page Giter Site logo

pt4code's Introduction

PT4code

The repo of ESEC/FSE 2022 paper "No More Fine-Tuning? An Experimental Evaluation of Prompt Tuning in Code Intelligence"

In this report, we upload all three tasks that can also be introduced in detail at CodeXGlue.

You can design and experiment different prompt templates by yourself :).

Defect Detection

Firstly download the dataset.

cd dataset
pip install gdown
gdown https://drive.google.com/uc?id=1x6hoF7G-tSYxg8AFybggypLZgMGDNHfF
cd ..

We provide a prompt version and fine-tuning version.

To prompt tuning a CodeBERT, just

cd defect/prompt
python codebert.py

To prompt tuning a CodeT5:

cd defect/prompt
python prompt_t5_2.py --visible_gpu <GPU> --data_dir=../dataset --max_source_length 512 --max_target_length 3 

To fine-tune a CodeT5, we provide the official and our implementation of CodeT5 repo in

cd defect/finetune

Code Summarization

Download the dataset, where {LANG} can be one of six programming languages.

cd summarization/data
wget https://s3.amazonaws.com/code-search-net/CodeSearchNet/v2/{LANG}.zip
unzip {LANG}.zip
python preprocess.py

To fine-tune or prompt tuning CodeT5 and try some different templates by yourself :)

cd summarization
python finetune_t5_gene.py --visible_gpu <GPU> --lang {LANG} --max_source_length 256 --max_target_length 128 --log_name=./log/{LANG}.log

For prompt tuning, use prompt_t5.py.

Code Translation

Download dataset from CodeXGlue dataset:

cd translation/data
python preprocess.py

The running command is similar to code summarization for fine-tune and prompt tuning.

Full Results

defect detection

Template Verbalizer ACC
[x] the code is [Z] bad, defective&clean, perfect 63.68
the code [x] is [z] bad, defective&clean, perfect 64.17
[x] it is [z] bad, defective&clean, perfect 63.98
a [z] code [x] bad, defective&clean, perfect 63.36
the code [x] is [z] yes&no 63.08
the code [x] is [z] bad, defective&indefective, perfect 64.28
the code [x] is [z] bad&perfect 63.71
the code [x] is [z] bad, defective, insecure&clean, perfect, secure 63.26
the code [x] is [z] bad, defective, insecure, vulnerable&clean, perfect, secure,invulnerable 63.10
[SOFT] [z] [SOFT] [x] bad, defective&clean, perfect 62.95
[x] [SOFT]*2 [z] bad, defective&clean, perfect 62.77
[x] [SOFT]*3 [z] bad, defective&clean, perfect 63.15
[SOFT]*10 [x] [z] bad, defective&clean, perfect 62.52
[SOFT]*50 [x] [z] bad, defective&clean, perfect 62.96
[SOFT]*100 [x] [z] bad, defective&clean, perfect 62.46
CodeT5-small ACC
Defect [X] [Z] 63
prefix 50 62.34
prefix 100 62.65
prefix 150 63.52
prefix 200 63.91
prefix 250 63.77
CodeT5-base ACC
Defect [X] [Z] 64.98
prefix 50 64.59
prefix 100 64.7
prefix 150 65.66
prefix 200 65.82
prefix 250 65.64

Code Summarization

Ruby JavaScript Go Python Java PHP Overall
codet5-small Fine-tuning 13.38 14.94 21.27 17.88 18.38 24.70
Codet5-small Prompt tuning 13.60 15.91 22.33 18.34 20.60 26.95
codet5-base Fine-tuning 13.70 15.80 22.60 17.97 19.56 25.77
codet5-base Prompt tuning 14.29 16.04 23.11 18.52 19.72 27.06

low resource

Python 100 200 300 500 1000 1%
CodeT5-small 5.42 7.62 7.89 11.58 13.23 14.01
CodeT5-small+PT 6.55 9.28 9.6 12.73 13.89 14.33
CodeT5-base 5.8 8.46 9.36 13.58 13.86 14.22
CodeT5-base+PT 7.82 10.78 12.63 14.77 14.78 14.81
Ruby 100 200 300 500 1000 1%
CodeT5-small 4.82 6.75 7.22 9.46 9.85 9.99
CodeT5-small+PT 6.48 7.89 8.26 10.89 10.91 10.85
CodeT5-base 4.93 6.83 7.19 10.1 11.22 10.36
CodeT5-base+PT 6.99 8.52 9.41 10.79 11.87 10.64
PHP 100 200 300 500 1000 1%
CodeT5-small 6.41 9.5 11.89 13.21 16.71 17.25
CodeT5-small+PT 7.9 12.23 14.13 16.26 17.47 17.88
CodeT5-base 5.52 8.9 12.83 15.59 17.65 20.65
CodeT5-base+PT 9.12 13.55 14.94 17.39 18.3 21.05
go 100 200 300 500 1000 1%
CodeT5-small 5.24 7.18 8.65 12.99 15.05 17.65
CodeT5-small+PT 7.2 11.51 12.42 14.32 16.88 17.95
CodeT5-base 7.96 9.64 10.88 13.62 16.93 19.99
CodeT5-base+PT 9.07 12.15 13.66 15.04 17.74 20.54
java 100 200 300 500 1000 1%
CodeT5-small 2.7 3.86 5.33 6.94 7.88 10.12
CodeT5-small+PT 3.56 5.89 7.35 9.9 10.44 11.18
CodeT5-base 3.35 4.73 7.24 8.32 10.94 11.75
CodeT5-base+PT 6.07 7.56 10.14 11.06 11.99 12.4
js 100 200 300 500 1000 1%
CodeT5-small 3.56 5.48 6.97 7.73 8.36 9.81
CodeT5-small+PT 5.9 7.58 8.76 9.6 10.14 11.58
CodeT5-base 4.14 5.6 7.07 10 10.62 11.53
CodeT5-base+PT 6.5 8.37 9.61 11.27 11.81 12.17

Code translation

BLEU Accuracy CodeBLEU BLEU Accuracy CodeBLEU
Naive copy 18.69 0 - 18.54 0 -
Transformer 50.47 37.90 61.59 55.84 33.00 63.74
RoBERTa (code) 71.99 57.90 80.18 77.46 56.10 83.07
CodeBERT 72.14 58.00 79.41 79.92 59.00 85.10
CodeT5-small Fine-tuning 78.67 65.40 82.55 82.29 63.80 87.01
CodeT5-small Prompt tuning 79.59 66.00 83.06 83.33 64.30 87.99
CodeT5-base Fine-tuning 79.45 66.10 83.96 83.61 65.30 88.32
CodeT5-base Prompt tuning 79.76 66.10 84.39 83.99 65.40 88.74

pt4code's People

Contributors

banana-boat avatar anonymousupdatar avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.