Comments (5)
@yuewang-cuhk
The original dataset for SourceSum is from https://github.com/sriniiyer/codenn/tree/master/data/stackoverflow
For the training data, "_silvia.tsv" is the correct one after our preprocessing.
For the testing data, the CodeNN group provided the human annotations for around 100 records for both Csharp and SQL. For example, you could find the sql ones in here: https://github.com/sriniiyer/codenn/tree/master/data/stackoverflow/sql/eval. As described in their paper, the bleu score is calculated for those having the human annotated summaries and the text from stackoverflow. We followed their steps to just evaluate these around 100 records. These are all contained in our tsv files for test. This is mentioned in our paper Page 4:
Iyer et al. (2016) asked human annotators to provide two additional titles for 200 randomly chosen code snippets from the validation and test set for SQL and CSharp code. We followed their preprocessing methods and evaluation using the test dataset annotated by human annotators.
from codetrans.
Hi @yuewang-cuhk ,
Thanks for your interest in our work.
In our research, we have used the original T5 inference function to make the prediction:
https://github.com/google-research/text-to-text-transfer-transformer#decode
Afterward, we used the CodeBert smoothed-BLEU score function to calculate the results:
https://github.com/microsoft/CodeBERT/tree/master/CodeBERT/code2nl
Due to the complexity of T5 and slow inference speed, we decided to convert all our models to the hugging-face library, which is much faster and easier for researchers.
The reason for the difference in the smoothed BLEU results, due to the configuration of the beam search, which was used in T5 library compared to the hugging-face library.
In T5, they used beam search 4 and decode alpha 0.6:
https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/models/gin/beam_search.gin
To approximately match the same configuration in Hugging-face, you have to adjust the beam search configuration as following:
preds = pipeline(tokenized_input,
min_length=1,
max_length=1024,
num_beams=4,
temperature=0,
length_penalty=0.6
)
Here are the expected results of T5 Vs. Hugging-face for the Javascript Code Documentation Generation:
Library/Model | Small | Base | Large |
---|---|---|---|
T5 with beam search | 17.23 | 18.25 | 18.98 |
HuggingFace without beam search | 15.8 | 16.96 | 17.67 |
HuggingFace with beam search | 17.1 | 18.13 | 18.94 |
T5 - HuggingFace difference (using beam search) | 0.13 | 0.12 | 0.04 |
As you can see, using the correct beam search configuration, you can approximately match the same result as T5 on HuggingFace.
The small insignificant difference between T5 and HuggingFace results due to the different implementation of beam search.
For example, HuggingFace calculates the penalty differently than T5 (based on Mesh TensorFlow):
https://github.com/huggingface/transformers/blob/996a315e76f6c972c854990e6114226a91bc0a90/src/transformers/generation_beam_search.py#L368
https://github.com/tensorflow/mesh/blob/985151bc4e787be3c99174d0d0eee743a4cb8561/mesh_tensorflow/beam_search.py#L261
I have created three Colab examples that should replicate the above results and reproduce it:
https://colab.research.google.com/drive/10PwFRsY8P2uMc3SGr7WRgqQXFxjzbj83?usp=sharing
https://colab.research.google.com/drive/1vc84NthgeLNLxOH6eUqbh_5UIuD-Mh4s?usp=sharing
https://colab.research.google.com/drive/1YvXt5vYL6HJDPW37tWv9f_r-p-TJfqLs?usp=sharing
Simply following the above examples for the rest of the languages/models, you should be able to reproduce our results.
Regarding preprocessing, you don't need to tokenize the source code using tree_sitter for the CodeBert dataset because it is already preprocessed. You only need to do so if you have a new example that you need to predict.
I hope the above explanation answers your questions.
Out of curiosity, why are you reproducing our results?
Are you planning to use it internally in salesforce, or preparing for a new publication, or something else ?
from codetrans.
Hi @agemagician, great thanks to your quick and detailed response! We have been able to reproduce the code documentation generation tasks following your instructions. We are planning to compare CodeTrans in our new publication.
By the way, we can only find the provided training set but not the dev and test sets. Could you also kindly share them (tokenized dev and test datasets) to facilitate the easy comparison with your CodeTrans on all downstream tasks? Thanks in advance!
from codetrans.
You are welcome ๐
Sure, we have updated the readme with the datasets links:
https://www.dropbox.com/sh/mzxa2dq30gnot29/AABIf7wPxH5Oe0PZHJ5jPV22a?dl=0
Feel free to send me an email or LinkedIn message, if you want to have a discussion over the new publication.
I and my co-author @matchlesswei will be happy to discuss it.
from codetrans.
Hi @agemagician, thanks for sharing these datasets. I've checked them and confirmed that most of them have the same data statistics in the paper except the "SourceSum" task, where only the training set (ends with "_silvia.tsv") has the matched size. Could you help to check that? I print all the data sizes for files in the "SourceSum" folder below:
6252 testC#
6629 testCS_silvia.tsv
2662 testPython
2659 testPython.txt
2783 testPython_silvia.tsv
2932 testSQL
3340 testSQL_silvia.tsv
49801 trainC#
52943 trainCS_silvia.tsv
11461 trainPython
11458 trainPython.txt
12004 trainPython_silvia.tsv
22492 trainSQL
25671 trainSQL_silvia.tsv
6241 valC#
2647 valPython
2651 valPython.txt
2858 valSQL
from codetrans.
Related Issues (9)
- Is it possible to fine tune a Python capable model herein with custom new libraries dataset
- Code Summarization
- Error in model.predict - TypeError: 'str' object is not callable HOT 1
- Differences between the first three downstream tasks?(except for the dataset) HOT 1
- Checkpoints for models HOT 1
- Any chance of actual pretraining/finetununing code? HOT 3
- summary model outputs interrogative sentence๏ผ HOT 1
- RuntimeError: Could not infer dtype of dict
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from codetrans.