Giter Site home page Giter Site logo

saner-2019's Introduction

Materials for Saner'2019

Overview

  • Bi-DTBCNN and D-TBCNN code in Tensorflow : /model
  • Baseline code : /baselines
    • n-gram
    • bow
    • tf-idf
    • siamese-lstm
    • gated graph neural networks
  • Data: /data
    • Raw data : /data/code
    • Protobuf : /data/code_pb_slice
    • Pickle : /data/code_pb_slice_pkl
  • Labels of algorithsm : miscellaneous/algorithms.txt

Since the training data is quite large, we store in on Google Drive.

Download the pretrained embedding, training data and testing data here: https://drive.google.com/open?id=1aA-l31EwaDETdBtFFZ2EXLYkN0Z6zKgT and the raw Github Data here https://drive.google.com/open?id=103Ij5KIyL23dHXr0sCj_iNW473cMgupq and store into the directory pretrained_embedding/

We use Python with Tensorflow, keras, sklearn to build the model and run the baselines.

Results

  • In Table 1, we prove that the performance of Bi-DTBCNN outperforms the other baselines in the cross-language binary classification.
  • In Table 2, we prove that the performance of DTBCNN outperforms the other baselines in the single language classification.
  • In Table 3, we perform an sensitivity analysis to see how different Bi-NNs is affected when number of classes increases. We can see that Bi-DTBCNN can keep up the performance whislt Bi-GGNN decreases significantly.
  • In Table 4, we show that different dependency tree can affect the performance significantly.

Process

To give an overview of how we process our data, here are the steps:

  • Use nicad clone detection tool : https://www.txl.ca/ to remove clones.
  • Once the clones are removed, we use the parser from http://www.srcml.org/ to parse the code into Protobuf format.
  • From the protobuf format, we dump it into the Python pickle format for training, since our code based is written in Python, from now all, all of the training is based on the pickle files.

saner-2019's People

Contributors

vaadvx avatar

Stargazers

 avatar  avatar  avatar

saner-2019's Issues

models' weights

It's very interesting work!
Thanks for the code ๐Ÿ‘

Could you please publish models weights?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.