Giter Site home page Giter Site logo

skyhover / deckard Goto Github PK

View Code? Open in Web Editor NEW
209.0 16.0 78.0 4.84 MB

Code clone detection; clone-related bug detection; sematic clone analysis

License: Other

Shell 4.71% Python 8.96% Java 1.31% C 42.03% C++ 23.28% Objective-C 0.26% GAP 2.55% Makefile 2.09% Lex 4.24% Yacc 9.98% ANTLR 0.59%

deckard's Issues

What parameters are fine? Need help

Recently, I try deckard to find bugs in clone code. I find it not work well for the following java file. I set
MIN_TOKENS='15' STRIDE='2' SIMILARITY='0.8' .
According to the FSE 07 it is supposed to find out the bug.
The bug line is:
cmp = lhsType.compareTo(lhsType);
if (cmp != 0)
return cmp;
it looks similar several line ahead:
cmp = lhsName.compareTo(rhsName);
if (cmp != 0)
return cmp;
Can anyone help me?

public class VersionInsensitiveBugComparator implements WarningComparator {

private ClassNameRewriter classNameRewriter = IdentityClassNameRewriter.instance();

private boolean exactBugPatternMatch = true;

private boolean comparePriorities = false;
public VersionInsensitiveBugComparator() {
}

public void setClassNameRewriter(ClassNameRewriter classNameRewriter) {
    this.classNameRewriter = classNameRewriter; 
}
public void setComparePriorities(boolean b) {
    comparePriorities = b;
}

/**
 * Wrapper for BugAnnotation iterators, which filters out
 * annotations we don't care about.
 */
private class FilteringAnnotationIterator implements Iterator<BugAnnotation> {
    private Iterator<BugAnnotation> iter;
    private BugAnnotation next;

    public FilteringAnnotationIterator(Iterator<BugAnnotation> iter) {
        this.iter = iter;
        this.next = null;
    }

    public boolean hasNext() {
        findNext();
        return next != null;
    }

    public BugAnnotation next() {
        findNext();
        if (next == null)
            throw new NoSuchElementException();
        BugAnnotation result = next;
        next = null;
        return result;
    }

    public void remove() {
        throw new UnsupportedOperationException();
    }

    private void findNext() {
        while (next == null) {
            if (!iter.hasNext())
                break;
            BugAnnotation candidate = iter.next();
            if (!isBoring(candidate)) {
                next = candidate;
                break;
            }
        }
    }

}

private boolean isBoring(BugAnnotation annotation) {
    return !annotation.isSignificant();
}

private static int compareNullElements(Object a, Object b) {
    if (a != null)
        return 1;
    else if (b != null)
        return -1;
    else
        return 0;
}

private static String getCode(String pattern) {
    int sep = pattern.indexOf('_');
    if (sep < 0)
        return "";
    return pattern.substring(0, sep);
}

public int compare(BugInstance lhs, BugInstance rhs) {
    // Attributes of BugInstance.
    // Compare abbreviation 
    // Compare class and method annotations (ignoring line numbers).
    // Compare field annotations.

    int cmp;

    BugPattern lhsPattern = lhs.getBugPattern();
    BugPattern rhsPattern = rhs.getBugPattern();

    if (lhsPattern == null || rhsPattern == null) {
        // One of the patterns is missing.
        // However, we can still accurately match by abbrev (usually) by comparing
        // the part of the type before the first '_' character.
        // This is almost always equivalent to the abbrev.

        String lhsCode = getCode(lhs.getType());
        String rhsCode = getCode(rhs.getType());

        if ((cmp = lhsCode.compareTo(rhsCode)) != 0) {
            return cmp;
        }
    } else {
        // Compare by abbrev instead of type. The specific bug type can change
        // (e.g., "definitely null" to "null on simple path").  Also, we often
        // change bug pattern types from one version of FindBugs to the next.
        //
        // Source line and field name are still matched precisely, so this shouldn't
        // cause loss of precision.
        if ((cmp = lhsPattern.getAbbrev().compareTo(rhsPattern.getAbbrev())) != 0)
            return cmp;
        if (isExactBugPatternMatch() && (cmp = lhsPattern.getType().compareTo(rhsPattern.getType())) != 0)
            return cmp;
    }




    if (comparePriorities) {
        cmp = lhs.getPriority() - rhs.getPriority();
        if (cmp != 0) return cmp;
    }


    Iterator<BugAnnotation> lhsIter = new FilteringAnnotationIterator(lhs.annotationIterator());
    Iterator<BugAnnotation> rhsIter = new FilteringAnnotationIterator(rhs.annotationIterator());

    while (lhsIter.hasNext() && rhsIter.hasNext()) {
        BugAnnotation lhsAnnotation = lhsIter.next();
        BugAnnotation rhsAnnotation = rhsIter.next();

        // Different annotation types obviously cannot be equal,
        // so just compare by class name.
        if (lhsAnnotation.getClass() != rhsAnnotation.getClass())
            return lhsAnnotation.getClass().getName().compareTo(rhsAnnotation.getClass().getName());

        if (lhsAnnotation.getClass() == ClassAnnotation.class) {
            // ClassAnnotations should have their class names rewritten to
            // handle moved and renamed classes.

            String lhsClassName = classNameRewriter.rewriteClassName(
                    ((ClassAnnotation)lhsAnnotation).getClassName());
            String rhsClassName = classNameRewriter.rewriteClassName(
                    ((ClassAnnotation)rhsAnnotation).getClassName());

            cmp = lhsClassName.compareTo(rhsClassName);
            if (cmp != 0)
                return cmp;
        } else if(lhsAnnotation.getClass() == MethodAnnotation.class ) {
            // Rewrite class names in MethodAnnotations
            MethodAnnotation lhsMethod = ClassNameRewriterUtil.convertMethodAnnotation(
                    classNameRewriter, (MethodAnnotation) lhsAnnotation);
            MethodAnnotation rhsMethod = ClassNameRewriterUtil.convertMethodAnnotation(
                    classNameRewriter, (MethodAnnotation) rhsAnnotation);

            cmp = lhsMethod.compareTo(rhsMethod);
            if (cmp != 0)
                return cmp;

        } else if(lhsAnnotation.getClass() == FieldAnnotation.class) {
            // Rewrite class names in FieldAnnotations
            FieldAnnotation lhsField = ClassNameRewriterUtil.convertFieldAnnotation(
                    classNameRewriter, (FieldAnnotation) lhsAnnotation);
            FieldAnnotation rhsField = ClassNameRewriterUtil.convertFieldAnnotation(
                    classNameRewriter, (FieldAnnotation) rhsAnnotation);

            cmp = lhsField.compareTo(rhsField);
            if (cmp != 0)
                return cmp;
        } else if(lhsAnnotation.getClass() == StringAnnotation.class) {
            // Rewrite class names in FieldAnnotations
            String lhsString = ((StringAnnotation)lhsAnnotation).getValue();
            String rhsString = ((StringAnnotation)rhsAnnotation).getValue();
            cmp = lhsString.compareTo(rhsString);
            if (cmp != 0)
                return cmp;
        } else if(lhsAnnotation.getClass() == LocalVariableAnnotation.class) {
            // Rewrite class names in FieldAnnotations
            String lhsName = ((LocalVariableAnnotation)lhsAnnotation).getName();
            String rhsName = ((LocalVariableAnnotation)rhsAnnotation).getName();
            if (lhsName.equals("?") && rhsName.equals("?"))
                continue;
            cmp = lhsName.compareTo(rhsName);
            if (cmp != 0)
                return cmp;
        } else if(lhsAnnotation.getClass() == TypeAnnotation.class) {
            // Rewrite class names in FieldAnnotations
            String lhsType = ((TypeAnnotation)lhsAnnotation).getTypeDescriptor();
            String rhsType = ((TypeAnnotation)rhsAnnotation).getTypeDescriptor();
            lhsType = ClassNameRewriterUtil.rewriteSignature(classNameRewriter, lhsType);
            rhsType = ClassNameRewriterUtil.rewriteSignature(classNameRewriter, rhsType);
            cmp = lhsType.compareTo(lhsType);
            if (cmp != 0)
                return cmp;
        } else if(lhsAnnotation.getClass() == IntAnnotation.class) {
            // Rewrite class names in FieldAnnotations
            int lhsValue = ((IntAnnotation)lhsAnnotation).getValue();
            int rhsValue = ((IntAnnotation)rhsAnnotation).getValue();
            cmp = lhsValue - rhsValue;
            if (cmp != 0)
                return cmp;
        } else if (isBoring(lhsAnnotation)) {
            throw new IllegalStateException("Impossible");
        } else
            throw new IllegalStateException("Unknown annotation type: " + lhsAnnotation.getClass().getName());
    }

    if (rhsIter.hasNext())
        return -1;
    else if (lhsIter.hasNext())
        return 1;
    else
        return 0;
}

/**
 * @param exactBugPatternMatch The exactBugPatternMatch to set.
 */
public void setExactBugPatternMatch(boolean exactBugPatternMatch) {
    this.exactBugPatternMatch = exactBugPatternMatch;
}

/**
 * @return Returns the exactBugPatternMatch.
 */
public boolean isExactBugPatternMatch() {
    return exactBugPatternMatch;
}

}

Command line options for filter IDs not implemented

When I run the bugfiltering command, the results showed "Command line options for filters IDs not implemented" and "Cannot open file : src/AbstractAsyncTableRendering.java". The command I used is "scripts/bugdetect/bugfiltering samples/clusters/post_cluster_vdb_50_0_allg_0.95_30 java > bug_result". Do you have any idea about how to solve the problem? Thanks.

Build fails

Hi, on a current Mac OS system, the build fails because malloc.h is not at the place you expect that to be. I fixed it for me by a symbolic link but that can only be a workaround. It would be better fixed in the build script.

Build on Linux fails

Hi
I am trying to compile Deckard on a Linux system, but it stops, because it tries to find "dot2d".
Is this some kind of third party lib I should add? If so were should it be placed?

Here is a part of the log:

Everything cool above here:

  • -c -O3 -DREAL_FLOAT enumBuckets.cpp
    g++ -o ../bin/enumBuckets -O3 enumBuckets.o BucketHashing.o Geometry.o LocalitySensitiveHashing.o Random.o Util.o GlobalVars.o SelfTuning.o NearNeighbors.o -lm
    g++ -c -O3 -DREAL_FLOAT exploreBuckets.cpp
    g++ -o ../bin/exploreBuckets -O3 exploreBuckets.o BucketHashing.o Geometry.o LocalitySensitiveHashing.o Random.o Util.o GlobalVars.o SelfTuning.o NearNeighbors.o -lm
    make[1]: Leaving directory `/home/janO/Deckard/src/lsh/sources'
    ./build.sh: Zeile 98: cd: ../lib: Datei oder Verzeichnis nicht gefunden (File or Folder not found)
    ./build.sh: Zeile 111: cd: ../dot2d/grammars/output: Datei oder Verzeichnis nicht gefunden (File or Folder not found)
    ./build.sh: Zeile 123: cd: ../dot2d: Datei oder Verzeichnis nicht gefunden (File or Folder not found)

In braces I translated the error from German to English.

Cheers and Thanks

Problem in running Deckard for C project

Hi and thanks for the tool!
I set up a config file to test my C project, following the one reported as sample in scripts/clonedect, but I obtain this error after running ./deckard.sh:

==== Configuration checking...Error: missing file ~/Deckard-rel2.0solidity/scripts/clonedetect/src/main/cvecgen. Check your config

any suggestion?
Thanks in advance

Error: problem in vec generator step. Stop and check logs in times/

I receive this error message when running on sample code in /Deckard/samples/src
DECKARD--A Tree-Based Code Clone Detection Toolkit.

  • Version 2.0 + support for Solidity syntax
    Copyright (c) 2007-2018. University of California / Singapore Management University
    Distributed under the three-clause BSD license.

==== Configuration checking...Done.

==== Start clone detection ====

Vector generation.../home/shijing/ra/codeReuse/Deckard/src/main/jvecgen *.java

vgen: 30 2 ...Done. Log: times/vgen_30_2
...deleting intermediate vector files...Done

vgen: 30 0 ...Done. Log: times/vgen_30_0
...deleting intermediate vector files...Done

vgen: 50 2 ...Done. Log: times/vgen_50_2
...deleting intermediate vector files...Done

vgen: 50 0 ...Done. Log: times/vgen_50_0
...deleting intermediate vector files...Done

Error: problem in vec generator step. Stop and check logs in times/

Did anyone encounter similar situation?

Why does Deckard act differently from one run to another?

Hi,
Thank you for your great tool.
I am currently using Deckard for my research. However, when I run it multiple times with the same set of hyperparameters on the same dataset, I get different results. This affects the reproducibility of my research. Any chance to set seed?

Kind regards.

On Bugfiltering

Ln 52 scripts/bugfiltering
filterpath = os.environ.get("DECKARD_DIR")

The bash crashes, stating it cannot find the Deckard path.

Build fails

Build fails. It seems that it is related to solidity parser.

/mainsol.py solidity.y
bison -d pt_solidity.y -o pt_solidity.tab.cc -v -g
pt_solidity.y:213.9-15: syntax error, unexpected identifier, expecting string
make[1]: *** [pt_solidity.tab.cc] Error 1
make: *** [TARGET] Error 2

build fails

In Mac OS(Mojave 10.14.5) and Linux(Ubuntu 18.04.2 LTS), cannot build.
I command $sh build.sh in src/main/

Error Message

Mac OS

rm -f *.pyc
make -C simple clean
rm -f .o lex.yy.cc pt_c.tab pt_c.y head.cc c_ptgen
make -C gcc clean
rm -f .o lex.yy.cc pt_c.tab pt_c.y head.cc gccptgen.a
make -C java clean
rm -f .o lex.yy.cc pt_j.tab pt_j.y head.cc javaptgen.a
make -C php5 clean
rm -f .o lex.yy.cc pt_zend_language_parser.tab pt_zend_language_parser.y head.cc phpptgen.a
make -C sol clean
rm -f .o lex.yy.cc pt_solidity. head.cc solidityptgen.a
make -C gcc
./mainc.py c.y
Traceback (most recent call last):
File "./mainc.py", line 43, in
import YaccParser,YaccLexer
File "../YaccParser.py", line 77
except antlr.RecognitionException, ex:
^
SyntaxError: invalid syntax
make[1]: *** [pt_c.y] Error 1
make: *** [TARGET] Error 2
Error: ptgen make failed. Exit.
Error: ptgen make failed. Deckard build fails.

Linux

rm -f *.pyc
make -C simple clean
make[1]: Entering directory '/home/imseongbin/Deckard/src/ptgen/simple'
rm -f .o lex.yy.cc pt_c.tab pt_c.y head.cc c_ptgen
make[1]: Leaving directory '/home/imseongbin/Deckard/src/ptgen/simple'
make -C gcc clean
make[1]: Entering directory '/home/imseongbin/Deckard/src/ptgen/gcc'
rm -f .o lex.yy.cc pt_c.tab pt_c.y head.cc gccptgen.a
make[1]: Leaving directory '/home/imseongbin/Deckard/src/ptgen/gcc'
make -C java clean
make[1]: Entering directory '/home/imseongbin/Deckard/src/ptgen/java'
rm -f .o lex.yy.cc pt_j.tab pt_j.y head.cc javaptgen.a
make[1]: Leaving directory '/home/imseongbin/Deckard/src/ptgen/java'
make -C php5 clean
make[1]: Entering directory '/home/imseongbin/Deckard/src/ptgen/php5'
rm -f .o lex.yy.cc pt_zend_language_parser.tab pt_zend_language_parser.y head.cc phpptgen.a
make[1]: Leaving directory '/home/imseongbin/Deckard/src/ptgen/php5'
make -C sol clean
make[1]: Entering directory '/home/imseongbin/Deckard/src/ptgen/sol'
rm -f .o lex.yy.cc pt_solidity. head.cc solidityptgen.a
make[1]: Leaving directory '/home/imseongbin/Deckard/src/ptgen/sol'
make -C gcc
make[1]: Entering directory '/home/imseongbin/Deckard/src/ptgen/gcc'
./mainc.py c.y
bison -d pt_c.y -o pt_c.tab.cc
make[1]: bison: Command not found
Makefile:59: recipe for target 'pt_c.tab.cc' failed
make[1]: *** [pt_c.tab.cc] Error 127
make[1]: Leaving directory '/home/imseongbin/Deckard/src/ptgen/gcc'
Makefile:35: recipe for target 'TARGET' failed
make: *** [TARGET] Error 2
Error: ptgen make failed. Exit.
Error: ptgen make failed. Deckard build fails.

plz, help me.

Clone detection failure?need help

I followed the steps what README.md say.
But when I installed the Deckard,I want to test the clone detection...
I create a "config" file in the path /home/xx/projects/Deckard,And the content is same as "config" in /sample,
The configuration file is as follows:


FILE_PATTERN='*.java' # used in the 'find' command below
#where are the source files?
SRC_DIR="src"

The following are for Deckard2's support for dot only####

PDG_DIR="ddgs" # used by Deckard2 for 'find $SRC_DIR -ipath "*/$PDG_DIR/$FILE_PATTERN"'
AST_DIR="asts" # each pdg should have an ast with the same name in a different folder
#where are node definition files? used by Deckard2
TYPE_FILE='/home/ly/projects/Deckard/testdata/deckard3/AstNodeTypeNamesIDs.txt'
RELEVANT_NODEFILE='/home/ly/projects/Deckard/testdata/deckard3/AstRelevantNodes.txt'
LEAF_NODEFILE='/home/ly/projects/Deckard/testdata/deckard3/AstLeafNodes.txt'
PARENT_NODEFILE='/home/ly/projects/Deckard/testdata/deckard3/AstParentNodes.txt'
#####The above are for Deckard2 only #####

#where is Deckard?
DECKARD_DIR="/home/ly/projects/Deckard"
#clone parameters; refer to paper.
MIN_TOKENS='30 50' # can be a sequence of integers
STRIDE='2 0' # can be a sequence of integers
SIMILARITY='1.0 0.95' # can be a sequence of values <= 1
#DISTANCE='0 0.70711 1.58114 2.236'

###########################################################
#Where to store result files?

#where to output generated vectors?
VECTOR_DIR="vectors"
#where to output detected clone clusters?
CLUSTER_DIR="clusters"
#where to output timing/debugging info?
TIME_DIR="times"

##########################################################
#where are several programs we need?

#where is the vector generator?
VGEN_EXEC="$DECKARD_DIR/src"
case $FILE_PATTERN in
*.dot )
VGEN_EXEC="$VGEN_EXEC/dot2d/dotvgen" ;; # for Deckard2 dot only
*.java )
VGEN_EXEC="$VGEN_EXEC/main/jvecgen" ;;
*.php )
VGEN_EXEC="$VGEN_EXEC/main/phpvecgen" ;;
*.c | *.h )
VGEN_EXEC="$VGEN_EXEC/main/cvecgen" ;;

  • )
    echo "Error: invalid FILE_PATTERN: $FILE_PATTERN"
    VGEN_EXEC="$VGEN_EXEC/invalidvecgen" ;;
    esac
    #how to divide the vectors into groups?
    GROUPING_EXEC="$DECKARD_DIR/src/vgen/vgrouping/runvectorsort"
    #where is the lsh for vector clustering/querying?
    CLUSTER_EXEC="$DECKARD_DIR/src/lsh/bin/enumBuckets"
    QUERY_EXEC="$DECKARD_DIR/src/lsh/bin/queryBuckets"
    #how to post process clone groups?
    POSTPRO_EXEC="$DECKARD_DIR/scripts/clonedetect/post_process_groupfile"
    #how to transform source code html? Used by Deckard1 only
    SRC2HTM_EXEC=source-highlight
    SRC2HTM_OPTS=--line-number-ref

MAX_PROCS=8

GROUPING_S='30' # should be a single value
#GROUPING_D
#GROUPING_C

export DECKARD_DIR
export FILE_PATTERN
export SRC_DIR
export PDG_DIR
export AST_DIR

export TYPE_FILE
export RELEVANT_NODEFILE
export LEAF_NODEFILE
export PARENT_NODEFILE

export VECTOR_DIR
export TIME_DIR
export CLUSTER_DIR

export VGEN_EXEC
export GROUPING_EXEC
export CLUSTER_EXEC
export POSTPRO_EXEC
export SRC2HTM_EXEC
export SRC2HTM_OPTS

export MIN_TOKENS
export STRIDE
#export DISTANCE
export SIMILARITY
export GROUPING_S
export GROUPING_D
export GROUPING_C
export MAX_PROCS


But when I follow the next step to run,there will be a error.


`ly@ubuntu:~/projects/Deckard$ sh /home/ly/projects/Deckard/scripts/clonedetect/deckard.sh
DECKARD--A Tree-Based Code Clone Detection Toolkit.
/home/ly/projects/Deckard/scripts/clonedetect/deckard.sh: 4: /home/ly/projects/Deckard/scripts/clonedetect/deckard.sh: [[: not found

  • Version Unknown. Missing README.
    Copyright (c) 2007-2018. University of California / Singapore Management University
    Distributed under the three-clause BSD license.

==== Configuration checking.../home/ly/projects/Deckard/scripts/clonedetect/deckard.sh: 81: /home/ly/projects/Deckard/scripts/clonedetect/configure: [[: not found
Error: no config file in current directory


I don't know how to fix it.....
Can someone give me some advice,Thx

Upgrade to Python 3

Most of the Yacc parser (and maybe other portions) were written in Python 2. Since Python 2 was deprecated in 2020, we should update the codebase to use Python 3.

Building error

Hi , I got this error when running build.sh:
rm -f *.pyc make -C simple clean make[1]: Entering directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/simple' rm -f *.o lex.yy.cc pt_c.tab* pt_c.y head.cc c_ptgen make[1]: Leaving directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/simple' make -C gcc clean make[1]: Entering directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/gcc' rm -f *.o lex.yy.cc pt_c.tab* pt_c.y head.cc gccptgen.a make[1]: Leaving directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/gcc' make -C java clean make[1]: Entering directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/java' rm -f *.o lex.yy.cc pt_j.tab* pt_j.y head.cc javaptgen.a make[1]: Leaving directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/java' make -C php5 clean make[1]: Entering directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/php5' rm -f *.o lex.yy.cc pt_zend_language_parser.tab* pt_zend_language_parser.y head.cc phpptgen.a make[1]: Leaving directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/php5' make -C sol clean make[1]: Entering directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/sol' rm -f *.o lex.yy.cc pt_solidity.* head.cc solidityptgen.a make[1]: Leaving directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/sol' make -C gcc make[1]: Entering directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/gcc' ./mainc.py c.y Traceback (most recent call last): File "/home/ayf/Deckard-rel2.0solidity/src/ptgen/gcc/./mainc.py", line 43, in <module> import YaccParser,YaccLexer File "/home/ayf/Deckard-rel2.0solidity/src/ptgen/gcc/../YaccParser.py", line 8 False = 0 ^^^^^ SyntaxError: cannot assign to False make[1]: *** [Makefile:62: pt_c.y] Error 1 make[1]: Leaving directory '/home/ayf/Deckard-rel2.0solidity/src/ptgen/gcc' make: *** [Makefile:35: TARGET] Error 2 Error: ptgen make failed. Exit. Error: ptgen make failed. Deckard build fails.
it seemed that YaccParser.py assigned to False, which is not accepted in python.
Did I have the wrong environment or something went wrong ?

Building errors

Hi.
I want to build the Deckard but got error in Error: ptgen make failed. Exit.Error: ptgen make failed. Deckard build fails.
I have tried the solutions in other issues like install the newest version of packages, edit the file /src/ptgen/gcc/mainc.py to use python2 .
I also changed my OS to the Ubuntu 12.
But still get the errors below.
Can anyone help me? Thanks a lot!

syu@ubuntu:~/workspaces/Deckard/src/main$ sudo ./build.sh
rm -f *.pyc
make -C simple clean
make[1]: Entering directory /home/syu/workspaces/Deckard/src/ptgen/simple' rm -f *.o lex.yy.cc pt_c.tab* pt_c.y head.cc c_ptgen make[1]: Leaving directory /home/syu/workspaces/Deckard/src/ptgen/simple'
make -C gcc clean
make[1]: Entering directory /home/syu/workspaces/Deckard/src/ptgen/gcc' rm -f *.o lex.yy.cc pt_c.tab* pt_c.y head.cc gccptgen.a make[1]: Leaving directory /home/syu/workspaces/Deckard/src/ptgen/gcc'
make -C java clean
make[1]: Entering directory /home/syu/workspaces/Deckard/src/ptgen/java' rm -f *.o lex.yy.cc pt_j.tab* pt_j.y head.cc javaptgen.a make[1]: Leaving directory /home/syu/workspaces/Deckard/src/ptgen/java'
make -C php5 clean
make[1]: Entering directory /home/syu/workspaces/Deckard/src/ptgen/php5' rm -f *.o lex.yy.cc pt_zend_language_parser.tab* pt_zend_language_parser.y head.cc phpptgen.a make[1]: Leaving directory /home/syu/workspaces/Deckard/src/ptgen/php5'
make -C sol clean
make[1]: Entering directory /home/syu/workspaces/Deckard/src/ptgen/sol' rm -f *.o lex.yy.cc pt_solidity.* head.cc solidityptgen.a make[1]: Leaving directory /home/syu/workspaces/Deckard/src/ptgen/sol'
make -C gcc
make[1]: Entering directory /home/syu/workspaces/Deckard/src/ptgen/gcc' ./mainc.py c.y bison -d pt_c.y -o pt_c.tab.cc pt_c.y: conflicts: 11 shift/reduce flex -olex.yy.cc c.l g++ -O3 -I../../include -c -o lex.yy.o lex.yy.cc g++ -O3 -I../../include -c -o pt_c.tab.o pt_c.tab.cc pt_c.tab.cc: In function ‘int yyparse()’: pt_c.tab.cc:13685:35: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] pt_c.tab.cc:13827:35: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] g++ -O3 -I../../include -c -o head.o head.cc ar -csrv gccptgen.a lex.yy.o pt_c.tab.o head.o a - lex.yy.o a - pt_c.tab.o a - head.o make[1]: Leaving directory /home/syu/workspaces/Deckard/src/ptgen/gcc'
make -C java
make[1]: Entering directory /home/syu/workspaces/Deckard/src/ptgen/java' ./mainj.py j.y bison -d pt_j.y -o pt_j.tab.cc pt_j.y: conflicts: 24 shift/reduce, 259 reduce/reduce flex -olex.yy.cc j.l g++ -O3 -I../../include -c -o lex.yy.o lex.yy.cc g++ -O3 -I../../include -c -o pt_j.tab.o pt_j.tab.cc pt_j.tab.cc: In function ‘int yyparse()’: pt_j.tab.cc:17408:35: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] pt_j.tab.cc:17550:35: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] g++ -O3 -I../../include -c -o head.o head.cc ar -csrv javaptgen.a lex.yy.o pt_j.tab.o head.o a - lex.yy.o a - pt_j.tab.o a - head.o make[1]: Leaving directory /home/syu/workspaces/Deckard/src/ptgen/java'
make -C php5
make[1]: Entering directory /home/syu/workspaces/Deckard/src/ptgen/php5' ./mainphp.py zend_language_parser.y sed -i -e "s/'\"'/'\\\\\"'/" head.cc bison -d pt_zend_language_parser.y -o pt_zend_language_parser.tab.cc flex -i -olex.yy.cc zend_language_scanner.l g++ -O3 -I../../include -c -o lex.yy.o lex.yy.cc zend_language_scanner.l: In function ‘int yylex(YYSTYPE*)’: zend_language_scanner.l:906:67: warning: format ‘%s’ expects argument of type ‘char*’, but argument 3 has type ‘int’ [-Wformat] zend_language_scanner.l:906:67: warning: format ‘%d’ expects a matching ‘int’ argument [-Wformat] lex.yy.cc:4873:57: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc: In function ‘int yy_get_next_buffer()’: lex.yy.cc:4894:61: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc:4962:51: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc:4975:3: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc:4975:3: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc:5005:68: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc: In function ‘void yyunput(int, char*)’: lex.yy.cc:5102:54: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc: In function ‘yy_buffer_state* yy_create_buffer(FILE*, int)’: lex.yy.cc:5261:65: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc:5270:65: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc: In function ‘void yyensure_buffer_stack()’: lex.yy.cc:5427:71: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc:5447:71: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc: In function ‘yy_buffer_state* yy_scan_buffer(char*, yy_size_t)’: lex.yy.cc:5473:63: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc: In function ‘yy_buffer_state* yy_scan_bytes(const char*, int)’: lex.yy.cc:5522:62: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc:5531:51: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc: In function ‘void yy_push_state(int)’: lex.yy.cc:5557:68: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] lex.yy.cc: In function ‘void yy_pop_state()’: lex.yy.cc:5568:53: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] g++ -O3 -I../../include -c -o pt_zend_language_parser.tab.o pt_zend_language_parser.tab.cc pt_zend_language_parser.tab.cc: In function ‘int yyparse()’: pt_zend_language_parser.tab.cc:11522:35: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] pt_zend_language_parser.tab.cc:11664:35: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] g++ -O3 -I../../include -c -o head.o head.cc ar -csrv phpptgen.a lex.yy.o pt_zend_language_parser.tab.o head.o a - lex.yy.o a - pt_zend_language_parser.tab.o a - head.o make[1]: Leaving directory /home/syu/workspaces/Deckard/src/ptgen/php5'
make -C sol
make[1]: Entering directory /home/syu/workspaces/Deckard/src/ptgen/sol' ./mainsol.py solidity.y bison -d pt_solidity.y -o pt_solidity.tab.cc -v -g pt_solidity.y:255.1-11: invalid directive: %precedence'
pt_solidity.y:254.8-10: %type redeclaration for UFIXED
pt_solidity.y:231.62-67: previous declaration
pt_solidity.y:254.8-10: %type redeclaration for FIXED
pt_solidity.y:231.56-60: previous declaration
pt_solidity.y:254.8-10: %type redeclaration for BYTE
pt_solidity.y:231.51-54: previous declaration
pt_solidity.y:254.8-10: %type redeclaration for BYTES
pt_solidity.y:231.45-49: previous declaration
pt_solidity.y:254.8-10: %type redeclaration for UINT
pt_solidity.y:231.40-43: previous declaration
pt_solidity.y:254.8-10: %type redeclaration for INT
pt_solidity.y:231.36-38: previous declaration
pt_solidity.y:254.8-10: %type redeclaration for VAR
pt_solidity.y:231.32-34: previous declaration
pt_solidity.y:254.8-10: %type redeclaration for STRING
pt_solidity.y:231.25-30: previous declaration
pt_solidity.y:254.8-10: %type redeclaration for BOOL
pt_solidity.y:231.20-23: previous declaration
pt_solidity.y:254.8-10: %type redeclaration for ADDRESS
pt_solidity.y:231.12-18: previous declaration
pt_solidity.y:270.1-11: invalid directive: %precedence' pt_solidity.y:269.8-10: %type redeclaration for DELETE pt_solidity.y:233.39-44: previous declaration pt_solidity.y:269.8-10: %type redeclaration for AFTER pt_solidity.y:233.33-37: previous declaration pt_solidity.y:273.1-11: invalid directive: %precedence'
make[1]: *** [pt_solidity.tab.cc] Error 1
make[1]: Leaving directory `/home/syu/workspaces/Deckard/src/ptgen/sol'
make: *** [TARGET] Error 2
Error: ptgen make failed. Exit.
Error: ptgen make failed. Deckard build fails.

typefile and nodefiles

I noticed that the Deckard 2 config parameters for TYPE_FILE, RELEVANT_NODEFILE, LEAF_NODEFILE and PARENT_NODEFILE of the sample config point to the nonexistent directory Deckard/testdata.
I assume that they are pretty important, as the detection outputs a lot of garbage if they are not changed.

What is supposed to be in these files? I assume this is about the node types for the ASTs, but I cant figure out how to specify them.

I'm using Java and want to run Deckard on BigCloneEval. The clones should have method level granularity.
It is especially important that I can configure Deckard to prune irrelevant NODE types early, as I want to run a performance analysis and comparison, and it doesn't feel fair to run Deckard on a lot more ASTs than necessary.

how to use a slice ?

Hi,
I am trying to detect clones from a slice, how can I use Deckard to detect clones from a slice?

Thanks!

Bug Report: cluster: Possible errors occurred with LSH.

Hi,
I executed Deckard to detect clones on a dataset of 47k source files. However, after a day of execution I faced with the an error. following,, you can find the content of different log files.

cluster_vdb_50_4_g9_2.50998_30_100000

Clustering 'vectors/vdb_50_4_g9_2.50998_30_100000' 6.513064 ...
/home/local/SAIL/amir/tasks/RQ2/RQ2.2/Deckard/src/lsh/bin/enumBuckets -R 6.513064 -M 7600000000 -b 2 -A -f vectors/vdb_50_4_g9_2.50998_30_100000 -c -p vectors/vdb_50_4_g9_2.50998_30_100000.param > clusters/cluster_vdb_50_4_g9_2.50998_30_100000
Warning: output all clones. Takes more time...
Warning: will compute parameters
Error: the structure supports at most 2097151 points (3238525 were specified).

real 2m58.162s
user 2m50.464s
sys 0m7.492s
cluster: Possible errors occurred with LSH. Check log: times/cluster_vdb_50_4_g9_2.50998_30_100000

paramsetting_50_4_0.79_30

paramsetting: 50 4 0.79 ...Looking for optimal parameters by Clustering 'vectors/vdb_50_4_g9_2.50998_30_100000' 6.513064 ...
/home/local/SAIL/amir/tasks/RQ2/RQ2.2/Deckard/src/lsh/bin/enumBuckets -R 6.513064 -M 7600000000 -b 2 -A -f vectors/vdb_50_4_g9_2.50998_30_100000 -c -p vectors/vdb_50_4_g9_2.50998_30_100000.param > clusters/cluster_vdb_50_4_g9_2.50998_30_100000
cluster: Possible errors occurred with LSH. Check log: times/cluster_vdb_50_4_g9_2.50998_30_100000
Error: paramsetting failure...exit.

grouping_50_4_2.50998_30

grouping: vectors/vdb_50_4 with distance=2.50998...Total 7602630 vectors read in; 11282415 vectors dispatched into 57 ranges (actual groups may be many fewer).

real 410m12.610s
user 6m43.592s
sys 26m6.544s
Done grouping 50 4 2.50998. See groups in vectors/vdb_50_4_g[0-9]_2.50998_30

Note that I have sufficient memory for execution; Thus, I added two other conditions for the memory limit setting in both vecquery and vertical-param-batch files. The reason I increased the memory limit is that my vectors size is greater than 2G and I have no problem with the availability of enough memory. Now the conditions are like this:

# dumb (not flexible) memory limit setting
mem=`wc "$vdb" | awk '{printf("%.0f", $3/1024/1024+0.5)}'`
if [ $mem -lt 2 ]; then
	mem=10000000
elif [ $mem -lt 5 ]; then
	mem=20000000
elif [ $mem -lt 10 ]; then
	mem=30000000
elif [ $mem -lt 20 ]; then
	mem=60000000
elif [ $mem -lt 50 ]; then
	mem=150000000
elif [ $mem -lt 100 ]; then
	mem=300000000
elif [ $mem -lt 200 ]; then
	mem=600000000
elif [ $mem -lt 500 ]; then
	mem=900000000
elif [ $mem -lt 1024 ]; then
	mem=1900000000
elif [ $mem -lt 2048 ]; then
	mem=3800000000
elif [ $mem -lt 4096 ]; then  # this condition is added by me
	mem=7600000000
elif [ $mem -lt 8192 ]; then  # this condition is added by me
	mem=15200000000
else
	echo "Error: Size of $vdb > 8G. I don't want to do it before you think of any optimization." | tee -a "$TIME_DIR/cluster_${vfile}"
	exit 1;
fi

The parameters of deckard is set to the following values:

  • MIN_TOKENS='50'
  • STRIDE='4'
  • SIMILARITY='0.79'
  • MAX_PROCS = 40

I attached the log files. please help me to mitigate this problem, I need your tool for my experiments.
deckard log.zip

Clone detection on sample fails(?)

After I put my directory into the config in the sample directory, I can run the clone detection but I get the following output:

= Vector clustering w/ MIN_TOKENS=30, STRIDE=2, SIMILARITY=0.95 ...

grouping: vectors/vdb_30_2 with distance=5,477226...Done grouping 30 2 5,477226. See >groups in vectors/vdb_30_2_g[0-9]_5,477226_30
paramsetting: 30 2 0.95 ...Error: paramsetting failure: no vector group found: 30 2 0.95
Error: problem in vec clustering step. Stop and check logs in times/

So I'm not sure I can trust what is output in clusters/post_cluster...

What is wrong?

Thanks,
Stefan

Make error

Error starts at line...

"make[1]: execvp: ./mainc.py: Permission denied"

and then ends at...

"make: *** No rule to make target ../ptgen/gcc/gccptgen.a', needed bycvecgen'. Stop."

post_cluster file is 0 bytes

I have run Deckard on the code of about 30 java projects. The resulting cluster_vdb_50_0_allg_0.95_30 is not empty but the corresponding post_cluster_vdb_50_0_allg_0.95_30 file is empty. Why does this happen? Is it because there are too much suspicious clones in cluster file and then in the post-process all the clones are excluded leading to empty post_cluster file?
Screenshot from 2020-03-15 16-06-52

build fails

Hi, I'm getting:

a - token-counter.o
a - sq-tree.o
a - node-vec-gen.o
a - vector-output.o
a - vector-merger.o
a - tree-accessor.o
a - token-tree-map.o
a - clone-context-php.o
rm -f vectorsort dispatchvectors computeranges *~ *.o
gcc -O3  -O3  vectorsort.c  -lm -o vectorsort
gcc -O3  -O3  dispatchvectors.c  -lm -o dispatchvectors
gcc -O3  -O3  computeranges.c  -lm -o computeranges
rm -f *.o cvecgen jvecgen cbugfilters jbugfilters out2html phpvecgen phpbugfilters out2xml cParseTreeMain jParseTreeMain phpParseTreeMain
g++  -o ptreeC.o -O3 -I../include -I../vgen/treeTra -c -DCLANG ptree.cc
make: *** No rule to make target '../ptgen/gcc/gccptgen.a', needed by 'cvecgen'.  Stop.
Error: main make failed. Exit.
./build.sh  7.49s user 0.35s system 85% cpu 9.207 total

by just executing the build.sh in src/main

Crash on "return A?B:C"

When I use the command: "cvecgen -i ../../src/dircolors.c -o tmp.vec --start-line-number 508 --end-line-number 508"
The output is "cvecgen: tree-accessor.C:81: static TreeVector* TreeAccessor::get_node_vector(Tree*): Assertion `attr_itr!=t->attributes.end()' failed."

Please refer to the attachment for the file dircolors.c.

dircolors.c.zip

Vec generator failure

I've got a problem when doing clone detecting with my C codes. The feed back is like this
"Error: problem in vec generator step. Stop and check logs in times/"
Could you tell me what might be the problem? Thanks a lot.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.