Giter Site home page Giter Site logo

spoonlabs / coming Goto Github PK

View Code? Open in Web Editor NEW
92.0 5.0 33.0 118.47 MB

A tool for mining commits from Git repositories and diffs to automatically extract code change pattern instances and features with ast analysis

Home Page: https://hal.inria.fr/hal-00861883/file/paper-short.pdf

License: MIT License

Java 100.00% Shell 0.01% Python 0.01%
ast-analysis mining-software-repositories kth inria

coming's People

Contributors

andre15silva avatar henry-lp avatar khaes-kth avatar martinezmatias avatar monperrus avatar niloofartrg avatar sedflix avatar sin-mim avatar tdurieux avatar yogyagamage avatar zxch3n avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

coming's Issues

NPEfix patch

These two snippets are from the NPEfix patch but I don't think that they could be generated by NPEfix repair tool regarding NPEfix's method. Am I right? @monperrus

Screen Shot 2019-09-04 at 15 59 23

Screen Shot 2019-09-04 at 15 59 05

Designing test cases for repairability

Hi @martinezmatias ,

I have a doubt about designing test cases. I think I've two options right now:

  • Use @BeforeAll to process all the input files at once. After that, we just need to analyze the results in each of the test cases. Hence, the heavy processing is done only once.

  • The other is the usual choice of dividing files into different sets of test cases. This may result in a repetition of the same file as well as the processing of the same file more than once.

It's a tradeoff between testing time and modularity of test cases.

WDYT?

srt and dst seems to be interchanged in some pattern

The Diff:

--- /src/main/java///org/apache/commons/math3/optimization/linear/SimplexSolver.java
+++ /src/main/java///org/apache/commons/math3/optimization/linear/SimplexSolver.java
@@ -120,7 +120,7 @@
                     for (int i = 0; i < tableau.getNumArtificialVariables(); i++) {
                         int column = i + tableau.getArtificialVariableOffset();
                         final double entry = tableau.getEntry(row, column);
-                        if (Precision.equals(entry, 1d, maxUlps) && row.equals(tableau.getBasicRow(column))) {
+                        if (Precision.equals(entry, 1d, maxUlps) && !row.equals(tableau.getBasicRow(column))) {
                             return row;
                         }
                     }

The pattern:

<pattern name="unary">
    <entity id="2" type="UnaryOperator"/>
    <action entityId="2" type="ANY"/>
</pattern>

The commad:

 coming -location ... -mode mineinstance -pattern unary.xml   -input files -output ./unary_mine

The output:

{
      "revision": "patch5-Math-28-jMutRepair",
      "pattern_name": "unary",
      "instance_detail": [
        {
          "pattern_action": "ANY",
          "pattern_entity": {
            "entity_type": "UnaryOperator",
            "entity_new value": "*",
            "entity_role": "*",
            "entity_parent": "null"
          },
          "concrete_change": {
            "operator": "INS",
            "src_type": "UnaryOperator",
            "dst_type": "null",
            "src": "(!(row.equals(tableau.getBasicRow(column))))",
            "dst": "null",
            "src_parent_type": "BinaryOperator",
            "dst_parent_type": "null",
            "src_parent": "(org.apache.commons.math3.util.Precision.equals(entry, 1.0, maxUlps)) \u0026\u0026 (!(row.equals(tableau.getBasicRow(column))))",
            "dst_parent": "null"
          },
          "file": "/test",
          "line": 123
        }
      ]
    }

In the above example the json output has source specified as (!(row.equals(tableau.getBasicRow(column)))) while that is actually the destination.
I observed this inversion present in most of the instances mined for the given pattern.

Any idea why this might be happening? @martinezmatias

Coming can’t be used on a larger git repo

Description

When I attempted to input a git repository with 2000+ commits, I didn't get the same result as the test data set. Similarly, when input another git repository with 2000+commits, the results were the same.

Input command

java -classpath ./target/coming-0.1-SNAPSHOT-jar-with-dependencies.jar fr.inria.coming.main.ComingMain -location ./*/ -mode mineinstance -pattern ./pattern_INS_IF_MOVE_ASSIG.xml -output ./out/*

CMD output

image

visualization of coming output

For sake of dissemination, it would be nice that coming has a 'default mode', taking a repo as input.

In this default mode, the output could be a nice graphical visualization à la Gource.

add support for python?

Hello @martinezmatias and thank you for your great library.
I am doing research on python language and I want to know if you added python support for coming.
I didn't see the language support in the README.
I would also be glad if I can contribute in adding Python support.

let me know if I can have your email address to contact you about this issue :)

Min Max Distance

As discussed with Samuel at KTH meetup, we should have the possibility to specify in the pattern specification the min distance between parent and children. The current distance corresponds to max

-input (git|files): using more expressive option

I think that something on the line of revision_format or input_format would be better. Just input seems to be a bit confusing(as it can be confused with location if a person hasn't read the documentation).

New Change Pattern Specification

Can we make some sort of domain-specific language for specifying the changing pattern?

We can use some sort of modifiers to specify the actions. For example:

  • //: Move
  • +/-: Update
  • -: Remove
  • default or +: Add

Example specification 1

for(*) {   
	//a++; 
}

This specifies a change in which a for was added(irrespective of the conditions) and a++ was moved.

Example specification 1

+/-for(*) {
	// a++
}

This specifies a change in which a for was modified.

The main advantage of such an approach would be UX. I feel that this will make the process of specifying change pattern easier and hence improving the usability.

Need more information in README

Thanks! Coming is very helpful for my research. Do you mind adding a few more descriptions about this tool?

  • Supported languages
  • Which branch it mines in default

Max number of commits to analyze

By default, Coming analyzes the last max_nb_commit_analyze commits (100 according to the configuration file).
The potential problem is that, if a user uses the command line with the min number of required parameters (location, inputs, a few more), he did not realize that the analysis is done over the partial portion of the input (e.g., git repo).
Should we put this feature as optional? Should be increase the value of max_nb_commit_analyze?

consider more concise log output for test files

when i am trying to upgrade GTSpoon to newer version (to solve issues on some datasets), one CI error caused by exceeding limit of log size was triggered.
Also, I suggest to move resources files only used by test files to .../src/test/resources

S4R features

@ycaxgjd has done a first migration from the the Zhingxing's Coming folk to Coming.
However, it think we have still some differences i.e., Comings pushed to the mentioned folk after the migration.

Source file not found (FineGrainAnalyzer)

I modified this patch of code in fineGrainAnalyzer to find the difference between the source and target file. however, the source file address read by revision.getChildren(); is not available. Do you have any idea why this happens? @martinezmatias

an example of source file address printed out is src/test/java/org/apache/commons/lang3/builder/ToStringBuilderTest.java




public AnalysisResult<IRevision> analyze(IRevision revision) {

		List<IRevisionPair> javaFiles = revision.getChildren();

		Map<String, Diff> diffOfFiles = new HashMap<>();

		log.info("\n*****\nCommit: " + revision.getName());


		for (IRevisionPair<String> fileFromRevision : javaFiles) {

			String left = fileFromRevision.getPreviousVersion();
			String right = fileFromRevision.getNextVersion();

			String leftName = fileFromRevision.getPreviousName();
			String rightName = fileFromRevision.getName();
            System.out.println("In fine grain analyzer...............................................................");
            System.out.println(leftName);
            System.out.println(rightName);

            //build simple lists of the lines of the two testfiles
            List<String> original = null;
            try {
                original = Files.readAllLines(new File(leftName).toPath());
            } catch (IOException e) {
                e.printStackTrace();
            }
            List<String> revised = null;
            try {
                revised = Files.readAllLines(new File(rightName).toPath());
            } catch (IOException e) {
                e.printStackTrace();
            }

            Diff diff = compare(left, right, leftName, rightName);
			if (diff != null) {
				diffOfFiles.put(fileFromRevision.getName(), diff);
                //compute the patch: this is the diffutils part
                Patch<String> patch = null;
                try {
                    patch = DiffUtils.diff(original, revised);
                } catch (DiffException e) {
                    e.printStackTrace();
                }

                System.out.println("patch .............");
                System.out.println(patch);
			}
		}

		return new DiffResult<IRevision, Diff>(revision, diffOfFiles);
	}



New input

New kind of input: to have two arguments corresponding to two files. Then, coming executes the analyzers over the pair.

UNCHANGED action

Concerned patch:

--- /src/com/google/javascript/rhino/testing/Asserts.java
+++ /src/com/google/javascript/rhino/testing/Asserts.java
@@ -102,3 +102,3 @@
         (a == null) == (b == null));
-    if (a == null) {
+    if (message!=null) {
       return;

The pattern is not supposed to match the given change because a has been changed to message.

Pattern Specification:

<pattern name="binary">

    <entity id="1" type="BinaryOperator"/>

    <entity id="2" type="*" role="LEFT_OPERAND"> 
        <parent parentId="1" distance="1"/>
    </entity>

    <entity id="3" type="*" role="RIGHT_OPERAND"> 
        <parent parentId="1" distance="1"/>
    </entity>

    <action entityId="1" type="UPD"/>
    <action entityId="2" type="UNCHANGED_HIGH_PRIORITY"/>
    <action entityId="3" type="UNCHANGED_HIGH_PRIORITY"/>
</pattern>

Instance matched:

 "instance_detail": [                                                                                                                                                                   
            {                                                                                                                                                                                    
              "pattern_action": "UPD",                                                                                                                                                           
              "pattern_entity": {                                                                                                                                                                
                "entity_type": "BinaryOperator",                                                                                                                                                 
                "entity_new value": "*",                                                                                                                                                         
                "entity_role": "*",                                                                                                                                                              
                "entity_parent": "null"                                                                                                                                                          
              },                                                                                                                                                                                 
              "concrete_change": {                                                                                                                                                               
                "operator": "UPD",                                                                                                                                                               
                "src_type": "BinaryOperator",                                                                                                                                                    
                "dst_type": "BinaryOperator",                                                                                                                                                    
                "src": "a == null",                                                                                                                                                              
                "dst": "message != null",                                                                                                                                                        
                "src_parent_type": "If",                                                                                                                                                         
                "dst_parent_type": "If",                                                                                                                                                         
                "src_parent": "if (a == null) {\n    return;\n}",
                "dst_parent": "if (message != null) {\n    return;\n}"
              },
              "file": "/home/ubuntu/coming/patch1-Closure-7-Nopol2017",
              "line": 103
            }
          ]

The concerned files can be found here (private repo)

How to write the pattern specification to match the change of IF ccondition

I want to match the change of IF condition, i.e.

- if (a){
+ if (b){
    System.out.println(a);
}

I have tried the specification like this one below:

<?xml version="1.0"?>
<pattern>
<entity id="1" type = "If"/>
<entity id="2" type = "*">
	<parent parentId="1" distance="10" />
</entity>
<action entityId ="2" type = "*" />
</pattern>

But it will also capture the change in the if block. For example, it will also match the change like this one.

if (a){
-    System.out.println(a);
+    System.out.println(a);
}

How should I write the Pattern Specification to only match the changes in the condition part?

[Proposal] Include checks for OperatorKind in Specification

This is a proposal to add a check for properties like BinaryOperatorKind and UnaryOperatorKind in the Change Pattern Specification. For example: If entity="BinaryOperator" and kind="Plus", we only consider revisions in which Operator is of kind plus.

And will it be beneficial is add another abstraction over OperatorKinds like LogicalOperator etc?

gumtree does not provide an output

I have written this test case for NPEfix and I can't figure out why gumtree does not find any difference between these two files? @martinezmatias

`

public static void main (String[] args) {

    hello();

    void hello(){
        String ptr = null;

        if (ptr.equals("gfg"))
            System.out.print("Same");
        else
            System.out.print("Not Same");
    }

}

`

`

public static void main (String[] args) {

    hello();

    void hello(){
        String ptr = null;

        return ;
        if (ptr.equals("gfg"))
            System.out.print("Same");
        else
            System.out.print("Not Same");
    }

}

`

DetectorChangePatternInstanceEngine does not distinguish different entities

In the current implementation, I notice that the following pattern specification will also match the change with only one TypeReference entity insert.

<?xml version="1.0"?>
<pattern>
    <entity id="1" type="TypeReference"/>
    <entity id="2" type="TypeReference"/>
    <action entityId="1" type="INS"/>
    <action entityId="2" type="INS"/>
</pattern>

This specification would match

+ int c;

In other words, the entities with different ids may match the same operation in the AST.

Is this by design? I would prefer that the engine distinguishes entities with different IDs, which is more flexible.

I think the implementation may be here.

Repairability: JGenProg

Hi @martinezmatias and @monperrus ,
This issue is regarding the question of how should we handle the patches for JGenProg.

Here most of the patches are:

  1. Insert a new block/statement
  2. Delete a new block/statement
  3. Insert some statement/block as well as delete the statement/block

These are very weak conditions by itself and all the commits lie in one of these categories.

And since the paper applies diff minimization, from my perspective, it would be hard(with only the current features of coming) to figure out whether the new statement inserted already existed in the code or not.

Q1. ) Does our implementation(the patches in DRR) use diff minimization?

Q2. ) Do we want to include other characteristics into coming? For example, running test cases and seeing the paths followed in the buggy version or other analysis based on test cases results. Or maybe accepting only git repo as an input for certain kind of repair-tool analysis so that we can search through code(like jGenProg). Or about including diff minimization dependencies?

Q3. ) How strong indicator our classification is supposed to be of that patch being generation by the concerned repair tool?

#66

High Memory Usage

I was trying to run the module on commons-lang(5468 commits) and I got the following error:

Analyzing 1022/5468
2019-07-05 11:41:06,797 INFO fr.inria.coming.changeminer.analyzer.commitAnalyzer.FineGrainDifftAnalyzer -
*****
Commit: 5111ae7db08a70323a51a21df0bbaf46f21e072e
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
.... 
<this stack trace can be different>

Adding support for SimFix

Hello @monperrus @martinezmatias,
I have been trying to add support for SimFix repair tool but I can't understand something.
According to the below figure from the paper, to understand if a patch can be generated by SimFix I wanted to check that the change between my source and target file was in the frequent modifications set(S1) and the S2 set.

Screen Shot 2019-08-29 at 16 03 32

My problem is that I think that all changes between source and target files are in S2 by nature(!) because in order to produce S2 We have selected a faulty snippet of code from this same source file (which is now the line of code in my source file where there is a change with respect to the target file) and considered it's modification with all possible snippets from the target file and at some point we will generate the same modification that we are indeed looking for in S2.

tests logs contain confusing exceptions

in test logs, there are exceptions such as

--java.lang.ClassNotFoundException: vvvvv
java.lang.Exception: Error Loading Engine: java.lang.ClassNotFoundException: vvvvv
	at fr.inria.coming.core.extensionpoints.PlugInLoader.loadPlugin(PlugInLoader.java:23)
	at fr.inria.coming.main.ComingMain.loadAnalyzersFromCommand(ComingMain.java:313)
	at fr.inria.coming.main.ComingMain.loadModelAnalyzers(ComingMain.java:301)
	at fr.inria.coming.main.ComingMain.createEngine(ComingMain.java:218)
	at fr.inria.coming.main.ComingMain.run(ComingMain.java:137)
	at fr.inria.coming.spoon.patterns.InstanceMiningTest.testMainModeArg(InstanceMiningTest.java:101)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)

However, they do not indicate a test failure and as such are quite confusing.

We should not have any exception in the console test logs (related to #test)

typo in package name

fr.inria.coming.repairability.repiartools

should be

fr.inria.coming.repairability.repairtools

a refactoring is required

Using File.pathSeparator as substitution for colon will cause problem on Windows system

As you can see from here, on Win File.pathSeparator is ; rather than : which causes four failures when I ran testing on my Win machine.

In those errors, File.pathSeparator simply serves as the character : rather than pathSeparator that diffs on different systems. For example, README says you need the following command to combine filters,

 -filter numberhunks:maxfiles  -parameters max_nb_hunks2:max_files_per_commit:1

where : is used to separate different parameters and filter.

I will make a PR soon to fix problems like this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.