cqfn / veniq Goto Github PK

View Code? Open in Web Editor NEW

20.0 6.0 3.0 789 KB

Veniq uses Machine Learning to analyze source code, find possible refactorings, and suggest those that seem optimal

License: MIT License

Python 58.80% Makefile 0.06% Java 41.14%

static-analysis machine-learning refactorings

veniq's People

Contributors

Stargazers

Watchers

Forkers

lyriccoder lencof aiovlev

veniq's Issues

Synthetic Dataset. Insertion doesn't work for inline

Insertion doesn't work for inline for many cases:
problems.zip

ASTNode test fails

======================================================================
FAIL: test_class_computed_fields (test.ast_framework.test_ast_node.ASTNodeTestSuite)

Traceback (most recent call last):
File "D:\git\veniq\test\ast_framework\test_ast_node.py", line 22, in test_class_computed_fields
self.assertEqual(java_class.documentation, "/\n* Some documentation\n*/")
AssertionError: '/\r\n* Some documentation\r\n*/' != '/**\n* Some documentation\n*/'

/**
? -

- Some documentation
  ? -

- Some documentation
  */

Veniq Dataset Collection. Function is not inserted correctly inside synchronized block

Suppose, we have the following invocation:

b'synchronized(getParent()) {'

We need to inline getParent. It doesn't work properly.

You can find the input on the server.

data/full_dataset/output_files/PropertiesStructure_addItem_242.java

Need to add mypy to CI

SEMI recommendation api: add method to check validity of input method declaration and create error codes

Why

The current implementation #121 takes as input a list of strings, assuming it is a valid method declaration.

What we want

Add a subprocedure in ``recommend_for_method'' checking it is a valid method declaration. If it is not valid, the API should return an error code.

Proposed solution

Validity checks:
- check it is not a class declaration
- check for bracket balance
- part of the errors can be caught by try-catch wrapper around procedure creating AST tree from string.
Error codes: TODO

@acheshkov @lyriccoder

Lines on insertions are negative

In the file attached below we can see that the start line of insertions is larger than the end line of insertion

31411,
/dataset/01/eclipse/openj9/sourcetools/j9constantpool/com/ibm/oti/VMCPTool/ConstantPoolStream.java,data/full_dataset/input_files/ConstantPoolStream_871c8b89fa567f7d368997faeb856a6a542d89631cb8f19b7426b04d7b773efb.java,
ConstantPoolStream,b'writeFooter();',
close,
209,
writeFooter,
212,
213,
data/full_dataset/output_files/ConstantPoolStream_871c8b89fa567f7d368997faeb856a6a542d89631cb8f19b7426b04d7b773efb_close_211.java,
True,
211,
210

ConstantPoolStream_871c8b89fa567f7d368997faeb856a6a542d89631cb8f19b7426b04d7b773efb_close_211.txt

Better README.

I think the project should have a better README to understand how to use the project.

ASTframework. Parsing problem

Hello. I faced the problem during parsing this class:

public abstract class DdlChange implements MigrationStep {

  private final Database db;

  public DdlChange(Database db) {
    this.db = db;
  }


  public final void execute() throws SQLException {
    try (Connection writeConnection = createDdlConnection()) {
      Context context = new ContextImpl(writeConnection);
      execute(context);
    }
  }

  private Connection createDdlConnection() throws SQLException {
    Connection writeConnection = db.getDataSource().getConnection();
    writeConnection.setAutoCommit(false);
    return writeConnection;
  }
}

Problem is during printing method invocation createDdlConnection()

node index: 45
arguments: []
member: createDdlConnection
node_type: Method invocation
postfix_operators: []
prefix_operators: []
qualifier: None
selectors: []
type_arguments: None
0 ['    try (Connection writeConnection ', ' createDdlConnection()) {\n']
Error has happened during file analyze: 'NoneType' object is not subscriptable

Block-statement graph builder

Create graph only for plain statements and statements which produce a single block

Make ASTNode return its subtree.

Currently to get subtree one need to use get_subtree method of AST, not of ASTNode. Moving this method to ASTNode would make it more convinient. To achive it we need to apply factory/DI trick to break circular import.

Block-Statements graph

For syntactic filtering in baseline we need to find boundaries of each block. It can be done with help of bipartite graph of blocks and statements. This graph could be also used for traversing AST while extracting semantic from statements.

Semantic filter

Detect which variables as in put and as output extraction opportunity needs

Veniq. Semi. Bug when trying to run semi algorithm

I had the error:

Node Statement is not supported.

File is attached below
HandleLayer.zip

    return _block_extractors[statement.node_type](statement)
KeyError: <ASTNodeType.STATEMENT: 58>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\git\veniq\veniq\dataset_collection\validation.py", line 144, in validate_row
    opport = find_extraction_opportunities(ast_subtree)
  File "D:\git\veniq\veniq\dataset_collection\validation.py", line 28, in find_extraction_opportunities
    statements_semantic = extract_method_statements_semantic(method_ast)
  File "D:\git\veniq\veniq\baselines\semi\extract_semantic.py", line 12, in extract_method_statements_semantic
    block_statement_graph = build_block_statement_graph(method_ast)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 15, in build_block_statement_graph
    root_index = _build_graph_from_statement(method_ast.get_root(), graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 26, in _build_graph_from_statement
    new_block_index = _build_graph_from_block(block, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 40, in _build_graph_from_block
    new_statement_index = _build_graph_from_statement(statement, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 26, in _build_graph_from_statement
    new_block_index = _build_graph_from_block(block, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 40, in _build_graph_from_block
    new_statement_index = _build_graph_from_statement(statement, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 26, in _build_graph_from_statement
    new_block_index = _build_graph_from_block(block, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 40, in _build_graph_from_block
    new_statement_index = _build_graph_from_statement(statement, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 26, in _build_graph_from_statement
    new_block_index = _build_graph_from_block(block, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 40, in _build_graph_from_block
    new_statement_index = _build_graph_from_statement(statement, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 26, in _build_graph_from_statement
    new_block_index = _build_graph_from_block(block, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 40, in _build_graph_from_block
    new_statement_index = _build_graph_from_statement(statement, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 26, in _build_graph_from_statement
    new_block_index = _build_graph_from_block(block, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 40, in _build_graph_from_block
    new_statement_index = _build_graph_from_statement(statement, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 26, in _build_graph_from_statement
    new_block_index = _build_graph_from_block(block, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 40, in _build_graph_from_block
    new_statement_index = _build_graph_from_statement(statement, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 26, in _build_graph_from_statement
    new_block_index = _build_graph_from_block(block, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 40, in _build_graph_from_block
    new_statement_index = _build_graph_from_statement(statement, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 24, in _build_graph_from_statement
    blocks = extract_blocks_from_statement(statement)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\_block_extractors.py", line 17, in extract_blocks_from_statement
    raise NotImplementedError(f"Node {statement.node_type} is not supported.")
NotImplementedError: Node Statement is not supported.

Refactor baseline method extraction to use block statement graph

This refactoring is needed for adding ability to insert fake nodes on block leaving

Veniq. Dataset collection. Function is not inserted correctly

TomEEWarListener_e9b19cb814c4fbfa77d6bc2ca982b2ef30f1f06314f4_output.txt

        File jar = TomcatFactory.getTomEEWebAppJar(tp.getCatalinaHome(), tp.getCatalinaBase());
        if (this.currentTomEEJar != jar && (this.currentTomEEJar == null || !this.currentTomEEJar.equals(jar))) {
            currentTomEEJar = jar;
            TomcatManager.TomEEVersion version = TomcatFactory.getTomEEVersion(jar);
            TomcatManager.TomEEType type = version == null ? null : TomcatFactory.getTomEEType(jar.getParentFile());
            refresh.refresh(version, type);
        }

        if (this.currentTomEEJar != jar && (this.currentTomEEJar == null || !this.currentTomEEJar.equals(jar))) {
            currentTomEEJar = jar;
            TomcatManager.TomEEVersion version = TomcatFactory.getTomEEVersion(jar);
            TomcatManager.TomEEType type = version == null ? null : TomcatFactory.getTomEEType(jar.getParentFile());
            refresh.refresh(version, type);
        }

Add gouping and ranking to baaseline pipeline

Lines of insertion are not valid for inner classes

If we have Inner class and it's function name matches the main class name, the line numbers are incorrect

PainlessParser.txt

  '/dataset/01/elastic/elasticsearch/modules/lang-painless/src/main/java/org/elasticsearch/painless/antlr/PainlessParser.java'),
 ('input_filename',
  'data/full_dataset/input_files/PainlessParser_bd33bb08aa9a728098ee1dfb7f9677f9db75a6b3dfa38a21732fa1300b8a0ac4.java'),
 ('class_name', 'PainlessParser'),
 ('invocation_text_string', 'trailer()'),
 ('method_where_invocation_occurred', 'rstatement'),
 ('start_line_of_function_where_invocation_occurred', '317.0'),
 ('invocation_method_name', 'trailer'),
 ('invocation_method_start_line', '422.0'),
 ('invocation_method_end_line', '424.0'),
 ('output_filename',
  'data/full_dataset/output_files/PainlessParser_bd33bb08aa9a728098ee1dfb7f9677f9db75a6b3dfa38a21732fa1300b8a0ac4_rstatement_566.java'),
 ('can_be_parsed', 'True'),
 ('inline_insertion_line_start', '566'),
 ('inline_insertion_line_end', '625'),
 ('project_name', 'elastic/elasticsearch')]```

Veniq. Dataset collection. Function is not inserted correctly

There is a function

    private void refillUserDataConstraint() {
        setUserDataConstraint(null);
        UserDataConstraint userDataConstraint = getUserDataConstraint();
        userDataConstraint.setDescription(userDataConstraintDescTF.getText());
        userDataConstraint.setTransportGuarantee((String) transportGuaranteeCB.getSelectedItem());
    }

The last statement is missed in the new file:

    public void setValue(javax.swing.JComponent source, Object value) {
        if (source == displayNameTF) {
            String text = (String)value;
            constraint.setDisplayName(text);
            SectionPanel enclosingPanel = getSectionView().findSectionPanel(constraint);
            enclosingPanel.setTitle(text);
            enclosingPanel.getNode().setDisplayName(text);
        } else if (source == authConstraintCB) {
            if (authConstraintCB.isSelected()) {
                refillAuthConstraint();
            } else {
                setAuthConstraint(null);
            }
        } else if (source == roleNamesTF) {
            refillAuthConstraint();
        } else if (source == authConstraintDescTF) {
            refillAuthConstraint();
        } else if (source == userDataConstraintCB) {
            if (userDataConstraintCB.isSelected()) {
                setUserDataConstraint(null);
                UserDataConstraint userDataConstraint = getUserDataConstraint();
                userDataConstraint.setDescription(userDataConstraintDescTF.getText());
            } else {
                setUserDataConstraint(null);
            }
        } else if (source == transportGuaranteeCB) {
            refillUserDataConstraint();
        } else if (source == userDataConstraintDescTF) {
            refillUserDataConstraint();
        }
    }

SecurityConstraintPanel_setValue_192.java.txt

Veniq Dataset Collection. Function is not inserted correctly inside set of strings

Suppose, we have the following invocation:

b'new String[] { getWinUtilsPath(), "readlink", link }'

We need to inline getWinUtilsPath. It doesn't work properly.

You can find the input on the server.

data/full_dataset/output_files/Shell_getReadlinkCommand_310.java

Add ability to create fake nodes.

We need fake nodes for extract semantic step, to add them after each block has ended.

Inline problems

We have such problem in our currently inline script:

Cannot inline those methods or invokation, which is defined in one row with some other statements:
(problem to insert body of hasMore() invokation)

 public Vector getWordVector(char[] separators)
   {  Vector list=new Vector();
      do { list.addElement(getWord(separators)); } while (hasMore());
      return list;
   }

Problem to extract body of this method: one-line several statements in a row
public String getRemainingString() { return str.substring(index); }
Problem with output new inserted blocks

            if (ic == null) {
                // it must be unrecognized setting
        name = NbBundle.getMessage(InstanceNode.class,

When it was:

            if (ic == null) {
                // it must be unrecognized setting
                return NbBundle.getMessage(InstanceNode.class,
                    "LBL_BrokenSettings"); //NOI18N

Problem to detect the last line of methods at the end of file

    private void resetFactories() {
        if (ppFactories == null) {
            ppFactories = ProfilingPointsManager.getDefault().getProfilingPointFactories();
            getChooserFrame().initPanel(ppFactories);
        }

        currentFactory = 0;
    }
}

Problem to Inline such method invocaitons
if (!isProcessExternalEntities()) {
Still, there're a lot of problems is insertion with brackets.
FIX writing into csv

  File ".\veniq\dataset_collection\augmentation.py", line 427, in <module>
    writer.writerow(i)
  File "C:\Users\Vitaly\Anaconda3\lib\encodings\cp1251.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 158-161: character maps to <undefined>

(In ivocation line may contains chines symbols in comments)

Problem to inline methods with massive comments inside at the end of method declaration:

/*
        } else {
              pass
        }
*/

Fix baseline.

Currently baseline filter out extraction opportunity from 17 to 26. It happens because extraction semantic step output else if statements, while they are not automatically traversed in syntactic filter. To solve this, one may need to fix block-statement graph creation, so each else if statement would be represented by its own Statement node. This means each If statement ow may how only one THEN_BRANCH block.

Baseline CLI common part

AST framework. Enums and interfaces doesn't have `methods` fields

Enums and interfaces doesn't have methods fields.
But they have have methods. Necessary to implement it.

SEMI algorithm chooses 1 line to extract

SEMI algorithm chooses 1 line to extract.
Seems we consider extracting more than 3 lines, don't we?

 DocumentElement getDocumentElement(int startOffset, int endOffset) throws BadLocationException {
        readLock();
        checkDocumentDirty();
        try {
            for(int i = 0; i < elements.size(); i++) {
                DocumentElement de = elements.get(i);
                if(de.getStartOffset() == startOffset &&
                        de.getEndOffset() == endOffset)
                    return de;
                if(de.getStartOffset() > startOffset) break;
            }
            return null;
        }finally{
            readUnlock();
        }
    }
    List<DocumentElement> getDocumentElements(int startOffset) throws BadLocationException {
        readLock();
        if(documentDirty) {
            writeLock(); // This line is suggested to extract
            try {
                doc.readLock();
                try {
                    elements.resort();
                } finally {
                    doc.readUnlock();
                }
            }finally {
                writeUnlock();
            }
            documentDirty = false;
        }
        try {
            int elementIndex = elements.binarySearchForOffset(startOffset);
            if(elementIndex < 0) {
                return Collections.emptyList();
            } else {
                ArrayList<DocumentElement> found = new ArrayList<DocumentElement>();
                found.add(elements.get(elementIndex));
                int eli = elementIndex;
                while(--eli >= 0) {
                    DocumentElement previous = elements.get(eli);
                    if(previous.getStartOffset() == startOffset) {
                        found.add(0, previous);
                    } else {
                        break;
                    }
                }
                while(++elementIndex < elements.size()) {
                    DocumentElement next = elements.get(elementIndex);
                    if(next.getStartOffset() == startOffset) {
                        found.add(next);
                    } else {
                        break;
                    }
                }
                return found;
            }
        }finally{
            readUnlock();
        }
    }

writeLock(); // This line is suggested to extract

Is it ok?
DocumentModel_618deb49bbf23fe675cea45f2e2cf3f69e8231df8d81a2f47123c696ac11dbf1_getDocumentElements_199.zip

SEMI Baseline. Semantic extraction bug.

Add name of resource in try statement to its semantic.

Semi doesn't work with constructors

If we pass a constructor to SEMI, it fails with the error:

Traceback (most recent call last):
  File "D:\git\veniq\veniq\dataset_collection\validation.py", line 88, in validate_row
    opport = _print_extraction_opportunities(
  File "D:\git\veniq\veniq\dataset_collection\validation.py", line 24, in _print_extraction_opportunities
    statements_semantic = extract_method_statements_semantic(method_ast)
  File "D:\git\veniq\veniq\baselines\semi\extract_semantic.py", line 12, in extract_method_statements_semantic
    block_statement_graph = build_block_statement_graph(method_ast)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 15, in build_block_statement_graph
    root_index = _build_graph_from_statement(method_ast.get_root(), graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 24, in _build_graph_from_statement
    blocks = extract_blocks_from_statement(statement)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\_block_extractors.py", line 17, in extract_blocks_from_statement
    raise NotImplementedError(f"Node {statement.node_type} is not supported.")
NotImplementedError: Node Constructor declaration is not supported.

for part of the file (Constructor):
VmCustomizer_VmCustomizer_96.txt

 public VmCustomizer(final GlassfishInstance instance) {
        this.instance = instance;
        javaPlatforms = JavaUtils.findSupportedPlatforms(this.instance);
        this.platformButtonText = NbBundle.getMessage(
                VmCustomizer.class,
                "VmCustomizer.platformButton");
        this.platformButtonAction = new PlatformAct

Make overlap

Make overlap function which takes list of opportunities (lines numbers) and calculates the overlap between them

SEMI Baseline. Finding opportunities takes too much time

Finding opportunities takes too much time. It happens with large files:

files.zip

it takes from minute up to 4 minutes. It didn't even stop in 4 minutes

SEMI: convenient API for EMO recommendation

EMO = Extract Method Opportunity

Why

Currently, there is no implemented function that takes a method source code and directly output EMO recommendations. Such functionality could be useful to the end-used or if called by some refactoring plugin.

What we want

A function that takes a method declaration, and outputs SEMI recommendation in the form of an ordered list of recommended EMOs. The order is of decreasing recommendation. EMO is represented as a tuple (start_line_number, end_line_number).

def recommend(method_declaration_lines: List[str]): -> List[(int, int)]

Proposed solution

Write a wrapper around existing functions extract_method_statements_semantic, create_extraction_opportunities,
filter_extraction_opportunities, rank_extraction_opportunities
implement it in a separate file under veniq.baseline.semi

@acheshkov @lyriccoder discuss

Skip statements without semnatic in semantic filter.

See this file:
NbValidatioTransaction.txt

Traceback (most recent call last):
  File "D:/git/veniq/veniq/dataset_collection/validation.py", line 69, in <module>
    opport = _print_extraction_opportunities(
  File "D:/git/veniq/veniq/dataset_collection/validation.py", line 18, in _print_extraction_opportunities
    filtered_extraction_opportunities = filter_extraction_opportunities(
  File "D:\git\veniq\veniq\baselines\semi\filter_extraction_opportunities.py", line 24, in filter_extraction_opportunities
    return list(extraction_opportunities_filtered)
  File "D:\git\veniq\veniq\baselines\semi\filter_extraction_opportunities.py", line 21, in <lambda>
    and semantic_filter(extraction_opportunity, statements_semantic, block_statement_graph),
  File "D:\git\veniq\veniq\baselines\semi\_semantic_filter.py", line 16, in semantic_filter
    method_block_statement_graph.traverse(
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\block.py", line 48, in traverse
    self._traverse_function(self._graph, self._id, on_node_entering, on_node_leaving)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\_nodes_factory.py", line 50, in _traverse_graph
    on_node_entering(destination_node)
  File "D:\git\veniq\veniq\baselines\semi\_semantic_filter.py", line 51, in on_node_entering
    self._on_statement_entering(node)
  File "D:\git\veniq\veniq\baselines\semi\_semantic_filter.py", line 81, in _on_statement_entering
    statement_semantic = self._statements_semantic[statement.node]
KeyError: <ASTNode node_type: Try statement, node_index: 2334>

Originally posted by @lyriccoder in #57 (comment)

Add test for extraction opportunities ranking from example from paper.

Veniq Dataset Collection. Function is not inserted correctly with 1-line invocation

Suppose, we have the following invocation:

'public void removeUpdate(DocumentEvent e) { nameChange(); }'

We need to inline nameChange. It doesn't work properly.
You can find the input on the server.

data/full_dataset/output_files/Clone_Clone_61.java

List of all continuous EMOs has no TRY blocks.

I try to generate all continuous EMOs for the following code snippet:

public class InsertExecutor {
    protected int getPkValuesBySequence() {
        try {
            a = 1;
        } catch (SQLException ignore) {
        }
        
        if (a > 2) {
            a = b + 2;
            b = a - 2;
        }
    }
}

I get the following list of EMOs:

0th extraction opportunity:
    First statement: Statement expression on line 4
    Last statement: Statement expression on line 4
range: 
             a = 1; 
 -----
1th extraction opportunity:
    First statement: If statement on line 8
    Last statement: Statement expression on line 10
range: 
         if (a > 2) {
            a = b + 2;
            b = a - 2;
        } 
 -----
2th extraction opportunity:
    First statement: Statement expression on line 9
    Last statement: Statement expression on line 9
range: 
             a = b + 2; 
 -----
3th extraction opportunity:
    First statement: Statement expression on line 9
    Last statement: Statement expression on line 10
range: 
            a = b + 2;
            b = a - 2; 
 -----
4th extraction opportunity:
    First statement: Statement expression on line 10
    Last statement: Statement expression on line 10
range: 
             b = a - 2;

Among all EMOs I expect to see:

try {
  a = 1;
} catch (SQLException ignore) {
}

Design for d6tflow framework

We can split our tasks to the following Task of d6tflow framework
Task1 -> open Java file with correct encoding
Task2 -> remove all spaces and comments in it and save to another file
Task3 -> open file, find all method which can be inlined. Save target, extracted, full_ast, text_file, filename, row_csv from Task2
Task4 -> Task3 get target, extracted and filter it. Save target, extracted, full_ast, text_file, filename, row_csv from Task3
Task5 -> get result from Task3 and filter limited cases. Save target, extracted, full_ast, text_file, filename, row_csv from Task4
Task6 -> Inline Method, save file, row_csv
Task 7 -> save row_csv to global DataFrame

Possible problems:

We have to save our preprocessed files to external memory, since we will have lots of files and it won't have enough memory to keep them in cache. Also, we have to keep them also in external memory since, it's our dataset which will be validated.
Seems, it cannot be done due to d6t/d6tflow#6
We need to save different types of objects: ast tree, text. Seems, it's difficult:
d6t/d6tflow#26

SEMI Baseline. Finding opportunity fails for file

file KeyBindingSettingsImpl
KeyBindingSettingsImpl.txt

Traceback (most recent call last):
  File "D:/git/veniq/veniq/dataset_collection/validation.py", line 69, in <module>
    opport = _print_extraction_opportunities(
  File "D:/git/veniq/veniq/dataset_collection/validation.py", line 16, in _print_extraction_opportunities
    statements_semantic = extract_method_statements_semantic(method_ast)
  File "D:\git\veniq\veniq\baselines\semi\extract_semantic.py", line 12, in extract_method_statements_semantic
    block_statement_graph = build_block_statement_graph(method_ast)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 15, in build_block_statement_graph
    root_index = _build_graph_from_statement(method_ast.get_root(), graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 26, in _build_graph_from_statement
    new_block_index = _build_graph_from_block(block, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 40, in _build_graph_from_block
    new_statement_index = _build_graph_from_statement(statement, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 24, in _build_graph_from_statement
    blocks = extract_blocks_from_statement(statement)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\_block_extractors.py", line 15, in extract_blocks_from_statement
    return _block_extractors[statement.node_type](statement)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\_block_extractors.py", line 45, in _extract_blocks_from_if_branching
    statements=_unwrap_block_to_statements_list(statement.then_statement),
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\_block_extractors.py", line 115, in _unwrap_block_to_statements_list
    assert block_statement_or_statement_list.node_type == ASTNodeType.BLOCK_STATEMENT
AssertionError

Synth. dataset: remove duplicated method invocations

Why

We observed the following case of function invocations in the original code used for synthtic dataset: an method extrMethod(); is called from different places (different methods) within the same class. This is a case of code duplication. It may be that the reason extrMethod was defined as a separate method is precisely because of code duplication, and not because it is a semantically cohesive piece of code which can serve as a good example of ExtractMethod refactoring.

What we want

Eliminate the factor of code duplication in our synthetic dataset on LM/EM.

Proposed solution

Simplest and least nuanced solution: do not inline methods that are invoked more than once with a give class.

Fix creation of all opportunities

Filter all fake nodes.

SEMI alternative: generate all opportunities

Add fake statements on block leaving in extract semantics step in baseline.

Currently baseline suffers from small amount of large extractions opportunities, which got filtered out easily by syntactic filter. Inserting fake statements on end of each block can help with it.

Based on #51

Clustering test fails

======================================================================
FAIL: test_article (test.clustering.test_clustering.ClusteringTestCase)

Traceback (most recent call last):
File "D:\git\veniq\test\clustering\test_clustering.py", line 47, in test_article
self.assertEqual(SEMI(self.example),
AssertionError: Lists differ: [[26, 34], [3, 12], [30, 31], [30, 34], [3, 25], [13, 25], [13, 22], [3, 34]] != [[26, 34], [13, 25], [3, 25], [13, 22], [3, 12], [30, 31], [30, 34], [3, 34]]

First differing element 1:
[3, 12]
[13, 25]

[[26, 34], [3, 12], [30, 31], [30, 34], [3, 25], [13, 25], [13, 22], [3, 34]]

[[26, 34], [13, 25], [3, 25], [13, 22], [3, 12], [30, 31], [30, 34], [3, 34]] : Wrong unique clusters

Ran 66 tests in 0.809s

Fix statements clustering.

Currently statements clustering outputs less clusters than it should. For example from paper it prints a single cluster [[7, 38]], while there is definitely more of them.

Make test CI

test CI with github actions

Add support for complex statements in block-statement graph

There are no support for:

If statement
Switch statement
Try statement

Baseline: syntax filtering

Filter continuous sequence of statements based on block-statement graph.

Utils test fails

FAIL: test_canReadLines (test.utils.Lines.test_lines.TestLines)

Traceback (most recent call last):
File "D:\git\veniq\test\utils\Lines\test_lines.py", line 11, in test_canReadLines
self.assertEqual(
AssertionError: 'class SimpleLinesTest {\n' != 'class SimpleLinesTest {\r\n'

class SimpleLinesTest {

class SimpleLinesTest {
? +
: Did not match first line

SEMI: correct spelling mistakes in functionality implementation

Why

Some field names and method names contain spelling mistakes and it's bothering me:

"benifit"
"treshold"
(maybe others)

What we want

Correct the spelling

Veniq WP. Semi fails when extract_method_statements_semantic

During run of extract_method_statements_semantic the error occurs
InternalElkGraphLexer_mTokens_1338.txt

data/small_dataset/output_files/InternalMetaDataLexer_mTokens_3800.java
Traceback (most recent call last):
  File "D:/git/veniq/veniq/dataset_collection/validation.py", line 97, in <module>
    opport = _print_extraction_opportunities(
  File "D:/git/veniq/veniq/dataset_collection/validation.py", line 18, in _print_extraction_opportunities
    statements_semantic = extract_method_statements_semantic(method_ast)
  File "D:\git\veniq\veniq\baselines\semi\extract_semantic.py", line 12, in extract_method_statements_semantic
    block_statement_graph = build_block_statement_graph(method_ast)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 15, in build_block_statement_graph
    root_index = _build_graph_from_statement(method_ast.get_root(), graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 26, in _build_graph_from_statement
    new_block_index = _build_graph_from_block(block, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 40, in _build_graph_from_block
    new_statement_index = _build_graph_from_statement(statement, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 24, in _build_graph_from_statement
    blocks = extract_blocks_from_statement(statement)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\_block_extractors.py", line 15, in extract_blocks_from_statement
    return _block_extractors[statement.node_type](statement)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\_block_extractors.py", line 92, in _extract_blocks_from_try_statement
    for catch_clause in statement.catches:
TypeError: 'NoneType' object is not iterable

cqfn / veniq Goto Github PK

veniq's People

Contributors

Stargazers

Watchers

Forkers

veniq's Issues

====================================================================== FAIL: test_class_computed_fields (test.ast_framework.test_ast_node.ASTNodeTestSuite)

Why

What we want

Proposed solution

Why

What we want

Proposed solution

Why

What we want

Proposed solution

====================================================================== FAIL: test_article (test.clustering.test_clustering.ClusteringTestCase)

FAIL: test_canReadLines (test.utils.Lines.test_lines.TestLines)

Why

What we want

Recommend Projects

Recommend Topics

Recommend Org

======================================================================
FAIL: test_class_computed_fields (test.ast_framework.test_ast_node.ASTNodeTestSuite)

======================================================================
FAIL: test_article (test.clustering.test_clustering.ClusteringTestCase)