Giter Site home page Giter Site logo

cqfn / veniq Goto Github PK

View Code? Open in Web Editor NEW
20.0 6.0 3.0 789 KB

Veniq uses Machine Learning to analyze source code, find possible refactorings, and suggest those that seem optimal

License: MIT License

Python 58.80% Makefile 0.06% Java 41.14%
static-analysis machine-learning refactorings

veniq's People

Contributors

acheshkov avatar aiovlev avatar aravij avatar katgarmash avatar lyriccoder avatar vitaly-protasov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

veniq's Issues

ASTNode test fails

======================================================================
FAIL: test_class_computed_fields (test.ast_framework.test_ast_node.ASTNodeTestSuite)

Traceback (most recent call last):
File "D:\git\veniq\test\ast_framework\test_ast_node.py", line 22, in test_class_computed_fields
self.assertEqual(java_class.documentation, "/\n* Some documentation\n*/")
AssertionError: '/
\r\n* Some documentation\r\n*/' != '/**\n* Some documentation\n*/'

  • /**
    ? -
  • /**
    • Some documentation
      ? -
    • Some documentation
      */

SEMI recommendation api: add method to check validity of input method declaration and create error codes

Why

The current implementation #121 takes as input a list of strings, assuming it is a valid method declaration.

What we want

Add a subprocedure in ``recommend_for_method'' checking it is a valid method declaration. If it is not valid, the API should return an error code.

Proposed solution

  • Validity checks:
    • check it is not a class declaration
    • check for bracket balance
    • part of the errors can be caught by try-catch wrapper around procedure creating AST tree from string.
  • Error codes: TODO

@acheshkov @lyriccoder

Lines on insertions are negative

In the file attached below we can see that the start line of insertions is larger than the end line of insertion

31411,
/dataset/01/eclipse/openj9/sourcetools/j9constantpool/com/ibm/oti/VMCPTool/ConstantPoolStream.java,data/full_dataset/input_files/ConstantPoolStream_871c8b89fa567f7d368997faeb856a6a542d89631cb8f19b7426b04d7b773efb.java,
ConstantPoolStream,b'writeFooter();',
close,
209,
writeFooter,
212,
213,
data/full_dataset/output_files/ConstantPoolStream_871c8b89fa567f7d368997faeb856a6a542d89631cb8f19b7426b04d7b773efb_close_211.java,
True,
211,
210

ConstantPoolStream_871c8b89fa567f7d368997faeb856a6a542d89631cb8f19b7426b04d7b773efb_close_211.txt

Better README.

I think the project should have a better README to understand how to use the project.

ASTframework. Parsing problem

Hello. I faced the problem during parsing this class:

public abstract class DdlChange implements MigrationStep {

  private final Database db;

  public DdlChange(Database db) {
    this.db = db;
  }


  public final void execute() throws SQLException {
    try (Connection writeConnection = createDdlConnection()) {
      Context context = new ContextImpl(writeConnection);
      execute(context);
    }
  }

  private Connection createDdlConnection() throws SQLException {
    Connection writeConnection = db.getDataSource().getConnection();
    writeConnection.setAutoCommit(false);
    return writeConnection;
  }
}

Problem is during printing method invocation createDdlConnection()

node index: 45
arguments: []
member: createDdlConnection
node_type: Method invocation
postfix_operators: []
prefix_operators: []
qualifier: None
selectors: []
type_arguments: None
0 ['    try (Connection writeConnection ', ' createDdlConnection()) {\n']
Error has happened during file analyze: 'NoneType' object is not subscriptable

Make ASTNode return its subtree.

Currently to get subtree one need to use get_subtree method of AST, not of ASTNode. Moving this method to ASTNode would make it more convinient. To achive it we need to apply factory/DI trick to break circular import.

Block-Statements graph

For syntactic filtering in baseline we need to find boundaries of each block. It can be done with help of bipartite graph of blocks and statements. This graph could be also used for traversing AST while extracting semantic from statements.

Semantic filter

Detect which variables as in put and as output extraction opportunity needs

Veniq. Semi. Bug when trying to run semi algorithm

I had the error:

Node Statement is not supported.

File is attached below
HandleLayer.zip

    return _block_extractors[statement.node_type](statement)
KeyError: <ASTNodeType.STATEMENT: 58>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\git\veniq\veniq\dataset_collection\validation.py", line 144, in validate_row
    opport = find_extraction_opportunities(ast_subtree)
  File "D:\git\veniq\veniq\dataset_collection\validation.py", line 28, in find_extraction_opportunities
    statements_semantic = extract_method_statements_semantic(method_ast)
  File "D:\git\veniq\veniq\baselines\semi\extract_semantic.py", line 12, in extract_method_statements_semantic
    block_statement_graph = build_block_statement_graph(method_ast)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 15, in build_block_statement_graph
    root_index = _build_graph_from_statement(method_ast.get_root(), graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 26, in _build_graph_from_statement
    new_block_index = _build_graph_from_block(block, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 40, in _build_graph_from_block
    new_statement_index = _build_graph_from_statement(statement, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 26, in _build_graph_from_statement
    new_block_index = _build_graph_from_block(block, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 40, in _build_graph_from_block
    new_statement_index = _build_graph_from_statement(statement, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 26, in _build_graph_from_statement
    new_block_index = _build_graph_from_block(block, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 40, in _build_graph_from_block
    new_statement_index = _build_graph_from_statement(statement, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 26, in _build_graph_from_statement
    new_block_index = _build_graph_from_block(block, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 40, in _build_graph_from_block
    new_statement_index = _build_graph_from_statement(statement, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 26, in _build_graph_from_statement
    new_block_index = _build_graph_from_block(block, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 40, in _build_graph_from_block
    new_statement_index = _build_graph_from_statement(statement, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 26, in _build_graph_from_statement
    new_block_index = _build_graph_from_block(block, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 40, in _build_graph_from_block
    new_statement_index = _build_graph_from_statement(statement, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 26, in _build_graph_from_statement
    new_block_index = _build_graph_from_block(block, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 40, in _build_graph_from_block
    new_statement_index = _build_graph_from_statement(statement, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 26, in _build_graph_from_statement
    new_block_index = _build_graph_from_block(block, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 40, in _build_graph_from_block
    new_statement_index = _build_graph_from_statement(statement, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 24, in _build_graph_from_statement
    blocks = extract_blocks_from_statement(statement)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\_block_extractors.py", line 17, in extract_blocks_from_statement
    raise NotImplementedError(f"Node {statement.node_type} is not supported.")
NotImplementedError: Node Statement is not supported.

Veniq. Dataset collection. Function is not inserted correctly

TomEEWarListener_e9b19cb814c4fbfa77d6bc2ca982b2ef30f1f06314f4_output.txt

        File jar = TomcatFactory.getTomEEWebAppJar(tp.getCatalinaHome(), tp.getCatalinaBase());
        if (this.currentTomEEJar != jar && (this.currentTomEEJar == null || !this.currentTomEEJar.equals(jar))) {
            currentTomEEJar = jar;
            TomcatManager.TomEEVersion version = TomcatFactory.getTomEEVersion(jar);
            TomcatManager.TomEEType type = version == null ? null : TomcatFactory.getTomEEType(jar.getParentFile());
            refresh.refresh(version, type);
        }
        if (this.currentTomEEJar != jar && (this.currentTomEEJar == null || !this.currentTomEEJar.equals(jar))) {
            currentTomEEJar = jar;
            TomcatManager.TomEEVersion version = TomcatFactory.getTomEEVersion(jar);
            TomcatManager.TomEEType type = version == null ? null : TomcatFactory.getTomEEType(jar.getParentFile());
            refresh.refresh(version, type);
        }

Lines of insertion are not valid for inner classes

If we have Inner class and it's function name matches the main class name, the line numbers are incorrect

PainlessParser.txt

  '/dataset/01/elastic/elasticsearch/modules/lang-painless/src/main/java/org/elasticsearch/painless/antlr/PainlessParser.java'),
 ('input_filename',
  'data/full_dataset/input_files/PainlessParser_bd33bb08aa9a728098ee1dfb7f9677f9db75a6b3dfa38a21732fa1300b8a0ac4.java'),
 ('class_name', 'PainlessParser'),
 ('invocation_text_string', 'trailer()'),
 ('method_where_invocation_occurred', 'rstatement'),
 ('start_line_of_function_where_invocation_occurred', '317.0'),
 ('invocation_method_name', 'trailer'),
 ('invocation_method_start_line', '422.0'),
 ('invocation_method_end_line', '424.0'),
 ('output_filename',
  'data/full_dataset/output_files/PainlessParser_bd33bb08aa9a728098ee1dfb7f9677f9db75a6b3dfa38a21732fa1300b8a0ac4_rstatement_566.java'),
 ('can_be_parsed', 'True'),
 ('inline_insertion_line_start', '566'),
 ('inline_insertion_line_end', '625'),
 ('project_name', 'elastic/elasticsearch')]```

Veniq. Dataset collection. Function is not inserted correctly

There is a function

    private void refillUserDataConstraint() {
        setUserDataConstraint(null);
        UserDataConstraint userDataConstraint = getUserDataConstraint();
        userDataConstraint.setDescription(userDataConstraintDescTF.getText());
        userDataConstraint.setTransportGuarantee((String) transportGuaranteeCB.getSelectedItem());
    }

The last statement is missed in the new file:

    public void setValue(javax.swing.JComponent source, Object value) {
        if (source == displayNameTF) {
            String text = (String)value;
            constraint.setDisplayName(text);
            SectionPanel enclosingPanel = getSectionView().findSectionPanel(constraint);
            enclosingPanel.setTitle(text);
            enclosingPanel.getNode().setDisplayName(text);
        } else if (source == authConstraintCB) {
            if (authConstraintCB.isSelected()) {
                refillAuthConstraint();
            } else {
                setAuthConstraint(null);
            }
        } else if (source == roleNamesTF) {
            refillAuthConstraint();
        } else if (source == authConstraintDescTF) {
            refillAuthConstraint();
        } else if (source == userDataConstraintCB) {
            if (userDataConstraintCB.isSelected()) {
                setUserDataConstraint(null);
                UserDataConstraint userDataConstraint = getUserDataConstraint();
                userDataConstraint.setDescription(userDataConstraintDescTF.getText());
            } else {
                setUserDataConstraint(null);
            }
        } else if (source == transportGuaranteeCB) {
            refillUserDataConstraint();
        } else if (source == userDataConstraintDescTF) {
            refillUserDataConstraint();
        }
    }

SecurityConstraintPanel_setValue_192.java.txt

Inline problems

We have such problem in our currently inline script:

  • Cannot inline those methods or invokation, which is defined in one row with some other statements:
    (problem to insert body of hasMore() invokation)
 public Vector getWordVector(char[] separators)
   {  Vector list=new Vector();
      do { list.addElement(getWord(separators)); } while (hasMore());
      return list;
   }

  • Problem to extract body of this method: one-line several statements in a row
    public String getRemainingString() { return str.substring(index); }

  • Problem with output new inserted blocks

            if (ic == null) {
                // it must be unrecognized setting
        name = NbBundle.getMessage(InstanceNode.class,

When it was:

            if (ic == null) {
                // it must be unrecognized setting
                return NbBundle.getMessage(InstanceNode.class,
                    "LBL_BrokenSettings"); //NOI18N
  • Problem to detect the last line of methods at the end of file
    private void resetFactories() {
        if (ppFactories == null) {
            ppFactories = ProfilingPointsManager.getDefault().getProfilingPointFactories();
            getChooserFrame().initPanel(ppFactories);
        }

        currentFactory = 0;
    }
}
  • Problem to Inline such method invocaitons
    if (!isProcessExternalEntities()) {

  • Still, there're a lot of problems is insertion with brackets.

  • FIX writing into csv

  File ".\veniq\dataset_collection\augmentation.py", line 427, in <module>
    writer.writerow(i)
  File "C:\Users\Vitaly\Anaconda3\lib\encodings\cp1251.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 158-161: character maps to <undefined>

(In ivocation line may contains chines symbols in comments)

  • Problem to inline methods with massive comments inside at the end of method declaration:
/*
        } else {
              pass
        }
*/

Fix baseline.

Currently baseline filter out extraction opportunity from 17 to 26. It happens because extraction semantic step output else if statements, while they are not automatically traversed in syntactic filter. To solve this, one may need to fix block-statement graph creation, so each else if statement would be represented by its own Statement node. This means each If statement ow may how only one THEN_BRANCH block.

SEMI algorithm chooses 1 line to extract

SEMI algorithm chooses 1 line to extract.
Seems we consider extracting more than 3 lines, don't we?

 DocumentElement getDocumentElement(int startOffset, int endOffset) throws BadLocationException {
        readLock();
        checkDocumentDirty();
        try {
            for(int i = 0; i < elements.size(); i++) {
                DocumentElement de = elements.get(i);
                if(de.getStartOffset() == startOffset &&
                        de.getEndOffset() == endOffset)
                    return de;
                if(de.getStartOffset() > startOffset) break;
            }
            return null;
        }finally{
            readUnlock();
        }
    }
    List<DocumentElement> getDocumentElements(int startOffset) throws BadLocationException {
        readLock();
        if(documentDirty) {
            writeLock(); // This line is suggested to extract
            try {
                doc.readLock();
                try {
                    elements.resort();
                } finally {
                    doc.readUnlock();
                }
            }finally {
                writeUnlock();
            }
            documentDirty = false;
        }
        try {
            int elementIndex = elements.binarySearchForOffset(startOffset);
            if(elementIndex < 0) {
                return Collections.emptyList();
            } else {
                ArrayList<DocumentElement> found = new ArrayList<DocumentElement>();
                found.add(elements.get(elementIndex));
                int eli = elementIndex;
                while(--eli >= 0) {
                    DocumentElement previous = elements.get(eli);
                    if(previous.getStartOffset() == startOffset) {
                        found.add(0, previous);
                    } else {
                        break;
                    }
                }
                while(++elementIndex < elements.size()) {
                    DocumentElement next = elements.get(elementIndex);
                    if(next.getStartOffset() == startOffset) {
                        found.add(next);
                    } else {
                        break;
                    }
                }
                return found;
            }
        }finally{
            readUnlock();
        }
    }

writeLock(); // This line is suggested to extract

Is it ok?
DocumentModel_618deb49bbf23fe675cea45f2e2cf3f69e8231df8d81a2f47123c696ac11dbf1_getDocumentElements_199.zip

Semi doesn't work with constructors

If we pass a constructor to SEMI, it fails with the error:

Traceback (most recent call last):
  File "D:\git\veniq\veniq\dataset_collection\validation.py", line 88, in validate_row
    opport = _print_extraction_opportunities(
  File "D:\git\veniq\veniq\dataset_collection\validation.py", line 24, in _print_extraction_opportunities
    statements_semantic = extract_method_statements_semantic(method_ast)
  File "D:\git\veniq\veniq\baselines\semi\extract_semantic.py", line 12, in extract_method_statements_semantic
    block_statement_graph = build_block_statement_graph(method_ast)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 15, in build_block_statement_graph
    root_index = _build_graph_from_statement(method_ast.get_root(), graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 24, in _build_graph_from_statement
    blocks = extract_blocks_from_statement(statement)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\_block_extractors.py", line 17, in extract_blocks_from_statement
    raise NotImplementedError(f"Node {statement.node_type} is not supported.")
NotImplementedError: Node Constructor declaration is not supported.

for part of the file (Constructor):
VmCustomizer_VmCustomizer_96.txt

 public VmCustomizer(final GlassfishInstance instance) {
        this.instance = instance;
        javaPlatforms = JavaUtils.findSupportedPlatforms(this.instance);
        this.platformButtonText = NbBundle.getMessage(
                VmCustomizer.class,
                "VmCustomizer.platformButton");
        this.platformButtonAction = new PlatformAct

Make overlap

Make overlap function which takes list of opportunities (lines numbers) and calculates the overlap between them

SEMI: convenient API for EMO recommendation

EMO = Extract Method Opportunity

Why

Currently, there is no implemented function that takes a method source code and directly output EMO recommendations. Such functionality could be useful to the end-used or if called by some refactoring plugin.

What we want

A function that takes a method declaration, and outputs SEMI recommendation in the form of an ordered list of recommended EMOs. The order is of decreasing recommendation. EMO is represented as a tuple (start_line_number, end_line_number).

def recommend(method_declaration_lines: List[str]): -> List[(int, int)]

Proposed solution

  • Write a wrapper around existing functions extract_method_statements_semantic, create_extraction_opportunities,
    filter_extraction_opportunities, rank_extraction_opportunities
  • implement it in a separate file under veniq.baseline.semi

@acheshkov @lyriccoder discuss

Skip statements without semnatic in semantic filter.

See this file:
NbValidatioTransaction.txt

Traceback (most recent call last):
  File "D:/git/veniq/veniq/dataset_collection/validation.py", line 69, in <module>
    opport = _print_extraction_opportunities(
  File "D:/git/veniq/veniq/dataset_collection/validation.py", line 18, in _print_extraction_opportunities
    filtered_extraction_opportunities = filter_extraction_opportunities(
  File "D:\git\veniq\veniq\baselines\semi\filter_extraction_opportunities.py", line 24, in filter_extraction_opportunities
    return list(extraction_opportunities_filtered)
  File "D:\git\veniq\veniq\baselines\semi\filter_extraction_opportunities.py", line 21, in <lambda>
    and semantic_filter(extraction_opportunity, statements_semantic, block_statement_graph),
  File "D:\git\veniq\veniq\baselines\semi\_semantic_filter.py", line 16, in semantic_filter
    method_block_statement_graph.traverse(
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\block.py", line 48, in traverse
    self._traverse_function(self._graph, self._id, on_node_entering, on_node_leaving)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\_nodes_factory.py", line 50, in _traverse_graph
    on_node_entering(destination_node)
  File "D:\git\veniq\veniq\baselines\semi\_semantic_filter.py", line 51, in on_node_entering
    self._on_statement_entering(node)
  File "D:\git\veniq\veniq\baselines\semi\_semantic_filter.py", line 81, in _on_statement_entering
    statement_semantic = self._statements_semantic[statement.node]
KeyError: <ASTNode node_type: Try statement, node_index: 2334>

Originally posted by @lyriccoder in #57 (comment)

List of all continuous EMOs has no TRY blocks.

I try to generate all continuous EMOs for the following code snippet:

public class InsertExecutor {
    protected int getPkValuesBySequence() {
        try {
            a = 1;
        } catch (SQLException ignore) {
        }
        
        if (a > 2) {
            a = b + 2;
            b = a - 2;
        }
    }
}

I get the following list of EMOs:

0th extraction opportunity:
    First statement: Statement expression on line 4
    Last statement: Statement expression on line 4
range: 
             a = 1; 
 -----
1th extraction opportunity:
    First statement: If statement on line 8
    Last statement: Statement expression on line 10
range: 
         if (a > 2) {
            a = b + 2;
            b = a - 2;
        } 
 -----
2th extraction opportunity:
    First statement: Statement expression on line 9
    Last statement: Statement expression on line 9
range: 
             a = b + 2; 
 -----
3th extraction opportunity:
    First statement: Statement expression on line 9
    Last statement: Statement expression on line 10
range: 
            a = b + 2;
            b = a - 2; 
 -----
4th extraction opportunity:
    First statement: Statement expression on line 10
    Last statement: Statement expression on line 10
range: 
             b = a - 2; 

Among all EMOs I expect to see:

try {
  a = 1;
} catch (SQLException ignore) {
}

Design for d6tflow framework

We can split our tasks to the following Task of d6tflow framework
Task1 -> open Java file with correct encoding
Task2 -> remove all spaces and comments in it and save to another file
Task3 -> open file, find all method which can be inlined. Save target, extracted, full_ast, text_file, filename, row_csv from Task2
Task4 -> Task3 get target, extracted and filter it. Save target, extracted, full_ast, text_file, filename, row_csv from Task3
Task5 -> get result from Task3 and filter limited cases. Save target, extracted, full_ast, text_file, filename, row_csv from Task4
Task6 -> Inline Method, save file, row_csv
Task 7 -> save row_csv to global DataFrame

Possible problems:

  1. We have to save our preprocessed files to external memory, since we will have lots of files and it won't have enough memory to keep them in cache. Also, we have to keep them also in external memory since, it's our dataset which will be validated.
    Seems, it cannot be done due to d6t/d6tflow#6
  2. We need to save different types of objects: ast tree, text. Seems, it's difficult:
    d6t/d6tflow#26

SEMI Baseline. Finding opportunity fails for file

file KeyBindingSettingsImpl
KeyBindingSettingsImpl.txt

Traceback (most recent call last):
  File "D:/git/veniq/veniq/dataset_collection/validation.py", line 69, in <module>
    opport = _print_extraction_opportunities(
  File "D:/git/veniq/veniq/dataset_collection/validation.py", line 16, in _print_extraction_opportunities
    statements_semantic = extract_method_statements_semantic(method_ast)
  File "D:\git\veniq\veniq\baselines\semi\extract_semantic.py", line 12, in extract_method_statements_semantic
    block_statement_graph = build_block_statement_graph(method_ast)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 15, in build_block_statement_graph
    root_index = _build_graph_from_statement(method_ast.get_root(), graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 26, in _build_graph_from_statement
    new_block_index = _build_graph_from_block(block, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 40, in _build_graph_from_block
    new_statement_index = _build_graph_from_statement(statement, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 24, in _build_graph_from_statement
    blocks = extract_blocks_from_statement(statement)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\_block_extractors.py", line 15, in extract_blocks_from_statement
    return _block_extractors[statement.node_type](statement)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\_block_extractors.py", line 45, in _extract_blocks_from_if_branching
    statements=_unwrap_block_to_statements_list(statement.then_statement),
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\_block_extractors.py", line 115, in _unwrap_block_to_statements_list
    assert block_statement_or_statement_list.node_type == ASTNodeType.BLOCK_STATEMENT
AssertionError

Synth. dataset: remove duplicated method invocations

Why

We observed the following case of function invocations in the original code used for synthtic dataset: an method extrMethod(); is called from different places (different methods) within the same class. This is a case of code duplication. It may be that the reason extrMethod was defined as a separate method is precisely because of code duplication, and not because it is a semantically cohesive piece of code which can serve as a good example of ExtractMethod refactoring.

What we want

Eliminate the factor of code duplication in our synthetic dataset on LM/EM.

Proposed solution

Simplest and least nuanced solution: do not inline methods that are invoked more than once with a give class.

Clustering test fails

======================================================================
FAIL: test_article (test.clustering.test_clustering.ClusteringTestCase)

Traceback (most recent call last):
File "D:\git\veniq\test\clustering\test_clustering.py", line 47, in test_article
self.assertEqual(SEMI(self.example),
AssertionError: Lists differ: [[26, 34], [3, 12], [30, 31], [30, 34], [3, 25], [13, 25], [13, 22], [3, 34]] != [[26, 34], [13, 25], [3, 25], [13, 22], [3, 12], [30, 31], [30, 34], [3, 34]]

First differing element 1:
[3, 12]
[13, 25]

  • [[26, 34], [3, 12], [30, 31], [30, 34], [3, 25], [13, 25], [13, 22], [3, 34]]
  • [[26, 34], [13, 25], [3, 25], [13, 22], [3, 12], [30, 31], [30, 34], [3, 34]] : Wrong unique clusters

Ran 66 tests in 0.809s

Fix statements clustering.

Currently statements clustering outputs less clusters than it should. For example from paper it prints a single cluster [[7, 38]], while there is definitely more of them.

Utils test fails

FAIL: test_canReadLines (test.utils.Lines.test_lines.TestLines)

Traceback (most recent call last):
File "D:\git\veniq\test\utils\Lines\test_lines.py", line 11, in test_canReadLines
self.assertEqual(
AssertionError: 'class SimpleLinesTest {\n' != 'class SimpleLinesTest {\r\n'

  • class SimpleLinesTest {
  • class SimpleLinesTest {
    ? +
    : Did not match first line

Veniq WP. Semi fails when extract_method_statements_semantic

During run of extract_method_statements_semantic the error occurs
InternalElkGraphLexer_mTokens_1338.txt

data/small_dataset/output_files/InternalMetaDataLexer_mTokens_3800.java
Traceback (most recent call last):
  File "D:/git/veniq/veniq/dataset_collection/validation.py", line 97, in <module>
    opport = _print_extraction_opportunities(
  File "D:/git/veniq/veniq/dataset_collection/validation.py", line 18, in _print_extraction_opportunities
    statements_semantic = extract_method_statements_semantic(method_ast)
  File "D:\git\veniq\veniq\baselines\semi\extract_semantic.py", line 12, in extract_method_statements_semantic
    block_statement_graph = build_block_statement_graph(method_ast)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 15, in build_block_statement_graph
    root_index = _build_graph_from_statement(method_ast.get_root(), graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 26, in _build_graph_from_statement
    new_block_index = _build_graph_from_block(block, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 40, in _build_graph_from_block
    new_statement_index = _build_graph_from_statement(statement, graph)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\builder.py", line 24, in _build_graph_from_statement
    blocks = extract_blocks_from_statement(statement)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\_block_extractors.py", line 15, in extract_blocks_from_statement
    return _block_extractors[statement.node_type](statement)
  File "D:\git\veniq\veniq\ast_framework\block_statement_graph\_block_extractors.py", line 92, in _extract_blocks_from_try_statement
    for catch_clause in statement.catches:
TypeError: 'NoneType' object is not iterable

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.