lisa-analyzer / lisa Goto Github PK

📚 a modular easy to use Library for Static Analysis aiming at multi-language analysis

Home Page: https://lisa-analyzer.github.io/

License: MIT License

Java 81.64% ANTLR 0.24% JavaScript 10.46% HTML 7.67%

static-analysis abstract-interpretation static-analyzer static-analyzers formal-methods program-verification analysis analyzer java cybersecurity

lisa's People

Contributors

Stargazers

Watchers

lisa's Issues

[FEATURE REQUEST] Smashed sum abstract domain

Description
The smashed sum abstract domain takes n > 2 abstract domains and compose them smashing their bottom elements and creating a new single bottom element. Top elements are not smashed but, to preserve the lattice structure, a new top element is introduced (that is greater than each top element of the underlying abstract domains). Least upper bound of two elements of such domain is top when elements come from different abstract domains composing the smashed sum, the least upper bound of the underlying domain otherwise. It is important to have such a way to compose domains especially when dealing with dynamic languages or analysis that handle values of different types.

Suggested implementation
The implementation should be an instance of SemanticDomain and should introduce singletons bottom and top elements. The implementation of the methods of SemanticDomain (e.g., assign, lub) should be similar to the Cartesian product one, except that we need to pay attention when we get non-bottom values from different abstract domains, in such a casa top should be returned.

[BUG] HeapEnvironment is missing equals/hashCode/lub/widen/lessOrEqual to take into account substitutions and replacements

Description
As for #69, HeapEnvironment is adding additional fields to the standard environment/functional lattice. Both equals()/hashCode(), as well as lattice operators, should take these fields into account to guarantee the correct behavior of all operations.

[BUG] FunctionalLattice equals and hashCode should consider also the lattice instance

Description
Currently, equals and hashCode methods in FunctionalLattice just consider the class of the lattice references, while they should take into accout their instances. This is important in particular for the class Environment, where lattice is used to check if the environment is either top or bottom: without considering the lattice instances, a top and bottom environments are wrongly equal.

[FEATURE REQUEST] Base non relational domains should be able to reason about type casting and type conversion expressions

Currently, base non relational domains do not reason about type casting and type conversion expressions, since they just return top when these expressions are met. These domains should be able instead to reason about these symbolic expressions: for instance, interval domain should be able to cast or convert an interval to the type tracked by the domain, i.e., integer.

[FEATURE REQUEST] Non-interference as first inference system

Description
To properly assess the effectiveness of the InferenceSystem and InferredValue classes, we need a proper analysis based on an inference system that needs a program counter/execution state. Non-interference, in its type system definition, is the chosen candidate.

[BUG] NullPointerException when dumping analysis and typing results of unreachable nodes

Description
In branch program-structure(commit 680e53a), a NullPointerExceptionis thrown when LiSA tries to dump the analysis or typing results for a non reachable node of a CFG. In particular, the exceptions are raised at line 399
dumpCFG("typing___", result, st -> result.getAnalysisStateAt(st).toString());
and line 425
dumpCFG("analysis___", result, st -> result.getAnalysisStateAt(st).toString());
since, if st is an unreachable node in the CFG (i.e., it has no incoming edges), result.getAnalysisStateAt(st) returns null. In order to fix this bug, it should be enough to check if result.getAnalysisStateAt(st) is null, for instance:
dumpCFG("analysis___", result, st -> result.getAnalysisStateAt(st) == null ? "Dead code" : result.getAnalysisStateAt(st));

[FEATURE REQUEST] Add an analysis general framework for LiSA

Description
We should implement a general analysis infrastructure for LiSA, in order to provide a starting framework for the definition of new semantic-driven analyses.

Motivation
This feature implementation corresponds to the LiSA analysis core and it should contain the starting point for the definition of new semantic analyses. In particular, we should provide:

an unique analysis-independent fix-point algorithm on CFGs;
the base classes for the definition of new abstract domains. We should start defining the base classes for lattices, functional, value and heap abstract domains (for heap abstractions, we could follow this paper)
a suitable representation of the analysis results on CFGs;

[FEATURE REQUEST] Add a variable table on CFG to handle variable scoping

Description
Variables that appear in one statement and are pushed in the abstract state are never popped out. Having a variable table that defines the scope of each of them enables the automatic removal through forgetIdentifier() of out-of-scope variables.

Motivation
Not removing variables leads to noise when inspecting the abstract state, but also to errors: two different weak identifiers in two different scopes but with same name should be treated separately (i.e. the first assignment to each of them should be strong), but the first assignment to the second will be lubbed with the last assigned value of the first.

[FEATURE REQUEST] Interval satisfies methods

Description
Interval domain does not override the satifies methods from the BaseNonRelationalValueDomainclass. We need to implement them in order to gain more precision when the domain checks the satisfiability of a symbolic expression.

[FEATURE REQUEST] Greatest lower bound for non relational domains

Non relational domains should implement the greatest lower bound operation in order to improve precision on assume method on environments, in particular for filtering the values that should survive after the assumption.

[FEATURE REQUEST] CFG element location should be not null

Description
We should ensure that the code location of each cfg element (Parameter, CFGDescriptor, CompilationUnit, etc) is not null. If the code location of an element is unknown, we should put the unknown location of the corresponding CodeLocation concrete implementation (e.g., for SourceCodeLocation we should put new SourceCodeLocation(null, -1, -1)).

[BUG] Intervals multiplication

Description
There is a mistake in the current implementation of interval multiplication: it calculates x2 * y2 in the intervals multiplication equations, instead of x1 * y2.
Source: https://en.wikipedia.org/wiki/Interval_arithmetic#Interval_operators

Reproducibility information
This bug was introduced in commit ae11414 , on master branch inside the file lisa/src/test/java/it/unive/lisa/test/imp/tutorial/intervals/Intv.java .
You can reproduce this bug by doing operations in your programs like: [5,5] * [10,10], that returns: [50, 100] instead of the correct answer: [50, 50].

[BUG] Moving weak updates from the analysis state

Currently, weak updates are handled by the analysis state. Given an assignment id = exp, if id is a weak identifier, the analysis state lubs the previous analysis state with the current one (the one tracking the assignment). This leads to a terrible loss of precision when the previous state is the TOP abstract state: considering again the weak assignment id = exp, we would lose the information that id -> TOP. In order to get more precise results, we need to move the handling of the weak updates on the domains. For instance, in the case of non-relational domains, weak updates should be handled by the environment.

[FEATURE REQUEST] Parallelize execution of syntactic checks

Description
Execution of syntactic checks should exploit the low resource consumption that is typical of these checks to reduce the overall execution time.

Motivation
It is reasonable to assume that syntactic checks would far outnumber semantic ones in a realistic analysis scenario. It is possible to take advantage of the isolation of these checks (typically considering information from one statement/expression at a time) to parallelize their execution. This would also represent the first parallelization effort, that would provide the framework for a library-wide parallelization framework.

Suggested implementation
Standard executors relying on thread pools would be the preferred way to go. Once that framework is in place, one can either keep the execution of single checks sequential while parallelizing different checks, or execute one check at a time but iterating over the cfgs concurrently. A more intelligent approach, deciding a tradeoff between the two approaches relying on the number of cores available would probably be best.

Further content
We should also investigate the possibility of parallelizing semantic checks too.

[BUG] Loss of previously gathered information when calling constructors

Description
A call to a constructor, using the Intraprocedural call graph abstraction, returns the top abstract state, causing the loss of any previously gathered analysis information. Informally speaking, when a constructor is called, it is as the analysis restarted from the point the constructor is called.

Reproducibility information
In branch program-structure (commit: 680e53a). Let's consider the following IMP program

class example {
	f(e) {
		def obj = new example();
	}
}

and let's suppose to analyze it using the default abstractions for the CallGraph (IntraproceduralCallGraph) and HeapDomain (MonohiliticHeap), and Sign for the ValueDomain. The result of the analysis is reported in the following, by means of the corresponding dumped dot file.

Note that, the abstract value regarding the parameter e (namely TOP, the top of the Sign domain) is lost.
The expected result is the same dot file, except that in the first node the value analysis also contains the information about the parameter e, i.e., contains the entry vid$e: TOP.

[BUG] Set and inverse set lattices LiSA concrete implementations should override equals and hashCode methods

Description
SetLatticeLattice and InverseSetLattice concrete implementations in LiSA should override equals and hashCode methods, considering their own logics. Specifically, any concrete implementation of these domains we have currently in LiSA relies on the property isTop in order to check, together with the elements property of the superclass, if a lattice instance is either top or bottom. Currently, concrete implementations rely on the equals and hashCode methods of the superclass, that just check the equality between elements. In this way, top and bottom elements of the concrete implementations are wrongly seen as equal.

[FEATURE REQUEST] Control flow structure identification

Description
Support for frontend definition of conditional structures and heuristics to identify them if not provided by frontends.

Motivation
Domains might need syntactic information about where the program point being analyzed is inside the cfg in order to apply additional logic.

Suggested implementation
The most accurate information possible comes from the original program: frontends, where possible, should add conditional structures corresponding to loops and if-elses. Where those are not present (compiled code analysis), a set of best-effort heuristics should be in place, in order to extract as much information as possible.

[FEATURE REQUEST] Modularize abstract state

Description
The abstract state provided by LiSA is not expressive enough. For instance, techniques like trace partitioning would have to be applied on value state and heap state independently. We should create a general AbstractState interface, such that users can define their own abstract state and combine them, having a direct effect on both the heap state and value state at once.

[FEATURE REQUEST] Generic graph structure

Description
The CFG graph structure should be abstracted away, since more components could benefit from the same structure. One example of these are the callgraphs, that could be implemented as a graph backed from an adjacency matrix and that will surely need a fixpoint algorithm.

Motivation
Reducing code duplication and increase encapsulation and modularity.

[FEATURE REQUEST] Unsersolved calls must have a a resolution strategy

Description
Depending on the language, calls might be resolved in different ways:

In C, types of parameters are resolved statically and a signature matching those types is required
In Java, the same thing as C happens, excluding the receiver of instance calls, whose type is resolved dynamically
In C#, both receiver and parameters' types are resolved dynamically

The callgraph should be aware of the resolution strategy of the call to be aware of how to resolve the target signature.

[FEATURE REQUEST] Centralize symbolic expression visit

We should centralize the visit of symbolic expressions (by introducing a visitor interface) in order to avoid duplicated code when we recursively inspect symbolic expressions, for example in non-relational domains and base inferred value.

[FEATURE REQUEST] Support for semantic checks

Description
Add support for semantic checks that can access the results of the analysis.

Motivation
The most interesting part of static analysis is being able to issue warnings exploiting semantic information. Moreover, it gives a quick chance to verify the results of an analysis without having to thoroughly inspect its results.

Suggested implementation
Extend the structure of SyntacticCheck and of CheckTool, so that the same executor can be used to spawn the various checks.

[FEATURE REQUEST] Program structure

Description
The collection of cfgs is not enough to express a program: there is no information about fields, global variables, cfg overriding, ...
We should introduce a form of program structure where cfgs are grouped into units, and the general model is enriched by adding all the capabilities that we are missing.

[FEATURE REQUEST] basic elements for running analyses

Description
Following the work of #12, we should provide basic components for first analyses implementations (callgraph, heap abstraction), runtime type information, and an architecture for selecting and executing analyses.

Motivation
We want to provide basic features for running analyses for testing/educational purpose. Ideally we want one simple, imprecise and fast implementation of callgraph, heapdomain and valuedomain. Type inference is also needed for providing domains a way to check if a variable/expression/constant is of interest (e.g. numerical domains want to track properties only on numerical values). A way of specifiying the analyses to execute is also needed.

Suggested implementation

CallGraph
A first implementation could be a callgraph that (i) computes the fixpoint by iterating over all cfgs in no particular order, (ii) resolves call targets to all possible cfgs that are compatible with the target's signature, (iii) when queried for the abstract result of a cfg call, returns top (that is, it treats all calls as open).

HeapDomain
A first implementation could be one that abstracts every heap location to a unique identifier, causing all reads and writes to refer to that one. Extremely imprecise but very fast.

TypeInference
This should be a separate fixpoint iteration that computes type runtime information and then stores them directly inside Statements, that can then use this information to determine what to do during the semantics() computation.

Architecture for analysis selection
A nice feature would be to just specify some analyses through addAnalysis(analysis.class), and then to have LiSA selecting the required analyses to run before and constructing a combination of domains to run.

[FEATURE REQUEST] Factor out the definition of the code location where an element is defined

Right now, most of the syntactic components have a reference to the file, line and column where they are defined in the source code, but not all programs might have source code: reading from bytecode or compiled code, the location in the original program can be determined through other means.

We should introduce a generic 'code location' interface, implemented by a 'source code location', a 'bytecode location', ... that provides a unique string representation of the location where that element is defined, and that domains can use.

[FEATURE REQUEST] Greatest lower bound for set and inverse set lattice domains

Description
SetLattice and InverseSetLattice should provide the greatest lower bound operation in order to better manipulate set elements in abstract domains using them, such as upper bounds or pentagons.

[BUG] Enforce ordering in string representations of abstract states

Description
Graph comparisons use string equality on the contents of the nodes. If the same set of information can be dumped in different order due to non-deterministic iteration, such comparisons will fail even when performing on the same data.

Affected classes are all the ones that hold sets of information: DataflowDomains, Environments, ...

[FEATURE REQUEST] Type cast and type conversion symbolic expressions

Currently, LiSA supports only type casting in symbolic expressions, but there is no support for type conversion. We need to add two separate symbolic expressions handling type conversion and type casting. Specifically,

TYPE_CAST, where the destination type is smaller or equal than the source type (that is, is a narrowing operator)
TYPE_CONV, where the destination type is greater or equal than the source type (that is, is a widening operator)

[FEATURE REQUEST] Annotations support

Description
Currently, LiSA does not support annotations, limiting its internal representation's expressiveness. We need to provide annotations in LiSA internal representation that are as generic as possible in order to model the different programming languages annotations (e.g., Java vs C#) and an efficient way to inject the annotations to the proper elements of the control-flow graph.

[FEATURE REQUEST] Create factories to instantiate analysis components

Description
Submitting new analyses to LiSA is now achieved by invoking lisa.addXXX(domain). This implies that for each new "construct" (call graphs, abstract states, ...) a new method should be implemented. Having a factory that loads and compose the bits of an analysis would simplify the process.

[FEATURE REQUEST] Heap analyses

Implement the first instances of heap analyses:

type-based: heap identifiers are determined only by their type (both field sensitive and insensitive)
point-based: heap identifiers are determined only by the program point where they have been created (both field sensitive and insensitive)

[BUG] Methods with empty body in imp

When a method with an empty body is analyzed using the IMP frontend, the analysis crashes during the fixpoint computation.

[BUG] Fail to match constructor when object is instantiated

Branch InterproceduralAnalysis. Testcase ContextSensitiveInterproceduralAnalysisTest.testRTAContextSensitive
When it resolves the various method calls, the dynamic types of the receivers are always empty, thus it fails to resolve the calls.
In particular, when debugging the aforementioned test case with a breakpoint at line 40 of class CallGraphBasedInterproceduralAnalysis it firsts stop for call [unresolved]A(this) (that is the first line of method subtyping). When stepping into, this is resolved to dynamic type A (this looks odd to me, this should be of type subtyping since this is the class where the code is contained, however the type is correct - but the receiver is not :)), but when it tries to match the signature of the called methods with the possible candidates (line 83 AbstractBaseCallGraph) it fails since the call has a parameter (this) while the constructor has no parameter.

[FEATURE REQUEST] Interprocedural analysis

Support inteprocedural analysis, and in particular:

different call graphs (CHA and RTA at least)
context sensitive analyses through call strings

[FEATURE REQUEST] Provide program point information to abstract domains

Description
All instances of abstract domains (SemanticDomain, NonRelationalDomain, ...) should have knowledge of the program point that generated the expressions they are evaluating.

Motivation
Information about the program point (i.e. if it is inside a loop) can be required by abstract domains for them to be able to apply their reasoning. A simple example of this is a heap abstraction that creates weak identifiers only for program points that are inside a loop.

Suggested implementation
Statement should implement a ProgramPoint interface that provides access to the cfg. The cfg can then provide methods to know if a program point is inside loops, or any other logic required by domains. All methods of abstract domains should take the program point being evaluated as parameter.

[FEATURE REQUEST] Collections of expressions should be instances of a SetLattice

Collections of symbolic expressions that we use around the analysis (for example, rewritten expressions) behave as instances of a SetLattice abstract domain. We should create a set lattice domain of symbolic expression (ExpressionSetLattice) and replace the collections around the analysis with instances with this domain.

[FEATURE REQUEST] Add support for more than one entrypoint on CFGs

Description
The CFG structure should have an Collection<Statement> entrypoints instead of the the Statement first field to support multi-entrypoints procedures.

Motivation
In some languages it is possible to jump in the middle of the code of a procedure. Thus, those procedure will have more than one entry point (or "first" instruction). To model those, a more flexible structure is needed.

[FEATURE REQUEST] Identifier traceability for non relational domains

Currently, non-relational abstract domains are able to process any identifier, independently of the identifier types. For instance, numerical abstract domains can only track identifiers, and not, for example, pointer identifiers. This lead to unexpected analysis results (e.g., pointer identifiers abstracted into interval values). We need to define which identifiers can be tracked by a non-relational abstract domain: if a non-relational domain cannot track an identifier, it should assign top to it.

[BUG] Equals and hashCode methods for InferenceSystem

Description
Currently, InferenceSystem inherits equals and hashCode methods from Environment, without taking into account the inferred value of the instance of inference system. It should override the methods taking into account also the inferred value.

Reproducibility information
On master branch (commit d7e12f7) for instance the inference systems
x: [int] y: [int] [inferred: _|_, state: _|_] and x: [int] y: [int] [inferred: [int], state: _|_] are equal even if they infer different values. This lead to a wrong behavior on lub: in particular, the expected result is x: [int] y: [int] [inferred: [int], state: _|_], but the actual one is x: [int] y: [int] [inferred: _|_, state: _|_]

[FEATURE REQUEST] Support for weak and strong assignments

Description
Heap domains might produce weak identifiers. On those, the semantics of assignment should be modified to perform the least upper bound between the entry state and the exist state of the assignment.

Suggested implementation
Add an isWeak property in Identifiers, and have the AnalysisState check for that flag and perform a strong or weak assignment.

[BUG] type analysis issues

The results of the type analysis in the test cases seem to be wrong. For instance, in https://raw.githubusercontent.com/UniVE-SSV/lisa/master/lisa/imp-testcases/type-inference/typing___untyped_typing.test2(untyped_i).dot it infers that the type of the expression after a Boolean condition (node 3) is TOP instead of Boolean.
In addition, it exposes non-deterministic behaviors in branch inteprocedural. For instance, in the example above sometimes it computer TOP, other times integer.
I tried to use HashSet instead of ExternalSet in field elements of InferredTypes but it does not help (the nondeterministic behaviors are fewer/less frequent, but still there).

[FEATURE REQUEST] Support hybrid call resolution

Description
At the moment, the call resolution algorithm inside call graph throws a CallResolutionException if a call that has both CFGs and NativeCFGs as targets is found. We need to support this case.

[FEATURE REQUEST] Inputs and outputs to LiSA should be serializable

Description
All inputs (CFGs for now) and outputs (collections of warnings for now) should be serializable, such that we can read them from - and write them to - the file system.

Motivation
Having serialization and deserialization feature would ease the creation of tests as well as keeping a clean reference result for regression tests. This would also enable external tools to produce inputs for LiSA and read her outputs.

Suggested implementation
JAXB is probably the best choice, since schemas can be automatically generated for other tools to generate data structures modeling LiSA's.

[FEATURE REQUEST] Dataflow interface

Description
It should be possible to easily define dataflow analysis by just defining the join operator and the gen/kill functions.

[BUG] Wrong NoOp simplification

Description
The current simplification algorithm is mishandling the simplification of NoOps that are entry nodes or exit nodes. This leads to exceptions at various stages of the analysis (fixpoint, graph dumping, ...).

In short:

NoOps that are entrypoints get removed, but their outgoing edges are not; the set of entrypoints is not modified
NoOps that are exitpoints get removed, but their ingoing edges are not

Reproducibility information
Happening on v0.1a2. Tested on three CFGs:

CFG first = new CFG(new CFGDescriptor("foo"));
NoOp start = new NoOp(first);
Assignment assign = new Assignment(first, new Variable(first, "x"), new Literal(first, 5, Untyped.INSTANCE));
Return ret = new Return(first, new Variable(first, "x"));
first.addNode(start);
first.addNode(assign);
first.addNode(ret);
first.addEdge(new SequentialEdge(assign, ret));
first.addEdge(new SequentialEdge(start, assign));
first.simplify();

CFG second= new CFG(new CFGDescriptor("foo"));
Assignment assign1 = new Assignment(first, new Variable(first, "x"), new Literal(first, 5, Untyped.INSTANCE));
Assignment assign2 = new Assignment(first, new Variable(first, "y"), new Literal(first, 50, Untyped.INSTANCE));
NoOp end = new NoOp(first);
first.addNode(end);
first.addNode(assign1);
first.addNode(assign2);
first.addEdge(new SequentialEdge(assign1, assign2));
first.addEdge(new SequentialEdge(assign2, end));
second.simplify();

CFG third = new CFG(new CFGDescriptor("foo"));
Assignment assign1 = new Assignment(first, new Variable(first, "b"),
		new Literal(first, true, Untyped.INSTANCE));
Assignment assign2 = new Assignment(first, new Variable(first, "x"), new Literal(first, 5, Untyped.INSTANCE));
Assignment assign3 = new Assignment(first, new Variable(first, "y"), new Literal(first, 50, Untyped.INSTANCE));
NoOp end = new NoOp(first);
first.addNode(end);
first.addNode(assign1);
first.addNode(assign2);
first.addNode(assign3);
first.addEdge(new TrueEdge(assign1, assign2));
first.addEdge(new FalseEdge(assign1, assign3));
first.addEdge(new SequentialEdge(assign2, end));
first.addEdge(new SequentialEdge(assign3, end));
third.simplify();

Expected behavior
toString() of the adjacency matrix of the three cfgs:

first:
"x = 5" -> [
	ingoing: 
	outgoing: [ x = 5 ] ---> [ return x ]
]
"return x" -> [
	ingoing: [ x = 5 ] ---> [ return x ]
	outgoing: 
]

second:
"x = 5" -> [
	ingoing: 
	outgoing: [ x = 5 ] ---> [ y = 50 ]
]
"y = 50" -> [
	ingoing: [ x = 5 ] ---> [ y = 50 ]
	outgoing: 
]

third:
"b = true" -> [
	ingoing: 
	outgoing: [ b = true ] -T-> [ x = 5 ], [ b = true ] -F-> [ y = 50 ]
]
"y = 50" -> [
	ingoing: [ b = true ] -F-> [ y = 50 ]
	outgoing: [ y = 50 ] ---> [ no-op ]
]
"x = 5" -> [
	ingoing: [ b = true ] -T-> [ x = 5 ]
	outgoing: [ x = 5 ] ---> [ no-op ]
]
"no-op" -> [
	ingoing: [ x = 5 ] ---> [ no-op ], [ y = 50 ] ---> [ no-op ]
	outgoing: 
]

Actual behavior
toString() of the adjacency matrix of the three cfgs:

first:
"x = 5" -> [
	ingoing: [ no-op ] ---> [ x = 5 ]
	outgoing: [ x = 5 ] ---> [ return x ]
]
"return x" -> [
	ingoing: [ x = 5 ] ---> [ return x ]
	outgoing: 
]

second:
"x = 5" -> [
	ingoing: 
	outgoing: [ x = 5 ] ---> [ y = 50 ]
]
"y = 50" -> [
	ingoing: [ x = 5 ] ---> [ y = 50 ]
	outgoing: [ y = 50 ] ---> [ no-op ]
]

third:
"b = true" -> [
	ingoing: 
	outgoing: [ b = true ] -T-> [ x = 5 ], [ b = true ] -F-> [ y = 50 ]
]
"y = 50" -> [
	ingoing: [ b = true ] -F-> [ y = 50 ]
	outgoing: [ y = 50 ] ---> [ no-op ]
]
"x = 5" -> [
	ingoing: [ b = true ] -T-> [ x = 5 ]
	outgoing: [ x = 5 ] ---> [ no-op ]
]

[BUG] Interval greatest lower bound is not correctly working

Description
The Interval greatest lower bound does not correctly work when the intersection between the interval is empty.

Reproducibility information
On branch master (commit 9eea32e), consider the interval [0,0] and [1,1]. The expected result is bottom, while glbAux computes 1 as low bound and 0 as high bound. We should check if the new low bound is greater than the new high bound: in this case, we should return bottom.

[FEATURE REQUEST] Implement a generic graph visitor interface

Description
With the generic graph structure being introduced, a generic visitor pattern should be introduced to uniformly inspect all types of graph (call graphs or cfgs). Checks (syntactic and all future types) would directly benefit from this, but also other constructs might exploit this pattern (e.g. a call graph that wants to find all calling statements in a cfg).

[FEATURE REQUEST] Add static typing information

Description
LiSA should be aware of (static) typing coming from the original program, and should then be able to infer runtime types.

Motivation
Typing is essential for determining the appropriate target of instance method calls in object-oriented programs, and are also necessary to determine the result of casting operations. LiSA should not enforce typing where not needed, since some languages are not statically typed and frontends should not be forced to add typing information on them.

Suggested implementation
We would go for a generic Type interface with some sub-interfaces for more specific types. A partial hierarchy would be:

interface Type {
  default public boolean isNumeric() {
    return this instanceof NumericType;
  }
  default public NumericType asNumeric() {
    return isNumeric() ? (NumericType) this : null;
  }
}

interface NumericType extends Type {
  boolean isFloatingPoint();
  boolean is64Bits();
}

Together with NumericType, other specific interfaces can be provided (e.g. StringType). A concrete instance Untyped should also be created, serving as a default type for non-statically typed programs.

Information about static typing should be added to CFG (return value and parameters) and to each expression instance.

[FEATURE REQUEST] Allow the frontend to define the entry points of a program

Right now, there is no way to define what are the entry points of the applications. When we run the interprocedural analysis, we take all methods and we consider them as entry points. We should:

add a method that allows the frontend to return the entry points
use this method to implement a Program.getEntryPoints() method that returns a collection of CFGs
use this method in the interprocedural analysis

@lucaneg I can take care of the second and third point, but I am not sure on how to proceed for the first one.

[FEATURE REQUEST] Inference systems

Description
We need a special ValueEnvironment that can model inference systems, like type systems or non interference.

Suggested implementation
An extension of ValueEnvironment that gives access to (i) the last inferred value, and (ii) an execution state/program counter that keeps track of the traversed conditions.

lisa-analyzer / lisa Goto Github PK

lisa's People

Contributors

Stargazers

Watchers

Forkers

lisa's Issues

Recommend Projects

Recommend Topics

Recommend Org