Comments (5)
Currently, this is one of the TODOs. I will just write down some thoughts about an action plan on how this can be implemented.
On the function definition, if it has a kwargs
or args
argument we need to implement a corresponding dictionary/list that will contain those definitions. Since we can't know which items this dictionary/list will contain at function definition, we just create the dictionary object, and its items will be filled in on function call.
In the case of a function call with the **kwargs
parameter, we just iterate the items of the dictionary that is passed as a parameter, and update the object that corresponds to the **kwargs
that was created on function definition. The internal functionality of pycg will take care of the rest I believe.
So, a TODO list would be:
- Create micro-benchmark items that correspond to
kwargs
andargs
. - Implement
args
functionality as a list. - Implement
kwargs
functionality as a dictionary.
I will be happy to provide more detail if anyone wants to assign this to themselves.
from pycg.
Sounds awesome!
I'm considering using PyCG in one of our projects, and I can see us addressing this issue (and perhaps others).
Lately, I cloned the repo and ran the benchmarks and saw that a lot of them fail, many of them on completeness rather than soundness problems. Even though I'm not worried about certain complex cases, could you provide a more definitive list of limitations?
In addition, we may want to add a more programmatic, configurable API to PyCG, so that we can call it from other code. Would that be something you're interested in?
from pycg.
Could you provide some examples of the tests that fail due to completeness issues (category-test)? Some test cases are new and experimental (especially those related to external calls) but on my local machine (OSx) most tests fail due to soundness issues (i.e. false negatives).
Regarding an API for PyCG, sure I would be interested in that. What is the main use case that you have in mind?
As far as concrete limitations:
- External Calls: PyCG is pretty good at identifying calls that are related to internal calls within a package but has trouble identifying external calls due to not having their source code available. This problem can mainly be solved by heuristics that identify the namespaces of external entities (which will not lead to maximal recall and might even harm precision) or by implementing an extension of PyCG that can handle the analysis of many packages at once. The main challenge for the latter is identifying the exact location of an external module that is imported, but I believe we can come up with an efficient design.
- Built-in Methods: Currently there is no support for the effects of built-in methods. So, for example, if there's a call to
list.append
, PyCG won't identify the effects of that call and won't store the new identified element. This problem can be solved by modeling built-in functions and their effects (e.g. whenever we see an append call, we internally store the new element). As another example consider the callhash(obj)
which will lead to the call of the method__hash__()
of that object. There are many such calls on Python's standard library, but I believe we are interested in only a subset of those.
These are the most important issues that come from the top of my head right now. There are some minor other limitations that can be found by executing the test cases. Also, if you find any other limitations or would like to test a particular test case that is not included in the micro-benchmark, it would be really cool if you created a pull request with those new test cases!
from pycg.
I am mostly concerned about false negatives (i.e. missing information). Below is a screenshot of which tests fail for me.
Essentially, what we're trying to do here is trace call paths to specific external methods. As we do not have control over the analyzed code, we cannot assume it is typed. We'd prefer not to work with files more than neccessary, and remain in Python-land.
To accomplish this, we must resolve the problem of the external call. We are only interested in a closed set of libraries, so we could simply download all the libraries, or a typeshed
version of them, ahead of analysis. However, from what I've seen, PyCG not only does not leverage type annotations to enhance results, but it also ignores any clauses with type annotations. As an example, if you changed test_assignments/chained
on line 7 to something like b: Callable = func1
, the test will ignore the assignment and won't know anything about b
even though it ran successfully without the type annotation.
To wrap things up, how difficult would it be to:
- Provide support for type annotations, including
.pyi
stubs, and leverage that to improve results - Identify the call paths to a closed set of libraries (and methods of external classes).
By the way - according to my understanding (and as described https://steemit.com/software/@cpuu/the-difference-between-soundness-and-completeness), having false negatives (corresponds to completeness problems. Am I right?
from pycg.
Related to completeness and soundness, in general, completeness is related to false negatives but as far as I understand in the program analysis world it relates to false positives. I'll try to use the term false negatives
to avoid any confusion.
Regarding type annotations, PyCG does not need any information about the types of variables since it infers any potential types during program execution. In this case, I'm not confident that the analysis of .pyi
stubs will provide any additional benefit. However, the case that PyCG is ignoring elements with a type annotation is probably due to the AST Visitor. Specifically, we need to define methods for each object that is being visited. For example, for function definitions, we must define the method visit_Func
. Probably, the AST Visitor pattern requires a different method for a typed assignment. Are there any other cases with type definitions that PyCG ignores? It would be very useful to have a list of those and find the relevant methods (even though I could not find any documentation of the available visitor methods, but there are some hacky ways around finding them).
Regarding external calls to a closed set of libraries, the most straightforward way of accomplishing this is by providing the source code for all packages to PyCG. Currently, PyCG is designed to work with one specific package, but I do not see any issues that can arise by extending this functionality to work for multiple packages.
I see two main engineering tasks:
- Retrieving the correct namespace of a module under analysis based on the current package under analysis.
- Making the import mechanism identify the correct namespace of an imported module.
For the first one, we retrieve the namespace of a module using the operation to_mod_name(os.path.relpath(package_path, module_path))
. If we had the correct package path for the current module we are analyzing this would lead to the correct namespace. The second one is a bit more tricky. One needs to identify whether the imported module is residing on the current package or an external package. This can be done through heuristics but needs some experimenting on for a concrete action plan.
from pycg.
Related Issues (20)
- Broken link HOT 1
- What tool have you used to generate the figures from JSON/FASTEN? HOT 1
- Can it be used to generate static backward slice? HOT 1
- what is the meaning of NEXT_MEHOD and ITER_METHOD? HOT 1
- list definition creation causes an infinite loop in the post processor HOT 1
- make test occur a error HOT 1
- Can PyCG optimize to flow sensitive? HOT 5
- PyCG fails when parsing files whose names are used in init file declarations HOT 1
- how to analyse cg of __init__.py HOT 2
- Possible to have option to exclude builtin functions? HOT 1
- Not being able to run PyCG from source due to circular import(?) HOT 2
- analysis does not follow explicit super()-calls HOT 2
- Add class def in addition to MRO HOT 1
- Relative/partial module path of class due to relative import HOT 8
- Does pycg work in colab? HOT 1
- PyCG can't support the analysis of function pointers in a call graph. HOT 1
- Ignore built-in functions, numpy, torch... HOT 2
- Detected non-deterministic results under various configurations HOT 3
- Handling posonlyargs HOT 2
- Install of `0.0.7` failing in GitHub actions HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pycg.