Giter Site home page Giter Site logo

resolve *args, **kwargs about pycg HOT 5 CLOSED

vitsalis avatar vitsalis commented on May 25, 2024
resolve *args, **kwargs

from pycg.

Comments (5)

vitsalis avatar vitsalis commented on May 25, 2024

Currently, this is one of the TODOs. I will just write down some thoughts about an action plan on how this can be implemented.

On the function definition, if it has a kwargs or args argument we need to implement a corresponding dictionary/list that will contain those definitions. Since we can't know which items this dictionary/list will contain at function definition, we just create the dictionary object, and its items will be filled in on function call.

In the case of a function call with the **kwargs parameter, we just iterate the items of the dictionary that is passed as a parameter, and update the object that corresponds to the **kwargs that was created on function definition. The internal functionality of pycg will take care of the rest I believe.

So, a TODO list would be:

  • Create micro-benchmark items that correspond to kwargs and args.
  • Implement args functionality as a list.
  • Implement kwargs functionality as a dictionary.

I will be happy to provide more detail if anyone wants to assign this to themselves.

from pycg.

rotemiman avatar rotemiman commented on May 25, 2024

Sounds awesome!
I'm considering using PyCG in one of our projects, and I can see us addressing this issue (and perhaps others).

Lately, I cloned the repo and ran the benchmarks and saw that a lot of them fail, many of them on completeness rather than soundness problems. Even though I'm not worried about certain complex cases, could you provide a more definitive list of limitations?

In addition, we may want to add a more programmatic, configurable API to PyCG, so that we can call it from other code. Would that be something you're interested in?

from pycg.

vitsalis avatar vitsalis commented on May 25, 2024

Could you provide some examples of the tests that fail due to completeness issues (category-test)? Some test cases are new and experimental (especially those related to external calls) but on my local machine (OSx) most tests fail due to soundness issues (i.e. false negatives).

Regarding an API for PyCG, sure I would be interested in that. What is the main use case that you have in mind?

As far as concrete limitations:

  • External Calls: PyCG is pretty good at identifying calls that are related to internal calls within a package but has trouble identifying external calls due to not having their source code available. This problem can mainly be solved by heuristics that identify the namespaces of external entities (which will not lead to maximal recall and might even harm precision) or by implementing an extension of PyCG that can handle the analysis of many packages at once. The main challenge for the latter is identifying the exact location of an external module that is imported, but I believe we can come up with an efficient design.
  • Built-in Methods: Currently there is no support for the effects of built-in methods. So, for example, if there's a call to list.append, PyCG won't identify the effects of that call and won't store the new identified element. This problem can be solved by modeling built-in functions and their effects (e.g. whenever we see an append call, we internally store the new element). As another example consider the call hash(obj) which will lead to the call of the method __hash__() of that object. There are many such calls on Python's standard library, but I believe we are interested in only a subset of those.

These are the most important issues that come from the top of my head right now. There are some minor other limitations that can be found by executing the test cases. Also, if you find any other limitations or would like to test a particular test case that is not included in the micro-benchmark, it would be really cool if you created a pull request with those new test cases!

from pycg.

rotemiman avatar rotemiman commented on May 25, 2024

I am mostly concerned about false negatives (i.e. missing information). Below is a screenshot of which tests fail for me.

Essentially, what we're trying to do here is trace call paths to specific external methods. As we do not have control over the analyzed code, we cannot assume it is typed. We'd prefer not to work with files more than neccessary, and remain in Python-land.

To accomplish this, we must resolve the problem of the external call. We are only interested in a closed set of libraries, so we could simply download all the libraries, or a typeshed version of them, ahead of analysis. However, from what I've seen, PyCG not only does not leverage type annotations to enhance results, but it also ignores any clauses with type annotations. As an example, if you changed test_assignments/chained on line 7 to something like b: Callable = func1, the test will ignore the assignment and won't know anything about b even though it ran successfully without the type annotation.

To wrap things up, how difficult would it be to:

  1. Provide support for type annotations, including .pyi stubs, and leverage that to improve results
  2. Identify the call paths to a closed set of libraries (and methods of external classes).

My locally failing tests:
Screen Shot 2021-07-29 at 20 35 32


By the way - according to my understanding (and as described https://steemit.com/software/@cpuu/the-difference-between-soundness-and-completeness), having false negatives (corresponds to completeness problems. Am I right?

from pycg.

vitsalis avatar vitsalis commented on May 25, 2024

Related to completeness and soundness, in general, completeness is related to false negatives but as far as I understand in the program analysis world it relates to false positives. I'll try to use the term false negatives to avoid any confusion.

Regarding type annotations, PyCG does not need any information about the types of variables since it infers any potential types during program execution. In this case, I'm not confident that the analysis of .pyi stubs will provide any additional benefit. However, the case that PyCG is ignoring elements with a type annotation is probably due to the AST Visitor. Specifically, we need to define methods for each object that is being visited. For example, for function definitions, we must define the method visit_Func. Probably, the AST Visitor pattern requires a different method for a typed assignment. Are there any other cases with type definitions that PyCG ignores? It would be very useful to have a list of those and find the relevant methods (even though I could not find any documentation of the available visitor methods, but there are some hacky ways around finding them).

Regarding external calls to a closed set of libraries, the most straightforward way of accomplishing this is by providing the source code for all packages to PyCG. Currently, PyCG is designed to work with one specific package, but I do not see any issues that can arise by extending this functionality to work for multiple packages.

I see two main engineering tasks:

  1. Retrieving the correct namespace of a module under analysis based on the current package under analysis.
  2. Making the import mechanism identify the correct namespace of an imported module.

For the first one, we retrieve the namespace of a module using the operation to_mod_name(os.path.relpath(package_path, module_path)). If we had the correct package path for the current module we are analyzing this would lead to the correct namespace. The second one is a bit more tricky. One needs to identify whether the imported module is residing on the current package or an external package. This can be done through heuristics but needs some experimenting on for a concrete action plan.

from pycg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.