Giter Site home page Giter Site logo

jameelnabbo / php-parsers Goto Github PK

View Code? Open in Web Editor NEW
20.0 3.0 7.0 1.28 MB

Parsing PHP source code using Python and generating ASTs

Home Page: https://bufferoverflows.net/

License: MIT License

Python 100.00%
sast yacc ply php appsec lexer parser php-sast php-parser

php-parsers's Introduction

AST Generator and Analyser written in Python

A small tools for building ASTs using Python and performing simple queries on them.


Table of Contents

About The Project

A small Python Library for building ASTs from source code and performing simple queries on them.

Note: This project only supports ASTs for PHP as of now and is no longer in development. There are a few known issues listed here. Any contributions are welcome.

The project uses a PLY-based Parser for building ASTs for PHP. There is also an ANTLR-based compiler but it is not fully complete and compatible with the modules. Therefore, all the modules in the project use the PLY-based compiler.

The project is license under the MIT License

Installation

The project requires Python 3.x.

Dependencies for the project are stored in requirements.txt. Install them using

pip install -r requirements.txt

Getting Started

Building AST for a File

To build ast for a file, create the following test.py file in the root directory.

from src.modules.php.syntax_tree import build_syntax_tree

tree = build_syntax_tree("path/to/php/file")

Run it using python -i test.py to inspect the result of the AST built. build_syntax_tree returns a SyntaxTree object.

Building Resource Tree for a Directory

A Resource Tree is basically a collection of ASTs for all the files in a project directory along with some other information (e.g, Function and Method definitions).

To build a Resource Tree for a given directory, create a test2.py file in the root directory.

from src.modules.php.resource import build_resource_tree

r_tree = build_resource_tree("examples/php")

Then run it using python -i test2.py to inspect the results. build_resource_tree returns a ResourceTree object.

Using Traversers and Visitors

Analysing the built Abstract Syntax Trees requires you to follow the Visitor Pattern. You need to use a traverser that inherits from the built-in Traverser class and overrides its methods. The traverser can register one or more visitors that inherit from the built-in Visitor class.

Once the traverser runs, it should visit each node in the tree and dispatch all of the visitors on the visited node. The visitors can collect information or mutate the AST during the traversal.

Have a look at the predefined Visitors and Traversers to get an idea of how it works.

Here is a minimal example that uses the built-in BFTraverser and a custom visitor to print the types of all the nodes present in the AST:

from src.modules.php.traversers.bf import BFTraverser
from src.modules.php.base import Visitor 
from src.modules.php.syntax_tree import build_syntax_tree

class CustomVisitor(Visitor):
    def visit(self, node):
        print(type(node))

s_tree = build_syntax_tree("/path/to/file")
traverser = BFTraverser(s_tree) 
printer_visitor = CustomVisitor()
traverser.register_visitor(printer_visitor)

traverser.traverse()

Querying for Particular Nodes

To search for and collect nodes that meet a particular criterion, you can use the pre-defined NodeFinder visitor. It takes a boolean-valued callback function and searches for nodes that meet that callback.

For example, to search for all function calls without a parameter, you would use the following example,

from src.modules.php.syntax_tree import build_syntax_tree
from src.modules.php.visitors.finders import NodeFinder
from src.modules.php.traversers.bf import BFTraverser
from src.compiler.php.phpast import FunctionCall

def function_has_no_params(node):
    return isinstance(node, FunctionCall) and len(node.params) == 0

s_tree = build_syntax_tree("/path/to/php/file")
bft = BFTraverser(s_tree)
node_finder = NodeFinder(function_has_no_params)
bft.register_visitor(node_finder) 

bft.traverse()
print(node_finder.found)

Predefined Traversers and Visitors

Traversers:

  • BFTraverser: Carries the visitors in a Breadth-first manner
  • DFTraverser: Carries the visitors in a Depth-first manner

Visitors:

  • Printer: Prints the nodes of the AST
  • GraphBuilder: Builds a graph using the AST
  • NameFinder: Searches for specified names (Function Definitions/Variables) and collects those nodes
  • NameHighlighter: Searches and highlights specified names using the AST and a GraphBuilder instance
  • NodeFinder: Takes a boolean-valued callback function and collects all the nodes that satisfy that callback filter
  • DependencyResolver: Searches for all the Include/Require nodes and tries to expand them by connecting the SyntaxTree for the imported file

Resource Tree specific Visitors:

Work similar to the above visitors, except they require a ResourceTree instance at initialization.

  • TablesBuilder: Searches for all function/method definitions while walking a file and updates the function_table and method_table in the corresponding ResourceTree
  • ResourceCallsFinder: Searches for all the Function Calls and Method Calls and associates them with the definitons in function_table and method_table of the corresponding ResourceTree.

Known Issues

  • Poor Performance, especially in the ANTLR-based parser
  • The PLY-based parser does not interpret some constrcuts properly. For example,
    • Use declarations of the format use (const|function) (namespace)
    • Arrow Functions
    • Nested Variable Names or Accessors like $a -> {$b -> c}
  • The ANTLR-based parser has poor performance and is tens of times slower than the PLY-based parser. The AST Generation is also not fully complete

php-parsers's People

Contributors

jameelnabbo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

php-parsers's Issues

More examples of how to use the CustomVisitor class

hello! I am very interested in your project.

It's not different, I have a question about the example code, so I left a question.

In the code below, I created a CustomVisitor, can I only use that visit function?

Or is there a way to have a function called for each token with a visit_Num expression like python's ast module?

from src.modules.php.traversers.bf import BFTraverser
from src.modules.php.base import Visitor
from src.modules.php.syntax_tree import build_syntax_tree

class CustomVisitor(Visitor):
     def visit(self, node):
         print(type(node))

s_tree = build_syntax_tree("/path/to/file")
traverser = BFTraverser(s_tree)
printer_visitor = CustomVisitor()
traverser.register_visitor(printer_visitor)

traverser. traverse()

I would like to know how to inherit the Visitor class and utilize various functions of the CustomVisitor class.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.