Giter Site home page Giter Site logo

Introduce named children about php-ast HOT 21 CLOSED

nikic avatar nikic commented on August 19, 2024
Introduce named children

from php-ast.

Comments (21)

nikic avatar nikic commented on August 19, 2024

While we internally have an AST pretty printer, it's not so easy to expose it, because it works on our raw internal representation. So we'd have to import the userland AST representation back into the internal one. And that's rather problematic, because we'd have to ensure that AST nodes are only nested correctly (there are certain assumptions about what can be used where, otherwise you'll get segfaults).

from php-ast.

malteskoruppa avatar malteskoruppa commented on August 19, 2024

If I understand correctly, at some lower layer, PHP has an internal raw representation of an AST, as well as a pretty printer that works on this representation. At some higher layer (the php-ast extension), this internal AST is converted to a "userland" AST and exposed via the ast\* API. Thus, if a userland AST created/manipulated in userland is to be pretty printed, the php-ast extension cannot use the internal pretty printer on the lower layer, because this pretty printer does not know anything about this userland AST. Is that correct? Please correct me if I'm wrong. :-)

Then, the following questions come to my mind:

  1. Why are there different representations of the AST in the first place? At first glance this seems slightly unfortunate, because it creates problems such as this. Could not the internal AST representation be exported, instead of some converted userland AST? This would yield the possibility to parse, manipulate and execute PHP code directly in userland for free. :)
  2. Is it not possible to convert the userland AST back into the internal representation? Such a "reverse conversion" could be implemented at the higher level of the php-ast extension, and then the pretty printer working on the lower level could be passed the converted internal representation, so the pretty printer would not need to know anything about the userland AST.
  3. Alternatively, could not another pretty printer be implemented at the level of the php-ast extension, and this one exposed to userland? Of course it is kind of redundant and requires maintaining two separate pretty printers, but, since there are two different AST representations, maybe this cannot be helped. In fact, this pretty printer could even be implemented in userland itself, e.g., in the util.php bundled with php-ast. It is probably some work, but the result would be an extremely powerful tool for PHP programmers. :)

Forgive me if these questions are not too naive. I'm fairly new to PHP, and only seek to learn. :)

from php-ast.

nikic avatar nikic commented on August 19, 2024
  1. The internal representation is an arena-allocated tree of C structures, using C pointers between them. This is not something that can be directly used from PHP code. The userland representation is simply a tree of normal PHP objects. Conversely, the userland representation cannot be used internally because it's uses way too much memory, is slow and is hard to use from internal code.

    It is possible to keep using the internal structure and expose a userland "view" onto it, which will lazily create objects for individual nodes. This is what https://github.com/sgolemon/astkit does. This does allow reusing the internal pretty printer, however such an AST view would be read-only without some significant effort (see 2) and I just generally don't like this model (because it's magic and it holds on to more memory than it has to).

  2. It's possible to do this conversion, and the conversion itself is very simple. The problem is that our internal code makes a lot of assumptions about how a proper AST should look like (with these expectations matching what our parser spits out). E.g. code will assume that a statement will not be used as an expression operand, etc. There are many such rules (some simpler, some more complicated), so this would require some non-trivial validation. It's certainly doable (and will have to be done at some point if we want to allow implementation of compiler hooks from userland), it just takes some work :)

  3. Just implementing a pretty printer in userland is absolutely possible, that's what I'd recommend to do :) Will be more robust and easier to fix as well...

from php-ast.

malteskoruppa avatar malteskoruppa commented on August 19, 2024

I see. Thank you for your detailed answers.

I hope that (2) will be implemented someday, that would be awesome. In the meantime, (3) is probably not too hard to do. If I find the time, I may try myself at it. ;)

On a related side question, where would I find more information about the structure of the ASTs as represented by the php-ast extension?

  • For example, I found that each node of kind AST_ARRAY_ELEM has two children which are plain types:

    1. The first is the array element's value;
    2. The second is the array element's key. If no key is defined, this is simply null.

    I only found out what the second child was using try&error.

  • As a second example, each node of kind AST_CLASS has four children. I found that typically, the third child is an AST_STMT_LIST containing the statements within the class, while the others are null; after some guessing around, I found that

    1. The first is either null or an AST_NAME representing which class the current class extends;
    2. The second is either null or an AST_NAME_LIST representing which classes the current class implements;
    3. The third is an AST_STMT_LIST as mentioned above;
    4. I did not find what the fourth child is for; it was null in all my tries.

So if I wanted, say, to find out the meaning of the fourth child of an AST_CLASS, where would I look, other than wildly guessing and using try&error? :)

from php-ast.

nikic avatar nikic commented on August 19, 2024

There is currently no good way to find out what things mean apart from trial and error, or alternatively looking at http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_language_parser.y.

For the future we should either document the meaning of the offsets or generate string keys for child names (for non-list nodes). The latter would probably be preferable...

The last child of class nodes is always NULL, it's just an artifact of decl-nodes being fixed-size. I've suppressed it in e52afee.

from php-ast.

nikic avatar nikic commented on August 19, 2024

Here's some docs to start:

$funcNames = ['params', 'uses', 'stmts', 'returnType'];

$names = [
    /* special nodes */
    'AST_NAME' => ['name'],
    'AST_CLOSURE_VAR' => ['name'],

    /* declaration nodes */
    'ZEND_AST_FUNC_DECL' => $funcNames,
    'ZEND_AST_CLOSURE' => $funcNames,
    'ZEND_AST_METHOD' => $funcNames,
    'ZEND_AST_CLASS' => ['extends', 'implements', 'stmts'],

    /* 0 child nodes */
    'ZEND_AST_MAGIC_CONST' => [],
    'ZEND_AST_TYPE' => [],

    /* 1 child node */
    'ZEND_AST_VAR' => ['name'],
    'ZEND_AST_CONST' => ['name'],
    'ZEND_AST_UNPACK' => ['expr'],
    'ZEND_AST_UNARY_PLUS' => ['expr'],
    'ZEND_AST_UNARY_MINUS' => ['expr'],
    'ZEND_AST_CAST' => ['expr'],
    'ZEND_AST_EMPTY' => ['expr'],
    'ZEND_AST_ISSET' => ['var'],
    'ZEND_AST_SILENCE' => ['expr'],
    'ZEND_AST_SHELL_EXEC' => ['expr'],
    'ZEND_AST_CLONE' => ['expr'],
    'ZEND_AST_EXIT' => ['expr'],
    'ZEND_AST_PRINT' => ['expr'],
    'ZEND_AST_INCLUDE_OR_EVAL' => ['expr'],
    'ZEND_AST_UNARY_OP' => ['expr'],
    'ZEND_AST_PRE_INC' => ['var'],
    'ZEND_AST_PRE_DEC' => ['var'],
    'ZEND_AST_POST_INC' => ['var'],
    'ZEND_AST_POST_DEC' => ['var'],
    'ZEND_AST_YIELD_FROM' => ['expr'],

    'ZEND_AST_GLOBAL' => ['var'],
    'ZEND_AST_UNSET' => ['var'],
    'ZEND_AST_RETURN' => ['expr'],
    'ZEND_AST_LABEL' => ['name'],
    'ZEND_AST_REF' => ['var'],
    'ZEND_AST_HALT_COMPILER' => ['offset'],
    'ZEND_AST_ECHO' => ['expr'],
    'ZEND_AST_THROW' => ['expr'],
    'ZEND_AST_GOTO' => ['label'],
    'ZEND_AST_BREAK' => ['depth'],
    'ZEND_AST_CONTINUE' => ['depth'],

    /* 2 child nodes */
    'ZEND_AST_DIM' => ['expr', 'dim'],
    'ZEND_AST_PROP' => ['expr', 'prop'],
    'ZEND_AST_STATIC_PROP' => ['class', 'prop'],
    'ZEND_AST_CALL' => ['expr', 'args'],
    'ZEND_AST_CLASS_CONST' => ['class', 'const'],
    'ZEND_AST_ASSIGN' => ['var', 'expr'],
    'ZEND_AST_ASSIGN_REF' => ['var', 'expr'],
    'ZEND_AST_ASSIGN_OP' => ['var', 'expr'],
    'ZEND_AST_BINARY_OP' => ['left', 'right'],
    'ZEND_AST_GREATER' => ['left', 'right'],
    'ZEND_AST_GREATER_EQUAL' => ['left', 'right'],
    'ZEND_AST_AND' => ['left', 'right'],
    'ZEND_AST_OR' => ['left', 'right'],
    'ZEND_AST_ARRAY_ELEM' => ['value', 'key'],
    'ZEND_AST_NEW' => ['class', 'args'],
    'ZEND_AST_INSTANCEOF' => ['expr', 'class'],
    'ZEND_AST_YIELD' => ['value', 'key'],
    'ZEND_AST_COALESCE' => ['left', 'right'],

    'ZEND_AST_STATIC' => ['varName', 'default'],
    'ZEND_AST_WHILE' => ['cond', 'stmts'],
    'ZEND_AST_DO_WHILE' => ['stmts', 'cond'],
    'ZEND_AST_IF_ELEM' => ['cond', 'stmts'],
    'ZEND_AST_SWITCH' => ['cond', 'stmts'],
    'ZEND_AST_SWITCH_CASE' => ['cond', 'stmts'],
    'ZEND_AST_DECLARE' => ['declares', 'stmts'],
    'ZEND_AST_PROP_ELEM' => ['name', 'default'],
    'ZEND_AST_CONST_ELEM' => ['name', 'value'],
    'ZEND_AST_USE_TRAIT' => ['traits', 'adaptations'],
    'ZEND_AST_TRAIT_PRECEDENCE' => ['method', 'insteadof'],
    'ZEND_AST_METHOD_REFERENCE' => ['class', 'method'],
    'ZEND_AST_NAMESPACE' => ['name', 'stmts'],
    'ZEND_AST_USE_ELEM' => ['name', 'alias'],
    'ZEND_AST_TRAIT_ALIAS' => ['method', 'alias'],
    'ZEND_AST_GROUP_USE' => ['prefix', 'uses'],

    /* 3 child nodes */
    'ZEND_AST_METHOD_CALL' => ['expr', 'method', 'args'],
    'ZEND_AST_STATIC_CALL' => ['class', 'method', 'args'],
    'ZEND_AST_CONDITIONAL' => ['cond', 'trueExpr', 'falseExpr'],

    'ZEND_AST_TRY' => ['tryStmts', 'catches', 'finallyStmts'],
    'ZEND_AST_CATCH' => ['exception', 'varName', 'stmts'],
    'ZEND_AST_PARAM' => ['type', 'name', 'default'],

    /* 4 child nodes */
    'ZEND_AST_FOR' => ['init', 'cond', 'loop', 'stmts'],
    'ZEND_AST_FOREACH' => ['expr', 'valueVar', 'keyVar', 'stmts'],
];

from php-ast.

malteskoruppa avatar malteskoruppa commented on August 19, 2024

Thank you, that is exactly what I needed!

Here's two more very small questions in the same vein:

  1. Can any decl node except for an AST_CLOSURE ever use the use construct (i.e., import variables into its scope)? Your above documentation would suggest that yes, since uses is also defined both for functions as well as for methods, but I haven't been able to produce an example where the second child of a function or a method would be anything else than null.

  2. Decl nodes define the extra fields $endLineno, $name and $docComment. I was curious whether a closure can actually have a doc comment, and I found that yes indeed,

    $closure = /** blah */ function() {};

    gives

    AST_STMT_LIST
       0: AST_ASSIGN
           0: AST_VAR
               0: "closure"
           1: AST_CLOSURE
               flags: 0
               name: {closure}
               docComment: /** blah */
               0: AST_PARAM_LIST
               1: null
               2: AST_STMT_LIST
               3: null
    

    Is this "inline docblock" the normal way to define doc comments for closures? I haven't really found any official documentation on this subject, except for a lengthy discussion at phpDocumentor/phpDocumentor#830, which however discusses other possible formats.

from php-ast.

nikic avatar nikic commented on August 19, 2024
  1. Only closures can have uses, but the child is included for all "function-ish" types for structural compatibility. You likely want to handle functions, methods and closures using the same code and it would be rather inconvenient if some children would only be available conditionally.

  2. Probably the doc comment would go above the statement (otherwise it looks really weird):

    /** blah */
    $closure = function() {};

    But I have never seen doc blocks on closures in practice. There's not much point as they are generally not part of the API. But technically nothing wrong with it.

from php-ast.

malteskoruppa avatar malteskoruppa commented on August 19, 2024

Concerning (2), I agree that the inline doccomment looks really weird, but I (wrongly) assumed that it had to be right before the closure declaration instead of before the assignment for the parser to make the connection. Thanks :) So, in general, a pretty printer should probably not render the doc comment right before the closure declaration, but instead go back to the closest parent that is a child of an AST_STMT_LIST and render it before that statement...

For (1): What I would personally find much more beautiful was if the order of the children of all "function-ish" types were switched, with the uses child being the last:

$funcNames = ['params', 'stmts', 'returnType', 'uses'];

...

'ZEND_AST_CLOSURE' => $funcNames,
'ZEND_AST_FUNC_DECL' => $funcNames,
'ZEND_AST_METHOD' => $funcNames,

Then, the last child could simply be suppressed for functions and methods, just as you did for class declarations:

'ZEND_AST_CLOSURE' => ['params', 'stmts', 'returnType', 'uses'],
'ZEND_AST_FUNC_DECL' => ['params', 'stmts', 'returnType'],
'ZEND_AST_METHOD' => ['params', 'stmts', 'returnType'],

Thus, the children match up perfectly, and code for handling closures, functions and methods at once is straightforward to write, without having to use any conditionals concerning the type of the node. Of course, such code would still have to check whether the last child is set, but this is no different from the current paradigm, since such code currently also needs to check whether the uses child is not null, so we don't lose anything. What's really nice about this is that no "unnecessary" child nodes which are predictably always null are created which clutter the AST and unnecessarily consume memory.

Of course, this may be a matter of taste, and switching the order would probably require some changes to the underlying Zend engine, and it might possibly even conflict with some other design conventions that are in place... I'd very much like to hear your opinion. :)

from php-ast.

nikic avatar nikic commented on August 19, 2024

I agree that moving uses to the last child and omitting it for non-closures would be better. Changing it will break downstream code though. What I plan to do is introduce versioning for the output format and then switch to using named children for the next version - the uses can be dropped at the same time :)

from php-ast.

malteskoruppa avatar malteskoruppa commented on August 19, 2024

Cool, that sounds great! :)

Actually I didn't worry too much about downstream code... your README does say very clearly that

This extension is experimental and the representation of the syntax tree is not final.

I, for one, am prepared to cope with changes to the AST representation in my downstream code. ;)

from php-ast.

malteskoruppa avatar malteskoruppa commented on August 19, 2024

Btw, contrarily to what I said before, I would make returnType the second child and stmts the third:

'ZEND_AST_CLOSURE' => ['params', 'returnType', 'stmts', 'uses'],
'ZEND_AST_FUNC_DECL' => ['params', 'returnType', 'stmts'],
'ZEND_AST_METHOD' => ['params', 'returnType', 'stmts'],
'ZEND_AST_CLASS' => ['extends', 'implements', 'stmts'],

This way, for all decl nodes, stmts is always the third child. :)

from php-ast.

nikic avatar nikic commented on August 19, 2024

I've now added versioning support and did some first tweaks. Next up is switching to named children, that will require some more work.

from php-ast.

nikic avatar nikic commented on August 19, 2024

This will also require some more design work to choose good names. I don't think the list posted above is ideal. There's two things to look out for: Predictable names (so you can make a reasonable guess about how something is called without looking at a dump) and consistent names (similar nodes should have similar names).

I suspect that the fewer unique names the better and it's preferable to use a common generic name than a rare specific name.

Current list:

    X(name) \
    X(params) \
    X(uses) \
    X(stmts) \
    X(returnType) \
    X(extends) \
    X(implements) \
    X(expr) \
    X(var) \
    X(offset) \
    X(label) \
    X(depth) \
    X(dim) \
    X(prop) \
    X(class) \
    X(args) \
    X(const) \
    X(left) \
    X(right) \
    X(value) \
    X(key) \
    X(varName) \
    X(default) \
    X(cond) \
    X(declares) \
    X(traits) \
    X(adaptations) \
    X(method) \
    X(insteadof) \
    X(alias) \
    X(prefix) \
    X(trueExpr) \
    X(falseExpr) \
    X(tryStmts) \
    X(catches) \
    X(finallyStmts) \
    X(exception) \
    X(type) \
    X(init) \
    X(loop) \
    X(valueVar) \
    X(keyVar) \

from php-ast.

nikic avatar nikic commented on August 19, 2024

The implementation for string child names is up in a separate branch: https://github.com/nikic/php-ast/tree/stringChildren

from php-ast.

nikic avatar nikic commented on August 19, 2024

Looks like @tpunt has implemented a pretty printer: https://github.com/tpunt/php-ast-reverter

from php-ast.

malteskoruppa avatar malteskoruppa commented on August 19, 2024

Sorry for not answering in a long time, I was quite busy.

I've had a look at the named children version. I do agree that names should be predictable and consistent, though I think your above list is, if not ideal, at least a good start. :) As far as I can see, your stringChildren branch uses precisely those names which you documented above.

Here's a few thoughts / comments / questions:

  1. First off, I'm slightly confused as to what's what. There's a master branch, and a stringChildren branch:

    • In the master branch, it's now possible to pass an optional $version argument (either 10 or 20) when parsing code, defaulting to 10. As far as I can see, there are no named children in either of these versions. The differences I noticed between these two versions concern a different representation of some nodes such as unary operators (minus, plus, silence), binary operators (and, or, greater, greater or equal) and assignments (which now use the same BINARY_* flags as the binary operators, while the old ASSIGN_* flags have been made obsolete). In essence, some node types and flags have been merged, along the philosophy of "the fewer unique names the better" (aka the KISS principle), with which I concur.
    • As far as the stringChildren branch is concerned, it's also possible to pass 10 or 20 as version argument, and the default is also 10. Version 10 here appears to be perfectly identical to version 10 in the master branch -- no differences here. (In particular, still no named children.) Now, passing 20 as version argument finally yields some named children! :) As far as the representation of ASTs in this branch in version 20 is concerned, it seems to be some way "in-between" versions 10 and 20 on the master branch. For instance, some binary operators (greater, greater or equal) are as in version 20 on the master branch, while others (and, or) are still like in version 10 on the master branch.

    Is that correct?

  2. Concerning predictability of children names: I'm not very much thrilled about compound names, e.g., keyVar, valueVar, trueExpr, falseExpr, varName, tryStmts, finallyStmts, or returnType. I think I might prefer key, value, true, false, name, try, finally and return, or similar: After all, we are only speaking about children names, and IMO predictability outweighs accuracy. ;) For example when guessing children names of a try statement, I feel I might try try, but probably I wouldn't come up with tryStmts. YMMV.

  3. I noticed some node types are neither documented in your above list, nor yield named children on the stringChildren branch:

    • ZEND_AST{,_ARG,_ENCAPS,_EXPR,_STMT,_SWITCH,_CATCH,_PARAM,_NAME}_LIST
    • ZEND_AST_ARRAY
    • ZEND_AST_IF
    • ZEND_AST_CLOSURE_USES
    • ZEND_AST_PROP_DECL
    • ZEND_AST_CONST_DECL
    • ZEND_AST_CLASS_CONST_DECL
    • ZEND_AST_TRAIT_ADAPTATIONS
    • ZEND_AST_USE

    I then realized that these are exactly those nodes which have a variable (strictly positive) number of children, and are excellently named by using increasing integers, as is already the case. ;) I just wanted to throw in the idea of consistently postfixing them with _LIST, so that it is perfectly clear that they can have more than one (and arbitrarily many) children. I have to admit that AST_ARRAY_LIST looks slightly ugly, but on the good side it would introduce a certain consistency. A PHP user would then always either have a node with named children, or a node named *_LIST with numbered children. Really intuitive.

  4. Considering call expressions,

    'ZEND_AST_CALL' => ['expr', 'args'],
    'ZEND_AST_METHOD_CALL' => ['expr', 'method', 'args'],
    'ZEND_AST_STATIC_CALL' => ['class', 'method', 'args'],
    

    I feel this is slightly inconsistent. In particular, the name of the called function hides under expr for function calls, while it hides under method for non-static method and static method calls (even though there is also an expr child for non-static method calls). It is clear that a function is not strictly speaking the same thing as a method, but for the sake of simplicity and consistency, what about calling them all callee?

    'ZEND_AST_CALL' => ['callee', 'args'],
    'ZEND_AST_METHOD_CALL' => ['expr', 'callee', 'args'],
    'ZEND_AST_STATIC_CALL' => ['class', 'callee', 'args'],
    
  5. I have been wondering for some time now: what is the NAME_RELATIVE flag for AST_NAME nodes? I have not seen it so far, and I'm unsure what it is meant for. I have also looked at @tpunt's pretty printer implementation, but it is silently ignored there also (hinting that Thomas didn't know either). I originally thought that a name foo might have the flag NAME_NOT_FQ, foo\bar might have NAME_RELATIVE and \foo\bar might have NAME_FQ. However, it turns out that both foo and foo\bar have NAME_NOT_FQ and that \foo\bar has NAME_FQ. In passing, what does FQ stand for? I'm at a loss here. :)

To conclude, it is great news that you introduced named children! I cannot use this in my own code so far since I do not want to depend on a version of php-ast maintained on some internal development branch, but I'm very much looking forward for this to make its way to the master branch. :) What is your plan? Will these changes be part of version 20? Or will the named children come in some future version? More generally, what are your requirements for marking version 20 as "stable"?

Lastly, I'm happy someone did write a pretty printer. I had too many other things in my own oven... does this close this thread? Though I have to admit, I do appreciate this place for discussing some internals concerning the AST representation. ;) Maybe the issue should be renamed into "Introduce named children." ;)

from php-ast.

nikic avatar nikic commented on August 19, 2024

Answering selectively:

To 1: The master/stringChildren branches weren't in sync. I've now merged master into stringChildren, so the behavior of version 10 and 20 should match. I've also switched to using version 30 for the named children, as it's unrelated to the other version 20 changes. I'll mark 20 as stable soon and I'll also make the version argument non-optional (the default version can't really be bumped without breaking BC...)

To 2: For varName, I could replace what are currently simple strings with real AST_VARIABLE nodes and call it var afterwards. Probably better for consistency...

As to the rest, not sure... Using value and key instead of valueVar and keyVar sounds reasonable, but e.g. return instead of returnType imho makes it harder to understand what this is about.

To 5: NAME_RELATIVE denotes names of type namespace\Foo, which are relative to the current namespace. I've never seen anyone actually use these in practice... FQ stands for "fully qualified", which in this context means that the name has a "" prefix.

from php-ast.

nikic avatar nikic commented on August 19, 2024

For 2, I did the varName change described above: 5694d06. Also flattening statement lists now (3d35346), after noticing that the try/catch dump was looking odd...

Also the named children are now merged into master so I don't have to deal with diverging code.

from php-ast.

nikic avatar nikic commented on August 19, 2024

Changed names keyVar to key, valueVar to var and exception to class (c1c16d4). Also made the $version argument required (ac969d7).

from php-ast.

nikic avatar nikic commented on August 19, 2024

Switched name nodes to being the current version (30) and also added a node list to the README: https://github.com/nikic/php-ast#ast-node-kinds

I think with this I'll consider this FR as "implemented" :)

from php-ast.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.