Comments (8)
Isn't this the same issue as #26?
from php-parser.
Heh, this issue seems to crop up in every project trying to serialize to XML. Seen it a few times in various unit testing frameworks, but didn't realize that it applies here too.
Just disabling parsing of escape sequences won't really help here, as one could still have issues with malformed UTF-8 in strings (or not UTF-8 at all). PHP's strings are raw binary data after all.
The only real way to solve this is what you did. Or rather one could only selectively encode strings containing invalid UTF-8.
By the way, what are you using the XML serialization for?
from php-parser.
I don't think that that problem only occurs when serializing to xml - it's just the only processor that actually complains. If you try to save the source back to a new (modified) php file, you'll end up having the same issues: The original \xxx component is lost, making the result unreadable at best.
Regarding the UTF-8 issue you pointed out: I'm 'iconv'ing the php source file before parsing to avoid that issue and so far didn't have any problems.
The XML serialization is used in phpDox ( https://github.com/theseer/phpDox ). I for now adopted your suggestion for Issue #26
from php-parser.
@theseer Do you think it would make sense to move the static Scalar_*::parse()
methods into the parser (as normal methods), so they can be overridden by extending it?
from php-parser.
I'm not sure how that would fix the problem as it would defer the solution to a client implementation?
From a user perspective, I'd expect the Parser not to "interfere" by modifying or translating values when parsing.
A serialization my add - depending on the output format - whatever is needed to escape otherwise invalid chars or translate it according to whatever makes sense, e.g. interpret x?? as binary.
Do you really have to - by default! - translate the x??-Values from a string into their binary presentation at parse time? How would you set an x??-Value at runtime for it, expecting the same output as the source had before parsing?
I guess the only thing that makes really sense is to keep the "raw" value and, on demand, translate it if requested.
from php-parser.
@theseer The parser provides an abstract syntax tree, meaning that a lot of information is (intentionally) discarded, only retaining the parts that are relevant to the programs interpretation. String formatting is one of those things that are discarded. From PHP's point of view it does not make a difference whether a string is "Hello, World!"
or whether it is "\x48\x65\x6C\x6C\x6F\x2C\x20\x77\x6F\x72\x6C\x64\x21"
. Interpreting the literal values allows to directly work with these values, e.g. use them as lookups, compare them, etc. This is not possible with encoded values because the same literal can have multiple representations (simplest example is single vs double quotes).
I see that this behavior is not appropriate for some use cases, these use cases simple weren't the ones I originally had in mind. My main motivation was a) static analysis and b) automated code changes where nobody ever has to read the generated code.
But in any case, ways to fully retain the file file formatting are being discussed in issue #41, so this might soon be possible. Though it probably doesn't really apply to this particular problem, because here the solution is rather simple anyway :)
from php-parser.
I do see your problem and your point. But considering your very example about automagic rewriting of existing source code, I - as a user - would expect it to NOT modify my string definition when writing it back as source.
But at least for me, the workaround with storing the raw version as additional attribute works fine.
from php-parser.
@nikic perhaps an idea to make a wiki or FAQ entry with this information and the workaround?
from php-parser.
Related Issues (20)
- Comment attributes not being repeated for nested nodes is not in UPGRADING guide
- [5.0] Parser crashes on an empty file HOT 1
- [Format-preserving printer] How to get rid of trailing commas in function calls/parameter definitions/closure uses HOT 2
- [5.0] PropertyItem extends Stmt by mistake? HOT 1
- 5.0.0 - Call to undefined method PhpParser\ParserFactory::createForHostVersion() HOT 2
- Declaration of PhpParser\Parser\Multiple::parse HOT 1
- Upgrade Guide to v5.0 -- typo error on Changes to the lexer chapter
- TokenPolyfill tries to construct itself with an id which is a string HOT 5
- getPhpVersion() method has different return type in V5 HOT 2
- Should PhpParser\Node\Name extend PhpParser\Node\Expr ? HOT 2
- Parse arbitrary expressions HOT 4
- Appending new statements HOT 2
- Parser instance is not automatically garbage collected due to self references HOT 4
- TokenPolyfill issue on PHP 7.4 platform HOT 25
- Pretty printing removes leading whitespace from class declaration on first line HOT 6
- php-parser v4.19.0 got deprecated Optional parameter before required parameter on PrintableNewAnonClassNode.php HOT 1
- Is there a way to modify or remove comments? HOT 1
- Error "Syntax error, unexpected EOF on line" when parse valid php file HOT 1
- Is there a way to find places from where specific class/object methods are called HOT 6
- Confusing docs about the usage of NodeTraverser
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from php-parser.