Giter Site home page Giter Site logo

Comments (8)

mvriel avatar mvriel commented on May 3, 2024

Isn't this the same issue as #26?

from php-parser.

nikic avatar nikic commented on May 3, 2024

Heh, this issue seems to crop up in every project trying to serialize to XML. Seen it a few times in various unit testing frameworks, but didn't realize that it applies here too.

Just disabling parsing of escape sequences won't really help here, as one could still have issues with malformed UTF-8 in strings (or not UTF-8 at all). PHP's strings are raw binary data after all.

The only real way to solve this is what you did. Or rather one could only selectively encode strings containing invalid UTF-8.

By the way, what are you using the XML serialization for?

from php-parser.

theseer avatar theseer commented on May 3, 2024

I don't think that that problem only occurs when serializing to xml - it's just the only processor that actually complains. If you try to save the source back to a new (modified) php file, you'll end up having the same issues: The original \xxx component is lost, making the result unreadable at best.

Regarding the UTF-8 issue you pointed out: I'm 'iconv'ing the php source file before parsing to avoid that issue and so far didn't have any problems.

The XML serialization is used in phpDox ( https://github.com/theseer/phpDox ). I for now adopted your suggestion for Issue #26

from php-parser.

nikic avatar nikic commented on May 3, 2024

@theseer Do you think it would make sense to move the static Scalar_*::parse() methods into the parser (as normal methods), so they can be overridden by extending it?

from php-parser.

theseer avatar theseer commented on May 3, 2024

I'm not sure how that would fix the problem as it would defer the solution to a client implementation?
From a user perspective, I'd expect the Parser not to "interfere" by modifying or translating values when parsing.

A serialization my add - depending on the output format - whatever is needed to escape otherwise invalid chars or translate it according to whatever makes sense, e.g. interpret x?? as binary.

Do you really have to - by default! - translate the x??-Values from a string into their binary presentation at parse time? How would you set an x??-Value at runtime for it, expecting the same output as the source had before parsing?

I guess the only thing that makes really sense is to keep the "raw" value and, on demand, translate it if requested.

from php-parser.

nikic avatar nikic commented on May 3, 2024

@theseer The parser provides an abstract syntax tree, meaning that a lot of information is (intentionally) discarded, only retaining the parts that are relevant to the programs interpretation. String formatting is one of those things that are discarded. From PHP's point of view it does not make a difference whether a string is "Hello, World!" or whether it is "\x48\x65\x6C\x6C\x6F\x2C\x20\x77\x6F\x72\x6C\x64\x21". Interpreting the literal values allows to directly work with these values, e.g. use them as lookups, compare them, etc. This is not possible with encoded values because the same literal can have multiple representations (simplest example is single vs double quotes).

I see that this behavior is not appropriate for some use cases, these use cases simple weren't the ones I originally had in mind. My main motivation was a) static analysis and b) automated code changes where nobody ever has to read the generated code.

But in any case, ways to fully retain the file file formatting are being discussed in issue #41, so this might soon be possible. Though it probably doesn't really apply to this particular problem, because here the solution is rather simple anyway :)

from php-parser.

theseer avatar theseer commented on May 3, 2024

I do see your problem and your point. But considering your very example about automagic rewriting of existing source code, I - as a user - would expect it to NOT modify my string definition when writing it back as source.

But at least for me, the workaround with storing the raw version as additional attribute works fine.

from php-parser.

mvriel avatar mvriel commented on May 3, 2024

@nikic perhaps an idea to make a wiki or FAQ entry with this information and the workaround?

from php-parser.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.