Giter Site home page Giter Site logo

Comments (8)

hamishknight avatar hamishknight commented on June 16, 2024

cc @natecook1000, I also noticed \N doesn't appear to be mentioned in the Unicode proposal, should it be?

from swift-experimental-string-processing.

natecook1000 avatar natecook1000 commented on June 16, 2024

Oh, interesting — I didn't know that behavior existed! I'll get it added to the Unicode proposal, yes.

from swift-experimental-string-processing.

natecook1000 avatar natecook1000 commented on June 16, 2024

Actually, do you have an example of how this isn't working how you'd expect? If \N is supposed to be an inverse of \n, which I think we're going to treat basically like \R, then it should also be affected by things like .asciiOnlyWhitespace().

from swift-experimental-string-processing.

hamishknight avatar hamishknight commented on June 16, 2024

Hmm, it was my understanding that it followed .'s newline behavior (i.e not being affected by ASCII-only whitespace). PCRE defines it as:

The escape sequence \N when not followed by an opening brace behaves like a dot, except that it is not affected by the PCRE2_DOTALL option. In other words, it matches any character except one that signifies the end of a line.

from swift-experimental-string-processing.

hamishknight avatar hamishknight commented on June 16, 2024

Oniguruma defines it as:

\N       negative newline  (?-m:.)

which is now extra confusing because Oniguruma seems to define (?m) differently to PCRE?

 m: multi-line (dot (.) also matches newline)

from swift-experimental-string-processing.

hamishknight avatar hamishknight commented on June 16, 2024

That being said, I'm fine if we want to define it as the inverse of \R. It should however be noted that PCRE supports changing the definition of \R through global matching options such as (*BSR_ANYCRLF) (which wouldn't affect \N in PCRE AFAIK). I'm not sure if we want to support those global matching options tho.

from swift-experimental-string-processing.

natecook1000 avatar natecook1000 commented on June 16, 2024

I think this is the tricky bit:

except one that signifies the end of a line

In addition to changing the behavior of . and \s, does switching option modes also change what characters signify the end of a line? I'd expect that "the end of a line" would be equivalent to what \R matches. If you only want to recognize ASCII whitespace, you don't want to let non-ASCII line-break character define where lines start and end.

from swift-experimental-string-processing.

hamishknight avatar hamishknight commented on June 16, 2024

PCRE at least defines newline conventions and \R differently. Newline conventions are affected by these global options (under the heading "Newline conventions"):

  (*CR)        carriage return
  (*LF)        linefeed
  (*CRLF)      carriage return, followed by linefeed
  (*ANYCRLF)   any of the three above
  (*ANY)       all Unicode newline sequences
  (*NUL)       the NUL character (binary zero)

And PCRE says:

The newline convention affects where the circumflex and dollar assertions are true. It also affects the interpretation of the dot metacharacter when PCRE2_DOTALL is not set, and the behaviour of \N when not followed by an opening brace. However, it does not affect what the \R escape sequence matches. By default, this is any Unicode newline sequence, for Perl compatibility. However, this can be changed; see the next section and the description of \R in the section entitled "Newline sequences" below. A change of \R setting can be combined with a change of newline convention.

from swift-experimental-string-processing.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.