Comments (8)
cc @natecook1000, I also noticed \N
doesn't appear to be mentioned in the Unicode proposal, should it be?
from swift-experimental-string-processing.
Oh, interesting — I didn't know that behavior existed! I'll get it added to the Unicode proposal, yes.
from swift-experimental-string-processing.
Actually, do you have an example of how this isn't working how you'd expect? If \N
is supposed to be an inverse of \n
, which I think we're going to treat basically like \R
, then it should also be affected by things like .asciiOnlyWhitespace()
.
from swift-experimental-string-processing.
Hmm, it was my understanding that it followed .
's newline behavior (i.e not being affected by ASCII-only whitespace). PCRE defines it as:
The escape sequence \N when not followed by an opening brace behaves like a dot, except that it is not affected by the PCRE2_DOTALL option. In other words, it matches any character except one that signifies the end of a line.
from swift-experimental-string-processing.
Oniguruma defines it as:
\N negative newline (?-m:.)
which is now extra confusing because Oniguruma seems to define (?m)
differently to PCRE?
m: multi-line (dot (.) also matches newline)
from swift-experimental-string-processing.
That being said, I'm fine if we want to define it as the inverse of \R
. It should however be noted that PCRE supports changing the definition of \R
through global matching options such as (*BSR_ANYCRLF)
(which wouldn't affect \N
in PCRE AFAIK). I'm not sure if we want to support those global matching options tho.
from swift-experimental-string-processing.
I think this is the tricky bit:
except one that signifies the end of a line
In addition to changing the behavior of .
and \s
, does switching option modes also change what characters signify the end of a line? I'd expect that "the end of a line" would be equivalent to what \R
matches. If you only want to recognize ASCII whitespace, you don't want to let non-ASCII line-break character define where lines start and end.
from swift-experimental-string-processing.
PCRE at least defines newline conventions and \R
differently. Newline conventions are affected by these global options (under the heading "Newline conventions"):
(*CR) carriage return
(*LF) linefeed
(*CRLF) carriage return, followed by linefeed
(*ANYCRLF) any of the three above
(*ANY) all Unicode newline sequences
(*NUL) the NUL character (binary zero)
And PCRE says:
The newline convention affects where the circumflex and dollar assertions are true. It also affects the interpretation of the dot metacharacter when PCRE2_DOTALL is not set, and the behaviour of \N when not followed by an opening brace. However, it does not affect what the \R escape sequence matches. By default, this is any Unicode newline sequence, for Perl compatibility. However, this can be changed; see the next section and the description of \R in the section entitled "Newline sequences" below. A change of \R setting can be combined with a change of newline convention.
from swift-experimental-string-processing.
Related Issues (20)
- Regex non-deterministicly fails HOT 4
- Regex fails to match correctly in Xcode 14.3 (iOS, macOS) and Xcode 14.2 (macOS) HOT 8
- Refactor and unify code for built-in character class matching
- Remove the `anyScalar` CustomCharacterClass
- Verify scalar semantics in quantifier fast path HOT 1
- Add unit tests for advanced string algorithms
- Add benchmarks for string algorithms
- Crash in `BidirectionalCollection.firstRange(of:)` HOT 6
- SE-0351 was accepted with `mapOutput`, but it's not implemented HOT 3
- Unexpected compiler error for CharacterClass.inverted HOT 3
- Inline modifier fails to work consistently HOT 2
- Unicode 15 scripts and blocks are unrecognized HOT 1
- Add [Sub]String specializations for `contains` and other search algorithms
- RegexBuilder module is unavailable on Windows
- `contains<C>(_:)` does not behave as expected HOT 1
- Creating a `ChoiceOf` from an array HOT 1
- Don't operate on a collection and a range of indices if you can help it.
- Regex with positive lookahead crashes at runtime when accessing match.output HOT 1
- "\r?" regex doesn't seem to work properly HOT 1
- Regex does not match isolated combining mark as whitespace if preceded by whitespace HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from swift-experimental-string-processing.