Comments (5)
The caret anchors the pattern to the start of the string. Only "abadalab"
starts at the start of the string.
"findall" normally performs a series of searches, each search starting from
where the previous one ended, so the substrings found won't be overlapping. but
if the "overlapped" flag is turned on, each search starts from one character
beyond where the previous one _started_, allowing you to find overlapping
substrings.
Original comment by [email protected]
on 21 May 2011 at 12:19
- Changed state: Invalid
from mrab-regex-hg.
You're right, what I douche I am, the example that I provided is useless.
Let me try to make my point again. I don't know if this kind of regular
expression value is valid on any regex interpreter. I hope you can clarify this
to me.
Is there any reason why you don't include overlapping matches that start on
_the same_ letter? Let me try with a new example below:
Input string: ' x one something and another something'
I want to get all the 'something's that have an 'x' before and whatever other
stuff in between. Here, I would like to match: 'x one something' and 'x one
something and another something'
I would have hoped regexp.findall(r"x.*something"," x one something and another
something",overlapped=True) would produce that result. But like you said, after
the last x.*something match is found, you advance a place and the second match
is not found. In can find the other match if I do
regexp.findall(r"x.*?something", ...), but I am toast if there is a third match
in the middle.
Is this achievable with regular expressions at all? Why are the two results
above not considered an overlap?
Thanks for your patience
Original comment by [email protected]
on 22 May 2011 at 4:48
from mrab-regex-hg.
I guess one solution, which works with regex.0.1.20110514 but not with the
default python re module - or with Perl v5.10.1 for that matter is to use a
variable-length lookbehind pattern:
regex.findall(r"(?<=x.*)something", ...)
Original comment by [email protected]
on 22 May 2011 at 5:03
from mrab-regex-hg.
A regex supports greedy match ".*" and lazy match ".*?" (lazy match was a later
addition). I don't know of a regex implementation which supports what you're
asking for. There are also the implementation details to work out...
How much demand would there be for it, anyway?
Although it's a form of pattern matching, and regex is pattern matching, it's
not really a regex kind of thing.
Original comment by [email protected]
on 22 May 2011 at 5:20
from mrab-regex-hg.
Yeah, I don't know how much demand would there be for this. And I already
solved what I needed with the variable-length lookbehind, which seems to be
working fine.
I also understand about the additional complexity of the implementation.
Without knowing how it's currently implemented, I can imagine moving forward
one step after every match must simplify the implementation.
Just to be clear, my only problem was that when I saw the availability of the
'overlapped=True' flag, I thought it was reasonable to assume it would also
find overlapping matches that start on the same character. Just to be clear,
here is a much simpler example: take the string 'abb' and the match 'a.*b'.
'ab' and 'abb' are both valid, overlapping matches imho.
I'm not pushing hard for any change or implying demand here, just trying to
clarify what my confusion was, in case it helps with other potential confused
users :-)
Original comment by [email protected]
on 22 May 2011 at 5:27
from mrab-regex-hg.
Related Issues (20)
- Need diagnostic info in re.error HOT 3
- Missing setup.py regex-2014.08.28 tar.gz HOT 1
- K | C are note recognized in fuzzy search when followed by any symbol HOT 1
- Why this code is drastically slower than re? HOT 5
- regex.DEBUG fires only once HOT 1
- Failed debugging output of incomplete range HOT 3
- Failed debugging output of incomplete range HOT 1
- Failed debugging output of incomplete range HOT 1
- Please add a LICENSE file to the package HOT 5
- Reference to entire match (\g<0>) in Pattern.sub() doesn't work as of 2014.09.22 release. HOT 1
- regex module cannot be found HOT 3
- Infinite loop is found HOT 2
- unicode properties containing whitespace; unknown properties HOT 4
- escape function bug HOT 4
- Punctuation characters not matched using [:punct:] HOT 4
- nested sets behaviour HOT 1
- index out of range on null property \p{} HOT 1
- support for captures() in expandf() HOT 2
- Add timeout detection? HOT 1
- PyPy Support (with patch) HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mrab-regex-hg.