Comments (11)
It sees the second "[" and thinks it's the start of a nested set.
I've modified my sources to treat a "[" in a set as a literal if it fails to
parse it as a nested set. Seems to work.
Original comment by [email protected]
on 18 May 2011 at 3:52
- Changed state: Fixed
from mrab-regex-hg.
Thanks for the fix;
however, now I see, I had probably oversimplified my real regex pattern causing
problems; it seems, that some characters, here exemplified with "-", are still
causing problems (regex-0.1.20110610); cf.
>>> print regex.sub(r"([][-])", r"-", u"a[b]c")
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "regex.pyc", line 219, in sub
File "regex.pyc", line 371, in _compile
File "_regex_core.pyc", line 296, in parse_pattern
File "_regex_core.pyc", line 310, in parse_sequence
File "_regex_core.pyc", line 323, in parse_item
File "_regex_core.pyc", line 427, in parse_element
File "_regex_core.pyc", line 563, in parse_paren
File "_regex_core.pyc", line 296, in parse_pattern
File "_regex_core.pyc", line 310, in parse_sequence
File "_regex_core.pyc", line 323, in parse_item
File "_regex_core.pyc", line 440, in parse_element
File "_regex_core.pyc", line 1024, in parse_set
File "_regex_core.pyc", line 1034, in parse_set_union
File "_regex_core.pyc", line 1045, in parse_set_symm_diff
File "_regex_core.pyc", line 1053, in parse_set_inter
File "_regex_core.pyc", line 1061, in parse_set_diff
File "_regex_core.pyc", line 1074, in parse_set_imp_union
File "_regex_core.pyc", line 1082, in parse_set_member
File "_regex_core.pyc", line 1135, in parse_set_item
error: bad set
>>> print re.sub(r"([][-])", r"-", u"a[b]c")
a-b-c
>>>
it seems, that any character in the position of "-" in the above pattern is
causing this error, only ([][]) is currently working.
(I tried to test the new set operators like | ~ & - here, but I found that also
general characters like "a" are causing this.)
Just to be sure, the actual pattern I am using (which works with re) is e.g.:
print regex.sub("([][$.\\\\*+|?()^{}-])", r"\\\1", u"a[b]c.d?e*f{}gh&i\j@k")
i.e. an older homebrew version of regex.escape(..., special_only=True)
regards,
vbr
Original comment by [email protected]
on 15 Jun 2011 at 9:02
from mrab-regex-hg.
The problem is that regex can now have a set inside a set, so a literal "[" in
a set needs to be escaped.
Instead of r"([][-])" write r"([]\[-])".
Or would it be better if it behaved like re and required the NEW flag for a
nested set?
Original comment by [email protected]
on 15 Jun 2011 at 10:11
from mrab-regex-hg.
Thanks for the clarification,
I thought, it would have been resolved with the fallback-fix above, but it is
apparently not possible generally.
I thought, the nested sets are only meaningful with some operators between
them; these duplicated symbols are (probably?) normally not present in
non-nested sets, hence the nesting could only be evaluated, if there are some
of those in the pattern.
As for the policy regarding NEW, this would probably rather depend on
requirements for the inclusion into the standard library...
For my individual usecases, I would rather like having this feature available
by default, but it does'nt matter much; the most important thing for me is, it
can be made work - be it by escaping the brackets or by setting (?n) in the
patterns, depending on the decision.
On a related note, would it be possible to have some magic module-wide setting
like
regex.use_new(), which would enable the incompatible "new" features globally,
e.g. right after the import without the need to set the flag individually
afterwards?
(In my script, I am using regex, if available, but sometimes only re; in this
case, trying this setting once in a program would be more straightforward, than
trying the n-flag in all patterns requiring it.
(Not sure, if the internals would support it, or even whether the resetter
"stop_using_new()" would ever be usweful or possible...?)
Anyway, it's just a thoughtif this could possibly cause further problems or
complications, it isn't worth it;
vbr
involve those
For me
Original comment by [email protected]
on 15 Jun 2011 at 11:00
from mrab-regex-hg.
Sorry for the "garbage" in the text, due to sending the message sligthly
prematurely:
the last sentence should contain: "... thought; if ..." and the text should end
with "vbr". :-)
Original comment by [email protected]
on 15 Jun 2011 at 11:06
from mrab-regex-hg.
The regex is parsed by recursive descent. By the time it discovers there's a
problem it has already returned from the function where it decided to parse the
nested set, so it's too late to take the alternative course. (Hmm, I wonder
whether it's fixable with a hack...)
As for the NEW flag, how could it be turned on for one importer but not any
others? You wouldn't want it to break another module which uses regex but
expects it to be off.
Original comment by [email protected]
on 16 Jun 2011 at 12:30
from mrab-regex-hg.
Re comment 6, there isn't a clever hack. The alternative I'll try is to disable
nested sets and parse again if it finds a bad set. Seems to work so far.
Original comment by [email protected]
on 16 Jun 2011 at 1:03
from mrab-regex-hg.
Re 6: ok, that was the complication I hadn't considered ...; to keep the
setting in the given namespaces, it would probably be necessary to provide
something like regex_new module to import it with the NEW flag behaviour, but
this kind of "cloning" seem rather hackish too (not sure, if it could be
achieved somehow virtually).
In any case, I can, of course, adjust the patterns explicitely to deal with
regex or re respectively.
Thanks for the further improvements.
Original comment by [email protected]
on 16 Jun 2011 at 7:46
from mrab-regex-hg.
As some older bug in my code seems to have reappeared with some recent regex
version, I'd like to clarify the set behaviour.
Is the above mentioned fallback behaviour gone for V1 flag? (Cf. comments 1, 7)
Now I made sure to escape my patterns appropriately, hence it shouldn't be
relevant anymore, but I wanted to understand the changes.
regards,
vbr
=== regex-0.1.20110922a ===
>>> regex.sub(r"([][])", r"-", u"a[b]c")
u'a-b-c'
>>> regex.sub(r"(?V1)([][])", r"-", u"a[b]c")
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Python27\lib\regex.py", line 245, in sub
return _compile(pattern, flags, kwargs).sub(repl, string, count, pos,
File "C:\Python27\lib\regex.py", line 423, in _compile
parsed = parse_pattern(source, info)
File "C:\Python27\lib\_regex_core.py", line 334, in parse_pattern
branches = [parse_sequence(source, info)]
File "C:\Python27\lib\_regex_core.py", line 350, in parse_sequence
item = parse_item(source, info)
File "C:\Python27\lib\_regex_core.py", line 363, in parse_item
element = parse_element(source, info)
File "C:\Python27\lib\_regex_core.py", line 587, in parse_element
element = parse_paren(source, info)
File "C:\Python27\lib\_regex_core.py", line 723, in parse_paren
subpattern = parse_pattern(source, info)
File "C:\Python27\lib\_regex_core.py", line 334, in parse_pattern
branches = [parse_sequence(source, info)]
File "C:\Python27\lib\_regex_core.py", line 350, in parse_sequence
item = parse_item(source, info)
File "C:\Python27\lib\_regex_core.py", line 363, in parse_item
element = parse_element(source, info)
File "C:\Python27\lib\_regex_core.py", line 600, in parse_element
return parse_set(source, info)
File "C:\Python27\lib\_regex_core.py", line 1206, in parse_set
item = parse_set_union(source, info)
File "C:\Python27\lib\_regex_core.py", line 1222, in parse_set_union
items = [parse_set_symm_diff(source, info)]
File "C:\Python27\lib\_regex_core.py", line 1232, in parse_set_symm_diff
items = [parse_set_inter(source, info)]
File "C:\Python27\lib\_regex_core.py", line 1242, in parse_set_inter
items = [parse_set_diff(source, info)]
File "C:\Python27\lib\_regex_core.py", line 1252, in parse_set_diff
items = [parse_set_imp_union(source, info)]
File "C:\Python27\lib\_regex_core.py", line 1277, in parse_set_imp_union
items.append(parse_set_member(source, info))
File "C:\Python27\lib\_regex_core.py", line 1286, in parse_set_member
start = parse_set_item(source, info)
File "C:\Python27\lib\_regex_core.py", line 1334, in parse_set_item
return parse_set(source, info)
File "C:\Python27\lib\_regex_core.py", line 1206, in parse_set
item = parse_set_union(source, info)
File "C:\Python27\lib\_regex_core.py", line 1222, in parse_set_union
items = [parse_set_symm_diff(source, info)]
File "C:\Python27\lib\_regex_core.py", line 1232, in parse_set_symm_diff
items = [parse_set_inter(source, info)]
File "C:\Python27\lib\_regex_core.py", line 1242, in parse_set_inter
items = [parse_set_diff(source, info)]
File "C:\Python27\lib\_regex_core.py", line 1252, in parse_set_diff
items = [parse_set_imp_union(source, info)]
File "C:\Python27\lib\_regex_core.py", line 1277, in parse_set_imp_union
items.append(parse_set_member(source, info))
File "C:\Python27\lib\_regex_core.py", line 1286, in parse_set_member
start = parse_set_item(source, info)
File "C:\Python27\lib\_regex_core.py", line 1338, in parse_set_item
raise error("bad set", True)
error: bad set
>>>
Original comment by [email protected]
on 26 Sep 2011 at 5:11
from mrab-regex-hg.
The answer is yes, the fallback behaviour is gone. Although it helped in some
cases, it certainly wasn't foolproof, so I thought it better just to let it
fail.
Version 0: simple sets.
Version 1: nested sets.
Original comment by [email protected]
on 26 Sep 2011 at 5:29
from mrab-regex-hg.
Ok, thanks; this explains the behaviour I noticed; the plain failure indeed
worked in my case, as the pattern is now finally corrected :-)
vbr
Original comment by [email protected]
on 26 Sep 2011 at 5:41
from mrab-regex-hg.
Related Issues (20)
- Need diagnostic info in re.error HOT 3
- Missing setup.py regex-2014.08.28 tar.gz HOT 1
- K | C are note recognized in fuzzy search when followed by any symbol HOT 1
- Why this code is drastically slower than re? HOT 5
- regex.DEBUG fires only once HOT 1
- Failed debugging output of incomplete range HOT 3
- Failed debugging output of incomplete range HOT 1
- Failed debugging output of incomplete range HOT 1
- Please add a LICENSE file to the package HOT 5
- Reference to entire match (\g<0>) in Pattern.sub() doesn't work as of 2014.09.22 release. HOT 1
- regex module cannot be found HOT 3
- Infinite loop is found HOT 2
- unicode properties containing whitespace; unknown properties HOT 4
- escape function bug HOT 4
- Punctuation characters not matched using [:punct:] HOT 4
- nested sets behaviour HOT 1
- index out of range on null property \p{} HOT 1
- support for captures() in expandf() HOT 2
- Add timeout detection? HOT 1
- PyPy Support (with patch) HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mrab-regex-hg.