<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

"bad set" error for unescaped ] at the beginning of the set about mrab-regex-hg HOT 11 CLOSED

jamadden commented on June 12, 2024

"bad set" error for unescaped ] at the beginning of the set

from mrab-regex-hg.

Comments (11)

GoogleCodeExporter commented on June 12, 2024

It sees the second "[" and thinks it's the start of a nested set.

I've modified my sources to treat a "[" in a set as a literal if it fails to 
parse it as a nested set. Seems to work.

Original comment by [email protected] on 18 May 2011 at 3:52

Changed state: Fixed

from mrab-regex-hg.

GoogleCodeExporter commented on June 12, 2024

Thanks for the fix;
however, now I see, I had probably oversimplified my real regex pattern causing 
problems; it seems, that some characters, here exemplified with "-",  are still 
causing problems (regex-0.1.20110610); cf.

>>> print regex.sub(r"([][-])", r"-", u"a[b]c")
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "regex.pyc", line 219, in sub
  File "regex.pyc", line 371, in _compile
  File "_regex_core.pyc", line 296, in parse_pattern
  File "_regex_core.pyc", line 310, in parse_sequence
  File "_regex_core.pyc", line 323, in parse_item
  File "_regex_core.pyc", line 427, in parse_element
  File "_regex_core.pyc", line 563, in parse_paren
  File "_regex_core.pyc", line 296, in parse_pattern
  File "_regex_core.pyc", line 310, in parse_sequence
  File "_regex_core.pyc", line 323, in parse_item
  File "_regex_core.pyc", line 440, in parse_element
  File "_regex_core.pyc", line 1024, in parse_set
  File "_regex_core.pyc", line 1034, in parse_set_union
  File "_regex_core.pyc", line 1045, in parse_set_symm_diff
  File "_regex_core.pyc", line 1053, in parse_set_inter
  File "_regex_core.pyc", line 1061, in parse_set_diff
  File "_regex_core.pyc", line 1074, in parse_set_imp_union
  File "_regex_core.pyc", line 1082, in parse_set_member
  File "_regex_core.pyc", line 1135, in parse_set_item
error: bad set
>>> print re.sub(r"([][-])", r"-", u"a[b]c")
a-b-c
>>> 

it seems, that any character in the position of "-" in the above pattern is 
causing this error, only ([][]) is currently working.
(I tried to test the new set operators like | ~ & - here, but I found that also 
general characters like "a" are causing this.) 

Just to be sure, the actual pattern I am using (which works with re) is e.g.:
print regex.sub("([][$.\\\\*+|?()^{}-])", r"\\\1", u"a[b]c.d?e*f{}gh&i\j@k")
i.e. an older homebrew version of regex.escape(..., special_only=True)

regards,
    vbr

Original comment by [email protected] on 15 Jun 2011 at 9:02

from mrab-regex-hg.

GoogleCodeExporter commented on June 12, 2024

The problem is that regex can now have a set inside a set, so a literal "[" in 
a set needs to be escaped.

Instead of r"([][-])" write r"([]\[-])".

Or would it be better if it behaved like re and required the NEW flag for a 
nested set?

Original comment by [email protected] on 15 Jun 2011 at 10:11

from mrab-regex-hg.

GoogleCodeExporter commented on June 12, 2024

Thanks for the clarification,
I thought, it would have been resolved with the fallback-fix above, but it is 
apparently not possible generally.
I thought, the nested sets are only meaningful with some operators between 
them; these duplicated symbols are (probably?) normally not present in 
non-nested sets, hence the nesting could only be evaluated, if there are some 
of those in the pattern.

As for the policy regarding  NEW, this would probably rather depend on 
requirements for the inclusion into the standard library...
For my individual usecases, I would rather like having this feature available 
by default, but it does'nt matter much; the most important thing for me is, it 
can be made work - be it by escaping the brackets or by setting (?n) in the 
patterns, depending on the decision.

On a related note, would it be possible to have some magic module-wide setting 
like 
regex.use_new(), which would enable the incompatible "new" features globally, 
e.g. right after the import without the need to set the flag individually 
afterwards?
(In my script, I am using regex, if available, but sometimes only re; in this 
case, trying this setting once in a program would be more straightforward, than 
trying the n-flag in all patterns requiring it.
(Not sure, if the internals would support it, or even whether the resetter 
"stop_using_new()" would ever be  usweful or possible...?)
Anyway, it's just a thoughtif this could possibly cause further problems or 
complications, it isn't worth it;
vbr




involve those 
For me

Original comment by [email protected] on 15 Jun 2011 at 11:00

from mrab-regex-hg.

GoogleCodeExporter commented on June 12, 2024

Sorry for the "garbage" in the text, due to sending the message sligthly 
prematurely:
the last sentence should contain: "... thought; if ..." and the text should end 
with "vbr". :-)

Original comment by [email protected] on 15 Jun 2011 at 11:06

from mrab-regex-hg.

GoogleCodeExporter commented on June 12, 2024

The regex is parsed by recursive descent. By the time it discovers there's a 
problem it has already returned from the function where it decided to parse the 
nested set, so it's too late to take the alternative course. (Hmm, I wonder 
whether it's fixable with a hack...)

As for the NEW flag, how could it be turned on for one importer but not any 
others? You wouldn't want it to break another module which uses regex but 
expects it to be off.

Original comment by [email protected] on 16 Jun 2011 at 12:30

from mrab-regex-hg.

GoogleCodeExporter commented on June 12, 2024

Re comment 6, there isn't a clever hack. The alternative I'll try is to disable 
nested sets and parse again if it finds a bad set. Seems to work so far.

Original comment by [email protected] on 16 Jun 2011 at 1:03

from mrab-regex-hg.

GoogleCodeExporter commented on June 12, 2024

Re 6: ok, that was the complication I hadn't considered ...; to keep the 
setting in the given namespaces, it would probably be necessary to provide 
something like regex_new module to import it with the NEW flag behaviour, but 
this kind of "cloning" seem rather hackish too (not sure, if it could be 
achieved somehow virtually).
In any case, I can, of course, adjust the patterns explicitely to deal with 
regex or re respectively.
Thanks for the further improvements.

Original comment by [email protected] on 16 Jun 2011 at 7:46

from mrab-regex-hg.

GoogleCodeExporter commented on June 12, 2024

As some older bug in my code seems to have reappeared with some recent regex 
version, I'd like to clarify the set behaviour.
Is the above mentioned fallback behaviour gone for V1 flag? (Cf. comments 1, 7)

Now I made sure to escape my patterns appropriately, hence it shouldn't be 
relevant anymore, but I wanted to understand the changes.

regards,
   vbr

=== regex-0.1.20110922a ===
>>> regex.sub(r"([][])", r"-", u"a[b]c")
u'a-b-c'
>>> regex.sub(r"(?V1)([][])", r"-", u"a[b]c")
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Python27\lib\regex.py", line 245, in sub
    return _compile(pattern, flags, kwargs).sub(repl, string, count, pos,
  File "C:\Python27\lib\regex.py", line 423, in _compile
    parsed = parse_pattern(source, info)
  File "C:\Python27\lib\_regex_core.py", line 334, in parse_pattern
    branches = [parse_sequence(source, info)]
  File "C:\Python27\lib\_regex_core.py", line 350, in parse_sequence
    item = parse_item(source, info)
  File "C:\Python27\lib\_regex_core.py", line 363, in parse_item
    element = parse_element(source, info)
  File "C:\Python27\lib\_regex_core.py", line 587, in parse_element
    element = parse_paren(source, info)
  File "C:\Python27\lib\_regex_core.py", line 723, in parse_paren
    subpattern = parse_pattern(source, info)
  File "C:\Python27\lib\_regex_core.py", line 334, in parse_pattern
    branches = [parse_sequence(source, info)]
  File "C:\Python27\lib\_regex_core.py", line 350, in parse_sequence
    item = parse_item(source, info)
  File "C:\Python27\lib\_regex_core.py", line 363, in parse_item
    element = parse_element(source, info)
  File "C:\Python27\lib\_regex_core.py", line 600, in parse_element
    return parse_set(source, info)
  File "C:\Python27\lib\_regex_core.py", line 1206, in parse_set
    item = parse_set_union(source, info)
  File "C:\Python27\lib\_regex_core.py", line 1222, in parse_set_union
    items = [parse_set_symm_diff(source, info)]
  File "C:\Python27\lib\_regex_core.py", line 1232, in parse_set_symm_diff
    items = [parse_set_inter(source, info)]
  File "C:\Python27\lib\_regex_core.py", line 1242, in parse_set_inter
    items = [parse_set_diff(source, info)]
  File "C:\Python27\lib\_regex_core.py", line 1252, in parse_set_diff
    items = [parse_set_imp_union(source, info)]
  File "C:\Python27\lib\_regex_core.py", line 1277, in parse_set_imp_union
    items.append(parse_set_member(source, info))
  File "C:\Python27\lib\_regex_core.py", line 1286, in parse_set_member
    start = parse_set_item(source, info)
  File "C:\Python27\lib\_regex_core.py", line 1334, in parse_set_item
    return parse_set(source, info)
  File "C:\Python27\lib\_regex_core.py", line 1206, in parse_set
    item = parse_set_union(source, info)
  File "C:\Python27\lib\_regex_core.py", line 1222, in parse_set_union
    items = [parse_set_symm_diff(source, info)]
  File "C:\Python27\lib\_regex_core.py", line 1232, in parse_set_symm_diff
    items = [parse_set_inter(source, info)]
  File "C:\Python27\lib\_regex_core.py", line 1242, in parse_set_inter
    items = [parse_set_diff(source, info)]
  File "C:\Python27\lib\_regex_core.py", line 1252, in parse_set_diff
    items = [parse_set_imp_union(source, info)]
  File "C:\Python27\lib\_regex_core.py", line 1277, in parse_set_imp_union
    items.append(parse_set_member(source, info))
  File "C:\Python27\lib\_regex_core.py", line 1286, in parse_set_member
    start = parse_set_item(source, info)
  File "C:\Python27\lib\_regex_core.py", line 1338, in parse_set_item
    raise error("bad set", True)
error: bad set
>>>

Original comment by [email protected] on 26 Sep 2011 at 5:11

from mrab-regex-hg.

GoogleCodeExporter commented on June 12, 2024

The answer is yes, the fallback behaviour is gone. Although it helped in some 
cases, it certainly wasn't foolproof, so I thought it better just to let it 
fail.

Version 0: simple sets.

Version 1: nested sets.

Original comment by [email protected] on 26 Sep 2011 at 5:29

from mrab-regex-hg.

GoogleCodeExporter commented on June 12, 2024

Ok, thanks; this explains the behaviour I noticed; the plain failure indeed 
worked in my case, as the pattern is now finally corrected :-)
vbr

Original comment by [email protected] on 26 Sep 2011 at 5:41

from mrab-regex-hg.

"bad set" error for unescaped ] at the beginning of the set about mrab-regex-hg HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent