The following code began emitting a SyntaxWarning
in Python 3.12, and should likely be updated to use a raw-string format as described in the introductory section of the re
docs:
|
_unquotedChars = ':/\?=#~' |
Context / History
Python strings such as '\w'
that contain invalid escape sequences have become frequently-used in regular expression code, because regex syntax includes some character-classes that are indicated by a backslash followed by a single character.
The absence of any output warning/error when using standard Python string notation to represent those -- as opposed to a raw string notation) -- has allowed these invalid-escape strings to appear organically in many codebases.
Python 3.6 added a (silent-by-default) DeprecationWarning
for strings that contain invalid escape sequences, as part of an intention to gradually increase the severity the warning, and eventually for it to become a syntax error (rejected by the interpreter). The deprecation warning occurs when the relevant Python warnings are enabled:
# $ python -Walways
Python 3.11.8 (main, Feb 7 2024, 21:52:08) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> ':/\?=#~'
<stdin>:1: DeprecationWarning: invalid escape sequence '\?'
':/\\?=#~'
From Python 3.12 onwards, the visibility of these syntax problems has been increased, and they begin emitting a SyntaxWarning
message to the output by default:
# $ python
Python 3.12.3 (main, Apr 10 2024, 03:39:08) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> ':/\?=#~'
<stdin>:1: SyntaxWarning: invalid escape sequence '\?'
':/\\?=#~'
A fix should be relatively straightforward and is essentially source-only; it affects the way that we represent the set of unquoted characters in the source code as written, and the runtime behaviour of the program (apart from warning/error output) should not change.