manoss96 / pregex Goto Github PK

View Code? Open in Web Editor NEW

757.0 757.0 24.0 315 KB

PRegEx - Programmable Regular Expressions

Home Page: https://pregex.rtfd.io

License: MIT License

Python 100.00%

python regex regular-expression

pregex's People

Contributors

Stargazers

Watchers

pregex's Issues

Case Insensitive Modifiers

This project is awesome! One feature I noticed may be missing is the ability to add a case insensitive modifier.

Something along the lines of:

CaseInsensitive('without' + AtMost(Any(), n=30) + 'evidence')

that would wrap the regex in (?i) ... (?-i)?

pre = EnclosedBy(pre, Whitespace()) makes an error

In this example:

pre = Pregex('a')
pre = EnclosedBy(pre, Whitespace())

what I want is : (?<=\s)a(?=\s) but an error occored : "NonFixedWidthPatternException: Instances of class "Pregex" cannot receive an instance of class "Whitespace" in place of a lookbehind-restriction-pattern as the latter represents a pattern whose width is not fixed.". So where did I do wrong and how to fix it?
Thank you very much !!

Pre section of the docs is empty

Is it possible that in the docs, the pregex.core.pre is almost empty, and methods such as get_captures and get_named_captures do not appear in the index (they do appear in the documentation though). It's confusing because only the class containing this methods appear, it would easier to find if the methods could also be listed since they're important

Thank you.

How to get text in outer brackets?

Hello,
Just discovered this package, very cool !!!
I have a question though.
I have the following text:
text="000[111[222[333]222]111]000"
and i would like to retrieve the text with the outer brackets '111[222[333]222]111'
This is simple in pure python but i want to do it with a regular expression.
Any idea?
Regards

Is there any way to match a pattern at start and end with one function?

In which file is OnceOrMore?

pre.split_by_match differs re.split with pros and cons

Simply use re when wanting its pros, right? or pre has a better way?

I prefer pre's behavior:

Pregex('aa').split_by_match("12aa34") --> ['12', '34']
Pregex('aa').split_by_match("12aa34aa") --> ['12', '34']
Pregex('aa').split_by_match("12aa34aaaa") --> ['12', '34']

But re has its pros:

re.split('aa',"12aa34") --> ['12', '34']
re.split('aa',"12aa34aa") --> ['12', '34', '']
re.split('aa',"12aa34aaaa") --> ['12', '34', '', '']

Guarantee there was a delimiter in between each matches.

Contributing to pregex

This project is really cool. People (including myself) might be interested in contributing. Are you accepting Pull Requests and/or planning to add a Contributing guide?

Adding CI tools for testing and linting

I think adding some CI tools for testing and linting will make managing contributions easier, and it will also make PRs faster to review.

I hope you can make a class for Korean.

I am making the following code for Korean.
If Korean is reflected officially, I think many Koreans will use it. :)

from pregex.core.classes import AnyLetter
class AnyKor(AnyLetter):

    def __init__(self) -> 'AnyKor':
        '''
        Matches any character from the Latin alphabet.
        '''
        super(AnyLetter,self).__init__('[ㄱ-ㅎ|가-힣]', is_negated=False)

How to match a DOT (RE's anything) ?

Can't find any example for Anything(). The closest I can think of is : Either(Whitespace(),AnyButWhitespace())

get_matches() got 3 but has_match() only 1

Doing exercise from here

f) For the given list, filter all elements having a line starting with 'den' or ending with 'ly'.

items = ['love', '1\ndentist', 'fly\nfar', 'dent']
word = Indefinite(AnyLetter())
start_with_den = MatchAtLineStart("den" + word)
end_with_ly = MatchAtLineEnd(word + "ly")
pre = Either(start_with_den, end_with_ly)

The strange thing is that get_matches() finds 3, but . . .

for i, item in enumerate(items):
    print(i, pre.get_matches(item))

# 3 founds at item 1,2,3 
0 []
1 ['dentist']
2 ['fly']
3 ['dent']

. . . while has_match() failed with some of the 3 founds!!

for i, item in enumerate(items):
    print(i, pre.has_match(item), item)

# only one True at item 3 ! <------- Problem should be 3 True
0 False love
1 False 1
dentist
2 False fly
far
3 True dent

Examples for Backreference, Conditional

Where to find examples especially Backreference and Conditional?
Thank you very much.

Change naming of "Optional" to not conflict with Python standard typing library

Right now, from pregex.quantifiers import Optional will compete/conflict with the native python typing module's from typing import Optional.

Since the python standard lib is used pretty ubiquitously in modern projects, it probably makes more sense to change the naming of Optional here to something else.

Collaboration

Welcome to the burgeoning field of trying to make regex human-friendly! We're always happy to have more people working on this gnarly problem.

Here is a list of some other attempts:
https://github.com/SonOfLilit/kleenexp#similar-works

May I invite you to have a look at mine in particular and maybe join hands? We're already in quite an advanced stage, having just launched a vscode extension that allows using our syntax almost natively in your text editor.

Cheers,
Aur

How to rewrite re.sub(pattern, '\\1_\\2', text) in PRegEx ?

Converting Camel case to Snake case, it works fine by go through RE:

text = "ConvertingCamelCaseToSnakeCase"
camel_case = Capture(AnyLowercaseLetter()) + Capture(AnyUppercaseLetter())
re.sub(camel_case.get_pattern(), '\\1_\\2', text).lower()
# 'converting_camel_case_to_snake_case'

But I'd like to know: how to write the last line in PRegEx ?

Create classes Email and Date in pregex.meta.essentials

Use the Pypi 'regex' module instead of the built-in 're' module

I came across a limitation of the re module.
The regex module https://pypi.org/project/regex/ doesn't have it.
I am trying to write an assertion to reject matches within brackets (parentheses)

def Not_within_brackets(pre): 
    anytext_re = Indefinite(Any())
    return pre.not_preceded_by(Pregex("(").enclose(anytext_re)).not_followed_by(Pregex(")").enclose(anytext_re))
pre=Not_within_brackets(Pregex("one"))
text="(not this one) this one"
print(pre.get_matches(text))

I get the following error from the underlying re module:
re.error: look-behind requires fixed-width pattern

I would like to use the regex module instead, unless there is another solution to this problem.

Many thanks !!!

Email format parsing failed

from pregex.classes import AnyButWhitespace
from pregex.quantifiers import OneOrMore
from pregex.operators import Either

text = "My email is [email protected]"

pre = (
    OneOrMore(AnyButWhitespace())
    + "@"
    + OneOrMore(AnyButWhitespace())
    + Either(".com", ".org", ".io", ".net")
)

a = pre.get_matches(text)
print(a)

error report:

Traceback (most recent call last):
  File "/media/zyf/software/pregex-lab/test06.py", line 13, in <module>
    + Either(".com", ".org", ".io", ".net")
  File "/home/zyf/anaconda3/envs/py310/lib/python3.10/site-packages/pregex/operators.py", line 69, in __init__
    super().__init__(pres, lambda pre1, pre2: pre1._either(pre2), __class__._PatternType.Either)
  File "/home/zyf/anaconda3/envs/py310/lib/python3.10/site-packages/pregex/operators.py", line 17, in __init__
    result = transform(result, __class__._to_pregex(pre))
  File "/home/zyf/anaconda3/envs/py310/lib/python3.10/site-packages/pregex/operators.py", line 69, in <lambda>
    super().__init__(pres, lambda pre1, pre2: pre1._either(pre2), __class__._PatternType.Either)
AttributeError: 'str' object has no attribute '_either'

Cannot install pregex

Is there any way to install this package rather than pip install pregex? I face this error.

negated character class?

I was considering using this to develop a negated character class for the following:

take a string and replace any non-alpha-numeric character with underscore

I ended up just using re and going with:

take a string and replace any non-word-numeric character with underscore

def clean_title(t: str) -> str:
    _ = re.sub(r'\W', '_', t)
    logger.debug(f"{t=}, {_=}")

manoss96 / pregex Goto Github PK

pregex's People

Contributors

Stargazers

Watchers

Forkers

pregex's Issues

f) For the given list, filter all elements having a line starting with 'den' or ending with 'ly'.

Recommend Projects

Recommend Topics

Recommend Org