Giter Site home page Giter Site logo

pregex's People

Contributors

dylannalex avatar ericgustin avatar manoss96 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pregex's Issues

Case Insensitive Modifiers

This project is awesome! One feature I noticed may be missing is the ability to add a case insensitive modifier.

Something along the lines of:

CaseInsensitive('without' + AtMost(Any(), n=30) + 'evidence')

that would wrap the regex in (?i) ... (?-i)?

pre = EnclosedBy(pre, Whitespace()) makes an error

In this example:

pre = Pregex('a')
pre = EnclosedBy(pre, Whitespace()) 

what I want is : (?<=\s)a(?=\s) but an error occored : "NonFixedWidthPatternException: Instances of class "Pregex" cannot receive an instance of class "Whitespace" in place of a lookbehind-restriction-pattern as the latter represents a pattern whose width is not fixed.". So where did I do wrong and how to fix it?
Thank you very much !!

Pre section of the docs is empty

Is it possible that in the docs, the pregex.core.pre is almost empty, and methods such as get_captures and get_named_captures do not appear in the index (they do appear in the documentation though). It's confusing because only the class containing this methods appear, it would easier to find if the methods could also be listed since they're important

pre_pregex

Thank you.

How to get text in outer brackets?

Hello,
Just discovered this package, very cool !!!
I have a question though.
I have the following text:
text="000[111[222[333]222]111]000"
and i would like to retrieve the text with the outer brackets '111[222[333]222]111'
This is simple in pure python but i want to do it with a regular expression.
Any idea?
Regards

pre.split_by_match differs re.split with pros and cons

Simply use re when wanting its pros, right? or pre has a better way?

I prefer pre's behavior:

Pregex('aa').split_by_match("12aa34") --> ['12', '34']
Pregex('aa').split_by_match("12aa34aa") --> ['12', '34']
Pregex('aa').split_by_match("12aa34aaaa") --> ['12', '34']

But re has its pros:

re.split('aa',"12aa34") --> ['12', '34']
re.split('aa',"12aa34aa") --> ['12', '34', '']
re.split('aa',"12aa34aaaa") --> ['12', '34', '', '']

Guarantee there was a delimiter in between each matches.

Contributing to pregex

This project is really cool. People (including myself) might be interested in contributing. Are you accepting Pull Requests and/or planning to add a Contributing guide?

I hope you can make a class for Korean.

I am making the following code for Korean.
If Korean is reflected officially, I think many Koreans will use it. :)

from pregex.core.classes import AnyLetter
class AnyKor(AnyLetter):

    def __init__(self) -> 'AnyKor':
        '''
        Matches any character from the Latin alphabet.
        '''
        super(AnyLetter,self).__init__('[ㄱ-ㅎ|가-힣]', is_negated=False)

get_matches() got 3 but has_match() only 1

Doing exercise from here

f) For the given list, filter all elements having a line starting with 'den' or ending with 'ly'.

items = ['love', '1\ndentist', 'fly\nfar', 'dent']
word = Indefinite(AnyLetter())
start_with_den = MatchAtLineStart("den" + word)
end_with_ly = MatchAtLineEnd(word + "ly")
pre = Either(start_with_den, end_with_ly)

The strange thing is that get_matches() finds 3, but . . .

for i, item in enumerate(items):
    print(i, pre.get_matches(item))

# 3 founds at item 1,2,3 
0 []
1 ['dentist']
2 ['fly']
3 ['dent']

. . . while has_match() failed with some of the 3 founds!!

for i, item in enumerate(items):
    print(i, pre.has_match(item), item)

# only one True at item 3 ! <------- Problem should be 3 True
0 False love
1 False 1
dentist
2 False fly
far
3 True dent

Collaboration

Welcome to the burgeoning field of trying to make regex human-friendly! We're always happy to have more people working on this gnarly problem.

Here is a list of some other attempts:
https://github.com/SonOfLilit/kleenexp#similar-works

May I invite you to have a look at mine in particular and maybe join hands? We're already in quite an advanced stage, having just launched a vscode extension that allows using our syntax almost natively in your text editor.

Cheers,
Aur

How to rewrite re.sub(pattern, '\\1_\\2', text) in PRegEx ?

Converting Camel case to Snake case, it works fine by go through RE:

text = "ConvertingCamelCaseToSnakeCase"
camel_case = Capture(AnyLowercaseLetter()) + Capture(AnyUppercaseLetter())
re.sub(camel_case.get_pattern(), '\\1_\\2', text).lower()
# 'converting_camel_case_to_snake_case'

But I'd like to know: how to write the last line in PRegEx ?

Use the Pypi 'regex' module instead of the built-in 're' module

I came across a limitation of the re module.
The regex module https://pypi.org/project/regex/ doesn't have it.
I am trying to write an assertion to reject matches within brackets (parentheses)

def Not_within_brackets(pre): 
    anytext_re = Indefinite(Any())
    return pre.not_preceded_by(Pregex("(").enclose(anytext_re)).not_followed_by(Pregex(")").enclose(anytext_re))
pre=Not_within_brackets(Pregex("one"))
text="(not this one) this one"
print(pre.get_matches(text))

I get the following error from the underlying re module:
re.error: look-behind requires fixed-width pattern

I would like to use the regex module instead, unless there is another solution to this problem.

Many thanks !!!

Email format parsing failed

from pregex.classes import AnyButWhitespace
from pregex.quantifiers import OneOrMore
from pregex.operators import Either

text = "My email is [email protected]"

pre = (
    OneOrMore(AnyButWhitespace())
    + "@"
    + OneOrMore(AnyButWhitespace())
    + Either(".com", ".org", ".io", ".net")
)

a = pre.get_matches(text)
print(a)

error report:

Traceback (most recent call last):
  File "/media/zyf/software/pregex-lab/test06.py", line 13, in <module>
    + Either(".com", ".org", ".io", ".net")
  File "/home/zyf/anaconda3/envs/py310/lib/python3.10/site-packages/pregex/operators.py", line 69, in __init__
    super().__init__(pres, lambda pre1, pre2: pre1._either(pre2), __class__._PatternType.Either)
  File "/home/zyf/anaconda3/envs/py310/lib/python3.10/site-packages/pregex/operators.py", line 17, in __init__
    result = transform(result, __class__._to_pregex(pre))
  File "/home/zyf/anaconda3/envs/py310/lib/python3.10/site-packages/pregex/operators.py", line 69, in <lambda>
    super().__init__(pres, lambda pre1, pre2: pre1._either(pre2), __class__._PatternType.Either)
AttributeError: 'str' object has no attribute '_either'

Cannot install pregex

Is there any way to install this package rather than pip install pregex? I face this error.
image

negated character class?

I was considering using this to develop a negated character class for the following:

take a string and replace any non-alpha-numeric character with underscore

I ended up just using re and going with:

take a string and replace any non-word-numeric character with underscore

def clean_title(t: str) -> str:
    _ = re.sub(r'\W', '_', t)
    logger.debug(f"{t=}, {_=}")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.