Giter Site home page Giter Site logo

Comments (6)

stephenlb avatar stephenlb commented on August 25, 2024 1

Good find! Even linking to the RegEx standard showing that it works using the documented reference 👍

from regex.

stephenlb avatar stephenlb commented on August 25, 2024 1

Hope we can get this fixed soon 🔜🤞

from regex.

BurntSushi avatar BurntSushi commented on August 25, 2024 1

Your regex is kinda messed up. Specifically, this part (which is repeated):

[_[\w\d]*]?

Python regexes don't support nested character classes unlike the regex crate. And because Python's regex engine follows the tradition of context dependent escaping rules, meta characters like ] are treated literally when used in a context in which they cannot possibly have any special significance. But, as can be seen in this case, it makes the regex quite deceptive. Here's a better way to write the same part of the pattern:

[_\[\w\d]*\]?

And indeed, using that with the regex crate produces the desired result:

use regex::Regex;

fn main() {
    let pattern = r"(?:private|group)[_\[\w\d]*\]?_abc1d2345678ef90ab3c4567890defab[_\[\w\d]*\]?";
    let compiled = Regex::new(pattern).unwrap();

    let test_haystacks = vec![
        "private_x9z45678abc12345d6e7890f123ghijk_abc1d2345678ef90ab3c4567890defab",
        "private_x9z45678abc12345d6e7890f123ghijk_abc1d2345678ef90ab3c4567890defab___[[[aaa111]",
        "private[_0f4f790_abc1d2345678ef90ab3c4567890defab",
    ];

    for test_haystack in &test_haystacks {
        match compiled.is_match(test_haystack) {
            true => println!("PASS: {}", test_haystack),
            false => eprintln!("FAIL: {}", test_haystack),
        }
    }
}

(I also switched to using raw strings via r"..." so that you don't need to do double escaping.)

from regex.

BurntSushi avatar BurntSushi commented on August 25, 2024 1

Good find! Even linking to the RegEx standard showing that it works using the documented reference 👍

This isn't a bug and there is no requirement that this crate matches Python's regex engine in all cases. There's also no regex standard at play here (governing either Python's or Rust's regex engine).

from regex.

CodyPubNub avatar CodyPubNub commented on August 25, 2024

Hi @BurntSushi 👋

I don't consider this issue invalid.

I'm not in a position to change the un-compiled regular expressions as they are provided by end users, and if they're compilable, which they are, they are expected to be searchable.

Do you have any particular guidance toward a solution for compatibility?

from regex.

BurntSushi avatar BurntSushi commented on August 25, 2024

I don't know what you mean by your assertion that they are "compatible."

There is literally an unbounded number of ways in which Python regexes are different than Rust regexes. And this generally applies to all pairs of regex engines unless they very strictly follow a standard. (Of which, generally speaking, only two are prevalent: POSIX and ECMA. Neither Python's regex engine nor Rust's regex engine follow either one.)

I don't consider this issue invalid.

I want to be clear here that this issue is definitively invalid within the scope of this project. That doesn't mean you don't have a problem. You might have a problem on your end where you have a pile of regexes that worked with one regex engine and need to use them, unchanged, with some other regex engine. But that isn't really a problem I can help with and is in general not a problem that can be easily solved for any two regex engines. (Unless your patterns happen to incidentally behave the same, or as I mentioned above, the regex engines strictly adhere to an existing standard.)

Do you have any particular guidance toward a solution for compatibility?

Well... of course not. Because I don't really know the structure of the problem you're trying to solve. All that's been presented to me here is a regex that works one way in Python and a seeming request to have it work the same way in Rust. But that will definitively not happen. As far as solving your problem in a different way, I don't know because I don't know what problem you're trying to solve. If, for example, these regexes are provided by end users and you've promised that the regex syntax is equivalent to whatever Python supports, then you need to use a regex engine with the goal of compatibility with Python's regex engine. (Of which, I believe only one exists. The re module in Python's standard library. The third party regex Python package on PyPI might also have enough compatibility to work for you.)

from regex.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.