Giter Site home page Giter Site logo

Comments (14)

aligrudi avatar aligrudi commented on June 4, 2024

from neatvi.

kyx0r avatar kyx0r commented on June 4, 2024

Good point about the "[(]" I guess we have to add one more check there then.

The problematic regex that I run into was this:

([^\t -,.-/:-@[-^{-~]+:).+;

To test put it into any syntax highlight ft, for example C, and see that everything breaks.

Well brackets can have nested classes but they still can't have groups. Or even if they could neatvi does not implement that, so still there is no reason to parse the bracket expression like it is currently.

from neatvi.

kyx0r avatar kyx0r commented on June 4, 2024

Though it seems like we will have to skip the bracket somehow, because even though ( can be escaped the slash will make it into the bracket exp. I'll see if I can implement it more cleanly then.

from neatvi.

aligrudi avatar aligrudi commented on June 4, 2024

from neatvi.

kyx0r avatar kyx0r commented on June 4, 2024

I came up with scatch code that should skip all brackets. (Not tested)

static int re_groupcount(char *s)
{
        int brk = *s == '[' ? 1 : 0;
        int n = *s == '(' ? 1 : 0;
        while (*s++) {
                if (!brk && s[0] == '(' && s[-1] != '\\')
                        n++;
                else if (s[0] == '[' && s[-1] != '\\')
                        brk++;
                else if (s[0] == ']' && s[-1] != '\\')
                        brk--;
        }
        return n;
}

What do you think? Should work correctly and cover all cases right?

from neatvi.

kyx0r avatar kyx0r commented on June 4, 2024

Also has to check for itself, to maintain only the outermost brackets in tact, this is it?

static int re_groupcount(char *s)
{
        int brk = *s == '[' ? 1 : 0;
        int n = *s == '(' ? 1 : 0;
        while (*s++) {
                if (!brk && s[0] == '(' && s[-1] != '\\')
                        n++;
                else if (!brk && s[0] == '[' && s[-1] != '\\')
                        brk++;
                else if (brk && s[0] == ']' && s[-1] != '\\')
                        brk--;
        }
        return n;
}

from neatvi.

kyx0r avatar kyx0r commented on June 4, 2024

Seems to be good enough. This of course assumes that parenthesis and brackets are balanced, but well, if you don't too bad regex won't compile either way.

from neatvi.

kyx0r avatar kyx0r commented on June 4, 2024

Okay, screw this - it seems to be the cleanest solution is to implement bracket escapes into my regex engines afterall. I could parse and run proper nested looping and balance the ( and [ to accomodate all the edge cases but that alone is too much to take. On the other side, supporting escape in bracket is just 1 if statement.

from neatvi.

aligrudi avatar aligrudi commented on June 4, 2024

from neatvi.

kyx0r avatar kyx0r commented on June 4, 2024

Hello Ali,

I don't know if this is any good.
In my forks, I just made escapes work inside the brackets. So you have to escape "(" inside brackets now (sometimes) for rset to correctly count the groups. I just leave that responsibility to the user, user has to know better when an escape is needed.

But, with that I have very clean codebase, my re_groupcount() is straight forward implementation, just counts non escaped "(".
See here regex.c, kyx0r/nextvi@739089d
The escape implementation in cheap, just one if statement (way better than doing all that stuff in your patch)

Commit history is a bit messy because I was working on 4 different things at the same time,
and this bug was really annoying and blocking in the way of things. I had to put escapes on some regexes in conf.c (I hope I didn't mess anything up there for hls like tex and those dirmarks, cause I don't know how they behaved before, seems stuff like

{+0, +1, 1, "\\\\\\*\\[([^]]+)\\]"},

is rather niche, I suppose you wanted the text to not be in reverse if it's inside \*[] ? The ([^]]+) part is ambiguous but I changed it to ([^\\]]+) now with escaped ] so it isn't ambiguous anymore (fun fact pikevm actually treated that expression like it was an empty [] regex before I implemented escapes it might even be bugged in your version of neatvi right now). And so I did similar changes to other exps in conf.

Also take a look at this commit: kyx0r/nextvi@f13348d

Yes, yes RIP bracket classes. I don't implement them anymore. Explained in the readme why.

At the end of the day, it's up to you to decide how you want to fix it in your version. I went with the simplest solution possible.

from neatvi.

aligrudi avatar aligrudi commented on June 4, 2024

from neatvi.

kyx0r avatar kyx0r commented on June 4, 2024

I haven't read the posix manpage, specifically about the \ having special meaning. I will read it now.

But I mean, at this point, I don't really try to be 100% compliant to all nit picks in the standard.
Just trying to do all the things that make sense, ie. make the implementation cleaner and more maintainable by removing
code that isn't impactful/essential. Yes the downside may be that theoretically I might get different behaviors if I swap out regex engine to use the C library implementation for example. But I see no reason for ever doing so. My regex code is 3X faster than the Musl C regex library and is 630 LOC, while their's is like 3000+LOC. You can only imagine how bloated glibc regex is looking at musl.

One thing about \ is the code still allows you to get it, you just have to put it twice, one for escape and last one will be treated like regular character inside bracket.

from neatvi.

kyx0r avatar kyx0r commented on June 4, 2024

" all other special characters, including '', lose their special significance within a bracket expression." regex(7)
Ok, I misread your email. That's basically it, nothing new here. If I don't comply to that I guess the escapes will be treated as characters in other posix compliant engines.

from neatvi.

kyx0r avatar kyx0r commented on June 4, 2024

Okay I might of changed my mind on this topic again...
Ehh - so I was thinking in abstract how my proposed code does not correctly working on cases like ([a(s])-(d])
But the nature of this is ambiguous, does the bracket end at [a(s] or does it end till [a(s])-(d] ?
Given these expressions the regex engine will always pick the shortest bracket end, in this case
it is [a(s]. So the same principle applies to my algorithm I wrote there. That means it should always
come up with the same number of groups the regcomp will, shortest path possible. And that is the
problem we are trying to solve here. In case of bracket classes they always end with :] so : is that escape
character for them (sort of). I don't include *] and =] because neatvi does not use those.
Therefore adding support for them is a matter of adding one more && condition. But then
there will be another problem when [:] alone will still incorrectly count groups.

static int re_groupcount(char *s)
{
        int brk = *s == '[' ? 1 : 0;
        int n = *s == '(' ? 1 : 0;
        while (*s++) {
                if (!brk && s[0] == '(' && s[-1] != '\\')
                        n++;
                else if (!brk && s[0] == '[' && s[-1] != '\\' && s[1] != ':')
                        brk++;
                else if (brk && s[0] == ']' && s[-1] != '\\' && s[-1] != ':')
                        brk--;
        }
        return n;
}

Finally, enough thinking about this, or my brain will melt.
Truly makes me appreciate the simplicity of solution escapes in bracket provide.

from neatvi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.