Giter Site home page Giter Site logo

Escaping glob patterns about dotnet.glob HOT 9 CLOSED

dazinator avatar dazinator commented on June 19, 2024
Escaping glob patterns

from dotnet.glob.

Comments (9)

dazinator avatar dazinator commented on June 19, 2024

Escaping isn't currently supported by the parser. However you should be able to achieve it with the fluent builder:

var glob = new GlobBuilder()
.PathSeperator()
.Literal("my*files")
.PathSeperator()
.Literal("more[stuff]")
.PathSeperator()
.Literal("is-there-more?")
.PathSeperator()
.ToGlob();

Supporting it at the parser level is something I'd like to add in the future.

I guess you have to give some thought to the escape sequence. If we went with a backslash, I guess you could now necessarily assume it to always be an escape character as someone might be just matching a path like this: \foo\bar. So I guess it would only need to be interpreted by the parser as an escape characer if the following character would not ordinarliy be parsed as a literal. I think with that additional check it should work. Sound right to you?

from dotnet.glob.

cocowalla avatar cocowalla commented on June 19, 2024

Fluent builder wouldn't work for my use case, as I need to parse arbitrary patterns (one I used was just an example).

Escaping with a backslash and only treating it as an escape sequence if followed by a special character sounds generally like the correct approach to me too.

from dotnet.glob.

cocowalla avatar cocowalla commented on June 19, 2024

So, I thought I'd have a crack at this! I thought I was done, and was just adding some more test cases... then I realised that I don't think escaping with a backslash is going to work :(

Consider this pattern (which is likely to be very common):
c:\MyFolder\*.txt

The intent is of course to look for *.txt files within c:\MyFolder, but the asterisk would be treated as a literal, so that's not going to work.

Same deal:

c:\MyFolder\[abc]de

If we use backslash as an escape character, the result would be a literal c:\MyFolder[abc]de.

You might think that at least it's not an issue on Linux, but technically you can have backslashes in filenames (even if it's uncommon).

I'm not sure what a good solution is here. A simple solution is to make the escape character configurable, effectively pushing the problem to devs that use this library, but at least allowing them to choose something that works for their scenario.

from dotnet.glob.

dazinator avatar dazinator commented on June 19, 2024

Yeah its tricky, thanks for having a go at it.

I was just reading how other glob libraries approach it and came across this https://docs.python.org/3/library/glob.html

It mentions:

For a literal match, wrap the meta-characters in brackets. For example, '[?]' matches the character '?'.

:-) perhaps thats an easier approach

from dotnet.glob.

cocowalla avatar cocowalla commented on June 19, 2024

Huh, that could be quite a clever solution! I'll give it a try tonight and see how it looks.

from dotnet.glob.

dazinator avatar dazinator commented on June 19, 2024

@cocowalla

I have done some more investigation on this, and it turns out that no logic changes are necessary for handling escaping.. It should already just work. So I ended up removing the escape sequence parsing.

I added these test cases to IsMatch and they all passed:

        [InlineData(@"C:\myergen\[[]a]tor", @"C:\myergen\[a]tor")]
        [InlineData(@"C:\myergen\[[]ator", @"C:\myergen\[ator")]
        [InlineData(@"C:\myergen\[[][]]ator", @"C:\myergen\[]ator")]
        [InlineData(@"C:\myergen[*]ator", @"C:\myergen*ator")]
        [InlineData(@"C:\myergen[*][]]ator", @"C:\myergen*]ator")]
        [InlineData(@"C:\myergen[*]]ator", @"C:\myergen*ator", @"C:\myergen]ator")]
        [InlineData(@"C:\myergen[?]ator", @"C:\myergen?ator")]
        [InlineData(@"/path[\]hatstand", @"/path\hatstand")]
        public void IsMatch(string pattern, params string[] testStrings)

I noted that one of the test cases you added I think you were expecting these to match:

C:\myergen[*]]ator pattern to match: "C:\myergen*]ator")]

This isn't actually how this is currently interpreted. The above pattern is actually still a character list, so it will expect to match any one character in that list, which means

C:\myergen[*]]ator matches either "C:\myergen*ator")] or "C:\myergen]ator")].

To match "C:\myergen*]ator")] you would want to use this pattern:

C:\myergen[*][]]ator

Hopefully that makes sense.

This feature should already work, but as part of the feature branch we need to just extend the README with a section explaining how escaping works. I'll get around to that at some point no doubt.

from dotnet.glob.

dazinator avatar dazinator commented on June 19, 2024

Also negation also passes:

      [InlineData(@"/foo/bar[!!].baz", @"/foo/bar7.baz")] // anything except an exclaimation mark after bar
        [InlineData(@"/foo/bar[!]].baz", @"/foo/bar9.baz")] // anything except an ] after bar
        [InlineData(@"/foo/bar[!?].baz", @"/foo/bar7.baz")] // anything except an ? after bar
        [InlineData(@"/foo/bar[![].baz", @"/foo/bar7.baz")] // anything except an [ after bar

from dotnet.glob.

cocowalla avatar cocowalla commented on June 19, 2024

The above pattern is actually still a character list, so it will expect to match any one character

^ emphasis mine :) Yeah, I messed that one up!

it turns out that no logic changes are necessary for handling escaping.. It should already just work!

Excellent 👍

from dotnet.glob.

dazinator avatar dazinator commented on June 19, 2024

Cool. Well thanks for the PR, the added tests, and the removal of the unnecessary options. I think this has helped tidy it up a bit. I've merged this to develop, and i'll probably merge to master pretty soon as an incremental release.

from dotnet.glob.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.