Giter Site home page Giter Site logo

Comments (5)

blester125 avatar blester125 commented on August 14, 2024

I took a crack at making a grammar for parsing .gitattributes lines with the lark library. I doesn't have out of the box support for outputting text from a parse tree, but if we add that we could manipulate attts with a parse -> munge tree (add a filter=theta node) -> output new text line.

One thing I noticed it that the gitattributes pattern matching is a bit more complex than cli glob matching. For example it supports [[:space:]] for a space character and automatically translates between \ in filenames to a / in patterns. Right now we are using fnmatch.fnmatchcase for pattern match checking between a file and the current git attributes so we are going to have some false negatives right now (which is ok as the behavior for no match adds a new attribute whose pattern is an exact match for the file)

?start: line

line : comment
     | pattern " " [attr (" " attr)*]

comment: "#"/.+/?

pattern : C_ESCAPED_STRING
        | UNESCAPED_PATTERN

C_ESCAPED_STRING : "\"" CHAR_SEQUENCE "\""
CHAR_SEQUENCE : CHAR+
CHAR : BASIC_CHAR
     | ESCAPE
BASIC_CHAR : /[^"\\]/
ESCAPE : "\'"
       | "\""
       | "\?"
       | "\\"
       | "\a"
       | "\b"
       | "\f"
       | "\n"
       | "\r"
       | "\t"
       | "\v"
       | /\\\d{3,}/
       | /\\o\d*/
       | /\\x\d*/
       | /\\u\d{4,}/
       | /U\d{8,}/
       | "\N{" N_CHAR_SEQUENCE "}"

N_CHAR_SEQUENCE : N_CHAR+
N_CHAR : "A".."Z"
       | "0".."9"
       | "-"
       | " "

UNESCAPED_PATTERN : /[^ \t\f\r\n"]/+

attr : set
     | unset
     | set_value

set : EFFECT
unset : "-"EFFECT
set_value : EFFECT "=" value

EFFECT : "text"
       | "eol"
       | "working-tree-encoding"
       | "ident"
       | "filter"
       | "diff"
       | "merge"
       | "whitespace"
       | "export-ignore"
       | "export-subst"
       | "delta"
       | "encoding"
       | "binary"

value : UNESCAPED_PATTERN

There is enough complexity that I think it warrants some dedicated work and testing. But it seems complex enough (and our default actions when we don't processes it as correctly as we should are reasonable) that it will end up being pretty high effort.

from git-theta.

blester125 avatar blester125 commented on August 14, 2024

WRT the python stdlib version of fnmatch not handling named character classes like [:space:] I found this library https://facelessuser.github.io/wcmatch/fnmatch/ which does handle them. So adding correct pattern matching to an abstracted gitattributes library will be a bit easier.

from git-theta.

craffel avatar craffel commented on August 14, 2024

Cool! So just to clarify, you think it's worth writing a dedicated gitattributes parsing/modifying library, but that it will be a good amount of work? I'm also a little confused as to how writing out will work if lark doesn't support going from a parse tree back to syntax.

from git-theta.

blester125 avatar blester125 commented on August 14, 2024

writing out will work if lark doesn't support going from a parse tree back to syntax.

We would need to write something like a tree visitor that outputs the right substrings based on tree nodes, would be annoying but not too hard (esp if ok with some small visual changes like removal of extra spaces). Or we can find a library that supports round tripping out of the box.

So just to clarify, you think it's worth writing a dedicated gitattributes parsing/modifying library,

I am undecided (boo, I know). There are definitely edge cases when working with gitattributes that we are mishandling, but our fallback behavior results in creating very specific gitattribute rules which override things we miss. So the result is things are still happening correctly, just in a round-about way.

So a library would be good from the aspect of being more correct, but it wouldn't necessarily fix anything that is broken now, or protect us that much in the future.

from git-theta.

craffel avatar craffel commented on August 14, 2024

Ok, maybe this is an if-it-ain't-broke-don't-fix-it situation - we can punt on this until after the proof of concept is done, and then revisit it later on (especially if someone is really excited about writing such a library)?

from git-theta.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.