Comments (5)
I took a crack at making a grammar for parsing .gitattributes
lines with the lark library. I doesn't have out of the box support for outputting text from a parse tree, but if we add that we could manipulate attts with a parse -> munge tree (add a filter=theta node) -> output new text line
.
One thing I noticed it that the gitattributes pattern matching is a bit more complex than cli glob matching. For example it supports [[:space:]]
for a space character and automatically translates between \
in filenames to a /
in patterns. Right now we are using fnmatch.fnmatchcase
for pattern match checking between a file and the current git attributes so we are going to have some false negatives right now (which is ok as the behavior for no match adds a new attribute whose pattern is an exact match for the file)
?start: line
line : comment
| pattern " " [attr (" " attr)*]
comment: "#"/.+/?
pattern : C_ESCAPED_STRING
| UNESCAPED_PATTERN
C_ESCAPED_STRING : "\"" CHAR_SEQUENCE "\""
CHAR_SEQUENCE : CHAR+
CHAR : BASIC_CHAR
| ESCAPE
BASIC_CHAR : /[^"\\]/
ESCAPE : "\'"
| "\""
| "\?"
| "\\"
| "\a"
| "\b"
| "\f"
| "\n"
| "\r"
| "\t"
| "\v"
| /\\\d{3,}/
| /\\o\d*/
| /\\x\d*/
| /\\u\d{4,}/
| /U\d{8,}/
| "\N{" N_CHAR_SEQUENCE "}"
N_CHAR_SEQUENCE : N_CHAR+
N_CHAR : "A".."Z"
| "0".."9"
| "-"
| " "
UNESCAPED_PATTERN : /[^ \t\f\r\n"]/+
attr : set
| unset
| set_value
set : EFFECT
unset : "-"EFFECT
set_value : EFFECT "=" value
EFFECT : "text"
| "eol"
| "working-tree-encoding"
| "ident"
| "filter"
| "diff"
| "merge"
| "whitespace"
| "export-ignore"
| "export-subst"
| "delta"
| "encoding"
| "binary"
value : UNESCAPED_PATTERN
There is enough complexity that I think it warrants some dedicated work and testing. But it seems complex enough (and our default actions when we don't processes it as correctly as we should are reasonable) that it will end up being pretty high effort.
from git-theta.
WRT the python stdlib version of fnmatch
not handling named character classes like [:space:]
I found this library https://facelessuser.github.io/wcmatch/fnmatch/ which does handle them. So adding correct pattern matching to an abstracted gitattributes library will be a bit easier.
from git-theta.
Cool! So just to clarify, you think it's worth writing a dedicated gitattributes parsing/modifying library, but that it will be a good amount of work? I'm also a little confused as to how writing out will work if lark doesn't support going from a parse tree back to syntax.
from git-theta.
writing out will work if lark doesn't support going from a parse tree back to syntax.
We would need to write something like a tree visitor that outputs the right substrings based on tree nodes, would be annoying but not too hard (esp if ok with some small visual changes like removal of extra spaces). Or we can find a library that supports round tripping out of the box.
So just to clarify, you think it's worth writing a dedicated gitattributes parsing/modifying library,
I am undecided (boo, I know). There are definitely edge cases when working with gitattributes that we are mishandling, but our fallback behavior results in creating very specific gitattribute rules which override things we miss. So the result is things are still happening correctly, just in a round-about way.
So a library would be good from the aspect of being more correct, but it wouldn't necessarily fix anything that is broken now, or protect us that much in the future.
from git-theta.
Ok, maybe this is an if-it-ain't-broke-don't-fix-it situation - we can punt on this until after the proof of concept is done, and then revisit it later on (especially if someone is really excited about writing such a library)?
from git-theta.
Related Issues (20)
- Add an "apply to all" option to merge actions
- Parameter groups that are more than just tensors? HOT 3
- Add a way to script merges
- Functionality for partial model loading HOT 3
- Method to tell if git-theta wasn't installed? HOT 4
- Pytorch Checkpoint reading
- Git Add can have high memory usage.
- Finer-grained control of `git theta install` HOT 1
- Tensorflow model loading/saving seems bugged
- `git theta ls-files` HOT 1
- Git-Theta Clean
- Hanging when crashing
- More intelligent concurrency limits
- Investigate using cffi to speed up git lfs interface
- Configurable Serialization, Combining, and Saving to a backend
- Add `__str__` to metadata object HOT 1
- Update CI to handle MacOS
- Add retry to end2end tests
- in the `clean` filter, auto-detect checkpoint handler based on file extension HOT 1
- [end2ends] push repos to Hugging Face Hub (and git clone from there) to ensure it works HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from git-theta.