mherrmann / gitignore_parser Goto Github PK
View Code? Open in Web Editor NEWA spec-compliant gitignore parser for Python 3.5+
License: MIT License
A spec-compliant gitignore parser for Python 3.5+
License: MIT License
It's as simple as that. With the following ignore file, nothing is matched:
*
!/keepme
The file "keepme" is matched by the *
and is not inverted by the !
rule.
Looking through the source, I can see that the concept of a negation rule is in there. The ! is parsed, and 'negation' is set in an IgnoreRule. But it's never used.
More specifically, as lines are read from the ignore_file
, each line is turned into a rule using rule_from_pattern
. Those rules are then all evaluated via:
lambda file_path: any(r.match(file_path) for r in rules)
This makes all rules equal. Negation rules are not. They negate any match that happens above them, so separate processing would need to be done.
hey @mherrmann. thanks for this python module.
we discovered that a .gitignore with a leading slash directory doesn't work
you will see all files within the root directory
for instance
root
--node_modules
--etc
whenever you read a .gitignore with a leading slash it still is not able to pick up those paths
example .gitignore may look like the following
/node_modules
and this will return false coming back
To put it in code, I think that this one shall pass:
def test_ignore_directory_no_slash(self):
matches = _parse_gitignore_string('.venv', fake_base_dir='/home/michael')
self.assertTrue(matches('/home/michael/.venv'))
self.assertTrue(matches('/home/michael/.venv/folder'))
self.assertTrue(matches('/home/michael/.venv/file.txt'))
Because according to man gitignore
:
$ man gitignore | grep 'If there is a separator at the end of the pattern'
• If there is a separator at the end of the pattern then the pattern will only match directories, otherwise the pattern can match both files and directories.
I tried to use ** to indicate that I want all folders called 'Bar' at any level below foo/ to be ignored. Works fine for git, but given:
.gitignore
foo/**/Bar/
and
ignorePatterns = gitignore_parser.parse_gitignore(ignorefile)
ignorePatterns("c:\foo\Bar\foofile") # returns false
ignorePatterns("c:\foo\test\Bar\foofile") # returns false
ignorePatterns("c:\foo\test\test\Bar\foofile") # returns false
Some smoke tests would be helpful.
when installing a project of us:
DEPRECATION: gitignore-parser is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559
Just to notify you.
After upgrade from 0.1.0
to 0.1.1
I faced an issue with existing test. Looks like there is unsafe operation with abs_path
which supposed to be a string only. That's a wrong assuming, because most libraries support the both types: [str, Path].
Code impacted: https://github.com/mherrmann/gitignore_parser/blob/v0.1.1/gitignore_parser.py#L143
import tempfile
from pathlib import Path
from gitignore_parser import parse_gitignore
def reproduce_issue():
with tempfile.TemporaryDirectory() as git_dir:
git_dir = Path(git_dir)
gitignore_file_path = git_dir / ".gitignore"
gitignore_file_path.write_text("file1\n!file2")
matches = parse_gitignore(gitignore_file_path)
assert matches(git_dir / "file1")
assert not matches(git_dir / "file2")
if __name__ == '__main__':
reproduce_issue()
Traceback (most recent call last):
File "/Users/kukusan2/Workspace/gitignore_issue/reproduce_issue.py", line 20, in <module>
reproduce_issue()
File "/Users/kukusan2/Workspace/gitignore_issue/reproduce_issue.py", line 15, in reproduce_issue
assert matches(git_dir / "file1")
File "/Users/kukusan2/Workspace/gitignore_issue/venv2/lib/python3.9/site-packages/gitignore_parser.py", line 36, in <lambda>
return lambda file_path: handle_negation(file_path, rules)
File "/Users/kukusan2/Workspace/gitignore_issue/venv2/lib/python3.9/site-packages/gitignore_parser.py", line 11, in handle_negation
if rule.match(file_path):
File "/Users/kukusan2/Workspace/gitignore_issue/venv2/lib/python3.9/site-packages/gitignore_parser.py", line 143, in match
if self.negation and abs_path[-1] == '/':
TypeError: 'PosixPath' object is not subscriptable
Hello, unit tests fail on windows runners, specifically for the trailing whitespace tests. This occured for the recent changes made to fix issues with resolving symlinks. The Path
call in _normalize_path
retains whitespaces on Linux, but removes them on Windows.
A simple (but not elegant) fix I found was to re-apply the trailing whitespace on windows systems.
# At bottom of file
def _count_trailing_whitespace(text: str):
count = 0
for char in reversed(str(text)):
if char.isspace():
count += 1
else:
break
return count
# In IgnoreRule
def match(self, abs_path):
"""Returns True or False if the path matches the rule."""
matched = False
if self.base_path:
rel_path = str(_normalize_path(abs_path).relative_to(self.base_path))
else:
rel_path = str(_normalize_path(abs_path))
# Path() strips trailing spaces on windows
if sys.platform.startswith('win'):
rel_path += " " * _count_trailing_whitespace(abs_path)
# Path() strips the trailing slash, so we need to preserve it
# in case of directory-only negation
if self.negation and isinstance(abs_path, str) and abs_path[-1] == '/':
rel_path += '/'
if rel_path.startswith('./'):
rel_path = rel_path[2:]
if re.search(self.regex, rel_path):
matched = True
return matched
While Python allows you to use tabs or spaces the community often chooses spaces.
I noticed this repo contains a mixture with the majority tabs.
I would argue that spaces is more with the community and is actually part of PEP8.
See: https://www.python.org/dev/peps/pep-0008/
If you agree I can PR the changes to this and once the #15 is merged can add an additional linter to enforce on CI.
Would you be interested in setting up codacy integration into this project?
It's free and relatively painless aside the tuning of the tools that are are used to lint the project and can scan for multiple things like:
As well as it can be used to collect code coverage metrics from the test suite which I can help set up.
gitignore specification states:
when deciding whether to ignore a path, Git normally checks gitignore patterns from multiple sources [...] patterns read from a .gitignore file in the same directory as the path, or in any parent directory, with patterns in the higher level files (up to the top level of the work tree) being overridden by those in lower level files down to the directory containing the file
so it would be nice to have a way to build a list of ignore rules not only from specific .gitignore file, but from all .gitignore files in specific directory tree.
actually, for my work project I reused your code and built custom ignore function for shutil.copytree. this work I'd done can be back ported to your project, if you see idea fit your vision of this package. waiting to hear back from you to open PR.
.gitignore contents:
build
<path to project>/abcbuild
matches even though such a pattern is supposed to only match if the file name is exactly that.
What's your opinion on providing clear release notes with each release so that people know what's changed between?
I've seen some projects adopting a CHANGELOG.md
file in the repo to track these and others leverage github to handle this.
I don't have a strong opinion on either.
I have a file that is a symbolic link to another location, e.g., /one/two/three -> /four/five/six. When using parse_gitignore on the symlink path (/one/two/three) it throws a ValueError:
raise ValueError("{!r} is not in the subpath of {!r}"
I believe it's caused by this line:
rel_path = str(Path(abs_path).resolve().relative_to(self.base_path))
According to the Path() documentation, resolve() will follow symlinks and remove "..". Was this intentional behavior to resolve symlinks? If so, some additional logic may be needed to catch this situation.
Hi
I noticed this issue:
If you have a rule:
build/
Then also files like build.py
or build_something.py
match this rule.
I think it's because trailing slashes (indicating directories) are stripped for some reason: https://github.com/mherrmann/gitignore_parser/blob/4dd7293/gitignore_parser.py#L89
I'm happy to open a PR!
Hi ! Thank you for the implementation, It's great but I had some trouble implementing it in https://github.com/seluj78/potodo
Here's what is happening
.potodoignore
content
venv/
from gitignore_parser import parse_gitignore
from pathlib import Path
potodoignore_path = Path("/Users/seluj78/Projects/potodo/.potodoignore")
bad_file = Path("/Users/seluj78/Projects/potodo/venv/bin/python")
matches = parse_gitignore(gitignore_path)
matches(bad_file) # False
But if I change it to this, then it works
from gitignore_parser import parse_gitignore
from pathlib import Path
base_path = Path("/Users/seluj78/Projects/potodo")
bad_file = Path("/Users/seluj78/Projects/potodo/venv/bin/python")
matches = parse_gitignore(".potodoignore", base_dir=base_path)
matches(bad_file) # True
hope this helps !
Hi,
The premise of this issue is that I want to ignore patterns from .gitignore
and .git/info/exclude
as well (for local ignores). For performance reasons, it's probably better to use a single matcher for both files.
I noticed in #1 that you don't want to spend time on the project. Would you accept a PR of a small change that adds an function that take a file-like object as argument and does what's in parse_gitignore
's with open():
block ?
For anchored matches, the ^
is being inserted before the flags:
>>> import gitignore_parser
>>> from pathlib import Path
>>> gitignore_parser.rule_from_pattern("/foo", Path(".").resolve()).match("42")
/home/mdk/clones/gitignore_parser/gitignore_parser.py:143: DeprecationWarning: Flags not at the start of the expression '^(?ms)foo$'
if re.search(self.regex, rel_path):
False
Whilst processing an open source folder which happens to be obs-studio there is a .gitignore file
*
!.gitignore
!data/
!exec32/
!exec32r/
!exec32d/
!exec64/
!exec64r/
!exec64d/
!libs32/
!libs32r/
!libs32d/
!libs64/
!libs64r/
!libs64d/
!misc/
The library errors when it see's the single * wildcard.
~/.local/lib/python3.8/site-packages/gitignore_parser.py in parse_gitignore(full_path, base_dir)
16 return matched
---> 17
18 def parse_gitignore(full_path, base_dir=None):
~/.local/lib/python3.8/site-packages/gitignore_parser.py in rule_from_pattern(pattern, base_path, source)
67 start_index = m.start()
---> 68 if (start_index != 0 and start_index != len(pattern) - 2 and
69 (pattern[start_index - 1] != '/' or
IndexError: string index out of range
I think that the single character is causing issues. Causing the failure. I will look into the fix.
Is this correct? It feels like a bug to me since it works when you specify the rule in .gitignore
.
Imagine .gitignore
contents including
.venv/
And a script evaluating this
gitignore_matcher('/path/to/repo/.venv/')
True
gitignore_matcher('/path/to/repo/.venv/bin')
False
To me the second one should return True
Hi,
I try to set a .gitignore file like this one:
It doesn't work with this library.
If you try with a a simple .gitignore
file with:
*
You got something like that:
File "/home/fab/metwork/mfext/build/opt/python3_core/lib/python3.5/site-packages/gitignore_parser.py", line 18, in parse_gitignore
source=(full_path, counter))
File "/home/fab/metwork/mfext/build/opt/python3_core/lib/python3.5/site-packages/gitignore_parser.py", line 68, in rule_from_pattern
if pattern[0] == '*' and pattern[1] == '*':
IndexError: string index out of range
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.