Giter Site home page Giter Site logo

tavianator / bfs Goto Github PK

View Code? Open in Web Editor NEW
952.0 16.0 36.0 2.65 MB

A breadth-first version of the UNIX find command

Home Page: https://tavianator.com/projects/bfs.html

License: BSD Zero Clause License

Makefile 2.37% C 83.43% Shell 14.20%
find command-line filesystem directory-tree breadth-first-search linux macos bsd unix

bfs's Introduction

bfs
Version License CI Status Code coverage

Features   •   Installation   •   Usage   •   Building   •   Contributing   •   Changelog

Screencast

bfs is a variant of the UNIX find command that operates breadth-first rather than depth-first. It is otherwise compatible with many versions of find, including

POSIX   •   GNU   •   FreeBSD   •   OpenBSD   •   NetBSD   •   macOS

If you're not familiar with find, the GNU find manual provides a good introduction.

Features

bfs operates breadth-first, which typically finds the file(s) you're looking for faster.

Imagine the following directory tree:

haystack
├── deep
│   └── 1
│       └── 2
│           └── 3
│               └── 4
│                   └── ...
└── shallow
    └── needle

find will explore the entire deep directory tree before it ever gets to the shallow one that contains what you're looking for. On the other hand, bfs lists files from shallowest to deepest, so you never have to wait for it to explore an entire unrelated subtree.

bfsfind
$ bfs haystack
haystack
haystack/deep
haystack/shallow
haystack/deep/1
haystack/shallow/needle
...
$ find haystack
haystack
haystack/deep
haystack/deep/1
haystack/deep/1/2
haystack/deep/1/2/3
haystack/deep/1/2/3/4
...
haystack/shallow
haystack/shallow/needle
bfs tries to be easier to use than find, while remaining compatible.

For example, bfs is less picky about where you put its arguments:

bfsfind
$ bfs -L -name 'needle' haystack
haystack/needle

$ bfs haystack -L -name 'needle'
haystack/needle

$ bfs -L haystack -name 'needle'
haystack/needle
$ find -L -name 'needle' haystack
find: paths must precede expression: haystack

$ find haystack -L -name 'needle'
find: unknown predicate `-L'

$ find -L haystack -name 'needle'
haystack/needle
bfs gives helpful errors and warnings.

For example, bfs will detect and suggest corrections for typos:

$ bfs -nam needle
bfs: error: bfs -nam needle
bfs: error:     ~~~~
bfs: error: Unknown argument; did you mean -name?

bfs also includes a powerful static analysis to help catch mistakes:

$ bfs -print -name 'needle'
bfs: warning: bfs -print -name needle
bfs: warning:            ~~~~~~~~~~~~
bfs: warning: The result of this expression is ignored.
bfs adds some options that make common tasks easier.

For example, the -exclude operator skips over entire subtrees whenever an expression matches. -exclude is both more powerful and easier to use than the standard -prune action; compare

$ bfs -name config -exclude -name .git

to the equivalent

$ find ! \( -name .git -prune \) -name config

As an additional shorthand, -nohidden skips over all hidden files and directories. See the usage documentation for more about the extensions provided by bfs.

Installation

bfs may already be packaged for your operating system.

LinuxmacOS
Alpine Linux
# apk add bfs

Arch Linux
# pacman -S bfs

Debian/Ubuntu
# apt install bfs

Fedora Linux
# dnf install bfs

Gentoo
# emerge sys-apps/bfs

GNU Guix
# guix install bfs

NixOS
# nix-env -i bfs

Void Linux
# xbps-install -S bfs
Homebrew
$ brew install bfs

MacPorts
# port install bfs
BSD
FreeBSD
# pkg install bfs

OpenBSD
# pkg_add bfs
To build bfs from source, you may need to install some dependencies.

The only absolute requirements for building bfs are a C compiler, GNU make, and Bash. These are installed by default on many systems, and easy to install on most others. Refer to your operating system's documentation on building software.

bfs also depends on some system libraries for some of its features. Here's how to install them on some common platforms:

Alpine Linux
# apk add acl{,-dev} attr libcap{,-dev} liburing-dev oniguruma-dev

Arch Linux
# pacman -S acl attr libcap liburing oniguruma

Debian/Ubuntu
# apt install acl libacl1-dev attr libattr1-dev libcap2-bin libcap-dev liburing-dev libonig-dev

Fedora
# dnf install acl libacl-devel attr libcap-devel liburing-devel oniguruma-devel

NixOS
# nix-env -i acl attr libcap liburing oniguruma

Void Linux
# xbps-install -S acl-{devel,progs} attr-progs libcap-{devel,progs} liburing-devel oniguruma-devel

Homebrew
$ brew install oniguruma

MacPorts
# port install oniguruma6

FreeBSD
# pkg install oniguruma

These dependencies are technically optional, though strongly recommended. See the build documentation for how to disable them.

Once you have the dependencies, you can build bfs.

Download one of the releases or clone the git repo. Then run

$ ./configure
$ make

This will build the ./bin/bfs binary. Run the test suite to make sure it works correctly:

$ make check

If you're interested in speed, you may want to build the release version instead:

$ ./configure --enable-release
$ make

Finally, if you want to install it globally, run

# make install

bfs's People

Contributors

a1346054 avatar bmundt6 avatar bourgeoisbear avatar chapmanjacobd avatar data-man avatar dependabot[bot] avatar electronicsarchiver avatar farribeiro avatar lamby avatar lgtm-migrator avatar lugoues avatar markus-oberhumer avatar mpolden avatar rhermes avatar tavianator avatar thesamesam avatar virtualroot avatar vorpalblade avatar wscott avatar xfgusta avatar ylluminarious avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bfs's Issues

Not available via apt-get

The readme says that bfs is available with apt-get

If you're on Debian GNU/Linux, bfs is available via apt-get install bfs.

But when I try to install it, It doesn't seem to be available.

$ sudo apt-get update && sudo apt-get install bfs
Hit:1 http://archive.ubuntu.com/ubuntu zesty InRelease
Hit:2 http://archive.ubuntu.com/ubuntu zesty-updates InRelease
Hit:3 http://archive.ubuntu.com/ubuntu zesty-security InRelease
Reading package lists... Done
Reading package lists... Done
Building dependency tree       
Reading state information... Done
E: Unable to locate package bfs

Using apt search bfs does not show any matches either.

bftw(): Permission denied

Arch Linux 64 bits
bfs 0.70 (installed from AUR)

When I execute a search in system folders, it shows the error bftw(): Permission denied.
Example:

$ bfs /usr > /dev/null 
'/usr/share/polkit-1/rules.d/': Permission denied
bftw(): Permission denied

Support more regex types

Findutils offers

  • findutils-default
  • awk
  • egrep
  • ed
  • emacs
  • gnu-awk
  • grep
  • posix-awk
  • posix-basic
  • posix-egrep
  • posix-extended
  • posix-minimal-basic
  • sed

statx redefinition error

glibc has recently added statx which results in following redefinition error:

cc -D__EXTENSIONS__ -D_ATFILE_SOURCE -D_BSD_SOURCE -D_DARWIN_C_SOURCE -D_DEFAULT_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBFS_VERSION=\"1.2.2-10-g9caf477\" -D_FORTIFY_SOURCE=2 -std=c99 -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt -MD -MP -MF typo.d -c typo.c -o typo.o
In file included from stat.c:27:
/usr/include/linux/stat.h:56:8: error: redefinition of 'struct statx_timestamp'
 struct statx_timestamp {
        ^~~~~~~~~~~~~~~
In file included from /usr/include/sys/stat.h:446,
                 from stat.c:23:
/usr/include/bits/statx.h:25:8: note: originally defined here
 struct statx_timestamp
        ^~~~~~~~~~~~~~~
In file included from stat.c:27:
/usr/include/linux/stat.h:99:8: error: redefinition of 'struct statx'
 struct statx {
        ^~~~~
In file included from /usr/include/sys/stat.h:446,
                 from stat.c:23:
/usr/include/bits/statx.h:36:8: note: originally defined here
 struct statx
        ^~~~~
make: *** [Makefile;64: stat.o] Error 1
make: *** Waiting for unfinished jobs....

Pulseaudio had a similar situation when glibc added memfd, here is their solution: pulseaudio/pulseaudio@dfb0460#diff-a3824a947f7132abd61ce6a8a9b5e2a4

You could possibly check if __NR_statx defined, but I'm not sure.

Alternatively it might be better to expose a -D flag via the Makefile and guard against that in the implementation. That way I can compile it with make HAVE_STATX=1 or something like it.

idea: print inode of file (and even of the parent)

Today I was thinking that instead of building a comprehensive grammar into a lightweight tool like this, it would be useful to allow it to serve other programs as well as possible. An example could be performing operations on files hierarchies (for example deletion).

Naturally, a BFS traversal of a directory tree will find it hard to actually remove the entire tree in one go since the folders one encounters first can't easily be removed. Yet the files can.

Instead of:

$ find . -type f -delete

This would be possible:

# delete all files in a file hierarchy
$ bfs somedir -type f | parallel -X rm
# get rid of empty husk
$ rm -rf somedir

So that's great, doing one thing and one thing well. I wondered whether it could be made even better (and by better I mean faster):

It would seem that sorting by inode would be a good shot. Buffering up an array and sorting it in C is needlessly messy, and there are other tools that can do that. But bfs(1) would need to output the inode number as calling an external program to stat(2) every file is very expensive. Luckily, stat(2) is not necessary since the inode number is present in the dirent struct.

So we can imagine:

$ bfs -i .
 10720996 ./.git
 10721207 ./.gitignore
 10982737 ./bfs
 10979979 ./bfs.c
 10982713 ./bfs.d
 10982714 ./bfs.o

Notice that the output is not sorted by inode number. Naively doing:

$ bfs -i -type f . | sort -n | parallel -X rm -f

Would maybe not be ideal as it is very possible that this places files from lots of different directories close to each other. Intuitively it seems best to "empty" one directory in one god. this would need to be benchmarked though, because I'm just grasping at straws here.

So we could compare that to sorting by inode within a directory. For that it would be sufficient to output the inode of the parent directory as well:

$ bfs -i -p -type f .
...
10720996 10979934 ./.git/ORIG_HEAD
10720996 10721190 ./.git/packed-refs
10720996 10720997 ./.git/refs
10721002 10721003 ./.git/hooks/applypatch-msg.sample
10721002 10721004 ./.git/hooks/commit-msg.sample
10721002 10721005 ./.git/hooks/post-update.sample
...

$ bfs -i -p -type f . | sort -n -k1,2 | cut -f 2- -d' ' | parallel -X rm

That seems nice, if the intuition about what's more convenient for a file system is accurate, which remains to be seen. That said, this will also place directories in non-BFS order, which might be important. An alternative is to output a generation number instead of the parent directories' inode:

$ bfs -i -g -type f .
...
48 10977657 ./.git/objects/a4/7cee3a89da7a2cff6b615e381fd486d5d9c06e
49 10977653 ./.git/objects/a5/41b1002b2830a0bde86a2e50bf7e58cd41ea85
50 10721056 ./.git/objects/ac/11476e20136976bf689acf847fdcea4e05e37a
50 10976289 ./.git/objects/ac/2e304d1fca0aab2eebba954851c3ef759aaaf5
50 10721187 ./.git/objects/ac/6b2e7981854281d75b24fbb972933a25cb5bcc
51 10721048 ./.git/objects/ad/628c395cc45add3e3a87f21250cbd0df1d860e
52 10979608 ./.git/objects/b0/4f245cb03897e2616aaeb643cb03244993b4cd
53 10721137 ./.git/objects/b1/337c8e1dbfe6ae1b9d79e2b16af210562c2e17
53 10721119 ./.git/objects/b1/d430ec463fd1d49bf47443cb143af0e8e9905e
54 10721080 ./.git/objects/b2/510860f1a20cbaab76b3efca68efe9bc530c66
55 10741243 ./.git/objects/b4/0ccb5fe4e00584ec867706b71db707bf3f63ce
56 10721111 ./.git/objects/be/538a8a3cd111795549eecb16f05bce8c8ba907
...

This will allow one to sort only within directories, keeping BFS traversal order.

However, this still has a big problem: sort(1) wait until there's no more input before sorting and outputting. A bfs(1) run on a very large directory tree could take a long time, and in the meanwhile one could be performing operations on files instead of twiddling thumbs.

Something like this could work

bfs -i -dirdelim='\n' -type f | parallel --pipe --recend='\n\n' "sort -n | cut -f 2- -d' ' | parallel -X rm"

While it looks like it could work, it doesn't somehow, parallel(1) doesn't understand. It also looks complicated, which is almost never good.

Anyway, I just wanted to write down my thoughts for possible features. I've implemented a few of them (the output I pasted is mostly real :).

Permission Issue

When it try to list directories inside directory without read permissions, it throws out error. I think bfs should skip such directories before hand.

error: './cache/private': Permission denied
error: './cache/ldconfig': Permission denied
error: './db/sudo': Permission denied
error: './tmp/systemd-private-b0c63c239a4f49ebaf5fa0ee80d157c8-ModemManager.service-ODCouH': Permission denied
error: './tmp/systemd-private-b0c63c239a4f49ebaf5fa0ee80d157c8-systemd-timesyncd.service-Jze1lA': Permission denied
error: './lib/portables': Permission denied
error: './lib/machines': Permission denied
error: './lib/private': Permission denied
error: './lib/NetworkManager': Permission denied
error: './lib/udisks2': Permission denied
error: './lib/rpcbind': Permission denied
error: './log/private': Permission denied
error: './spool/cups': Permission denied
error: './lib/nfs/sm': Permission denied
error: './lib/nfs/sm.bak': Permission denied
error: './lib/sddm/.nv': Permission denied
error: './lib/sddm/.config': Permission denied
error: './lib/gssproxy/rcache': Permission denied

printf with \0 not handled the same as GNU find

GNU find outputs an ASCII NUL for -printf with \0, but not bfs.

$ find -name "*.c" -type f -printf '%P\0' | xargs -0 ls -s
24 bftw.c  12 color.c   4 dstring.c  24 eval.c  16 exec.c   4 main.c   4 mtab.c  76 parse.c  24 printf.c   8 typo.c   8 util.c

$ bfs -nohidden -name "*.c" -type f -printf '%P\0' | xargs -0 ls -s
ls: cannot access 'parse.ccolor.cmtab.cutil.cprintf.cmain.ctypo.ceval.cdstring.cbftw.cexec.c': No such file or directory

bfs just seems to omit the zero character, rather than simply terminating the string as I suspected:

$ bfs -name "parse.c" -type f -printf '%P\0%P'
parse.cparse.c

Feature Request: accept list of files from STDIN

One thing I've always thought would be great as a part of 'find' is accepting a list of files on stdin instead of searching through a provided directory. That way you could use the filters of find/bfs just on a list of files.

i.e.

$ cat foo
/home/user/file.txt
/home/user/file2.txt
/home/user/blah/file99.mp3
$ cat foo | bfs - -ctime +1
/home/user/file.txt
$ cat foo | bfs - -name '*.mp3'
/home/user/blah/file99.mp3

This sort of thing could probably also be done by using a bunch of other commands and parsing their output with grep. But it would be a lot more convenient and would significantly reduce cognitive load to be able to just pipe a list of files into bfs and use its filters on them.

I think this would be a great enhancement for bfs and would be yet another reason to use it over find. Either way, bfs is great and I'm thankful for your work on it!

Support NO_COLOR

While bfs -nocolor exists it would perhaps be nice if it honoured NO_COLOR to provide a more universal way to prevent the use of colours.

BSD find compatibility

Options:

  • -E
  • -H
  • -L
  • -P
  • -X
  • -d
  • -s
  • -x
  • -f PATH

Expressions:

  • ( EXPR )
  • ! EXPR, -not EXPR
  • EXPR1 EXPR2, EXPR1 -a EXPR2, EXPR1 -and EXPR2
  • EXPR1 -o EXPR2, EXPR1 -or EXPR2

Primaries:

  • -Bmin, -Bnewer, -Btime
  • -acl
  • -anewer, -amin, -atime
  • -cnewer, -cmin, -ctime
  • -asince, -csince, -since
  • -delete
  • -depth
  • -depth N
  • -empty
  • -exec COMMAND ;
  • -exec COMMAND {} +
  • -execdir COMMAND
  • -execdir COMMAND {} +
  • -exit STATUS
  • -flags FLAGS
  • -fstype TYPE
  • -gid GNAME
  • -group GNAME
  • -ignore_readdir_race, -noignore_readdir_race
  • -ilname PATTERN
  • -iname PATTERN
  • -inum N
  • -ipath PATTERN, -iwholename PATTERN
  • -iregex PATTERN
  • -links N
  • -lname PATTERN
  • -ls
  • -maxdepth LEVELS
  • -mindepth LEVELS
  • -mnewer
  • -mmin, -mtime
  • -mount
  • -name PATTERN
  • -newer FILE
  • -newerXY REFERENCE
  • -nogroup
  • -nouser
  • -noleaf
  • -ok COMMAND ;
  • -okdir COMMAND ;
  • -path PATTERN, -wholename PATTERN
  • -perm -?MODE
  • -perm +MODE
  • -print
  • -print0
  • -printx
  • -prune
  • -quit
  • -regex PATTERN
  • -rm
  • -samefile FILE
  • -size N[ckMGTP]
  • -sparse
  • -type [bcdpfls]
  • -uid UNAME
  • -user UNAME

A -type flag

I'd like to use this in some fzf contraption of mine. Yet for that I would need it to output only directories. For example like the type d flag commonly found in find.

Since you expose the typeflag parameter to the callback but don't do anything with it I assume you wanted to do this but didn't have the time.

The change is at any rate small (37+, 2- lines), I have a patch ready but cannot contribute to your project if it's under the WTFPL . Licenses like MIT/BSD/ISC/Apache 2 or similar would be fine, should you consider changing your license or dual-licensing the project.

I'm not asking you to change the license, that's your prerogative. But if you can't/won't do that, would you perhaps consider adding the feature yourself?

[Feature request] Grep style directory exclusion

I've been using bfs for a while now, it is excellent. One thing that find is terrible for is excluding directories, and I thought maybe bfs could add on a more use friendly system.

It would be a lot nicer if there was the option to exclude/include directories/files in the same way that grep does:

--exclude-dir={build,vendor,.git,node_modules} --include=*.{css,vcl,conf,yml,jade,js,styl,php,config,html} --exclude={*.min.js,tags,json}

I'd be quite interested in helping to write this functionality, but I thought I would raise it as an issue first to see what your thoughts on it are.

Respect VCS ignore files

As suggested by @merwok: #8 (comment), we should implement a -vcsignore action. Related to #30 since conditionally respecting .gitignore would be very weird, it should be an on/off thing regardless of where it appears on the command line.

Requires -lrt switch

Hello,

I just tried compiling this project.

It compiles if I do

make CFLAGS+=-lrt

Without that, I get

eval.c:858: undefined reference to `clock_gettime'

bfs 1.3 fails to build on FreeBSD

Commit b2e69d3 changes some EINVAL to ENODATA which is not defined on FreeBSD. Please advise on what I should replace it with.

Another issue is that util.h now includes sys/capability.h apparently unconditionally even on non-Linux platforms leading to

/usr/include/sys/capability.h:44:2: warning: this file includes <sys/capability.h> which is deprecated [-W#warnings]
#warning this file includes <sys/capability.h> which is deprecated
 ^

On FreeBSD sys/capability.h is an older name for sys/capcisum.h. It's not the same header as on Linux, so the __has_include check is not sufficient.

GNU find compatibility

For compatibility with GNU findutils' find, we need the following features:

Flags:

  • -H
  • -L
  • -P
  • -Olevel
  • -D

Expressions:

  • ( EXPR )
  • ! EXPR, -not EXPR
  • EXPR1 EXPR2, EXPR1 -a EXPR2, EXPR1 -and EXPR2
  • EXPR1 -o EXPR2, EXPR1 -or EXPR2
  • EXPR1 , EXPR2

Positional options:

  • -daystart
  • -follow
  • -regextype
  • -warn, -nowarn

Normal options:

  • -d, -depth
  • -maxdepth LEVELS
  • -mindepth LEVELS
  • -mount, -xdev
  • -noleaf
  • -ignore_readdir_race, -noignore_readdir_race
  • -help, --help
  • -version, --version

Tests:

  • -amin N
  • -anewer FILE
  • -atime N
  • -cmin N
  • -cnewer FILE
  • -ctime N
  • -empty
  • -false
  • -fstype TYPE
  • -gid N
  • -group NAME
  • -ilname PATTERN
  • -iname PATTERN
  • -inum N
  • -ipath PATTERN, -iwholename PATTERN
  • -iregex PATTERN
  • -links N
  • -lname PATTERN
  • -mmin N
  • -mtime N
  • -name PATTERN
  • -newer FILE
  • -newerXY REFERENCE
  • -nogroup
  • -nouser
  • -path PATTERN, -wholename PATTERN
  • -perm [+-/]?MODE
  • -regex PATTERN
  • -readable, -writable, -executable
  • -samefile FILE
  • -size N[bcwkMG]
  • -true
  • -type [bcdpflsD]
  • -uid N
  • -used N
  • -user NAME
  • -xtype [bcdpfls]

Actions:

  • -delete
  • -exec COMMAND ;
  • -exec COMMAND {} +
  • -ok COMMAND ;
  • -execdir COMMAND
  • -execdir COMMAND {} +
  • -okdir COMMAND ;
  • -ls
  • -fls FILE
  • -print
  • -fprint FILE
  • -print0
  • -fprint0 FILE
  • -printf FORMAT
  • -fprintf FILE FORMAT
  • -prune
  • -quit

Make -nohidden position independent

Hello,

When I do:

bfs -nohidden . -name "*foo*"

it works great! Fast, and ignoring dirs starting with ., as expected.

But when I do:

bfs . -name "*foo*" -nohidden

the -nohidden switch is not effective.. I get search results from hidden dirs too (dirs starting with .)!

Build failures on kfreebsd

Hi, bfs currently fails on the kfreebsd architectures in Debian:

c -D__EXTENSIONS__ -D_ATFILE_SOURCE -D_BSD_SOURCE -D_DARWIN_C_SOURCE -D_DEFAULT_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBFS_VERSION=\"1.6\" -Wdate-time -D_FORTIFY_SOURCE=2 -std=c99 -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -MD -MP -MF spawn.d -c spawn.c -o spawn.o
cc -D__EXTENSIONS__ -D_ATFILE_SOURCE -D_BSD_SOURCE -D_DARWIN_C_SOURCE -D_DEFAULT_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBFS_VERSION=\"1.6\" -Wdate-time -D_FORTIFY_SOURCE=2 -std=c99 -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -MD -MP -MF stat.d -c stat.c -o stat.o
cc -D__EXTENSIONS__ -D_ATFILE_SOURCE -D_BSD_SOURCE -D_DARWIN_C_SOURCE -D_DEFAULT_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBFS_VERSION=\"1.6\" -Wdate-time -D_FORTIFY_SOURCE=2 -std=c99 -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -MD -MP -MF time.d -c time.c -o time.o
cc -D__EXTENSIONS__ -D_ATFILE_SOURCE -D_BSD_SOURCE -D_DARWIN_C_SOURCE -D_DEFAULT_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBFS_VERSION=\"1.6\" -Wdate-time -D_FORTIFY_SOURCE=2 -std=c99 -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -MD -MP -MF trie.d -c trie.c -o trie.o
cc -D__EXTENSIONS__ -D_ATFILE_SOURCE -D_BSD_SOURCE -D_DARWIN_C_SOURCE -D_DEFAULT_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBFS_VERSION=\"1.6\" -Wdate-time -D_FORTIFY_SOURCE=2 -std=c99 -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -MD -MP -MF typo.d -c typo.c -o typo.o
cc -D__EXTENSIONS__ -D_ATFILE_SOURCE -D_BSD_SOURCE -D_DARWIN_C_SOURCE -D_DEFAULT_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBFS_VERSION=\"1.6\" -Wdate-time -D_FORTIFY_SOURCE=2 -std=c99 -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -MD -MP -MF util.d -c util.c -o util.o
cc -D__EXTENSIONS__ -D_ATFILE_SOURCE -D_BSD_SOURCE -D_DARWIN_C_SOURCE -D_DEFAULT_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBFS_VERSION=\"1.6\" -Wdate-time -D_FORTIFY_SOURCE=2 -std=c99 -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -MD -MP -MF bfs  -Wl,-z,relro -Wl,-z,now bftw.o color.o darray.o diag.o dstring.o eval.o exec.o fsade.o main.o mtab.o opt.o parse.o printf.o spawn.o stat.o time.o trie.o typo.o util.o   -o bfs
/usr/bin/ld: fsade.o: in function `bfs_check_xattrs':
./fsade.c:288: undefined reference to `extattr_list_file'
/usr/bin/ld: ./fsade.c:288: undefined reference to `extattr_list_link'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:96: bfs] Error 1
make[1]: Leaving directory '/<<PKGBUILDDIR>>'
dh_auto_build: error: make -j2 "INSTALL=install --strip-program=true" returned exit code 2
make: *** [debian/rules:6: binary-arch] Error 25

https://buildd.debian.org/status/package.php?p=bfs

I see that we have a BFS_HAS_SYS_EXTATTR macro already, but I think the setup for this is wrong in kFreeBSD which is a weird hybrid (eg. perhaps the && !__FreeBSD__ logic is wrong here.)

Can't compile on OSX Sierra, missing `->st_atim` etc.

Full output:

➜  bfs git:(master) make release
cc -D_DEFAULT_SOURCE -D_GNU_SOURCE -DBFS_VERSION=\"0.82-10-gb0216b9\"  -std=c99 -O3 -flto -Wall -DNDEBUG -MD -MP -MF bftw.d -c bftw.c -o bftw.o
cc -D_DEFAULT_SOURCE -D_GNU_SOURCE -DBFS_VERSION=\"0.82-10-gb0216b9\"  -std=c99 -O3 -flto -Wall -DNDEBUG -MD -MP -MF color.d -c color.c -o color.o
cc -D_DEFAULT_SOURCE -D_GNU_SOURCE -DBFS_VERSION=\"0.82-10-gb0216b9\"  -std=c99 -O3 -flto -Wall -DNDEBUG -MD -MP -MF dstring.d -c dstring.c -o dstring.o
cc -D_DEFAULT_SOURCE -D_GNU_SOURCE -DBFS_VERSION=\"0.82-10-gb0216b9\"  -std=c99 -O3 -flto -Wall -DNDEBUG -MD -MP -MF eval.d -c eval.c -o eval.o
cc -D_DEFAULT_SOURCE -D_GNU_SOURCE -DBFS_VERSION=\"0.82-10-gb0216b9\"  -std=c99 -O3 -flto -Wall -DNDEBUG -MD -MP -MF main.d -c main.c -o main.o
cc -D_DEFAULT_SOURCE -D_GNU_SOURCE -DBFS_VERSION=\"0.82-10-gb0216b9\"  -std=c99 -O3 -flto -Wall -DNDEBUG -MD -MP -MF parse.d -c parse.c -o parse.o
cc -D_DEFAULT_SOURCE -D_GNU_SOURCE -DBFS_VERSION=\"0.82-10-gb0216b9\"  -std=c99 -O3 -flto -Wall -DNDEBUG -MD -MP -MF typo.d -c typo.c -o typo.o
eval.c:139:20: error: no member named 'st_atim' in 'struct stat'
                time = &statbuf->st_atim;
                        ~~~~~~~  ^
eval.c:142:20: error: no member named 'st_ctim' in 'struct stat'
                time = &statbuf->st_ctim;
                        ~~~~~~~  ^
eval.c:145:20: error: no member named 'st_mtim' in 'struct stat'
                time = &statbuf->st_mtim;
                        ~~~~~~~  ^
eval.c:175:20: error: no member named 'st_atim' in 'struct stat'
                time = &statbuf->st_atim;
                        ~~~~~~~  ^
eval.c:178:20: error: no member named 'st_ctim' in 'struct stat'
                time = &statbuf->st_ctim;
                        ~~~~~~~  ^
eval.c:181:20: error: no member named 'st_mtim' in 'struct stat'
                time = &statbuf->st_mtim;
                        ~~~~~~~  ^
eval.c:199:40: error: no member named 'st_atim' in 'struct stat'
        time_t diff = timespec_diff(&statbuf->st_atim, &statbuf->st_ctim);
                                     ~~~~~~~  ^
eval.c:199:59: error: no member named 'st_ctim' in 'struct stat'
        time_t diff = timespec_diff(&statbuf->st_atim, &statbuf->st_ctim);
                                                        ~~~~~~~  ^
8 errors generated.
make: *** [eval.o] Error 1
make: *** Waiting for unfinished jobs....
parse.c:724:21: error: no member named 'st_mtim' in 'struct stat'
        expr->reftime = sb.st_mtim;
                        ~~ ^
parse.c:1156:23: error: no member named 'st_atim' in 'struct stat'
                        expr->reftime = sb.st_atim;
                                        ~~ ^
parse.c:1159:23: error: no member named 'st_ctim' in 'struct stat'
                        expr->reftime = sb.st_ctim;
                                        ~~ ^
parse.c:1162:23: error: no member named 'st_mtim' in 'struct stat'
                        expr->reftime = sb.st_mtim;
                                        ~~ ^
4 errors generated.
make: *** [parse.o] Error 1

Implement -sort

-s will sort by name, but it would be nice to sort by other attributes like type, inode number, etc. The main difficulty is that at the time we sort, we don't have the struct BFTW available, so we either need to potentially compute those earlier and cache them for the later actual visits, or pass a different (lighter) type to the comparison function. But even with a lighter type for comparisons, determining the type/inode number may need a stat(), which should be cached somehow anyway.

It would be extra nice to support compound orderings like -sort type,-name meaning sort by type first, then name backwards.

Make bftw() multi-threaded

Multi-threading in bftw() should allow for some compute and I/O parallelism. fd gets a great speedup from it, although implementing it in bfs will have significantly more complexity.

bfs quite a bit slower than BSD and GNU find

I've recently written my own BFS version of find in Go and optimised it until it performed equally to regular find(1).

I've ran a few rudimentary benchmarks of bfs(1):

➜  ~  time find . >/dev/null
find . > /dev/null  0.97s user 4.96s system 6% cpu 1:34.89 total
➜  ~  time gfind . >/dev/null
gfind . > /dev/null  1.39s user 5.17s system 8% cpu 1:15.24 total
➜  ~  time ~/github/bfs/bfs >/dev/null
~/github/bfs/bfs > /dev/null  1.57s user 36.42s system 15% cpu 4:00.42 total
➜  ~  time ~/github/bfs/bfs >/dev/null

This is after warmup. bfs(1) was compiled with -O2 -flto.

I believe this might have to do with the fact that find in its default incarnation only does getdents(64) calls with a large enough buffer to amortize the system call overhead. At least, I was only able to match find(1) when I did something similar (which I had to do by copying and changing some of the Go stdlib, which insisted on doing a useless stat(2)).

I'm on OSX now and don't know how to properly use dtrace, so it'll have to wait until I'm at a linux box to see what's going on. But I'm leaving this here just as a heads up.

Implement a depth-first mode

Suggested by /u/mailme_gx and /u/defaultxr on reddit here. Depth-first mode takes less memory, and is probably slightly faster if you're exploring a whole tree.

Latest bfs version fails to find libacl and libcap

> LANG=C git pull
Already up to date.

> LANG=C make
[…]
cc -D__EXTENSIONS__ -D_ATFILE_SOURCE -D_BSD_SOURCE -D_DARWIN_C_SOURCE -D_DEFAULT_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DBFS_VERSION=\"1.3\"  -std=c99 -g -Wall -Wmissing-declarations -MD -MP -MF bfs  -Wl,--as-needed  bftw.o color.o diag.o dstring.o eval.o exec.o main.o mtab.o opt.o parse.o posix1e.o printf.o spawn.o stat.o typo.o util.o  -lacl -lcap -lrt  -o bfs
/usr/bin/ld: cannot find -lacl
/usr/bin/ld: cannot find -lcap
collect2: error: ld returned 1 exit status
make: *** [bfs] Error 1

> uname -srm
Linux 4.4.0-119-generic x86_64

Consider removing all nohidden/hidden code

Dotfiles are one of those old bugs introduced long ago in an attempt to make ls skip the directories . and .. which permeate every directory, and because the code only checked if the first byte was ., [1].

It resulted in decades of developer abuse and clutter, including paranoia as now shells ignore these so-called "dotfiles" during globbing operations resulting in data loss as * does not mean "matches any string" as the glob(7) manual suggests.

In the name of simplicity, consider removing support for this pointless distinction or at least make the reality of the filesystem (-nohidden) the default.

As an aside, this would make bfs more compatible with find.

PS: This mechanism was correctly done by GoboLinux with their gobohide tool and module.

The -Og flag doesn't exist in Clang

Hi, bfs(1) works fine on OSX, except that it doesn't compile with the default compiler:

➜  bfs git:(master) ✗ make
cc -D_DEFAULT_SOURCE  -std=c99 -g -Og -Wall -MD -MP -MF bfs.d -c bfs.c -o bfs.o
cc -D_DEFAULT_SOURCE  -std=c99 -g -Og -Wall -MD -MP -MF bftw.d -c bftw.c -o bftw.o
cc -D_DEFAULT_SOURCE  -std=c99 -g -Og -Wall -MD -MP -MF color.d -c color.c -o color.o
error: invalid integral value 'g' in '-Og'
error: invalid integral value 'g' in '-Og'
make: *** [bftw.o] Error 1
make: *** Waiting for unfinished jobs....
error: invalid integral value 'g' in '-Og'
error: invalid integral value 'g' in '-Og'
error: invalid integral value 'g' in '-Og'
error: invalid integral value 'g' in '-Og'
make: *** [bfs.o] Error 1
make: *** [color.o] Error 1

It does work with gcc 5.2

➜  bfs git:(master) ✗ CC=gcc-5 make
gcc-5 -D_DEFAULT_SOURCE  -std=c99 -g -Og -Wall -MD -MP -MF bfs.d -c bfs.c -o bfs.o
gcc-5 -D_DEFAULT_SOURCE  -std=c99 -g -Og -Wall -MD -MP -MF bftw.d -c bftw.c -o bftw.o
gcc-5 -D_DEFAULT_SOURCE  -std=c99 -g -Og -Wall -MD -MP -MF color.d -c color.c -o color.o
gcc-5 -D_DEFAULT_SOURCE  -std=c99 -g -Og -Wall -MD -MP -MF bfs  bfs.o bftw.o color.o -o bfs

On that note, perhaps it wouldn't be bad to add a make release mode. Or, as is better in my experience, let the default make target be an optimized build (-O2 -flto) and provide a make debug target.

If it builds with a regular make invocation on OSX, this could get included in homebrew, which would be great.

Very beautiful code too, I enjoyed reading it.

Feature request: highlight matched part of filename

I think it would help with readability if matched substrings were highlighted in some way (eg: with a yellow background).

It would be extra helpful for regular expressions, where the patterns are abstract.

A workaround is to run bfs | grep --color, but this removes all of bfs' beautiful filetype colour coding!

bfs uses fcntl while find doesn't (unnecessary F_SETFD FD_CLOEXEC)

Running the following:

strace -c bfs -links 2 >/dev/null ; strace -c find . -links 2 >/dev/null 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 63.99    0.000972           2       530           newfstatat
 10.40    0.000158           0       374           getdents
  5.46    0.000083           0       558           fcntl
  4.41    0.000067           0       187           openat
  3.09    0.000047           0       375           close
  2.83    0.000043           5         9           mmap
  2.70    0.000041           0       189           fstat
  1.51    0.000023           6         4           mprotect
  1.18    0.000018           6         3         3 access
  0.86    0.000013           7         2           open
  0.79    0.000012          12         1           execve
  0.72    0.000011           4         3           brk
  0.72    0.000011           4         3         2 ioctl
  0.66    0.000010           5         2           munmap
  0.26    0.000004           4         1           read
  0.20    0.000003           3         1           getrlimit
  0.20    0.000003           3         1           arch_prctl
  0.00    0.000000           0         1           write
------ ----------- ----------- --------- --------- ----------------
100.00    0.001519                  2244         5 total
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 23.69    0.002030           4       530           newfstatat
 18.37    0.001574           4       372           getdents
 15.33    0.001314           3       379           close
 13.06    0.001119           3       375           fchdir
 10.65    0.000913           5       186           openat
 10.26    0.000879           4       196         4 open
  6.60    0.000566           3       191           fstat
  0.68    0.000058           5        12           mmap
  0.40    0.000034           6         6           mprotect
  0.21    0.000018           5         4         4 access
  0.18    0.000015           4         4           read
  0.16    0.000014           5         3           munmap
  0.13    0.000011           4         3           brk
  0.12    0.000010           3         3         2 ioctl
  0.11    0.000009           9         1           execve
  0.04    0.000003           3         1           uname
  0.04    0.000003           3         1           arch_prctl
  0.00    0.000000           0         1           write
------ ----------- ----------- --------- --------- ----------------
100.00    0.008570                  2268        10 total

I suspect the reason to be that find passed O_CLOEXEC to openat(2), while bfs does it in a separate fcntl(2) call or something. Which is valid because it doesn't fork but could be avoided.

Let's check.

$ strace find . -links 2 >/dev/null |& grep 'CLOEXEC' | tail
openat(AT_FDCWD, "d7", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
openat(AT_FDCWD, "27", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
openat(AT_FDCWD, "logs", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
openat(AT_FDCWD, "refs", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
openat(AT_FDCWD, "remotes", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
openat(AT_FDCWD, "origin", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
openat(AT_FDCWD, "heads", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
openat(AT_FDCWD, "hooks", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
openat(AT_FDCWD, "branches", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
openat(AT_FDCWD, "info", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
openat(AT_FDCWD, ".git", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5

bfs does a strange sort of dance:

$  strace bfs -links 2 >/dev/null |& grep -P 'close|open|CLOEXEC' | less
...
openat(6, "73/", O_RDONLY|O_DIRECTORY|O_CLOEXEC) = 9
fcntl(9, F_DUPFD_CLOEXEC, 0)            = 10
fcntl(10, F_SETFD, FD_CLOEXEC)          = 0
close(10)                               = 0
close(9)                                = 0
...

If I'm seeing that right, there's no reason to to the fcntl(10, F_SETFD, FD_CLOEXEC) operation after fcntl(9, F_DUPFD_CLOEXEC, 0) is called, From http://man7.org/linux/man-pages/man2/fcntl.2.html:

F_DUPFD_CLOEXEC (int; since Linux 2.6.24)
As for F_DUPFD, but additionally set the close-on-exec flag
for the duplicate file descriptor. Specifying this flag
permits a program to avoid an additional fcntl() F_SETFD
operation to set the FD_CLOEXEC flag. For an explanation of
why this flag is useful, see the description of O_CLOEXEC in
open(2).

As for the dup call which find doesn't do, I think it might have something to do with the breadth-first ways of bfs. So that could be required for proper functioning. Don't know.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.