Giter Site home page Giter Site logo

fsatrace's People

Contributors

jacereda avatar jayzhuang avatar kakkun61 avatar ndmitchell avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

fsatrace's Issues

Corruption of PATH

Since 4f9f599 the first entry on $PATH is corrupted, which causes the Shake test suites to fail, e.g. https://travis-ci.org/ndmitchell/shake/jobs/574506269. The actual test adds shake_helper to the start of the $PATH. Since the above fsatrace changes that to stuff_fsatrace_needs;$PATH, and the path separator on Linux is : not ;, that corrupts the first entry in the $PATH. I guess if you want to take that route, then you should use : vs ; in a platform-specific way? Although encoding information in the $PATH freaks me out a lot (but I'm guessing you determined that nothing else would do before trying it...).

fsatrace cannot run cabal: setModificationTime: invalid argument

I'm seeing the following error in Linux/x86_64 using the latest git version of fsatrace:

$ fsatrace rwm /tmp/foo -- cabal unpack -v0 base-orphans-0.5.4
base-orphans-0.5.4/: setModificationTime: invalid argument (Bad file
descriptor)
fsatrace?�~�(1072): error: command failed with code 1
argv[0]=cabal
argv[1]=unpack
argv[2]=-v0
argv[3]=base-orphans-0.5.4

Is that a known problem? Am I doing something wrong?

segfault in emiting op

Hello, I'm seeing a segfault in the traced app, when it's trying to write to the shared memory buffer back to fsatrace:

<segv>
#4  emitOp (oc=oc@entry=114, op1=<optimized out>, p2=p2@entry=0x0) at src/emit.c:118
#5  0x00007f6fa63525f3 in fdemit (c=c@entry=114, fd=fd@entry=16) at src/unix/fsatraceso.c:118
#6  0x00007f6fa6352937 in openat64 (fd=-100, p=<optimized out>, f=<optimized out>, m=<optimized out>) at src/unix/fsatraceso.c:269

I don't have much more info at this time, but from looking at the source, is this likely to be running past the end of the buffer?

From my reading of main(), all accesses are buffered in the shared memory buffer until the process is complete, and then written, correct? (no concurrent access)
https://github.com/jacereda/fsatrace/blob/master/src/fsatrace.c#L193-L203

And the default logsize is 1MB of text?
https://github.com/jacereda/fsatrace/blob/master/src/fsatrace.h#L4

Which can be overridden by setting the env var FSAT_BUF_SIZE?

failures on ubuntu 20.04

Changes on Ubuntu 20.04 seem to break several pretty basic things.

First, make test fails because resolv of __xlstat fails. Changing line 502 of fsatraceso.c to always use __xstat causes things to work.

Second, the unlinkat function (also in fsatraceso.c) contains an unconditional assert(0) on line 321. I think there should be an else before that; adding it makes thing work.

cc @spall @ndmitchell

Clarify (or clean up) the Windows makefile

While taking a look around I saw in win.mk:

SRCS32=src/win/fsatracedll.c src/win/inject.c src/win/patch.c src/win/hooks.c src/emit.c src/win/shm.c src/win/handle.c src/win/utf8.c src/win/dbg.c src/win/inject.c
SRCS64=$(SRCS32) src/win/inject.c

So SRCS64 = SRCS32 + inject.c. But inject.c is already is SRCS32? Unlikely to be harmful, but I guess it's a mistake?

Changes for GHC 8.0.2

I changed the script to work with GHC 8.0.2. A few minor issues which I'll pull request later, but the real issue was that two files define InterlockedAdd, namely:

  • C:\Users\Neil\AppData\Local\Programs\stack\x86_64-windows\ghc-8.0.2\mingw\x86_64-w64-mingw32\include\psdk_inc\intrin-impl.h
  • C:\Users\Neil\AppData\Local\Programs\stack\x86_64-windows\ghc-8.0.2\mingw\x86_64-w64-mingw32\include\ddk\wdm.h

I worked around that by adding:

#define __INTRINSIC_DEFINED__InterlockedAdd
#define __INTRINSIC_DEFINED__InterlockedAdd64

In hooks, which is vile. But is it acceptable? Or any other ideas how to avoid it?

Observe files queried

Not sure if this is possible, but it would be good if fsatrace could write some information whenever a file had its modification time queried, something like q|filename. If this worked, then it would be very easy to speed up slow rebuild checkers, using Shake. For example, on my system cabal build takes 0.625s, but in certain circumstances, ghc --make can take > 1 min. If you could run fsatrace - -- cabal build, and then capture everything it reads and queries, and only rerun if any of that changes, you could reduce the rebuild time to 0.01s. The tool ghc-make already does that using custom logic for ghc --make, but with fsatrace it could be totally generic and I could kill ghc-make entirely.

Quantifying fsatrace's coverage

For a paper we're writing about Rattle (which uses fsatrace) we'd like to quantify in some way how much of the filesystem API fsatrace covers. Right now, it looks like 25 functions are overridden on Linux, which is my estimate just by grepping for R(. Is that accurate? Is there a way to tell how many relevant functions glibc provides? Or anything else in this neighborhood?

cc @ndmitchell @spall

Notes on the test suite

I had a look through the test suite. A couple of questions:

  • What are the multi-letter forms used for? e.g. RR FilePath
  • Why do you catch the upper case variants as well? f ('R':'|':xs) = Just $ R xs. It would be ideal if you guaranteed only to produce one form.
  • os == "mingw32" - this always scares me that Haskell programs use a value called OS, matching against a string which is clearly not an OS, and is a toolkit not even installed probably at the wrong bit size. I prefer the isWindows from the extra library - but what you are doing is the Haskell expected pattern, I just don't like it, so go with whatever you prefer.

Tracing multiple subprocesses doesn't work on Windows

If I create a file twice.bat with contents:

cat foo.txt
cat bar.txt

Then it only records the first call to cat, not the second. Similarly, if I do gcc -c main.c then it traces the call to cc1 (which loads main.c and writes the .s file), but not the call to as (which writes main.o).

The problem (as best I can tell...) is that patchInstalled asks if this thread has previously installed a patch. But, if this thread previously installed a patch for a different process, it returns True even though the patch isn't valid. I "fixed" this by changing the value stored in the Tls to be the process of the thread passed to NtResumeThread. I have no real idea what I'm doing, but it fixes the problem, and tracing it seems to approximately work. Any advice?

Have make run stack setup for you

Not sure if this is a good idea, but make on Windows could run stack setup if it couldn't find the necessary compilers but can find stack. Would allow simplifying the instructions.

Remove duplicate adjacent lines

Currently, running gcc -c main.c, I get 139 lines output from fsatrace. If I remove all lines which are identical to the previous line, I'm left with 15. Reducing the number of lines by a factor of 10 results in less storage, and less requirement for processing downstream. This will probably help with ndmitchell/shake#334

Basic tracing does not appear to work on macOS Monterey Version 12.5.1

I tried what I thought would be the simplest possible example trace on Mac (with SIP turned off; see below), but I only saw a read of the binary I used, not any read/write events associated with the arguments.

 $ csrutil status
System Integrity Protection status: disabled.
 $ cd $(mktemp -d)
 $ touch test_file
 $ fsatrace vrwmd - -- cp test_file test_file.copy
argv[0]=cp
argv[1]=test_file
argv[2]=test_file.copy
r|/bin/cp
 $ 

Removing a symlink looks like removing its target

Removing a symbolic link looks like removing the link's destination (but perhaps should not?).

Demonstration:

$ touch foo
$ ln -s foo bar
$ fsatrace erwdtmq /dev/stdout -- rm -f bar
r|/usr/bin/rm
q|/home/fangism/foo
d|/home/fangism/foo

Destination foo is unaffected.

I expected something more like:

r|/usr/bin/rm
q|/home/fangism/bar
d|/home/fangism/bar

Error should be more informative

I sometimes get:

Fatal: fsatrace.c:39: CreateProcessW(0, cmd, 0, 0, 0, CREATE_SUSPENDED, 0, 0, &si, &pi), err: 2

It would be really useful if in those circumstances it could also print the command line used, since that can help with debugging various escaping issues (which I think is the problem I'm having here!)

embed extra files, .so/.dll?

Would it be possible to embed the extra files into fsatrace itself?

It would seem lovely to be able to enable at least the possibility of static linking (when there is, e.g. a C API). Not needed, but it does seem like it could enhance the user experience.

trace readdir

For fac/bigbro, readdir is an important operation to trace. It's needed if you have a build rule such as:

echo *.c > file.dat

in which case the rule needs to be rebuilt if a new file is created in that directory.

Add C API

I think there is a desire to add a C API, and I'd like to unify bigbro with fsatrace (under either one name or the other). How does the bigbro API look?

https://github.com/droundy/bigbro/blob/master/bigbro.h#L3

I could easily see creating more a fine-grained set of output (e.g. separating stat into a separate array), and also allowing null pointers for output that is not desired.

I don't know how portable to windows the file descriptor approach is for redirecting stdout and stderr. Also, this API doesn't support setting the environment for the child, so if that is important, we'd need another argument. Finally, returning the child PID seems important in terms of fac's usage, but I'm not sure how that will work on windows. Maybe we create a second helper function kill_children? Sounds racy.

Another question is how to support the "blocking" mode that Neil wants.

Output file even on non-zero exit code

At the moment if the process returns a non-zero exit code then fsatrace does nothing. I find that surprising - I'd expect it to always produce the output, and then bubble the exit code back as well. Assuming you want to keep the default behaviour, an option would be useful.

Tracking reads from non-existing files

First of all, thank you for the great tool!

It seems that fsatrace doesn't track reads from files that do not exist. For example, if the file 1.c does not exist then the command

fsatrace verwmdq 1.out -- gcc 1.c

does not list 1.c in the result 1.out, which is a problem for my use case. (Here is a blog post about my use case in case you are curious.)

Tested both on Windows and Linux.

How difficult would it be to add support for this?

Issues on work Windows 7 machine

Given a work Windows 7 64bit machine, with 32bit cygwin and a nasty anti-virus, I tried running some commands from my user directory (C:\User\myusername), using a local file list.txt. Observations:

fsatrace foo.txt -- cmd /c "type list.txt"

This segfaults trying to write to a null pointer.

fsatrace foo.txt -- cat list.txt

This never completes. It spawns in infinite number of fsatracehelper.exe processes. As it spawns each one, they become suspended, and a new one is spawned. I had to kill them with taskkill /FI "IMAGENAME eq fsatracehelper.exe" /f, but if I didn't know taskkill it would have required a reboot.

I have VS2008 on my machine. Rebuilding got further if I removed the stdint.h headers, which aren't available on older versions and don't actually seem to be required. After that I got the errors:

hooks.c(59) : error C2065: 'FILE_DIRECTORY_FILE' : undeclared identifier
hooks.c(61) : error C2065: 'FILE_DELETE_ON_CLOSE' : undeclared identifier
hooks.c(105) : warning C4013: 'NT_SUCCESS' undefined; assuming extern returning
LINK : fatal error LNK1181: cannot open input file 'ntdll.lib'

Not sure if they are solvable or not - VS2008 is quite old now.

Can't trace Go code on Linux

As an example, given the go code:

package main

import (
    "fmt"
    "io/ioutil"
    "os"
)

func main() {
    b, err := ioutil.ReadFile(os.Args[1])
    if err != nil {
        fmt.Print(err)
    }
    fmt.Print(string(b))
}

Save that as main.go and compile it with go build -o main main.go. fsatrace does not detect the read. I believe the cause will be that go does not use dynamic libraries but jumps straight to syscalls.

Tracing mkdir syscalls

fsatrace does not appear to trace the mkdir calls, which can create unexpected traces when working with temporary directories. Consider the Rust library tempfile. Here's a simple case where we create a temporary directory, which in turn creates a subdirectory, with a single file in it:

fn main() {
    let tmp = tempfile::TempDir::new().unwrap();
    let dir = tmp.path().join("dir");
    std::fs::create_dir(&dir).unwrap();
    std::fs::write(dir.join("hello"), b"hello").unwrap();
}

This produces the following trace:

r|/the-binary
w|/tmp/.tmp5JZCc9/dir/hello
r|/tmp/.tmp5JZCc9
r|/tmp/.tmp5JZCc9/dir
d|/tmp/.tmp5JZCc9/dir/hello
d|/tmp/.tmp5JZCc9/dir
d|/tmp/.tmp5JZCc9

We're not observing the creation of tmp/.tmp5JZCc9/dir, so when tempfile starts recursively deleting the temporary directory, the read call of tmp/.tmp5JZCc9/dir appears to be a unique access, even though the program fully created and cleaned up all these acceses.

To handle this, users of fsatrace could try to infer that these traces correspond to a directory tree by looking for successful accesses to subdirectories, but that won't work if we just create a directory and don't try to use it. For example, if we modify the previous code to remove the file write:

fn main() {
    let tmp = tempfile::TempDir::new().unwrap();
    let dir = tmp.path().join("dir");
    std::fs::create_dir(&dir).unwrap();
}

We will end up with this stream, that appears to access a file that wasn't created by the program:

r|/the-binary
r|/tmp/.tmpCSPGtB
r|/tmp/.tmpCSPGtB/dir
d|/tmp/.tmpCSPGtB/dir
d|/tmp/.tmpCSPGtB

I'd imagine that tracing the mkdir syscalls would add a w|/tmp/.tmpCSPGtB/dir event, which would allow us to infer that the program fully handled all these directory or file accesses.

Can't spawn a 32bit process from a .bat file

Given a 32bit binary, I tried with both cat and sleep from http://unxutils.sourceforge.net/, if I create foo.bat:

sleep 0s

Then do fsatrace rwm - -- cmd /c foo.bat it fails with:

Fatal: src/win/inject.c:44: CreateProcessA(0, helper, 0, 0, 0, 0, 0, 0, &si, &pi), err: 8

Going in to the code and somewhat randomly changing things, if I change https://github.com/jacereda/fsatrace/blob/master/src/win/inject.c#L32 to be if (is32 && 0) then it works and seemingly traces correctly.

Looking at the code, perhaps you should be using the 64bit technique if either of yourself or the child is 64bit? Or perhaps you should try the else branch of GetProcAddress and only if that fails try using fsatracehelper?

Add blocking mode

If Fsatrace could tell me about a file access before it occurred, Shake could do a need before it still built, which would make auto deps much more powerful. Is this feasible? Would require some kind of pipes based protocol probably - you write a line of stdout or to a file, Shake writes something back to say continue.

Add --verbose flag

It would be really useful to have a --verbose flag to say exactly what the argv arguments to spawn, or the single command line to CreateProcess, is. At the moment I'm having to guess it.

Doesn't trace execution on Windows

On Windows, if I create a binary Main.exe (in Haskell, but not sure that matters) and then do:

fsatrace rwmdq output.txt -- Main.exe

I don't get Main.exe as a file that is ready. Furthermore, doing cmd /c Main.exe as the command line doesn't get either cmd or Main as the detected binaries. It seems that the binary being run doesn't get a "Read" entry?

Make a release

For Windows users, being able to download a release of fsatrace would be very handy, as compiling requires a bunch of things most Windows users don't have. I'm happy to generate the binary and share, but the GitHub releases page of this repo is the most natural place to host them.

Can't trace gcc on Mac

Even after copying the gcc binary into $TMP doing gcc -c main.c doesn't trace the read of main.c or the write of main.o. My guess is because gcc spawns a binary that is itself in system which doesn't get the copy treatment? Anything that can be done about that, short of turning off system protection?

Consider ETW on Windows

https://docs.microsoft.com/en-us/windows/win32/etw/about-event-tracing - not sure if that would be faster or slower than Kernel hooking. There's a chance it might be simpler though. See https://github.com/lowleveldesign/wtrace for an example of building it up to a full tracing app. I measured 21% overhead using fsatrace on Windows (see https://ndmitchell.com/downloads/paper-build_scripts_with_perfect_dependencies-18_nov_2020.pdf S5.2), although some of that will have been spawning the fsatrace binary.

Support @foo command lines

In trying to support the Shell command in Shake as ndmitchell/shake#308. Part of the problem is that on Windows the built-in process stuff in Haskell does lots of quote mangling, without giving me the chance to opt out. If @foo.txt as a command line just read foo.txt and took the command line from there (as many programs already support) then I could skip the Haskell mangling and get exactly what I was after from fsatrace.

Some programs treat @foo.txt files as having one argument per line, others as one single line. I'd probably be tempted for Windows to join all lines with a space, and for Linux pass each line as a separate argument - that gives full flexibility and is simple.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.