I think there is a desire to add a C API, and I'd like to unify bigbro with fsatrace (

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

So we could have two file deors, one for stdout, and the other for street? I don

Add C API,about jacereda/fsatrace

Comments (48)

jacereda commented on June 20, 2024

I'm not sure about separating the outputs by type. fsatrace returns instead a sequence of operations that reflects the temporal ordering. It's true that this temporal ordering can be non-deterministic in multi-threaded programs, but still I think it could be useful in some situations.

The way fsatrace keeps the temporal ordering is by appending to a shared-memory buffer using atomic operations. The internal representation is the same used in the output.

In any case, going from the sequence of operations to a bucketed representation is easy and could be implemented on top of the other if you prefer that API.

from fsatrace.

jacereda commented on June 20, 2024

@ndmitchell wanted a callback for the blocking mode. Since we are in another process, it would need to signal the invoking process by means of a shared semaphore and block until the callback is handled.

Going to the extreme, every operation could just invoke a callback that way, but I think that's not an option due to the amount of context switches.

from fsatrace.

jacereda commented on June 20, 2024

As for stdout/stderr redirection, I would try to keep them separated. I'm using different colours for stdout/stderr in my build system.

from fsatrace.

droundy commented on June 20, 2024

The advantage of a set as output rather than a sequence as output is that it doesn't scale in the same way with the side of the job, eg if it opens the same file many times, one doesn't have linearly growing data use. It's probably not important, though.

The other advantage to my mind of set output is that the caller doesn't have to do the fiddly handling of directory renames, eg to find out what files were created.

from fsatrace.

droundy commented on June 20, 2024

So we could have two file descriptors, one for stdout, and the other for street? I don't like keeping them separated because then you lose the ordering between them, but I'm fine with giving that as an option.

The big question in my mind is whether we can make redirection portable to Windows.

from fsatrace.

jacereda commented on June 20, 2024

What if the tool always did the redirection and reported as o|<some stdout message> & e|<some stderr message> ?

from fsatrace.

jacereda commented on June 20, 2024

Those would probably need to embed the size of the message to make parsing easier and avoid escaping...

from fsatrace.

droundy commented on June 20, 2024

Always redirecting stdout and stderr to different locations means that it is never possible to get correct synchronized output, so that isn't a good option at all. I wish that tools would decide to only output to either stdout or stderr, but the reality is that they don't, and you can lose a lot of information if you cannot distinguish the order of output.

from fsatrace.

jacereda commented on June 20, 2024

Remember we can install hooks to write(), we can preserve the ordering. The process wouldn't be aware it's being redirected, it would just perform normal write()s.

from fsatrace.

droundy commented on June 20, 2024

I just read up on redirecting stdout/err on Windows, and it looks like the only difference is that one uses HANDLEs rather than file descriptors (which are ints). So I don't see any reason we can't have a flexible library that supports redirecting on either OS to pipes or files of the caller's choice.

from fsatrace.

droundy commented on June 20, 2024

Using hooks to redirect the data sound cumbersome. You'd need to mirror
the file descriptor table, and track changes across forks and calls to
exec, as well as dup and friends. Given that any process may make one of
these system calls directly and you'd mods it with LD_PRELOAD, this seems
like a lot of fragility to add, for very little benefit over doing things
the "normal" way.

On Sun, May 22, 2016, 1:18 PM jacereda [email protected] wrote:

Remember we can install hooks to write(), we can preserve the ordering.
The process wouldn't be aware it's being redirected, it would just perform
normal write()s.

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#14 (comment)

from fsatrace.

ndmitchell commented on June 20, 2024

Looking at the bigbro API:

I agree that the order of operations is potentially important. I don't think it's that big a deal to the API though - just have struct entry {int: mode; char* data} and have a single **entry pile for all the things that changed.
Knowing the difference between what happens on stdout and stderr is quite important. Why can't you just inherit stdout and stderr? Then people can redirect stderr/out already if they care?
Environment variables are important, but just an extra argument, nothing severe.
Blocking is very useful for me, but I guess a separate API, which takes a function which gets given an entry instead of a list of entry?
I suggest adding a flags argument, and initially supporting flags to turn on/off each entry, and a flag to nub the results, so you only get one the first entry per file.

from fsatrace.

droundy commented on June 20, 2024

I'm curious as to why order is important. Is it because you need to examine
order for some reason, or to compensate for renames and deletions? If it's
the latter, then I would prefer to embed the code that handles those issues
so it doesn't need to be duplicated by every caller. Eg no reason to
report to writes that are later deleted and no need to report renames of
created files or directories at all. Just report the net effect on the
filesystem. If we're going to return a sequence rather than sets, I'd like
to see a real use case.

The issue with always inheriting street and stdout is that it effectively
makes the code nonreentrant if the caller chooses to redirect stdout. True,
we can let the caller create a lock to deal with that, but why do so when
we can just redirect street and stdout after the fork? I never proposed to
unconditionally redirect either.

I agree that adding an env argument is a good idea.

And yes, blocking absolutely needs to be a separate API, since there seem
likely to be a severe performance penalty. Rather than giving a function
argument, I'd consider returning early and having a resume function. But
either way would work.

On Sun, May 22, 2016, 1:37 PM Neil Mitchell [email protected]
wrote:

Looking at the bigbro API:

I agree that the order of operations is potentially important. I don't
think it's that big a deal to the API though - just have struct entry
{int: mode; char* data} and have a single **entry pile for all the
things that changed.

Knowing the difference between what happens on stdout and stderr is
quite important. Why can't you just inherit stdout and stderr? Then people
can redirect stderr/out already if they care?

Environment variables are important, but just an extra argument,
nothing severe.

Blocking is very useful for me, but I guess a separate API, which
takes a function which gets given an entry instead of a list of entry?

I suggest adding a flags argument, and initially supporting flags to
turn on/off each entry, and a flag to nub the results, so you only get
one the first entry per file.

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#14 (comment)

from fsatrace.

ndmitchell commented on June 20, 2024

I actually don't have any uses where order is important, beyond the callback, where order is certainly important but obvious anyway. However, it seems like a file system tracing thing might reasonably want that information. Certainly if I was debugging what a program did that information would be handy. I guess the difference is that I'd rather a complete trace, rather than the net effect - since from one you can compute the other, but not vice versa.

If you go for a resume call then you also need a cancel call, and any variables on the stack have to be copied into a separate buffer. Neither is fatal, but both seem like more work in the C side. However, my code is completely continuation passing, so in that respect capturing a continuation to continue would suit me a lot better. They are certainly equivalent, but the @droundy formulation can be converted to the @ndmitchell formulation very cheaply, but the other way round requires an extra callee thread, so I guess resume makes more sense.

from fsatrace.

droundy commented on June 20, 2024

I agree that strace is a wonderful debugging tool, and I truly pity
platforms that don't have it, but am not thinking that this API is designed
as a debugging API. As a library, I think the primary concern should be the
ease of correct use by its intended audience. Specifically, in the
debugging case you probably want to use an executable tool to trace, and
probably want the terrace sent directly to stderr or stdout, where it can
be correlated with the debug printfs you are already making.

The resume call wouldn't require a cancel call, because the function would
return when it encounters a system call of interest. I have no particular
interest in the blocking API, except in making it feasible and minimally
introduce to the library. It will require an extra thread in the library.
The resume option would make it possible for a caller to use the API
without locking, which seems like good API design to me: put the necessary
tricky stuff into the library rather than into every caller.

On Mon, May 23, 2016, 2:04 AM Neil Mitchell [email protected]
wrote:

I actually don't have any uses where order is important, beyond the
callback, where order is certainly important but obvious anyway. However,
it seems like a file system tracing thing might reasonably want that
information. Certainly if I was debugging what a program did that
information would be handy. I guess the difference is that I'd rather a
complete trace, rather than the net effect - since from one you can compute
the other, but not vice versa.

If you go for a resume call then you also need a cancel call, and any
variables on the stack have to be copied into a separate buffer. Neither is
fatal, but both seem like more work in the C side. However, my code is
completely continuation passing, so in that respect capturing a
continuation to continue would suit me a lot better. They are certainly
equivalent, but the @droundy https://github.com/droundy formulation can
be converted to the @ndmitchell https://github.com/ndmitchell
formulation very cheaply, but the other way round requires an extra callee
thread, so I guess resume makes more sense.

—
You are receiving this because you were mentioned.

Reply to this email directly or view it on GitHub
#14 (comment)

from fsatrace.

droundy commented on June 20, 2024

It has now occurred to me that we have an additional challenge with the
blocking API, which is that it can be blocked on multiple system calls
simultaneously. Perhaps the API should just serialize them so that one
system call is handled by the caller at a time. But it does add a little
more excitement that I hadn't anticipated.

On Mon, May 23, 2016 at 6:26 AM David Roundy [email protected] wrote:

I agree that strace is a wonderful debugging tool, and I truly pity
platforms that don't have it, but am not thinking that this API is designed
as a debugging API. As a library, I think the primary concern should be the
ease of correct use by its intended audience. Specifically, in the
debugging case you probably want to use an executable tool to trace, and
probably want the terrace sent directly to stderr or stdout, where it can
be correlated with the debug printfs you are already making.

The resume call wouldn't require a cancel call, because the function would
return when it encounters a system call of interest. I have no particular
interest in the blocking API, except in making it feasible and minimally
introduce to the library. It will require an extra thread in the library.
The resume option would make it possible for a caller to use the API
without locking, which seems like good API design to me: put the necessary
tricky stuff into the library rather than into every caller.

On Mon, May 23, 2016, 2:04 AM Neil Mitchell [email protected]
wrote:

I actually don't have any uses where order is important, beyond the
callback, where order is certainly important but obvious anyway. However,
it seems like a file system tracing thing might reasonably want that
information. Certainly if I was debugging what a program did that
information would be handy. I guess the difference is that I'd rather a
complete trace, rather than the net effect - since from one you can compute
the other, but not vice versa.

If you go for a resume call then you also need a cancel call, and any
variables on the stack have to be copied into a separate buffer. Neither is
fatal, but both seem like more work in the C side. However, my code is
completely continuation passing, so in that respect capturing a
continuation to continue would suit me a lot better. They are certainly
equivalent, but the @droundy https://github.com/droundy formulation
can be converted to the @ndmitchell https://github.com/ndmitchell
formulation very cheaply, but the other way round requires an extra callee
thread, so I guess resume makes more sense.

—
You are receiving this because you were mentioned.

Reply to this email directly or view it on GitHub
#14 (comment)

from fsatrace.

jacereda commented on June 20, 2024

I don't understand why we need a thread to implement the blocking stuff. Isn't a callback enough? The installed callback would attempt to generate the missing file and signal the traced process via a semaphore when done, at which point it would resume the interrupted file operation. Am I missing something?

from fsatrace.

ndmitchell commented on June 20, 2024

I think a callback is simpler. It's a C API - I expect it to segfault it if I upset it, so I don't see any problem with multiple simultaneous callbacks. I had envisaged pretty much what @jacereda seems to have been thinking of.

Regarding strace, if you wanted to write something like that on top of fsatrace (and I really believe you do!) then you'd just use the callback API, and thus get events in order. I still suspect that having a single buffer is easier than having one buffer per type, as it allows us to add new "codes" without changing the API (e.g. readdir in #12). Given a single buffer, and given that they have to be in some order, surely the order they were generated makes most sense? The information will not be harmful, and occasionally useful.

from fsatrace.

droundy commented on June 20, 2024

Okay, callback is fine for the blocking API.

I agree that future-proofing the API is wise in general. However, the
codes do need to be semantic rather than having a 1-1 relationship with
system calls if the API is to be useful, and in most cases I expect that if
something was omitted, then our correct response is to add that to the
existing codes. To use your example, if we omitted readdir, then adding it
as an additional code is problematic, because users of the library may have
assumed that any read should count as a read. If they assume that, then
there is a bug in the existing code which isn't fixed by adding an
additional code. This assumption might even seem logical, since users
might know that you can open a directory with open(2), and might reasonably
assume that any open for reading the filesystem is counts as a read.
Similarly, if we failed to count execve as a read, users might be
disappointed, if they assumed that all file dependencies are accounted for
by the "read" output. Adding it as a separate code wouldn't fix the
existing bug. In short, I think there is a good case for a need to change
the API if we discover new file system events that require tracing. Of
course, if you wanted to enable tracing of other events, e.g. network
events, then that would be a different story entirely, but I don't see that
as a direction that interests me.

Generally, my bias is in favor of making the API easy and safe to use, at
the cost of extensibility, rather than the other way around. We can always
introduce a new function later to add new behavior.

As I see it, there are only very few kinds of FS events to be tackled:

Writes:
Write to file
Creation of file
Deletion of file
Creation of directory
Deletion of directory
Modification of file or directory metadata
Creation of symlink
Modification of symlink
Deletion of symlink

Reads:
Read of symlink contents (happens when following a symlink)
Read from file
Read of file metadata (size, etc)
Read from directory
Read of directory metadata (size, etc)

Renames:
File rename (can be viewed as deletion/creation)
Directory rename (can be viewed as a lot of creations/deletions)

Unusual:
Reflink (can be viewed as a read and a file creation)
Creation of hard link (looks like a read and file creation)

Each one of these operations has to show up somewhere in our set of
categories. If we don't have a special category for e.g. reflink, then we
need to put it in our existing categories (once the system call shows up in
a linux kernel we care about).

My preference is to define our categories in terms of causality. If an
operation causes a future read from a file to change its output, then it
must be a write to that file. Conversely, if a write to a file can cause
an operation to change, that operation must be a read of that file.
Directories
are funny, in that writes to any file residing in a directory cause a read
from the directory to change, but it would seem foolish to list each file
operation as a write to its parent directory. This semantic distinction is
why bigbro has just three categories. I wouldn't object to creating
subcategories, but it is important to me (for fac) that these causality
relationships be respected, which means that unless there is a shocking new
development in file systems, any newly traced filesystem operations must
fall in one of the existing categories. You could perhaps argue that
extended attributes are an exception. I suppose we could allow
introduction of new subcategories in a backwards-compatible manner, so
maybe this rant is irrelevant.

My biggest issue (other than ease of use) with the order of output being
chronological is that it places a constraint on implementation, which seems
unwise. Of course, there is also the issue that it requires rewriting
existing code, which lazy me doesn't want to do.

On Mon, May 23, 2016 at 12:37 PM Neil Mitchell [email protected]
wrote:

I think a callback is simpler. It's a C API - I expect it to segfault it
if I upset it, so I don't see any problem with multiple simultaneous
callbacks. I had envisaged pretty much what @jacereda
https://github.com/jacereda seems to have been thinking of.

Regarding strace, if you wanted to write something like that on top of
fsatrace (and I really believe you do!) then you'd just use the callback
API, and thus get events in order. I still suspect that having a single
buffer is easier than having one buffer per type, as it allows us to add
new "codes" without changing the API (e.g. readdir in #12
#12). Given a single buffer,
and given that they have to be in some order, surely the order they were
generated makes most sense? The information will not be harmful, and
occasionally useful.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#14 (comment)

from fsatrace.

jacereda commented on June 20, 2024

So, I guess we have now a rough idea of what we need. The bit that scares me is the OSX implementation. For instance, besides the problem of not being able to hook system binaries due to SIP, I tried yesterday to detect directory reads and failed miserably.

I can intercept opendir(), but looks like ls is using the deprecated getdirentries() + some non-interceptable open_nocancel(). If we can't find a way to hook that we should probably start considering alternatives.

AFAIK, ptrace/dtrace are out of question for system binaries due to SIP.

With that in mind, I think the most robust approach would be a FUSE-based solution. This would be a good start: http://loggedfs.cvs.sourceforge.net/viewvc/loggedfs/loggedfs/src/loggedfs.cpp?revision=1.14&view=markup

@ndmitchell was against it because it would require installing additional software and in some scenarios that might be difficult, but the current DYLD_INSERT_LIBRARIES has too many drawbacks.

Perhaps I should invest some time prototyping a FUSE-based solution to try to measure how much overhead to expect from that...

from fsatrace.

droundy commented on June 20, 2024

Is it really true that dtrace isn't feasible on Mac? I had been told that
it could be made to work, a year back... but that may have been before
SIP. The Apple page I read just now makes it sound like it only prevents
writing. I can see how that would prevent hooking, which is what malware
wants to do, but i don't see why it would prevent tracing. :(

Fuse is definitely an inferior solution, although it may be necessary.

On Tue, May 24, 2016, 2:54 PM jacereda [email protected] wrote:

So, I guess we have now a rough idea of what we need. The bit that scares
me is the OSX implementation. For instance, besides the problem of not
being able to hook system binaries due to SIP, I tried yesterday to detect
directory reads and failed miserably.

I can intercept opendir(), but looks like ls is using the deprecated
getdirentries() + some non-interceptable open_nocancel(). If we can't find
a way to hook that we should probably start considering alternatives.

AFAIK, ptrace/dtrace are out of question for system binaries due to SIP.

With that in mind, I think the most robust approach would be a FUSE-based
solution. This would be a good start:
http://loggedfs.cvs.sourceforge.net/viewvc/loggedfs/loggedfs/src/loggedfs.cpp?revision=1.14&view=markup

@ndmitchell https://github.com/ndmitchell was against it because it
would require installing additional software and in some scenarios that
might be difficult, but the current DYLD_INSERT_LIBRARIES has too many
drawbacks.

Perhaps I should invest some time prototyping a FUSE-based solution to try
to measure how much overhead to expect from that...

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#14 (comment)

from fsatrace.

jacereda commented on June 20, 2024

The way I got dtruss to work was to copy the binaries out of system directories and run it as root. Other that that, you can disable SIP. Neither option seems very attractive.

from fsatrace.

droundy commented on June 20, 2024

Could dtrace just detect when the application enters the system binaries,
and then we determine from that entry point what is going to happen? I
speak as a dtrace ignoramus.

On Wed, May 25, 2016, 5:55 AM jacereda [email protected] wrote:

The way I got dtruss to work was to copy the binaries out of system
directories and run it as root. Other that that, you can disable SIP.
Neither option seems very attractive.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#14 (comment)

from fsatrace.

ndmitchell commented on June 20, 2024

For order of events, its certainly going to be observable which order they are returned to the C user, so it makes sense (to me at least) to define that order. And the only order which "makes sense" is the order of time.

For Fuse (or indeed anything) the question is will it work out the box, and what configuration does it require. If it works out of the box with no configuration as a single binary/dll, that's awesome. If that's not feasible or requires a tweak, that cuts down on the number of users (every additional step does), but it's not fatal.

from fsatrace.

droundy commented on June 20, 2024

After looking into SIP and dtrace a bit, I wonder if making a copy of the system directories and using chroot might be a better choice than FUSE. You'd only need one copy, and could presumably save it from one invocation to another. It's an ugly hack, but it sounds like Apple is doing its best to prevent the kind of behavior we are hoping to engage in.

from fsatrace.

jacereda commented on June 20, 2024

I have started the FUSE implementation, looks like the performance will be acceptable. Building fsatrace itself takes 0.66 seconds untraced and 0.71 when traced.

This is the way I think it should work:

The FS daemon is launched mounting on top of the source directory.
Build commands are invoked normally, all file operations will go to this 'overlay' filesystem.
The FS daemon exposes resulting operations from a certain PID and all its children at a special path, say, .ops<PID>.
The memory associated with those .ops files will be kept in a circular buffer, so, only the operations for the N most recent top-level processes are available at a certain point.

What previously required invoking the fsatrace program would now just launch the process normally and read back its operations from its .ops file.

Do you think we'll need to track accesses out of the sources directory?

Does this sound reasonable?

from fsatrace.

jacereda commented on June 20, 2024

@droundy the problem is that I'm afraid Apple isn't the only system that will implement policies like those at some point, so I think going for a general solution (and FUSE is) should be better in the long run. I certainly prefer a FUSE-based solution to having two copies of all the system binaries floating around. Besides, at some point they might decide they also want to enable SIP for, say, /Applications/Xcode.app and then we'd also need to replicate that.

from fsatrace.

droundy commented on June 20, 2024

There are a couple of problems with FUSE, although it is what tup uses, so
obviously it is possible to use it.

The first is permissions. Mounting a FUSE filesystem requires special
permissions. Typically on Linux this involves the user being in a fuse
group, which is checked by an suid root binary. This gives several
challenging failure modes. Obviously it means your user needs to be in that
group. Maybe on Mac OS that is always guaranteed? Secondly, suid root
binaries need to be permitted. For me this was problematic because I use
NFS with rootsquash, which meant that and directory that was not world
readable could not be used with tup. But that error messages were of
course far from clear.

It is definitely nicer to be able to track all accesses, but that is not a
show stopper, since users can run clean if they update their compiler or
install a library.

The final question is whether it is possible to even define a "top level
process ID" as you propose, let alone discover which one corresponds to a
given process. I suppose you can define a top level process as a child of
your main process?

On Mon, May 30, 2016, 2:44 PM jacereda [email protected] wrote:

@droundy https://github.com/droundy the problem is that I'm afraid
Apple isn't the only system that will implement policies like those at some
point, so I think going for a general solution (and FUSE is) should be
better in the long run. I certainly prefer a FUSE-based solution to having
two copies of all the system binaries floating around. Besides, at some
point they might decide they also want to enable SIP for, say,
/Applications/Xcode.app and then we'd also need to replicate that.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#14 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAIZKVzvhkCjD-X3_D6-y5zMfbdEAlB5ks5qG1o1gaJpZM4Ij_su
.

from fsatrace.

jacereda commented on June 20, 2024

I think the FUSE approach is starting to look quite promising.

On Mac OS the experience is smoother. No 'fuse' group. No suid executables. I'll try to test on linux at some point, maybe there're ways to relax those requirements.

Finding the top-level pid would be something like:

static int
calc_toplevel_pid(int pid) {
    int ppid = calc_ppid(pid);
    if (ppid == s_root)
        return pid;
    if (ppid <= 1)
        return 1;
    return calc_toplevel_pid(pid);
}

So, the problem is now telling the FS who is the root process (the build tool). It could be the process who invoked the FS daemon, whatever you tell it via command line, or whatever you write to a special file entry (say, /.root).

I'm trying to decide whether to let it run as a daemon continuously or invoke/kill it explicitly when the build starts/stops. What do you think?

Also, no code is shared with fsatrace/bigbro, so I'm trying to figure out a good name for the new project. Any suggestion?

from fsatrace.

droundy commented on June 20, 2024

I think you would want to invoke and kill the fuse when the build starts or
stops. This would require a new API, since most builders will want to run
multiple jobs simultaneously. Tup's approach is to put the fuse mount in a
special subdirectory, so there is one mount per job that is running (thus
avoiding the need to track top-level PIDs), but that breaks quite a number
of build tools, so it's not optimal. Tup gets around that by using chroot
if tup itself is suid root, but of course that is another security rabbit
hole.

I would certainly not recommend using fuse on linux. I'm pretty certain
that there is no way around the security issues.

I'm not sure whether we'll end up with one cross-platform library or not,
but adding yet another project for another platform seems a bit silly. I
would just add the code to fsatrace if I were you. Shared code isn't
particularly important, way less important than a shared API. I've been
looking at adding some fsatrace code (with appropriate copyright headers)
into bigbro, to start the port to windows. So far it can run a process
sans tracing, which isn't much, but is something.

On Thu, Jun 2, 2016 at 9:56 AM jacereda [email protected] wrote:

I think the FUSE approach is starting to look quite promising.

On Mac OS the experience is smoother. No 'fuse' group. No suid
executables. I'll try to test on linux at some point, maybe there're ways
to relax those requirements.

Finding the top-level pid would be something like:

static int
calc_toplevel_pid(int pid) {
int ppid = calc_ppid(pid);
if (ppid == s_root)
return pid;
if (ppid <= 1)
return 1;
return calc_toplevel_pid(pid);
}

So, the problem is now telling the FS who is the root process (the build
tool). It could be the process who invoked the FS daemon, whatever you tell
it via command line, or whatever you write to a special file entry (say,
/.root).

I'm trying to decide whether to let it run as a daemon continuously or
invoke/kill it explicitly when the build starts/stops. What do you think?

Also, no code is shared with fsatrace/bigbro, so I'm trying to figure out
a good name for the new project. Any suggestion?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#14 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAIZKVeQzujsbT4-vlvE8M2DEILKfgUvks5qHwtNgaJpZM4Ij_su
.

from fsatrace.

jacereda commented on June 20, 2024

The problem is that embedding it in fsatrace is just too much work. It would require implementing the C API and I don't see a clear benefit. The FUSE approach could work without any API at all, since you only need read()/write() to communicate with it.

AFAIK it would even work on Windows via https://github.com/dokan-dev/dokany and would be far more robust across platforms.

from fsatrace.

jacereda commented on June 20, 2024

I've setup a new repo at https://github.com/jacereda/traced-fs

Should compile on Linux and Mac OS so far.

from fsatrace.

ndmitchell commented on June 20, 2024

To somewhat echo @droundy's point, I don't really care what the underlying mechanism is, but I do want something that presents the same interface on all platforms (so I can use it abstractly while developing the upstream code on only one platform). My experience with fsatrace on Linux is that it requires privileges that mean it can't be tested on Travis, which for me means I can't effectively test it. My guess is that on Windows such mechanisms are pretty scary, because usually when such Linux things are shoehorned in they tend to be, but I trust other people to make the call here.

from fsatrace.

jacereda commented on June 20, 2024

Looks like fuse filesystems can be tested on Travis:

https://github.com/mpl/camlistore/blob/master/.travis.yml

As for Windows, I'm trying to setup a VM to figure out how it goes there.

from fsatrace.

droundy commented on June 20, 2024

A quick look through the dokany documentation suggests that you can't mount
a dokany file system at arbitrary mount points, like you can with fuse,
which seems likely to be highly problematic. But maybe there is a way
around that.

On Fri, Jun 3, 2016 at 8:19 AM jacereda [email protected] wrote:

Looks like fuse filesystems can be tested on Travis:

https://github.com/mpl/camlistore/blob/master/.travis.yml

As for Windows, I'm trying to setup a VM to figure out how it goes there.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#14 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAIZKXox1z6mvHW_6dQaHsrLyU-ZzhqYks5qIEYKgaJpZM4Ij_su
.

from fsatrace.

jacereda commented on June 20, 2024

I don't think it would be problematic in my scenarios. To trace an operation, traced-fs would mount a T: drive that mirrors C: and the build system would need to switch to that unit prior to launching the commands.
It could certainly be a bit more painful if the build requires files from different drives.

from fsatrace.

droundy commented on June 20, 2024

I see. The only issue I see with that is that it is likely to break
debugging tools that record the path to the source code files. You could,
of course, keep the T: drive semi-permanently mounted, but that seems like
it could be a bit more of a pain.

On Fri, Jun 3, 2016 at 10:02 AM jacereda [email protected] wrote:

I don't think it would be problematic in my scenarios. To trace an
operation, traced-fs would mount a T: drive that mirrors C: and the build
system would need to switch to that unit prior to launching the commands.
It could certainly be a bit more painful if the build requires files from
different drives.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#14 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAIZKcxYRC3FWDbs6PkZD728LgT5Wz_-ks5qIF4dgaJpZM4Ij_su
.

from fsatrace.

jacereda commented on June 20, 2024

Well, maybe having it permanently mounted is desirable keeping in mind that mounting and unmounting will probably hurt caching.

from fsatrace.

jacereda commented on June 20, 2024

@droundy Could you make a quick test with your NFS setup? Something like this would suffice:

make
./fs &
ls -l traced/<absolute-path-to-some-file-in-your-nfs-volume> &
cat traced/.ops/$!

from fsatrace.

droundy commented on June 20, 2024

$ cat traced/opt/$!
cat: traced/opt/23228: No such file or directory

On Fri, Jun 3, 2016 at 11:18 AM jacereda [email protected] wrote:

@droundy https://github.com/droundy Could you make a quick test with
your NFS setup? Something like this would suffice:

make
./fs &
ls -l traced/
ls -l traced/.ops &
cat traced/.ops/$!

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#14 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAIZKVx70fT5bpDiR2W1S7bfG4OwySKGks5qIG4OgaJpZM4Ij_su
.

from fsatrace.

jacereda commented on June 20, 2024

Sorry, I edited the command sequence afterwards, can you recheck?

from fsatrace.

droundy commented on June 20, 2024

Still getting no such file or directory.

On Fri, Jun 3, 2016 at 4:27 PM jacereda [email protected] wrote:

Sorry, I edited the command sequence afterwards, can you recheck?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#14 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAIZKQZ_gxoHivDoW-X9ISZvQcgdHakGks5qILhagaJpZM4Ij_su
.

from fsatrace.

jacereda commented on June 20, 2024

Try this:

killall fs &
ls -l traced/<absolute-path-to-some-file-in-your-nfs-volume> &
cat traced/.ops/$!

The killall will fail if some process is running inside the traced directory, so make sure you don't have a bash running inside.
Notice it's .ops, not opt.. Also, make sure the ls is executed in the background (&).

from fsatrace.

droundy commented on June 20, 2024

bennet:traced-fs$ killall fs
bennet:traced-fs$ ./fs &
[2] 11067
[1] Done ./fs
bennet:traced-fs$ ls -l traced/home/droundy/.tmp/traced-fs/ &
[3] 11071
bennet:traced-fs$ total 100
-rwxr-xr-x 1 droundy users 24168 Jun 3 16:22 fs
-rw-r--r-- 1 droundy users 23377 Jun 3 16:21 fs.c
-rwxr-xr-x 1 droundy users 44920 Jun 3 16:22 fsd
-rw-r--r-- 1 droundy users 209 Jun 3 16:21 Makefile
drwxr-xr-x 24 root root 4096 May 24 10:35 traced

from fsatrace.

jacereda commented on June 20, 2024

Good, seems to work properly. Thanks.

from fsatrace.

jacereda commented on June 20, 2024

After fixing a bug in utimens handling, I can trace a stack build. That was failing miserably with fsatrace.

from fsatrace.

jacereda commented on June 20, 2024

If this happens at some point I guess it could be an alternative to dokany:

https://wpdev.uservoice.com/forums/266908-command-prompt-console-bash-on-ubuntu-on-windo/suggestions/13522845-add-fuse-filesystem-in-userspace-support-in-wsl

from fsatrace.

jacereda commented on June 20, 2024

I've been reconsidering the traced-fs implementation for Windows. I have a prototype using dokany but I think it would be just easier and more stable to write a minifilter driver.

Having to install a driver sucks, but dokany also installs a driver, so it would be at the same "suckiness" level.

from fsatrace.

Add C API about fsatrace HOT 48 OPEN

Comments (48)

I agree that the order of operations is potentially important. I don't
think it's that big a deal to the API though - just have struct entry
{int: mode; char* data} and have a single **entry pile for all the
things that changed.

Knowing the difference between what happens on stdout and stderr is
quite important. Why can't you just inherit stdout and stderr? Then people
can redirect stderr/out already if they care?

Environment variables are important, but just an extra argument,
nothing severe.

Blocking is very useful for me, but I guess a separate API, which
takes a function which gets given an entry instead of a list of entry?

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Comments (48)

I agree that the order of operations is potentially important. I don't think it's that big a deal to the API though - just have struct entry {int: mode; char* data} and have a single **entry pile for all the things that changed.

Knowing the difference between what happens on stdout and stderr is quite important. Why can't you just inherit stdout and stderr? Then people can redirect stderr/out already if they care?

Environment variables are important, but just an extra argument, nothing severe.

Blocking is very useful for me, but I guess a separate API, which takes a function which gets given an entry instead of a list of entry?

Related Issues (20)

Recommend Projects

Recommend Topics

Recommend Org

I agree that the order of operations is potentially important. I don't
think it's that big a deal to the API though - just have struct entry
{int: mode; char* data} and have a single **entry pile for all the
things that changed.

Knowing the difference between what happens on stdout and stderr is
quite important. Why can't you just inherit stdout and stderr? Then people
can redirect stderr/out already if they care?

Environment variables are important, but just an extra argument,
nothing severe.

Blocking is very useful for me, but I guess a separate API, which
takes a function which gets given an entry instead of a list of entry?