googlecloudplatform / gcsfuse Goto Github PK

View Code? Open in Web Editor NEW

2.0K 73.0 418.0 161.71 MB

A user-space file system for interacting with Google Cloud Storage

Home Page: https://cloud.google.com/storage/docs/gcs-fuse

License: Apache License 2.0

Go 81.22% Dockerfile 0.49% Python 13.16% Shell 5.03% Ruby 0.02% Smarty 0.06% Makefile 0.02%

gcsfuse's Issues

Run integration tests from GCE

Want to try out the integration tests for all of the following packages, on a GCE instance in the US:

jacobsa/gcloud
jacobsa/fuse
googlecloudplatform/gcsfuse

Things might work differently with differing network latency and listing latency conditions.

Document the dance needed for mmap durability guarantees

docs/semantics.md should include a section about how to safely modify mmap'd files, such that the modifications are made durable (and the user sees an error otherwise).

There is a novel written about this in the documentation for fuseops.FlushFileOp. Summary: if the user wants to have this work on both OS X and Linux, they should:

Call msync(2) with the MS_SYNC flag.
Call munmap(2) for the mapping.
Call close(2) for the original file descriptor.

(checking for errors on all calls.)

This can probably be relaxed in various ways on Linux and/or OS X, but I'm relatively confident this particular dance works (and jacobsa/fuse contains a test for it).

Allow configuring the temporary directory

Allow configuration of the temporary directory used for storing all temporary files. This lets users use a partition with more space, an SSD, etc.

Audit all uses of TempFile, AnonymousFile, and TempDir. Plumb in the setting.

Sort out permissions, ownership, and access story

when bucket is mounted the rights of the mounted folder are: owner - root, group - root. But because the access permission to folder is 700, our application can't access this folder. Can it be fixed? because our application works under different unprivileged user (not with root privileges). If we try to do chmod or chown we have an error function is not implemented.

Readdir unconditionally behaves like implicit directories are enabled

The tightening of tests in #30 revealed another problem that has been on my radar but which I haven't actually written down yet. Say the objects in a bucket are:

foo
bar/baz

Then when we get a readdir op from fuse for the root inode, we call GCS's Objects.list method with a delimiter of "/" and a prefix of "". That returns an object named "foo" and a collapsed run named "bar/", so we return directory entries for a file named "foo" and for a directory named "bar".

The problem is that, unless the user is running with --implicit_dirs (see semantics.md), there is no accessible directory named "bar". When the kernel comes back to look up such an inode, it will receive ENOENT.

On OS X, when ls encounters this situation it prints an easy to miss error and then ignores the entry (which is why the problem hasn't been in my face):

% ls -l mp
ls: bar: No such file or directory
total 0
-rwx------  1 jacobsa  eng  0 Apr  1 09:42 foo

On Linux it's a bit more obvious:

% ls -l mp
ls: cannot access mp/bar: No such file or directory
total 0
d????????? ? ?       ?   ?            ? bar
-rwx------ 1 jacobsa eng 0 Apr  1 09:42 foo

Increase the kernel -> fuse write size

As seen in #22, enabling fuse "big writes" ups the kernel -> gcsfuse write atom from 4 KiB to 32 KiB, which makes a performance difference.

Things to do:

Research, document, and support this in jacobsa/gcsfuse. Probably enabled by default (with a callout in the docs).
Can Linux be made to send even larger writes? Setting InitResponse.MaxWrite didn't appear to work.
Google around for "fuse write performance" and see if there's anything else obvious to be done.

Support interrupt requests from fuse

If you start a very large copy into a mounted bucket with cp, kill it, then attempt to Ctrl-C the gcsfuse process, it will refuse to unmount if it made it to the Flush stage because it the request is still in progress. You simply have to wait for it to finish, or take more drastic measures.

However, on OS X and Linux an Interrupt request does come through. We just respond ENOSYS to it. So we need to plumb through support for cancelling the associated context when this is received, and set up the GCS package to pay attention to cancelled contexts (probably using http.Transport.CancelRequest). Test the latter by starting large uploads and downloads, cancelling them, and timing the duration until the bucket returns in error.

Update msync documentation

Update the documentation about best pracitices for mmap in light of jacobsa/fuse#8. (Wait for the fuse package documentation to be updated, then translate that here.)

Support symlinks

This probably involves setting a mode bit in some object metadata value saying "this is a symlink", maybe with a key called gcsfuse_mode or similar. That's a bit unfortunate, because we haven't had to touch the metadata elsewhere yet.

Permissions weirdness

@marcgel reports the following permissions-related weirdnesses:

Directory permissions of the mount point show up with question marks:
```
d?????????  ? ?       ?            ?            ? test
```
Need to use sudo to list the directory. (Probably related to the first issue.)

Reproduce each of these interactively (maybe Linux is required), add failing tests, and fix.

Re-use GCE credentials if possible

It looks like it is possible to re-use credentials built into the GCE instance when running on GCE—see this example. This would be a nice feature.

The primary subtlety is probably the scope of the credentials. Can we introspect that, or do we simply need to print a helpful error when a request fails with an HTTP 403 or whatever?

"No such file or directory" errors when listing a directory

As mentioned at the end of #28, there are problems with ls -l on an empty directory:

jacobsa@jacobsa-macpro:~/tmp% ls -l mp/
total 0
drwx------  1 jacobsa  eng  0 Aug 31  1754 foo
-rwx------  1 jacobsa  eng  0 Mar 31 16:28 foo?


jacobsa@jacobsa-macpro:~/tmp% ls -l mp/foo/
ls: foo: No such file or directory


jacobsa@jacobsa-macpro:~/tmp% ls -l mp/foo/
ls: foo: No such file or directory

I believe this is because this code doesn't filter out itself from the prefix-based results returned from GCS. The equivalent code used to, but must have regressed at some point during refactoring.

The reason this isn't reproduced in the integration tests is that ioutil.ReadDir ignores ENOENT when statting the names it reads from the directory, silently filtering out such entries.

So, update the integration tests to use a pickier version of ioutil.ReadDir, then fix the bug.

Allow for parallelism within gcsfuse

As of the changes in d054a1a, every operation happens on a single goroutine, blocking further fuse operations. This is good because it fixes the race discussed in jacobsa/fuse#3, but is certainly not optimal.

We should decide which types of operations need to be serialized (for example, WriteFile/SyncFile/FlushFile for a particular inode) and build queues for those, with a goroutine spawned when the queue goes from empty to non-empty. All other operations can simply receive their own goroutine. Optimization: things that don't block don't need a goroutine.

tmp_dir set to a non-existing folder

when mounting a bucket with gcsfuse if option tmp_dir is set to a non-existing folder, then bucket is still mounted, but will raise an IO error when trying to copy files and all files will have 0 filesize.

Add support for caching in trade-off for reduced guarantees

Right now, gcsfuse caches nothing and allows the kernel to cache nothing. This is in order to support the consistency guarantees documented in semantics.md. But it makes things slow, particularly when the kernel is doing path resolution (which is very frequent).

There is probably room for a --go_fast flag that users can enable if they are okay with relaxed guarantees. I would start with just allowing the kernel to cache attributes and entries, and see if anything else is truly needed. If so, the following additional things may or may not be helpful (measure first to find out!):

A "stat cache" mapping object name to most recent gcs.Object record for it, perhaps with TTL. Probably also supports negative entries.
A "listing cache" that caches listings for a prefix. This is more subtle, so hopefully we don't need it.

File cp seems to create file, but size in JSON API is zero.

Steps for repro:

Mount following instructions.
Go into mount dir and copy a file "cp foo bar"
Using API Explorer and using objects.get method the size of bar is zero (as well as Developer's Console which uses JSON API).
"ls -al" does show the file has a size.

Expected JSON API to reflect the size.

Basic performance testing

We should do some performance testing to identify completely obvious bottlenecks.

Run on GCE, on a mid-range machine.
Generate random data of various sizes (1 KiB, 1 MiB, 10 MiB, 100 MiB, 1 GiB, 10 GiB?)
Use time cp to measure wall time taken to copy from local disk to GCS.
Find some way to measure CPU time taken by the gcsfuse process during the operation, too.
Look at throughput in terms of bytes per wall second and bytes per CPU second.
As a baseline, compare both to gsutil cp. (Both can be measured here by just using time gsutil cp, I think.)

Don't spend a GCS round trip on determining nlink

Both DirInode.Attributes and FileInode.Attributes stat the object in GCS just to find out what they should set the Nlink field to. This means that the many slow Getattr requests done by ls -l (see issue #39) are only for the sake of Nlink.

Assuming no one cares about Nlink for anything important, we can get rid of this and probably speed up ls -l significantly without any cost to consistency guarantees. But need to check what Nlink is used for by the kernel, and whether it traditionally matters in userspace.

Fix build for new fuse interface

After fixing jacobsa/fuse#3, gcsfuse no longer builds. Fix it. Don't worry about parallelism for now; leave a TODO.

Support read-only mounts

A user requests read-only mounting, which seems like a reasonable thing to support. Package bazilfuse has a read-only mount option, so this should be easy.

Add a read-only mount test to samples/memfs in the fuse package.
Implement read-only mounts in package fuse.
Add a --read_only flag to gcsfuse.

Make sure we work like the typical fuse mount tool

Figure out what is typical for fuse file system mount tools in terms of running in the foreground/background and logging to stderr/log files/both. Make it happen, and document how it works in the readme.

If there is no typical behavior, choose what seems like a good behavior and document it.

Get tests into a healthy state

After the shakeup caused by switching fuse packages, the "foreign modifications test" now passes but the "read/write test" doesn't. Add features and/or update tests until it does, on both OS X and Linux.

Add an "interesting names" test case

I happened to mount a bucket containing the detritus left over from the InterestingNames test case in the jacobsa/gcloud package. ls did not like it.

We should have a similar test that covers the same cases for file and directory names, in order to discover what the OS chokes on. If nothing, we should recurse into why ls chokes. In either case, also check the posix standard.

Audit semantics doc

For each guarantee, make sure there is a passing test. If there's not, file an issue. Replace the "make it not aspirational" TODO with a list of these known issues.

Unit tests for inodes

DirInode has become quite complicated. I need to keep myself honest by adding unit tests, covering implicit dirs on and off, using a mock bucket.

Put custom user agent in HTTP headers

This will allow the GCS team to measure usage, more easily distinguish from abusive clients, etc.

ls takes a long time for medium-sized directories

I hear a second-hand report that doing ls on a directory with a few hundred files takes multiple minutes, from GCE. This is probably due to ls doing a ton of stats plus our consistency guarantees, so plays into #29 (adding a "go fast" mode with caching and reduced guarantees). Investigate and make sure.

Fast by default

GCS is quite slow and the kernel is quite chatty, so the cost of the consistency guarantees in semantics.md is very high. We probably want to turn on "fast mode" using caching by default. Sigh.

To do:

Add integration tests that cover the caching case at least a bit, if only to make it easier to reproduce bugs later. Probably no need to do the full product of settings; just all caching and no caching.
Choose appropriate non-zero default TTLs and set them.
Update documentation to reflect this. In particular, make sure semantics.md has prominent warnings saying you must disable caching to get the consistency guarantees.

gcsfuse logging options

Another feature we think would be great is to have some sort of logging option for gcsfuse for ease of administration. As an example in the log:

what gcsfuse does at a current moment (basic operations without debugging)
which files bucket works with
data copying speed
full destination path in a bucket (for a file which is copied into a bucket)
status of file transfer for each file (e.g success, failed, interrupted)

In addition, it would be great to have a full debugging option.

Support small random reads within large files

The current design is terrible for the use case of a handful of small random reads within a very large file (e.g. hundreds of 20-byte reads within a 100 GiB file). I'm told that this use case may be important for e.g. genomics databases.

The obvious fix here is to read only the portions of the GCS object requested, on demand or with some cache. Many subtleties lurk though. If we do something about this, we probably want to start with minimal complexity:

O_RDONLY file handles only. Supporting writing is a whole other can of worms.
We'd probably only want to enable this if the object is large enough. Maybe still even protected by a mount option/flag.
No explicit cache. It's not clear that these applications make repeated reads to the same part of the file. Even if they do, the OS-level data cache may absorb these.
On the other hand, having an explicit cache may place nicely into the work in issue #35. TBD.

Flush object contents on fsync and close

This is not currently implemented, and thus blocking #3.

We'll first need support in jacobsa/fuse. Relevant bazilfuse request structs: bazilfuse.FlushRequest and bazilfuse.FsyncRequest.

File system-related commands hang while writing a large file

User report:

when copying files with cp to a mounted bucket, all other commands hang
(e.g. df -h), until the file is not entirely copied.

This is probably due to #23; confirm when that is fixed. This probably should not require setting GOMAXPROCS greater than one, assuming the Go scheduler doesn't starve goroutines. I could be wrong.

Clean up temporary space more proactively

Currently we destroy an inode's temporary files only when the kernel tells us to forget it. I believe the kernel does this only when it is running out of space in its inode cache, at least if the file hasn't been unlinked. In contrast, one temporary file per inode can grow to a lot of disk space, and we may want to clean up earlier.

Consider setting a (configurable) limit on the amount of disk space devoted to temporary files. The limit may be exceeded if we have dirty inodes that add up to more than the limit. But if we are over the limit and have any clean inodes, we will throw away their content (in least recently used order?) until we are under the limit or run out of clean inodes.

How to plumb this in? Some sort of central object used by the object proxies. They use a "grab file if still exists" method when clean, and a "register dirty file" and "unregister dirty file" methods when transitioning into and out of the dirty state. Something like that.

Interrupt copying with ctrl+c

a) if we do ctrl+c during copying of a file into a bucket (interrupt/cancel -cp command), then copying is not immediately cancelled (-cp still copies the file). Our guess is that when we do ctrl+c for -cp, gcsfuse will start copying of the data into a bucket which it has already put into gcsproxy.temp_dir. So, it seems that with ctrl+c we interrupt copying of files into gcsproxy.temp_dir, but the next step of copying (from gcsproxy.temp_dir into a bucket ) stays uninterrupted.
b) the example above results in gcsproxy.temp_dir being uncleaned, unlike for un-interrupted copying (i.e. when we didn't try to stop the process with ctrl+c)

allow_root invalid argument

Hello Aaron,

This is really great you have added this option! it is exactly what is needed - yet, still some bugs sneak around:

when we tried to use -o allow_root we got a following message:
2015/05/19 09:28:22.936398 Mount:bazilfuse.Mount: fusermount: "fusermount: option allow_root only allowed if 'user_allow_other' is set in /etc/fuse.conf\n", exit status 1
when we set an option user_allow_other in /etc/fuse.conf, gcsfuse complained about wrong option -o allow_root
2015/05/19 09:29:33.606858 Mount:bazilfuse.Mount: fusermount: "fusermount: mount failed: Invalid argument\n", exit status 1
we also tried -o=allow_root (just in case it is a syntax issue) - still no success
when we removed -o allow_root then bucket connected correctly.

For testing we used fuse-2.9.3-4.fc21.x86_64

Do you know what might cause this?

Linux: touch fails due to setattr call

On Linux, touch complains about a setattr call:

jacobsa@fourier:~/tmp% touch mp/foobar
touch: setting times of ‘mp/foobar’: Function not implemented

This appears to be because of an attempt to set times:

fuse: 2015/04/02 15:59:45 Received: Setattr [ID=0x39 Node=0xd Uid=83333 Gid=5000 Pid=30939] atime=2015-04-02 15:59:45.180338035 +1100 AEDT atime=now mtime=2015-04-02 15:59:45.180338035 +1100 AEDT mtime=now handle=INVALID-0x0
fuse: 2015/04/02 15:59:45 Responding with error to *fuseops.SetInodeAttributesOp: function not implemented

We don't actually support arbitrary modification times, and don't support atime at all.

To do:

Update semantics.md to clarify that mtime is only supported for actual modifications, not setattr (since we get it implicitly from GCS's Updated field)
Don't fail hard on setattr calls with unsupported fields, and instead just ignore them.

Please add support for 100+ MB/s throughput per VM from ~100 of VMs at the same time

We have a scenario where we need to load GBs of data to ~100 of VMs at the same time. And having 100+ MB/s throughput to each VM would make E2E workload very fast, and would allow us to switch to gcsfuse!

Thank you!

error listing mounted directory: ListObjects: toObjects: toObject: Unexpected Md5Hash field

I am using Google Compute Engine running a VM (default Debian image). I have installed Go as well as gcsfuse (from git head) based on documentation from: https://github.com/GoogleCloudPlatform/gcsfuse

I can mount a directory:

$ gcsfuse --key_file key.json --bucket my-test-bucket --mount_point /data --fuse.debug
2015/05/19 23:02:56.808281 Initializing GCS connection.
2015/05/19 23:02:56.808475 Opening a bazilfuse connection.
2015/05/19 23:02:56.810162 File system has been successfully mounted.
2015/05/19 23:02:56.810592 Op 0x00000000 connection.go:319] <- Init [ID=0x1 Node=0x0 Uid=0 Gid=0 Pid=0] 7.23 ra=131072 fl=InitAsyncRead+InitPosixLocks+InitAtomicTrunc+InitExportSupport+InitBigWrites+InitDontMask+InitSpliceWrite+InitSpliceMove+InitSpliceRead+InitFlockLocks+InitAutoInvalData+InitDoReaddirplus+InitReaddirplusAuto+InitAsyncDIO+InitWritebackCache+InitNoOpenSupport
2015/05/19 23:02:56.810954 Op 0x00000000 common_op.go:154] -> Init {MaxReadahead:131072 Flags:InitBigWrites MaxWrite:2097152}

But, as soon as I try to list the contents of the mounted directory I get:
$ ls /data
ls: reading directory /data: Input/output error

The debug output is here:

2015/05/19 23:02:59.455564 Op 0x00000001 connection.go:319] <- Getattr [ID=0x2 Node=0x1 Uid=1000 Gid=1000 Pid=22387]
2015/05/19 23:02:59.456040 Op 0x00000001 common_op.go:154] -> Getattr {AttrValid:0 Attr:{Inode:1 Size:0 Blocks:0 Atime:0001-01-01 00:00:00 +0000 UTC Mtime:0001-01-01 00:00:00 +0000 UTC Ctime:0001-01-01 00:00:00 +0000 UTC Crtime:0001-01-01 00:00:00 +0000 UTC Mode:drwxr-xr-x Nlink:1 Uid:1000 Gid:1000 Rdev:0 Flags:0}}
2015/05/19 23:02:59.456466 Op 0x00000002 connection.go:319] <- Getattr [ID=0x3 Node=0x1 Uid=1000 Gid=1000 Pid=22387]
2015/05/19 23:02:59.456924 Op 0x00000002 common_op.go:154] -> Getattr {AttrValid:0 Attr:{Inode:1 Size:0 Blocks:0 Atime:0001-01-01 00:00:00 +0000 UTC Mtime:0001-01-01 00:00:00 +0000 UTC Ctime:0001-01-01 00:00:00 +0000 UTC Crtime:0001-01-01 00:00:00 +0000 UTC Mode:drwxr-xr-x Nlink:1 Uid:1000 Gid:1000 Rdev:0 Flags:0}}
2015/05/19 23:02:59.457124 Op 0x00000003 connection.go:319] <- Open [ID=0x4 Node=0x1 Uid=1000 Gid=1000 Pid=22387] dir=true fl=OpenReadOnly+0x10800
2015/05/19 23:02:59.457501 Op 0x00000003 common_op.go:154] -> Open {Handle:0 Flags:0}
2015/05/19 23:02:59.457627 Op 0x00000004 connection.go:319] <- Read [ID=0x5 Node=0x1 Uid=1000 Gid=1000 Pid=22387] 0x0 4096 @0x0 dir=true
2015/05/19 23:02:59.619914 Op 0x00000004 common_op.go:131] -> (ReadDir(inode=1)) error: readAllEntries: ReadEntries: ListObjects: toObjects: toObject: Unexpected Md5Hash field:
2015/05/19 23:02:59.620324 Op 0x00000005 connection.go:319] <- Release [ID=0x6 Node=0x1 Uid=0 Gid=0 Pid=0] 0x0 fl=OpenReadOnly+0x10800 rfl=0 owner=0x0
2015/05/19 23:02:59.620587 Op 0x00000005 common_op.go:148] -> (ReleaseDirHandle(inode=1)) OK

Add optional implicit directory behavior

As discussed in the semantics doc, we require objects to exist for directories as well as files; there is no such thing as an implicit directory. If an object named foo/bar exists but no object named foo/ exists, then the file system behaves as if foo/bar does not exist. So if the user mounts a bucket containing only an object named foo/bar and then does cat foo/bar, they will get a "file not found" error.

Issue

When the user does cat foo/bar, fuse sends the following requests to gcsfuse:

Look up the name "foo" within the root inode. Return its inode ID and whether it's a file or a directory, or fail if it's non-existent. Call the returned inode F.
Look up the name "bar" within F. Return its inode ID and whether it's a file or a directory. Call the inode B.
Read the contents of B.

The fundamental issue is that at the point of the call in (1), gcsfuse can see that an object named foo doesn't exist and therefore can say "foo" doesn't refer to a file, but needs to decide between telling the kernel that it doesn't exist at all or telling the kernel that it refers to a directory.

Current behavior

The current behavior is that in (1) gcsfuse asks GCS to do a consistent read of the metadata for two objects, foo and foo/. If it finds the first it calls "foo" a file, if it finds the second it calls it a directory, and if it finds neither it says it doesn't exist. That's why we require the object foo/ to exist for the directory to appear to exist.

This method works because unlike listing objects by prefix, a read of the metadata for a single object is guaranteed to be fresh.

Alternatives

Listing-based lookup

One alternative is that we implement (1) by asking whether foo exists (as today) and by scanning objects with the prefix foo/, saying that "foo" is a directory if the scan is non-empty. But there are drawbacks here:

With this setup, it will appear as if there is a directory called "foo" containing a file called "bar". But when the user does rm foo/bar, suddenly it will appear as if the file system is completely empty. This is contrary to expectations, since the user hasn't done rmdir foo.
Similarly, if the user does rm foo/bar then touch foo/baz, the second command will fail with a surprising "no such file or directory" error.
The operation of scanning the prefix foo/ maps down to an unbounded number of requests to GCS, since each response contains a continuation token that must be redeemed to continue scanning and GCS does only a limited amount of work before bailing out and returning this. This means a single simple path resolution may result in enormous expense.
- It's possible that this is a non-issue if GCS guarantees that a) the first response for a non-empty range is non-empty, and that b) only one request is required for an empty range. This is not documented anywhere, so we would need to check with the GCS team about it. In particular, this would be impossible to guarantee if GCS uses tombstone records in the data source that is scanned when listing objects.
- For example, assume that a large directory used to exist but many or all of the objects within it have been deleted. Upon each Objects.list request, GCS will scan the hidden tombstones for those dead objects for awhile and then return an empty response with a continuation token. We may need to scan the entire range of tombstones (resulting in many requests) before we are sure the directory still exists. If all of its contents are gone, then we definitely must scan the entire range before we can say so.
- Update: Indeed, nherring tells me over IM that GCS offers neither guarantee: "Under conditions where many objects have been recently deleted, yes, you can can receive no useful answer and a page token".
Because listings may be arbitrarily far out of date, this will be a flaky user experience unless the bucket is never modified. Some behaviors that this might cause:
- The user creates an object by some other means—storage explorer, gsutil, etc.—and the implicitly-defined directory doesn't show up as existing for minutes or even hours. This is not unlikely based on the listing freshness numbers I've seen.
- The user does rm foo/bar. As discussed above now the directory "foo" no longer exists because it was only implicitly defined, so the user gets the surprising behavior of touch foo/baz failing. Except they only get behavior once the listing catches up. Worse, if they try the experiment several times then it may fail, succeed, fail, succeed, and fail again.

Even if GCS eventually offers list-your-own-writes consistency, negating the last point, the other issues remain.

Fixup tool

If users want to mount buckets where they've created object names assuming that implicit directories will work, we can create a "fixup" tool that lists the buckets and creates the appropriate objects for the implicit directories.

The main caveat here is that the tool would itself depend upon listing, so may miss some objects in the bucket for the same reasons discussed above. Another caveat is the need to run such a tool, but the behavior could be built into the gcsfuse binary itself (either as the default when mounting or on an opt in basis).

Support mounting from fstab

Goal: figure out how to set up an fstab entry for gcsfuse, so that buckets can be mounted at startup. Ideally set up such that a particular non-root user owns the mount.

Groom readiness warnings

Before calling it alpha, we should relax the "don't depend on this for anything" warnings throughout a bit.

Unmount on SIGINT

@marcgel points out that if you hit Ctrl-C to kill gcsfuse, which is natural to do when testing it out, you're left in a state where the mount is still active as far as the kernel is concerned but interacting with the directory doesn't work. More importantly, you can't just run the gcsfuse command again, because you can't mount over an existing mount point. Even more importantly, on Linux you have to use sudo to unmount because umount refuses normal users.

It would be better if we could avoid this. Modify the binary to set up a SIGINT handler that prints a message, attempts to unmount, and then exits with a status. Try it out on Linux and see how it works.

Audit TODOs

In these packages:

googlecloudplatform/gcsfuse
jacobsa/fuse
jacobsa/gcloud

I think there are several things that should be turned into GitHub issues.

Fix up and test conflicting names behavior

Currently:

DirInode.LookUpChild prefers directories over files if there is a conflicting name.
DirInode.ReadEntries makes no effort to do the same—a conflicting name will show up twice.

We don't have a good test for this, and I think files should probably be preferred over directories instead—they are more "local" in some sense. Action items:

Add a test for this behavior, specifying files are preferred.
Make it pass by reversing the behavior.
Figure out if we should be attempting to suppress the duplicates in ReadDir. What is the ls user experience?
Put this all in the semantics doc, probably under "surprising behavior".

Directory inodes should have a source generation too

Currently fileSystem contains two indexes over its inodes:

fileIndex is an index over all file inodes by (name, source generation.)
dirIndex is an index over all directory inodes by name.

It is suspicious that these are different, and indeed this has come around to bite me when implementing rmdir. The FakeGCS.DirectoryTest.Rmdir_OpenedForReading test cases crashes in checkInvariants after it creates a directory, deletes it, then creates another with the same name.

Not having the generation number of the directory's backing object in there makes it very hard to reason about race conditions involving the backing object for the directory. We should make this look just like files.

Action items:

Make sure the delete then recreate with same name case is better covered in tests, both for files and directories. This test only accidentally caught the issue.
Make NewDirInode accept a *storage.Object just like NewFileInode, and record it for a SourceGeneration method.
Unify the two inode indices, both over (name, source generation).
Make sure lookUpOrCreateDirInode looks like lookUpOrCreateFileInode.

This is blocking #3.

Support renaming

Relevant fuse request struct: bazilfuse.RenameRequest

Unfortunately we can't do this atomically. So:

For files, we'll probably want to clone and then delete behind.
For directories, I guess we want to clone recursively and then delete behind. Ugh.
Add both of these to the "surprising behavior" section in semantics.md.

gcsfuse daemonization

Hello Aaron,

we think it would be great to have an option for running gcsfuse as a linux daemon. Do you think it is feasible?

Anna

Memory/disk leak due to not forgetting inodes

Currently we never forget an inode that we created on behalf of the kernel, which means our RAM and disk usage grows without bound. When we receive a ForgetInodeOp from the fuse package, we should destroy any temporary files for the inode and then throw it away.

This is blocked by jacobsa/fuse#7 (the fuse package doesn't yet actually issue these).

What to do about rmdir?

As of 863e4a5, the semantics doc says that we will allow unlinking directories regardless of content.

I did this because it's impossible to delete the placeholder object in GCS if and only if there are no contents in the directory. Short of a transactional change feature in GCS, this will be true no matter what for concurrent changes to the directory across machines, even if consistent listing is eventually added to GCS. We could solve it correctly for a single machine with consistent listing, though.

Is this the least surprising and/or most desirable compromise though? The alternative is to list the directory and delete if the listing is empty. Consequences:

The user may delete the last file in a directory and then do rmdir and see it fail with "not empty" due to lack of listing consistency.
The user may write a file into a directory and then do rmdir and see it succeed, due to lack of listing consistency. This is true today too, but maybe more surprising if rmdir "usually" appears to work as normal.
If the user deletes an empty directory while another machine creates a file within it, that file will be stranded. This is unsolvable. It is true today, too, but maybe more surprising if rmdir "usually" appears to work as normal.

Work with rsync

I don't yet know what this involves. Try it out with GCS as both a source and a sink, and see what the user experience is like.