Giter Site home page Giter Site logo

jkominek / fdbfs Goto Github PK

View Code? Open in Web Editor NEW
12.0 3.0 1.0 298 KB

A not-yet-ready-for-use FoundationDB-backed FUSE filesystem. Seriously, don't use it.

License: ISC License

Python 2.04% C++ 94.82% C 1.72% Shell 0.69% Meson 0.72%
fuse fuse-filesystem foundationdb does-not-work full-of-lies

fdbfs's People

Contributors

jkominek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

iamfork

fdbfs's Issues

Convert to nlohmann/json 3.x

We're on 2.1.1. Which is probably fine for some time; I haven't actually read what's changed, I just know we don't build with the latest version. Probably isn't too hard to fix, since we're not doing anything very exciting.

statfs.cc and the build instructions in the github action are the places to touch.

Liveness-aware garbage collection

The garbage collector should make use of live process information from the liveness management system, to ensure it doesn't collect anything early.

Increase complexity of FDB configuration

Install a configuration file which spawns multiple/many FDB processes, each taking on different roles. The goal wouldn't be to improve or even alter performance, but rather to increase the opportunities for concurrency in FDB's internals. Hopefully that will expose us to a wider variety of FDB behaviors.

Serialize our own operations on inodes?

Currently it is possible that we would receive, and dispatch to FDB, requests which we could predict will cause transaction conflicts. Multiple updates to an inode, or writes to the same area of a file.

We should consider implementing a locking scheme on our side, whereby we maintain inode-level locks to serialize these operations.

The necessary data structure wouldn't be totally trivial; we'd need some sort of weakly held map from inodes to dynamically created locks, which is itself locked, so that competing processes don't attempt to both create the lock for a given inode at the same time. (Remember to release the lock on the outer structure before attempting to take the lock you really want.)

Include SQLite in the test suite

By "SQLite" I mean:

  1. Compile sqlite3 in a directory on fdbfs
  2. Run the sqlite3 test suite in there.

That should represent a lot of real-world loads nicely.

Maybe there are some other disk-interaction-heavy packages with sizable test suites that we could incorporate?

Perform our own permissions checking

Right now we have to farm out permissions checking to the kernel with the default_permissions option. That works, but allows for... "permissions skew", where the kernel may use cached permissions to determine whether or not some operation is allowed. So we could see:

  1. system A reads the inode for permissions (read-only transaction)
  2. system B changes the permissions on the inode (read-write)
  3. system A uses those permissions to perform an operation on the inode (read-write)

Now, is that the end of the world? No, I think local filesystems probably don't guarantee that can't happen. But their time bounds on it preventing it are probably muuuuch tighter than ours. In bad situations ours might be long enough to be human perceivable and weird.

It shouldn't really be any more expensive to do this. I believe I've added reads for the inode in all the places where permissions would need to be checked, even if we don't actually use the retrieved value (and it is used in almost all cases). So we're already paying the price; it's just a matter of implementing the permission checking function and calling it inside our transactions.

This is marked very hard because you've got to know the subtleties of POSIX permissions checking.

cmake

Switch to building with cmake.

Keep an eye out for Windows compatibility, if that's a concern, as we want to be able to build for Dokan as well.

Liveness management

Correct garbage collection and lock breaking requires "liveness management", whereby every process registers an ID used for marking inodes as in-use and holding locks. Every process will increment a counter and set a last updated time at a constant frequency. Every process will watch the PID table and, on spotting a process not updating its entry, "ping" it. Failure to respond to the "ping" within a significant multiple of the update frequency will mean the pinged process is dead, and other processes may remove the dead process' entry from the PID table which will kill it.

Convert away from Travis CI

To something/anything free. Or something which can securely store some AWS credentials and fire stuff up on AWS. Wouldn't mind paying a little bit in EC2 costs, just don't want to have to build/maintain significant infrastructure.

Macros for supporting static analysis

I think I'd like to have some macros like:

  • INODE_KEY
  • FILEDATA_KEY
  • DIRENT_KEY
  • ???

which just pass through their contents:

#define INODE_KEY(x, descriptor) (x)

and some helper macros:

#define TARGET 0
#define PARENT 1
???

which are used to mark all of the keys provided fdb_transaction_* functions:

fdb_transaction_get(transaction, DIRENT_KEY(key, PARENT).data(), key.size(), ...)
fdb_transaction_set(transaction, INODE_KEY(key, TARGET).data(), key.size(), ...)

so that we can quickly and accurately identify what KV pairs are read/written to by any given operation. If we start generating "synthetic" conflict keys for the inodes, we'll need this to help with correct reasoning.

Implement filesystem lock operations

Add support for the various FUSE locking operations. Double check them against the Dokan and samba lock operations to make sure whatever KV layout we use should support the locking requirements of our major targets.

Conflict range "cleanup"

When reading file blocks, the ..._get_range call should be snapshot; we don't have any obligation to return a specific version of the blocks, and don't want to conflict with writes. Though we might end up conflicting on the inode, if we're updating access times. At a minimum the snapshot read will produce one less conflict range to send & check. (Tiny optimization? Yes)

Similarly when performing a write, we should produce a single write range conflict covering all possibly affected blocks, instead of the maybe 3 conflicts we'd currently produce (start block, middle range clear, stop block). Again, tiny optimization, slightly fewer ranges to send over the wire, and fewer for the resolver to check.

file extended attribute encodings

Our KV layout gives us a lot of flexibility for encoding the inode data blocks. We've got basically a tiny amount of arbitrary data tacked at the ends of the keys, so that we can encode compression or parity information.

It'd be nice to be able to get the same thing going with xattr data, as well. Linux VFS allows attribute values up to 64kB, which is definitely a large enough lump of data to be worth compressing.

No immediate ideas for how to make that change.

Provide reusable interface to the filesystem

As it stands, the code ties the filesystem to FUSE. Which sort of makes sense.

But it would be useful to be able to reuse the same FDB code to produce a Samba VFS, a stand-alone library, and other things.

Consider sooner, rather than later, how to pull the FUSE-specifics out of the FDB code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.