Giter Site home page Giter Site logo

Comments (4)

SUPERCILEX avatar SUPERCILEX commented on May 25, 2024 1

Getting back to the issue at hand, I'm just going to check the path length and chain openats longer than 4096 bytes.

from fuc.

SUPERCILEX avatar SUPERCILEX commented on May 25, 2024

Is normal rm able to handle this? If I had to guess the answer is yes because they use openat I think?

from fuc.

SUPERCILEX avatar SUPERCILEX commented on May 25, 2024

The workaround would be to cd into a deep directory and start deletion from there, then work your way up.

I need to think about how to do this without adding overhead because I'm not willing to compromise performance for the majority to solve this edge case.

This problem actually has a much more interesting framing: reduce resource usage. Long paths suck for a number of reasons:

  • Every deletion task uses a path allocation which means we sit around copying more bytes and using more memory.
  • Every syscall interacting with the path has to actually walk the components which involves disk reads, though realistically they'll all be in the cache. Either way, repeated path traversal is not efficient.
  • The shape of the directory tree (in this case likely deeper rather than wider) influences task queue usage. Small queues are better obviously.

I noted down my thoughts trying to explore the space.


The way I tried to handle things originally was to keep FDs to each directory around for use in children openat syscalls (moved away from this in f2439db). Then each directory openat simply uses the name of the dir and the relative FD as its parent. This unfortunately didn't work because you quickly run out of file descriptors due to having too many queued tasks. Ideally one would have a maximum number of tasks queued at a time to limit memory and resource usage, but completing a unit of work potentially spawns more work, i.e. there's no way to make progress without blowing up the work queue along the way. Trying to keep around parent FDs doesn't work because if you try to go breadth-first, you'll fall over when you reach the lower levels on large fan-out directories, and if you try to go depth first, FDs scale as thread count times tree depth so you you're toast when that product gets too high (also depth-first execution requires scheduler support).

I had an interesting idea in this space that doesn't work unfortunately:

Explore a two-phased deletion approach when queue lengths grow unreasonably

It would be nice to improve the bound on queue length (which is currently just the number of directories in the deletion tree) such that resource utilization is bounded. An idea here is to enter a two-pass mode when the queue length exceeds some threshold wherein phase 1 only deletes files and ignores directories while phase 2 goes through and queues up the tasks for the directories that were ignored prior. This doesn't actually help because for every task you complete you still need to store something that says that task is pending. Once you execute a pending task, it will generate new tasks so you haven't actually provided any point in time where the task queue could shrink.

One thing I said is that FDs scale as thread count—I'm actually not sure this is true when unshare-ing the FD table. So maybe we then only scale with the directory tree depth. Here are my ulimits:

$ ulimit -Hn
1048576
$ ulimit -Sn
1024

Obviously 1024 is useless, but 1 million is very respectable so we could have some code in the CLIs that lifts the soft limit up to max. Then there's almost no way a depth first approach would fail. One interesting thought here is to use a thread-local cache of open directories instead of giving one to every deletion task. We'd unfortunately have to still pass around full paths since it's just a cache and we'll need a special data structure that supports tree based hashing to make retrieval efficient which doesn't sound worth it.

The next problem is unshare: to actually use these relative file descriptors, we need to send them to multiple threads and I'm not willing to lose the perf benefits of unshare. This is unfortunately a contradiction as far as I can tell because there's no way to share a single FD in a process-level table (linux doesn't work with processes anyway, everything is a task IIRC so I can imagine this potentially being an unreasonable request if you need to somehow keep track of a fallback FD table to support two FD tables).

So overall, an openat relative FD approach doesn't work under the constraint of wanting to keep unsharing the FD table.


The next solution is to use setcwd. To use this on a per-thread basis, we'll need to unshare(CLONE_FS). Again, a fundamental issue is sharing this information across threads. Additionally, we also need a way to back out.

A breadth first traversal minimizes the number of working dir changes (assuming scheduler support), but I don't want to compromise the resource usage of high fan-out. So next let's build up a depth-first traversal approach:

  • In the single threaded world, you just chdir every time you recurse and chdir("..") on exit of the stack frame.
  • Of course this introduces unnecessary syscall overhead, so a smarter approach is to build up a path and only chdir when it reaches some threshold number of bytes.
  • In the multi-threaded world, you're screwed because each thread can be at an arbitrary spot in the tree.
    • The dumb solution here is to just keep track of the full path and chdir (with potentially multiple calls split on the PATH_MAX boundary since the whole point is to avoid "file name too long" errors) on every task start.
    • The dumb solution unacceptably introduces overhead for no gain: we still keep track of full paths and chdir will presumably have to do the component walk openat would have done.
  • You could introduce the notion of a level and pass that around with deletion tasks so they know how far to back up with ..s, but then how do you dig back down if you have missing path pieces?

This time, I don't think a chdir approach is practical. You can make it work, but you don't gain anything from the additional syscalls.


In summary, there doesn't seem to be any room for improvement on the resource usage front. I'm working on the lockness scheduler which will reduce queue usage and generally afford much more control over scheduling and performance, but that's still a ways away. Also, it won't address the path length resource usage which is a bummer.

from fuc.

hizani avatar hizani commented on May 25, 2024

Is normal rm able to handle this? If I had to guess the answer is yes because they use openat I think?

Yes, it does work with rm

from fuc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.