Giter Site home page Giter Site logo

Comments (3)

SUPERCILEX avatar SUPERCILEX commented on May 23, 2024

would be very useful for using cpz as a backup tool.

Just a fair warning, cpz uses the copy_file_range syscall which means the physical bytes aren't actually copied on file systems that support it. You'd need to do a cross-fs copy to get a true copy. Though if you're using a FS like bcachefs you can tune the number of physical copies you want on a per file basis.

file permission

cpz copies these by default. I'd consider it a bug if this is not the case (on Linux, for other platforms this doesn't happen).


user ID and group ID

This I'm not willing to do because it would introduce an extra syscall for every file. You should sudo into the uid/gid you want the files to be copied as.

The time of last data modification and time of last access.

Same reasoning here.

In theory a flag could be added for both, but I'm really resisting adding flags because:

  • I don't know how to deal with cross platform compatibility. One solution here could be to just drop macOS/Windows compat since these tools aren't optimized for those platforms anyway.
  • Once one flag is added, then everybody wants flags and suddenly I'm trying to be the fastest while also being a coreutils clone which is not the goal.

preserves hard links between source files in the copies

This one is wild! Unless I'm missing something obvious, I think you need to keep track of every hard linked file you've seen so far and then match newly encountered hard links with what you've copied so far. This definitely won't happen flag or not because it requires synchronization between the copy threads.


If your use case is backups, it might actually be faster to first do a cpz pass to pound the NVMe queues, sync, and then run a second metadata + checksum rsync pass. For serious backups, I wouldn't trust the physical media to have written stuff correctly, so you need to use rsync either way.

from fuc.

baod-rate avatar baod-rate commented on May 23, 2024

cpz uses the copy_file_range syscall which means the physical bytes aren't actually copied on file systems that support it

Ah, I see. I think cp --reflink uses FICLONE. effectively (as far as the filesystem is concerned) the same, right?

This I'm not willing to do because it would introduce an extra syscall for every file. You should sudo into the uid/gid you want the files to be copied as.

Fair enough. Though note that this isn't an option if there are a mix of owners for the files or if the GID doesn't match the group of the user (common on network file systems or other multi-user directories like git repositories)

trying to be the fastest while also being a coreutils clone which is not the goal

Yeah, totally fair

This one is wild!

I had pretty much the same reaction! I didn't know about this one until I was referencing the docs to write this ticket. I had to test the behavior to believe it.

it might actually be faster to first do a cpz pass to pound the NVMe queues, sync, and then run a second metadata + checksum rsync pass. For serious backups, I wouldn't trust the physical media to have written stuff correctly, so you need to use rsync either way

That's smart, good tip thanks. and I did mean "backup" pretty loosely, but you're right, that's probably a good idea in any case

from fuc.

SUPERCILEX avatar SUPERCILEX commented on May 23, 2024

Ah, I see. I think cp --reflink uses FICLONE. effectively (as far as the filesystem is concerned) the same, right?

Yup!


Thanks for the suggestion though!

from fuc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.