Giter Site home page Giter Site logo

Safety of Mmap::as_slice ? about memmap-rs HOT 30 OPEN

danburkert avatar danburkert commented on September 27, 2024 1
Safety of Mmap::as_slice ?

from memmap-rs.

Comments (30)

BurntSushi avatar BurntSushi commented on September 27, 2024

See also: #4

from memmap-rs.

SimonSapin avatar SimonSapin commented on September 27, 2024

So is the conclusion that file-based mmap is basically impossible to use safely? (Unless you somehow control the entire filesystem.)

This does not match how common this pattern seems to be.

from memmap-rs.

BurntSushi avatar BurntSushi commented on September 27, 2024

@SimonSapin I feel similarly as you, but the argument in favor of unsafe is somewhat convincing. In particular, without unsafe one could have an &[u8] that can be mutated in safe code, which generally seems like a violation of what one would expect when given an &[u8].

from memmap-rs.

danburkert avatar danburkert commented on September 27, 2024

Yep, @BurntSushi is correct. I wouldn't expect something like a system package manager updating a package to be an issue, since I would expect it to write a new file instead of modifying an existing one, but obviously that's down to the implementation.

So is the conclusion that file-based mmap is basically impossible to use safely?

It's impossible for this library to guarantee that the mmap is being used correctly, but it's not impossible for a combination of the application developer and operator. File locks (even advisory), permissions, containers, etc. can all be used to make sure the invariants aren't broken.

from memmap-rs.

burdges avatar burdges commented on September 27, 2024

Apologies, but I'm confused. When would safe code mutate a &[u8] that some unknown function gave it? Wouldn't only a &mut [u8] have that problem? I understand that a function could be marked unsafe to push out the "unsafe boundary" that Niko's blog recent posts discuss in relation to future optimizations, but you'll never know if you're the only owner of a &[u8] given to you by another crate, so that does not seem to apply. What am I missing?

As an aside, an interesting way to address this for new rust code might be another crate wrapping this crate that provides a family of checksums on the file contents and updates them when some reference guard type goes out of scope. You'd want a checksum that was locally mailable in the data in O(1) time, so no cryptographic checksums like polay1305. Also, if the checksum were mailable in both the key and data, then you could mutate the key when mapping the file, so that the current file's checksum key would never exist on disk until you closed it, or maybe wrote it out as part of a commit operation and updated it in memory again. Any processes that want to share the file could share the in-memory key via another IPC channel. All this malleability means a weak checksum, but one could make up for that by making it larger.

from memmap-rs.

BurntSushi avatar BurntSushi commented on September 27, 2024

@burdges The &[u8] from a memory map is backed by a file, which means it can be mutated by modifying the file itself.

from memmap-rs.

burdges avatar burdges commented on September 27, 2024

I miss-read your statement up thread as saying that marking the function as unsafe changed the compiler's behavior in some more subtle way. In this case, the unsafe just reminds the programmer that providing a safe abstraction requires more work though. Ok

from memmap-rs.

BurntSushi avatar BurntSushi commented on September 27, 2024

@burdges Ah I see ya, definitely didn't mean that! Apologies.

from memmap-rs.

strega-nil avatar strega-nil commented on September 27, 2024

Couldn't it return an &[Cell] in order to be safe?

from memmap-rs.

BurntSushi avatar BurntSushi commented on September 27, 2024

@ubsan Can you expand more on that suggestion? What is Cell?

from memmap-rs.

burdges avatar burdges commented on September 27, 2024

I suppose @ubsan means std::cell::Cell<u8> but to answer the question..

Cell<T> is not Sync. AtomicU8 is Sync, but not stable, and rather expensive. If you want mmap then you likely want a lighter weight courser grained locking mechanism anyways.

from memmap-rs.

strega-nil avatar strega-nil commented on September 27, 2024

Seems reasonable.

from memmap-rs.

DoumanAsh avatar DoumanAsh commented on September 27, 2024

Wouldn't simply opening File with write access to ensure that no other process could access it? (at least on Windows I think it would give you some guarantees)

from memmap-rs.

BurntSushi avatar BurntSushi commented on September 27, 2024

@DoumanAsh "at least on Windows" I think is the key part. :-)

from memmap-rs.

DoumanAsh avatar DoumanAsh commented on September 27, 2024

Sadly i'm not expert in regard of Linux, but at least there is some capabilities to set lock through (fcntl)[http://man7.org/linux/man-pages/man2/fcntl.2.html]

So i suppose it all comes to OS capabilities.
fcntl can lock through Advisory record locking, but doesn't guarantee safety i think?

Then there is non-posix thing that would work only on Linux
See section Open file description locks (non-POSIX)
Which is stil advisory...

Considering that Mandatory locking is buggy according to above link, there is no way on Linux and therefore no way to guarantee safety on unix\linux platforms

from memmap-rs.

BurntSushi avatar BurntSushi commented on September 27, 2024

Right, advisory locking is opt-in. An application itself can use it to provide a form of internally consistent safety, but it can never prevent some other random process from mutating its &[u8].

from memmap-rs.

dtolnay avatar dtolnay commented on September 27, 2024

To summarize, are we saying that concurrently modifying a memmap'd file to which Rust holds a &[u8] or &mut [u8] is officially Undefined Behavior, but "probably" will not cause anything bad if you are just reading bytes out?

from memmap-rs.

strega-nil avatar strega-nil commented on September 27, 2024

@dtolnay I mean, yeah? but if you're not just reading bytes out, it'll probably bite you at some point. It's the same issue many old C++ codebases are having with newer compilers that assume you're not doing multithreaded accesses without atomics; this would be equivalent to multithreaded accesses without atomics.

from memmap-rs.

danburkert avatar danburkert commented on September 27, 2024

One thing that worries me is that we're changing the API to have a safe deref to &[u8] on Mmap, but make it unsafe to create a Mmap from a file. @ubsan is this going to cause problems with the unsafe code guidelines, since it seems like it may allow the unsafe behavior (accessing shared mmap'd memory) to be far from the unsafe annotation (creating the mmap)?

from memmap-rs.

strega-nil avatar strega-nil commented on September 27, 2024

@danburkert I mean, if you're not doing UB, you're fine ;)

from memmap-rs.

BurntSushi avatar BurntSushi commented on September 27, 2024

@ubsan Could you elaborate on that please?

from memmap-rs.

strega-nil avatar strega-nil commented on September 27, 2024

@BurntSushi unsafe doesn't affect whether something is UB - at least, I'm really working hard to make that happen. Therefore, the location of the keyword doesn't matter. (and anyways, the kind of UB we're talking about here will never be defined, as it's equivalent to unprotected threaded alias).

Basically, you need to use atomic accesses for every single access to a MAP_SHARED file, if you're going to be changing the backing file. It doesn't matter where the unsafe goes.

s/volatile/atomic, thanks talchas

from memmap-rs.

iqualfragile avatar iqualfragile commented on September 27, 2024

Would it be possible to open the file, i.e. have a file-handle, delete the file from the filesystem, check that only one file handle to that file exists by iterating through all open files and then return to the user?
That would probably cause problems if the file is not un-deleted before the program exits.

from memmap-rs.

danburkert avatar danburkert commented on September 27, 2024

@iqualfragile what specifically do you mean by un-deleted?

from memmap-rs.

iqualfragile avatar iqualfragile commented on September 27, 2024

well, you would have to restore the file to the file system when you close the mmap (on drop probably) otherwise the file is just gone. You got a file descriptor so you can re-add it to the file system.

from memmap-rs.

danburkert avatar danburkert commented on September 27, 2024

@iqualfragile what specifically do you mean by re-add it to the file system? I'm not aware of an API to do that (even OS/FS dependent), so I'm intrigued.

If there's no API and it requires writing the entire file back out you may as well not use a mmap at all.

from memmap-rs.

iqualfragile avatar iqualfragile commented on September 27, 2024

on linux you can access the list of file descriptors of a process in /proc/pid/fd/descriptor. you can then use ln -L to re-link the file (inode) to a name in the file system tree. doing so does not invoke a copy. i do not know how ln does it though.
so basically: have a file open in a program, optionally delete it, cd to /proc/pid/fd/, ls -lah to find which descriptor you want, ln -L it to where you want it, if you delete it you can restore it to the same name, otherwise you get two hardlinked files.
hope that makes it clear.
thinking about it this solution is probably not as unworkable as i thought when i first suggested it because an unclean shutdown of your program tends to leave an inconsistent state on disk anyway. so it might be ok/better if the file is gone instead.

from memmap-rs.

iqualfragile avatar iqualfragile commented on September 27, 2024

doing some further reading this is archived by using the rather new linkat() syscall, passing the AT_SYMLINK_FOLLOW flag.

from memmap-rs.

KodrAus avatar KodrAus commented on September 27, 2024

... an unclean shutdown of your program tends to leave an inconsistent state on disk anyway. so it might be ok/better if the file is gone instead.

Hmm, I don't think this an assumption that could really be baked in here. Inconsistent state doesn't necessarily mean irrecoverable state.

from memmap-rs.

iqualfragile avatar iqualfragile commented on September 27, 2024

true, but in that case you are building a database and are probably ok with the unsafe interface? Another even more specific option would be to snapshot/cp --reflink the file before deleting on filesystems that support it. Would effectively limit the full solution to work on linux using btrfs or zfs. (i think xfs is working on patches for cow and there are some patches to make ext do cow).

from memmap-rs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.