Comments (30)
See also: #4
from memmap-rs.
So is the conclusion that file-based mmap is basically impossible to use safely? (Unless you somehow control the entire filesystem.)
This does not match how common this pattern seems to be.
from memmap-rs.
@SimonSapin I feel similarly as you, but the argument in favor of unsafe
is somewhat convincing. In particular, without unsafe
one could have an &[u8]
that can be mutated in safe code, which generally seems like a violation of what one would expect when given an &[u8]
.
from memmap-rs.
Yep, @BurntSushi is correct. I wouldn't expect something like a system package manager updating a package to be an issue, since I would expect it to write a new file instead of modifying an existing one, but obviously that's down to the implementation.
So is the conclusion that file-based mmap is basically impossible to use safely?
It's impossible for this library to guarantee that the mmap is being used correctly, but it's not impossible for a combination of the application developer and operator. File locks (even advisory), permissions, containers, etc. can all be used to make sure the invariants aren't broken.
from memmap-rs.
Apologies, but I'm confused. When would safe code mutate a &[u8]
that some unknown function gave it? Wouldn't only a &mut [u8]
have that problem? I understand that a function could be marked unsafe
to push out the "unsafe boundary" that Niko's blog recent posts discuss in relation to future optimizations, but you'll never know if you're the only owner of a &[u8]
given to you by another crate, so that does not seem to apply. What am I missing?
As an aside, an interesting way to address this for new rust code might be another crate wrapping this crate that provides a family of checksums on the file contents and updates them when some reference guard type goes out of scope. You'd want a checksum that was locally mailable in the data in O(1) time, so no cryptographic checksums like polay1305. Also, if the checksum were mailable in both the key and data, then you could mutate the key when mapping the file, so that the current file's checksum key would never exist on disk until you closed it, or maybe wrote it out as part of a commit operation and updated it in memory again. Any processes that want to share the file could share the in-memory key via another IPC channel. All this malleability means a weak checksum, but one could make up for that by making it larger.
from memmap-rs.
@burdges The &[u8]
from a memory map is backed by a file, which means it can be mutated by modifying the file itself.
from memmap-rs.
I miss-read your statement up thread as saying that marking the function as unsafe
changed the compiler's behavior in some more subtle way. In this case, the unsafe
just reminds the programmer that providing a safe abstraction requires more work though. Ok
from memmap-rs.
@burdges Ah I see ya, definitely didn't mean that! Apologies.
from memmap-rs.
Couldn't it return an &[Cell] in order to be safe?
from memmap-rs.
@ubsan Can you expand more on that suggestion? What is Cell
?
from memmap-rs.
I suppose @ubsan means std::cell::Cell<u8>
but to answer the question..
Cell<T>
is not Sync
. AtomicU8
is Sync
, but not stable, and rather expensive. If you want mmap
then you likely want a lighter weight courser grained locking mechanism anyways.
from memmap-rs.
Seems reasonable.
from memmap-rs.
Wouldn't simply opening File with write access to ensure that no other process could access it? (at least on Windows I think it would give you some guarantees)
from memmap-rs.
@DoumanAsh "at least on Windows" I think is the key part. :-)
from memmap-rs.
Sadly i'm not expert in regard of Linux, but at least there is some capabilities to set lock through (fcntl)[http://man7.org/linux/man-pages/man2/fcntl.2.html]
So i suppose it all comes to OS capabilities.
fcntl
can lock through Advisory record locking
, but doesn't guarantee safety i think?
Then there is non-posix thing that would work only on Linux
See section Open file description locks (non-POSIX)
Which is stil advisory...
Considering that Mandatory locking
is buggy according to above link, there is no way on Linux and therefore no way to guarantee safety on unix\linux platforms
from memmap-rs.
Right, advisory locking is opt-in. An application itself can use it to provide a form of internally consistent safety, but it can never prevent some other random process from mutating its &[u8]
.
from memmap-rs.
To summarize, are we saying that concurrently modifying a memmap'd file to which Rust holds a &[u8] or &mut [u8] is officially Undefined Behavior, but "probably" will not cause anything bad if you are just reading bytes out?
from memmap-rs.
@dtolnay I mean, yeah? but if you're not just reading bytes out, it'll probably bite you at some point. It's the same issue many old C++ codebases are having with newer compilers that assume you're not doing multithreaded accesses without atomics; this would be equivalent to multithreaded accesses without atomics.
from memmap-rs.
One thing that worries me is that we're changing the API to have a safe deref to &[u8]
on Mmap
, but make it unsafe to create a Mmap
from a file. @ubsan is this going to cause problems with the unsafe code guidelines, since it seems like it may allow the unsafe behavior (accessing shared mmap'd memory) to be far from the unsafe
annotation (creating the mmap)?
from memmap-rs.
@danburkert I mean, if you're not doing UB, you're fine ;)
from memmap-rs.
@ubsan Could you elaborate on that please?
from memmap-rs.
@BurntSushi unsafe
doesn't affect whether something is UB - at least, I'm really working hard to make that happen. Therefore, the location of the keyword doesn't matter. (and anyways, the kind of UB we're talking about here will never be defined, as it's equivalent to unprotected threaded alias).
Basically, you need to use atomic
accesses for every single access to a MAP_SHARED
file, if you're going to be changing the backing file. It doesn't matter where the unsafe
goes.
s/volatile/atomic, thanks talchas
from memmap-rs.
Would it be possible to open the file, i.e. have a file-handle, delete the file from the filesystem, check that only one file handle to that file exists by iterating through all open files and then return to the user?
That would probably cause problems if the file is not un-deleted before the program exits.
from memmap-rs.
@iqualfragile what specifically do you mean by un-deleted?
from memmap-rs.
well, you would have to restore the file to the file system when you close the mmap (on drop probably) otherwise the file is just gone. You got a file descriptor so you can re-add it to the file system.
from memmap-rs.
@iqualfragile what specifically do you mean by re-add it to the file system? I'm not aware of an API to do that (even OS/FS dependent), so I'm intrigued.
If there's no API and it requires writing the entire file back out you may as well not use a mmap at all.
from memmap-rs.
on linux you can access the list of file descriptors of a process in /proc/pid/fd/descriptor. you can then use ln -L to re-link the file (inode) to a name in the file system tree. doing so does not invoke a copy. i do not know how ln does it though.
so basically: have a file open in a program, optionally delete it, cd to /proc/pid/fd/, ls -lah to find which descriptor you want, ln -L it to where you want it, if you delete it you can restore it to the same name, otherwise you get two hardlinked files.
hope that makes it clear.
thinking about it this solution is probably not as unworkable as i thought when i first suggested it because an unclean shutdown of your program tends to leave an inconsistent state on disk anyway. so it might be ok/better if the file is gone instead.
from memmap-rs.
doing some further reading this is archived by using the rather new linkat() syscall, passing the AT_SYMLINK_FOLLOW flag.
from memmap-rs.
... an unclean shutdown of your program tends to leave an inconsistent state on disk anyway. so it might be ok/better if the file is gone instead.
Hmm, I don't think this an assumption that could really be baked in here. Inconsistent state doesn't necessarily mean irrecoverable state.
from memmap-rs.
true, but in that case you are building a database and are probably ok with the unsafe interface? Another even more specific option would be to snapshot/cp --reflink the file before deleting on filesystems that support it. Would effectively limit the full solution to work on linux using btrfs or zfs. (i think xfs is working on patches for cow and there are some patches to make ext do cow).
from memmap-rs.
Related Issues (20)
- Incorrectly uses usize instead of u64 for file offsets HOT 2
- Support madvise. HOT 4
- what happens if we exceed isize::MAX on 32-bit platforms? HOT 10
- Docs should say why file-backed maps are unsafe HOT 4
- Mmap fails for empty files on Windows HOT 21
- Allow mapping memory safely using `volatile` crate? HOT 4
- Use with DirectIO HOT 2
- Possible overflow while mapping large files on 32 bit systems HOT 1
- Add an `map_copy_read_only` API HOT 4
- Should anonymous mappings be MAP_SHARED by default?
- Allow anonymous RWX mappings HOT 1
- Implement StableDeref?
- Expose page_size pub function as part of crate api
- Support `Mmap` on platforms without real mmap support HOT 4
- Is this crate still maintained? HOT 15
- Error when mapping large files on Windows HOT 2
- Code 87 when attempting to write files HOT 1
- Handling IO-Errors during operation (which happen with network shares and connection drops) HOT 3
- ❌ mmap.binary_search EXC_BAD_ACCESS HOT 2
- Safety comments are missing HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from memmap-rs.