Comments (5)
@kevin-bates Hey Kevin, I've finally gotten more time to tackle this issue. I had some thoughts about this that I wanted your input on.
-
Is the
(st_dev, st_ino)
pair really guaranteed to grant file uniqueness on any multi-FS platform (ignoring NFS for now)? Reading the Linux man pages more closely, they state that an inode number is unique within a filesystem, not within a device. Depending on how precise they are with their nomenclature, this seems to imply it's possible for single device number to have multiple filesystems. Anecdotally, it looks like others are in agreement that the pair(st_dev, st_ino)
will uniquely identify a file on any system. However, given that I don't see this in the Linux man pages, this seems guaranteed to me only if there is no way of creating a separate filesystem on the same device without creating a new partition (which changes the device minor number). Is this true to the best of your knowledge? -
Is it known whether the device number for a partition is persistent across remounts and reboots? I can't find this behavior documented anywhere, and if this is not the case, we will have serious difficulty supporting multi-FS platforms. I made a SE question for this, hopefully it will get some responses.
-
I read that blog post on Linux NFS client handling device numbers you had linked in the original PR. Do NFS partitions remount on reboot? If so, that would also mean supporting NFS will be more difficult.
from jupyter_server_fileid.
Hi @dlqqq - It's been 30 years since I dealt with this area of the stack so my memory doesn't exactly serve me very well.
I agree there's some ambiguity regarding file system and st_dev
+ st_ino
uniqueness in the referenced man page. That said, it does appear that st_dev
can change across remounts and reboots (including NFS) - so it won't be reliable. It's also clear that only st_ino
is not sufficient. I think the various links and discussion illustrate the slippery slope that using stat
info can introduce, particularly given the myriad of filesystem types and implementations and Windows (:smile:).
Perhaps taking a higher-level approach where persisting the root directory, relative path, and hostname might make supporting this easier. This may also get things closer to ContentsManager-independence.
To address moves within a filesystem, perhaps (conditionally) capturing st_ino
(if the file resides in the file system) and using that as a hint when reconciling the file path might be helpful.
Would it help to drive insertion "on-demand" by inserting entries only when get_id()
doesn't find anything (and after checking if the inode entry exists to handle moves)? This might ease the pain of out-of-band updates by only persisting information "touched" by the application and not unconditionally.
from jupyter_server_fileid.
It's also clear that only st_ino is not sufficient. I think the various links and discussion illustrate the slippery slope that using stat info can introduce, particularly given the myriad of filesystem types and implementations and Windows (😄).
To add to this, FAT32/NTFS don't even really have inodes, but instead just have file indexes which appear, to my horror, possibly mutable over a file's lifetime. This is what is returned by os.stat().st_ino
on Windows. But that's a separate issue.
With respect to your other comments, the existing design handles out-of-band updates very intelligently, and it would be really difficult to make such major design changes without losing that functionality. However I'll keep those comments in mind if I'm unable to find a solution to supporting multi-FS platforms.
Thanks to Stack Overflow, it looks like blkid provides us with a persistent UUID per filesystem. Not sure how this works with NFS, but we'll see. I'm hopeful that this is exactly what we want.
from jupyter_server_fileid.
it looks like blkid provides us with a persistent UUID per filesystem.
Cool - so there's your primary key! 😉
from jupyter_server_fileid.
One immediate issue with the blkid
approach is that it seems specific only to certain Linux distributions. I don't know of its equivalents for OS X and Windows.
from jupyter_server_fileid.
Related Issues (20)
- prefix contents manager root to paths stored in ArbitraryFileIdManager HOT 1
- make `BaseFileIdManager` a true ABC
- Improve documentation about getting a LocalFileIdManager HOT 2
- support crtime on ext4
- remove mtime fallback HOT 4
- ArbitraryFileIdManager should have a configurable option to determine its "content root" HOT 3
- pypy compatibility
- Alternative file ID implementation
- create SQLite mixin
- jupyter-fileid script needs click HOT 6
- Tests are failing on archs like ppc64le or s390x HOT 5
- tmpfs support HOT 12
- Using sqlite in WAL mode causes file saving failure when used on JupyterHub on NFS HOT 15
- UNIQUE constraint failed when moving file HOT 9
- use a decorator to automatically commit after public methods
- Newly created notebook is not indexed HOT 5
- How to deal with file deletion? HOT 7
- Allow db_path to be set to ":memory:" HOT 3
- Add simple REST API to fetch a file ID given a file path HOT 1
- Setup Github publishing actions and PyPI with Jupyter Server Bot.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jupyter_server_fileid.