As <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

use st_dev in conjunction with ino about jupyter_server_fileid HOT 5 OPEN

jupyter-server commented on June 2, 2024

use st_dev in conjunction with ino

from jupyter_server_fileid.

Comments (5)

dlqqq commented on June 2, 2024

@kevin-bates Hey Kevin, I've finally gotten more time to tackle this issue. I had some thoughts about this that I wanted your input on.

Is the (st_dev, st_ino) pair really guaranteed to grant file uniqueness on any multi-FS platform (ignoring NFS for now)? Reading the Linux man pages more closely, they state that an inode number is unique within a filesystem, not within a device. Depending on how precise they are with their nomenclature, this seems to imply it's possible for single device number to have multiple filesystems. Anecdotally, it looks like others are in agreement that the pair (st_dev, st_ino) will uniquely identify a file on any system. However, given that I don't see this in the Linux man pages, this seems guaranteed to me only if there is no way of creating a separate filesystem on the same device without creating a new partition (which changes the device minor number). Is this true to the best of your knowledge?
Is it known whether the device number for a partition is persistent across remounts and reboots? I can't find this behavior documented anywhere, and if this is not the case, we will have serious difficulty supporting multi-FS platforms. I made a SE question for this, hopefully it will get some responses.
I read that blog post on Linux NFS client handling device numbers you had linked in the original PR. Do NFS partitions remount on reboot? If so, that would also mean supporting NFS will be more difficult.

from jupyter_server_fileid.

kevin-bates commented on June 2, 2024

Hi @dlqqq - It's been 30 years since I dealt with this area of the stack so my memory doesn't exactly serve me very well.

I agree there's some ambiguity regarding file system and st_dev + st_ino uniqueness in the referenced man page. That said, it does appear that st_dev can change across remounts and reboots (including NFS) - so it won't be reliable. It's also clear that only st_ino is not sufficient. I think the various links and discussion illustrate the slippery slope that using stat info can introduce, particularly given the myriad of filesystem types and implementations and Windows (:smile:).

Perhaps taking a higher-level approach where persisting the root directory, relative path, and hostname might make supporting this easier. This may also get things closer to ContentsManager-independence.

To address moves within a filesystem, perhaps (conditionally) capturing st_ino (if the file resides in the file system) and using that as a hint when reconciling the file path might be helpful.

Would it help to drive insertion "on-demand" by inserting entries only when get_id() doesn't find anything (and after checking if the inode entry exists to handle moves)? This might ease the pain of out-of-band updates by only persisting information "touched" by the application and not unconditionally.

from jupyter_server_fileid.

dlqqq commented on June 2, 2024

It's also clear that only st_ino is not sufficient. I think the various links and discussion illustrate the slippery slope that using stat info can introduce, particularly given the myriad of filesystem types and implementations and Windows (😄).

To add to this, FAT32/NTFS don't even really have inodes, but instead just have file indexes which appear, to my horror, possibly mutable over a file's lifetime. This is what is returned by os.stat().st_ino on Windows. But that's a separate issue.

With respect to your other comments, the existing design handles out-of-band updates very intelligently, and it would be really difficult to make such major design changes without losing that functionality. However I'll keep those comments in mind if I'm unable to find a solution to supporting multi-FS platforms.

Thanks to Stack Overflow, it looks like blkid provides us with a persistent UUID per filesystem. Not sure how this works with NFS, but we'll see. I'm hopeful that this is exactly what we want.

from jupyter_server_fileid.

kevin-bates commented on June 2, 2024

it looks like blkid provides us with a persistent UUID per filesystem.

Cool - so there's your primary key! 😉

from jupyter_server_fileid.

dlqqq commented on June 2, 2024

One immediate issue with the blkid approach is that it seems specific only to certain Linux distributions. I don't know of its equivalents for OS X and Windows.

from jupyter_server_fileid.

use st_dev in conjunction with ino about jupyter_server_fileid HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent