Giter Site home page Giter Site logo

Comments (5)

dlqqq avatar dlqqq commented on June 2, 2024

@kevin-bates Hey Kevin, I've finally gotten more time to tackle this issue. I had some thoughts about this that I wanted your input on.

  1. Is the (st_dev, st_ino) pair really guaranteed to grant file uniqueness on any multi-FS platform (ignoring NFS for now)? Reading the Linux man pages more closely, they state that an inode number is unique within a filesystem, not within a device. Depending on how precise they are with their nomenclature, this seems to imply it's possible for single device number to have multiple filesystems. Anecdotally, it looks like others are in agreement that the pair (st_dev, st_ino) will uniquely identify a file on any system. However, given that I don't see this in the Linux man pages, this seems guaranteed to me only if there is no way of creating a separate filesystem on the same device without creating a new partition (which changes the device minor number). Is this true to the best of your knowledge?

  2. Is it known whether the device number for a partition is persistent across remounts and reboots? I can't find this behavior documented anywhere, and if this is not the case, we will have serious difficulty supporting multi-FS platforms. I made a SE question for this, hopefully it will get some responses.

  3. I read that blog post on Linux NFS client handling device numbers you had linked in the original PR. Do NFS partitions remount on reboot? If so, that would also mean supporting NFS will be more difficult.

from jupyter_server_fileid.

kevin-bates avatar kevin-bates commented on June 2, 2024

Hi @dlqqq - It's been 30 years since I dealt with this area of the stack so my memory doesn't exactly serve me very well.

I agree there's some ambiguity regarding file system and st_dev + st_ino uniqueness in the referenced man page. That said, it does appear that st_dev can change across remounts and reboots (including NFS) - so it won't be reliable. It's also clear that only st_ino is not sufficient. I think the various links and discussion illustrate the slippery slope that using stat info can introduce, particularly given the myriad of filesystem types and implementations and Windows (:smile:).

Perhaps taking a higher-level approach where persisting the root directory, relative path, and hostname might make supporting this easier. This may also get things closer to ContentsManager-independence.

To address moves within a filesystem, perhaps (conditionally) capturing st_ino (if the file resides in the file system) and using that as a hint when reconciling the file path might be helpful.

Would it help to drive insertion "on-demand" by inserting entries only when get_id() doesn't find anything (and after checking if the inode entry exists to handle moves)? This might ease the pain of out-of-band updates by only persisting information "touched" by the application and not unconditionally.

from jupyter_server_fileid.

dlqqq avatar dlqqq commented on June 2, 2024

It's also clear that only st_ino is not sufficient. I think the various links and discussion illustrate the slippery slope that using stat info can introduce, particularly given the myriad of filesystem types and implementations and Windows (😄).

To add to this, FAT32/NTFS don't even really have inodes, but instead just have file indexes which appear, to my horror, possibly mutable over a file's lifetime. This is what is returned by os.stat().st_ino on Windows. But that's a separate issue.

With respect to your other comments, the existing design handles out-of-band updates very intelligently, and it would be really difficult to make such major design changes without losing that functionality. However I'll keep those comments in mind if I'm unable to find a solution to supporting multi-FS platforms.

Thanks to Stack Overflow, it looks like blkid provides us with a persistent UUID per filesystem. Not sure how this works with NFS, but we'll see. I'm hopeful that this is exactly what we want.

from jupyter_server_fileid.

kevin-bates avatar kevin-bates commented on June 2, 2024

it looks like blkid provides us with a persistent UUID per filesystem.

Cool - so there's your primary key! 😉

from jupyter_server_fileid.

dlqqq avatar dlqqq commented on June 2, 2024

One immediate issue with the blkid approach is that it seems specific only to certain Linux distributions. I don't know of its equivalents for OS X and Windows.

from jupyter_server_fileid.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.