Giter Site home page Giter Site logo

Comments (7)

dlqqq avatar dlqqq commented on June 9, 2024 2

@ellisonbg and I were able to discuss this issue at length. Our proposal is that we can simply have file_id_manager_class be a trait on FileIdExtension, defined as follows:

    # in FileIdExtension
    file_id_manager_class = Type(klass=AbstractFileIdManager)
    
    @default(file_id_manager_class)
    def _default_file_id_manager_class():
      return LocalFileIdManager if isinstance(self.settings["contents_manager"], FileContentsManager) else ArbitraryFileIdManager
    
    def initialize_settings(...):
      ...
      self.settings["file_id_manager"] = self.file_id_manager_class(...)

With this proposal, the following user flows are covered out-of-the-box:

  1. User uses default FileContentsManager => automatically default to using LocalFileIdManager because we know their contents manager only interacts with local filesystem.
  2. User installs custom contents manager with no file ID manager specified in Jupyter config => automatically default to ArbitraryFileIdManager which only listens to emitted events and makes no assumptions about the local filesystem
  3. User installs custom contents manager with custom file ID manager specified in Jupyter config => FileIdExtension inherits that trait and uses their custom file ID manager

Packages that provide a custom contents manager would specify their custom file ID manager in their Jupyter config file, or leave it blank. For example, to specify a custom file ID manager YYY for contents manager XXX:

{
  "ServerApp": {
    "contents_manager_class": XXX
  },
  "FileIdExtension": {
    "file_id_manager_class": YYY
  }
}

The only shortcoming is if the user has multiple contents managers installed locally, then passing --ServerApp.contents_manager_class=XXX is insufficient, as the user should also be passing the corresponding file ID manager. But our rationale is that to guarantee that XXX works, the user should pass the corresponding Jupyter config file it ships with anyways, as the package providing XXX may have additional configuration required for it to work besides setting FileIdExtension.file_id_manager_class.

The benefits to this approach are several:

  • No changes are necessary to Jupyter server
  • Custom contents managers can specify the file ID manager they need out-of-the-box
  • Jupyter server does not require a dependency on jupyter_server_fileid

from jupyter_server_fileid.

dlqqq avatar dlqqq commented on June 9, 2024

Hmm, if there's no way to retrieve some inode number equivalent from a filesystem (any immutable attribute that's preserved on moves), then the FileIdManager is essentially useless for anything out-of-band and 80% of the logic can be eliminated.

My proposed series of changes to help address this:

  1. add an abstract class AbstractFileIdManager (see #1)
  2. rename FileIdManager => LocalFileIdManager
  3. write a custom implementation ArbitraryFileIdManager that works on arbitrary filesystems, and just listens to contents manager filesystem events exclusively to track changes. No effort is made to track out-of-band filesystem ops.

One problem with this is that manually specifying a custom contents manager would require you to specify changing the file ID manager instance manually as well. Not sure if this is an issue we want to solve. We could either:

  • Have the file ID extension check if the contents manager is local simply by checking self.settings["contents_manager].__class__.__name__, and then pick the right file ID manager class to instantiate.
  • Add a exclusively_local trait on the abstract contents manager class that informs other extensions of whether it deals exclusively with local filesystems. File ID extension checks for the truthiness of this trait and picks the right file ID manager class to instantiate.

from jupyter_server_fileid.

ellisonbg avatar ellisonbg commented on June 9, 2024

@dlqqq thanks for writing this up. I have thought about the design a bit more since we talked and I think this approach looks good. This will enable us to evolve the file id stuff separate from Jupyter Server, which I think will help at this point.

from jupyter_server_fileid.

ellisonbg avatar ellisonbg commented on June 9, 2024

Thinking a bit more overnight. If there are N RTC clients, those clients would need to poll get_paths() on a regular interval, which causes a lot of issues. I think a better approach is as follows:

  • The file id manager should call sync_all in a subprocess on a regular interval.
  • Each time there is a path change, it should publish an event on the event bus.

This way, each client won't have to poll the server to get all these updates.

from jupyter_server_fileid.

kevin-bates avatar kevin-bates commented on June 9, 2024

Hi @ellisonbg - I agree with your last comment (background sync). Was this intended for the conversation on #20?

from jupyter_server_fileid.

ellisonbg avatar ellisonbg commented on June 9, 2024

LOL, yes, still waking up. I will move that comments over to #20 thanks.

from jupyter_server_fileid.

davidbrochart avatar davidbrochart commented on June 9, 2024

I opened #24.

from jupyter_server_fileid.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.