Comments (7)
@ellisonbg and I were able to discuss this issue at length. Our proposal is that we can simply have file_id_manager_class
be a trait on FileIdExtension
, defined as follows:
# in FileIdExtension
file_id_manager_class = Type(klass=AbstractFileIdManager)
@default(file_id_manager_class)
def _default_file_id_manager_class():
return LocalFileIdManager if isinstance(self.settings["contents_manager"], FileContentsManager) else ArbitraryFileIdManager
def initialize_settings(...):
...
self.settings["file_id_manager"] = self.file_id_manager_class(...)
With this proposal, the following user flows are covered out-of-the-box:
- User uses default
FileContentsManager
=> automatically default to usingLocalFileIdManager
because we know their contents manager only interacts with local filesystem. - User installs custom contents manager with no file ID manager specified in Jupyter config => automatically default to
ArbitraryFileIdManager
which only listens to emitted events and makes no assumptions about the local filesystem - User installs custom contents manager with custom file ID manager specified in Jupyter config =>
FileIdExtension
inherits that trait and uses their custom file ID manager
Packages that provide a custom contents manager would specify their custom file ID manager in their Jupyter config file, or leave it blank. For example, to specify a custom file ID manager YYY
for contents manager XXX
:
{
"ServerApp": {
"contents_manager_class": XXX
},
"FileIdExtension": {
"file_id_manager_class": YYY
}
}
The only shortcoming is if the user has multiple contents managers installed locally, then passing --ServerApp.contents_manager_class=XXX
is insufficient, as the user should also be passing the corresponding file ID manager. But our rationale is that to guarantee that XXX
works, the user should pass the corresponding Jupyter config file it ships with anyways, as the package providing XXX
may have additional configuration required for it to work besides setting FileIdExtension.file_id_manager_class
.
The benefits to this approach are several:
- No changes are necessary to Jupyter server
- Custom contents managers can specify the file ID manager they need out-of-the-box
- Jupyter server does not require a dependency on
jupyter_server_fileid
from jupyter_server_fileid.
Hmm, if there's no way to retrieve some inode number equivalent from a filesystem (any immutable attribute that's preserved on moves), then the FileIdManager
is essentially useless for anything out-of-band and 80% of the logic can be eliminated.
My proposed series of changes to help address this:
- add an abstract class
AbstractFileIdManager
(see #1) - rename
FileIdManager
=>LocalFileIdManager
- write a custom implementation
ArbitraryFileIdManager
that works on arbitrary filesystems, and just listens to contents manager filesystem events exclusively to track changes. No effort is made to track out-of-band filesystem ops.
One problem with this is that manually specifying a custom contents manager would require you to specify changing the file ID manager instance manually as well. Not sure if this is an issue we want to solve. We could either:
- Have the file ID extension check if the contents manager is local simply by checking
self.settings["contents_manager].__class__.__name__
, and then pick the right file ID manager class to instantiate. - Add a
exclusively_local
trait on the abstract contents manager class that informs other extensions of whether it deals exclusively with local filesystems. File ID extension checks for the truthiness of this trait and picks the right file ID manager class to instantiate.
from jupyter_server_fileid.
@dlqqq thanks for writing this up. I have thought about the design a bit more since we talked and I think this approach looks good. This will enable us to evolve the file id stuff separate from Jupyter Server, which I think will help at this point.
from jupyter_server_fileid.
Thinking a bit more overnight. If there are N RTC clients, those clients would need to poll get_paths()
on a regular interval, which causes a lot of issues. I think a better approach is as follows:
- The file id manager should call
sync_all
in a subprocess on a regular interval. - Each time there is a path change, it should publish an event on the event bus.
This way, each client won't have to poll the server to get all these updates.
from jupyter_server_fileid.
Hi @ellisonbg - I agree with your last comment (background sync). Was this intended for the conversation on #20?
from jupyter_server_fileid.
LOL, yes, still waking up. I will move that comments over to #20 thanks.
from jupyter_server_fileid.
I opened #24.
from jupyter_server_fileid.
Related Issues (20)
- prefix contents manager root to paths stored in ArbitraryFileIdManager HOT 1
- make `BaseFileIdManager` a true ABC
- Improve documentation about getting a LocalFileIdManager HOT 2
- support crtime on ext4
- remove mtime fallback HOT 4
- ArbitraryFileIdManager should have a configurable option to determine its "content root" HOT 3
- pypy compatibility
- Alternative file ID implementation
- create SQLite mixin
- jupyter-fileid script needs click HOT 6
- Tests are failing on archs like ppc64le or s390x HOT 5
- tmpfs support HOT 12
- Using sqlite in WAL mode causes file saving failure when used on JupyterHub on NFS HOT 15
- UNIQUE constraint failed when moving file HOT 9
- use a decorator to automatically commit after public methods
- Newly created notebook is not indexed HOT 5
- How to deal with file deletion? HOT 7
- Allow db_path to be set to ":memory:" HOT 3
- Add simple REST API to fetch a file ID given a file path HOT 1
- Setup Github publishing actions and PyPI with Jupyter Server Bot.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jupyter_server_fileid.