Comments (15)
Good observation. Yes, that's also reasonable. To summarize: the next steps would be to make journal mode configurable, default to DELETE
for ArbitraryFIDM
and WAL
for LocalFIDM
.
from jupyter_server_fileid.
I couldn't find any logs indicating that file_id stuff is actually in the critical path for file saves.
Thanks for the update @yuvipanda - no worries, pure curiosity on my part (always trying to gain information).
I did find stacktraces implicating the signatures database though (2i2c-org/infrastructure#2245 (comment)), so perhaps that is to blame and not this?
Interesting. Looking at nbformat
's use of SQLite, I see that it doesn't specify the journal_mode
PRAGMA, but, as you state on that issue, the nbsignatures.db
is on NFS, which implies that making the default journal_mode
for ArbitraryFileIdManager
default to DELETE
isn't going to help if there's a general SQLite vs. NFS conflict any way.
Regardless, I think not using WAL on NFS is still the right call
Totally agree - making it completely configurable is more correct, but we may have other NFS-related battles per what we see in nbformat
.
(Have a great vacation! 🌴)
from jupyter_server_fileid.
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋
Welcome to the Jupyter community! 🎉
from jupyter_server_fileid.
Very sorry to hear that you're running into compatibility issues. I wasn't aware that the WAL journal mode fails on NFS.
Is fileid required even when not using RTC?
Not at the moment, though there are plans to build new extensions that depend on fileid.
Does the db need to persist across JupyterHub sessions? if not, we can put it in /tmp (like in 2i2c-org/infrastructure#2246).
cc @fcollonval
Well, it depends on the extensions that are using fileid. If RTC is persisting the fileids in their own persistent store (which I believe they do, but am not certain of this), then the fileid database also must be persisted. Otherwise, no.
Does it need the WAL?
It's required when separate threads and processes are calling FileIdManager methods. Otherwise, database reads in the main process block writes in other processes.
However, I don't think it's required just for a fresh JupyterLab 3.6.x instance to work with RTC. The most obvious solution for NFS support is to make the WAL journal mode a configurable defaulting to False
.
This issue may be remedied by migrating away from using sqlite directly and instead opting for aiosqlite, which should handle this case on its own. Assuming aiosqlite doesn't require the WAL journal mode, we can preserve functionality for the multi-process case, while supporting NFS.
from jupyter_server_fileid.
The most obvious solution for NFS support is to make the WAL journal mode a configurable defaulting to
False
.
+1, I think this would be helpful in preventing issues for other JupyterHub users as they migrate to JupyterLab 3.6! We found it pretty quick but might be more difficult and frustrating for others, as this dependency comes through quite a few layers :)
I appreciate all the work on this! \o/
from jupyter_server_fileid.
I was writing this up when these responses came in. Rather than on or off, would it make sense to provide different defaults based on the configured class and allow all values (via configuration)?
Hi @yuvipanda - I agree that
WAL
is the wrong value for theArbitraryFileIdManager
journal mode given this class is supposed to be file-system agnostic.Seems like a
db_journal_mode
trait is desirable here with the default forArbitraryFileIdManager
beingDELETE
(i.e., default journal mode) and the default forLocalFileIdManager
beingWAL
- which is already file-system sensitive. This way operators/admins can "tune" their transaction journaling however they like but OOTB values "just work".
from jupyter_server_fileid.
If I cut a new release of jupyter_server_fileid
to revert the default journal mode, where does the target version need to be bumped? IOW, in which of the JupyterLab transitive dependencies is jupyter_server_fileid
explicitly listed in pyproject.toml
?
from jupyter_server_fileid.
Rather than on or off, would it make sense to provide different defaults based on the configured class and allow all values (via configuration)?
Yes, that seems very reasonable. My apologies, I didn't get as much coffee today as I should have.
from jupyter_server_fileid.
The main idea is that the default should not be the WAL journal mode however. I think we should try to support NFS out-of-the-box given the popularity of JupyterHub.
from jupyter_server_fileid.
Rather than on or off, would it make sense to provide different defaults based on the configured class and allow all values (via configuration)?
Yes, that seems very reasonable. My apologies, I didn't get as much coffee today as I should have.
The main idea is that the default should not be the WAL journal mode however. I think we should try to support NFS out-of-the-box given the popularity of JupyterHub.
OK - even with LocalFileIdManager
? (So DELETE
is the default for both classes?)
I ask for clarification because your previous comments contradict each other. Seems like the default for LocalFIleIdManager
could still be WAL
given it is not the default class and its use is opt-in.
from jupyter_server_fileid.
This change should be fairly straight-forward. I'm currently really busy trying to get an initial release of Jupyter AI setup; can anybody else own this? If not, I can still tackle this by the end of the week.
from jupyter_server_fileid.
I can go ahead and toss this in.
Regarding the dependency resolution/mystery...
It turns out that the package jupyter_server_ydoc
is what is pulling in fileID. However, it also appears that jupyter_server_ydoc
has been both moved and renamed from the juptyer-server org to the jupyterlab org and is now called juptyerlab/jupyter_collaboration
which can be inferred from the releases page.
from jupyter_server_fileid.
Unfortunately, this means all file save operations via contentsmanager fail! So this is in the critical path for all file saves, new notebook creations, etc - even when not using RTC.
@yuvipanda - I'm not immediately seeing the connection between this issue and the general failures to persist content from the included stack trace. Could you please shed light on how the inability to initialize the fileID manager interferes with opening a notebook file?
from jupyter_server_fileid.
@yuvipanda Can you try installing the fileid 0.8.0 release manually and see if this fixes your issue?
from jupyter_server_fileid.
@kevin-bates i did a little more digging, and unfortunately I couldn't find any logs indicating that file_id stuff is actually in the critical path for file saves. I did find stacktraces implicating the signatures database though (2i2c-org/infrastructure#2245 (comment)), so perhaps that is to blame and not this?
Regardless, I think not using WAL on NFS is still the right call, but perhaps this is not as serious as I had feared. Thank you for digging in!
@dlqqq I'm going to be on vacation for a couple of weeks so won't really be able to do much until I'm back.
from jupyter_server_fileid.
Related Issues (20)
- prefix contents manager root to paths stored in ArbitraryFileIdManager HOT 1
- make `BaseFileIdManager` a true ABC
- Improve documentation about getting a LocalFileIdManager HOT 2
- support crtime on ext4
- remove mtime fallback HOT 4
- ArbitraryFileIdManager should have a configurable option to determine its "content root" HOT 3
- pypy compatibility
- Alternative file ID implementation
- create SQLite mixin
- jupyter-fileid script needs click HOT 6
- Tests are failing on archs like ppc64le or s390x HOT 5
- tmpfs support HOT 12
- UNIQUE constraint failed when moving file HOT 9
- use a decorator to automatically commit after public methods
- Newly created notebook is not indexed HOT 5
- How to deal with file deletion? HOT 7
- Allow db_path to be set to ":memory:" HOT 3
- Add simple REST API to fetch a file ID given a file path HOT 1
- Setup Github publishing actions and PyPI with Jupyter Server Bot.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jupyter_server_fileid.