Deion We upgraded from lab 3.5 to 3.6, and that seems to h

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Using sqlite in WAL mode causes file saving failure when used on JupyterHub on NFS about jupyter_server_fileid HOT 15 CLOSED

jupyter-server commented on June 2, 2024

Using sqlite in WAL mode causes file saving failure when used on JupyterHub on NFS

from jupyter_server_fileid.

Comments (15)

dlqqq commented on June 2, 2024 2

Good observation. Yes, that's also reasonable. To summarize: the next steps would be to make journal mode configurable, default to DELETE for ArbitraryFIDM and WAL for LocalFIDM.

from jupyter_server_fileid.

kevin-bates commented on June 2, 2024 1

I couldn't find any logs indicating that file_id stuff is actually in the critical path for file saves.

Thanks for the update @yuvipanda - no worries, pure curiosity on my part (always trying to gain information).

I did find stacktraces implicating the signatures database though (2i2c-org/infrastructure#2245 (comment)), so perhaps that is to blame and not this?

Interesting. Looking at nbformat's use of SQLite, I see that it doesn't specify the journal_mode PRAGMA, but, as you state on that issue, the nbsignatures.db is on NFS, which implies that making the default journal_mode for ArbitraryFileIdManager default to DELETE isn't going to help if there's a general SQLite vs. NFS conflict any way.

Regardless, I think not using WAL on NFS is still the right call

Totally agree - making it completely configurable is more correct, but we may have other NFS-related battles per what we see in nbformat.

(Have a great vacation! 🌴)

from jupyter_server_fileid.

welcome commented on June 2, 2024

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.

You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

from jupyter_server_fileid.

dlqqq commented on June 2, 2024

Very sorry to hear that you're running into compatibility issues. I wasn't aware that the WAL journal mode fails on NFS.

Is fileid required even when not using RTC?

Not at the moment, though there are plans to build new extensions that depend on fileid.

Does the db need to persist across JupyterHub sessions? if not, we can put it in /tmp (like in 2i2c-org/infrastructure#2246).

cc @fcollonval

Well, it depends on the extensions that are using fileid. If RTC is persisting the fileids in their own persistent store (which I believe they do, but am not certain of this), then the fileid database also must be persisted. Otherwise, no.

Does it need the WAL?

It's required when separate threads and processes are calling FileIdManager methods. Otherwise, database reads in the main process block writes in other processes.

However, I don't think it's required just for a fresh JupyterLab 3.6.x instance to work with RTC. The most obvious solution for NFS support is to make the WAL journal mode a configurable defaulting to False.

This issue may be remedied by migrating away from using sqlite directly and instead opting for aiosqlite, which should handle this case on its own. Assuming aiosqlite doesn't require the WAL journal mode, we can preserve functionality for the multi-process case, while supporting NFS.

from jupyter_server_fileid.

yuvipanda commented on June 2, 2024

The most obvious solution for NFS support is to make the WAL journal mode a configurable defaulting to False.

+1, I think this would be helpful in preventing issues for other JupyterHub users as they migrate to JupyterLab 3.6! We found it pretty quick but might be more difficult and frustrating for others, as this dependency comes through quite a few layers :)

I appreciate all the work on this! \o/

from jupyter_server_fileid.

kevin-bates commented on June 2, 2024

I was writing this up when these responses came in. Rather than on or off, would it make sense to provide different defaults based on the configured class and allow all values (via configuration)?

Hi @yuvipanda - I agree that WAL is the wrong value for the ArbitraryFileIdManager journal mode given this class is supposed to be file-system agnostic.

Seems like a db_journal_mode trait is desirable here with the default for ArbitraryFileIdManager being DELETE (i.e., default journal mode) and the default for LocalFileIdManager being WAL - which is already file-system sensitive. This way operators/admins can "tune" their transaction journaling however they like but OOTB values "just work".

from jupyter_server_fileid.

dlqqq commented on June 2, 2024

If I cut a new release of jupyter_server_fileid to revert the default journal mode, where does the target version need to be bumped? IOW, in which of the JupyterLab transitive dependencies is jupyter_server_fileid explicitly listed in pyproject.toml?

from jupyter_server_fileid.

dlqqq commented on June 2, 2024

@kevin-bates

Rather than on or off, would it make sense to provide different defaults based on the configured class and allow all values (via configuration)?

Yes, that seems very reasonable. My apologies, I didn't get as much coffee today as I should have.

from jupyter_server_fileid.

dlqqq commented on June 2, 2024

The main idea is that the default should not be the WAL journal mode however. I think we should try to support NFS out-of-the-box given the popularity of JupyterHub.

from jupyter_server_fileid.

kevin-bates commented on June 2, 2024

Rather than on or off, would it make sense to provide different defaults based on the configured class and allow all values (via configuration)?

Yes, that seems very reasonable. My apologies, I didn't get as much coffee today as I should have.

The main idea is that the default should not be the WAL journal mode however. I think we should try to support NFS out-of-the-box given the popularity of JupyterHub.

OK - even with LocalFileIdManager? (So DELETE is the default for both classes?)

I ask for clarification because your previous comments contradict each other. Seems like the default for LocalFIleIdManager could still be WAL given it is not the default class and its use is opt-in.

from jupyter_server_fileid.

dlqqq commented on June 2, 2024

This change should be fairly straight-forward. I'm currently really busy trying to get an initial release of Jupyter AI setup; can anybody else own this? If not, I can still tackle this by the end of the week.

from jupyter_server_fileid.

kevin-bates commented on June 2, 2024

I can go ahead and toss this in.

Regarding the dependency resolution/mystery...

It turns out that the package jupyter_server_ydoc is what is pulling in fileID. However, it also appears that jupyter_server_ydoc has been both moved and renamed from the juptyer-server org to the jupyterlab org and is now called juptyerlab/jupyter_collaboration which can be inferred from the releases page.

from jupyter_server_fileid.

kevin-bates commented on June 2, 2024

Unfortunately, this means all file save operations via contentsmanager fail! So this is in the critical path for all file saves, new notebook creations, etc - even when not using RTC.

@yuvipanda - I'm not immediately seeing the connection between this issue and the general failures to persist content from the included stack trace. Could you please shed light on how the inability to initialize the fileID manager interferes with opening a notebook file?

from jupyter_server_fileid.

dlqqq commented on June 2, 2024

@yuvipanda Can you try installing the fileid 0.8.0 release manually and see if this fixes your issue?

from jupyter_server_fileid.

yuvipanda commented on June 2, 2024

@kevin-bates i did a little more digging, and unfortunately I couldn't find any logs indicating that file_id stuff is actually in the critical path for file saves. I did find stacktraces implicating the signatures database though (2i2c-org/infrastructure#2245 (comment)), so perhaps that is to blame and not this?

Regardless, I think not using WAL on NFS is still the right call, but perhaps this is not as serious as I had feared. Thank you for digging in!

@dlqqq I'm going to be on vacation for a couple of weeks so won't really be able to do much until I'm back.

from jupyter_server_fileid.

Using sqlite in WAL mode causes file saving failure when used on JupyterHub on NFS about jupyter_server_fileid HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent