Giter Site home page Giter Site logo

Comments (8)

The-Compiler avatar The-Compiler commented on June 19, 2024 1

Well, I finally found something! [QTBUG-124496] QObject::timerEvent may be called before the timer elapsed - Qt Bug Tracker

I wonder if we have a single-shot QTimer running somewhere in the IPC code, which Qt confuses with our timeout timer? Or maybe QLocalS(ocket|erver) has one internally?

That also explains why anecdotally, "clicking around" in the UI made it reproduce faster, because that probably leads to more timers being used.

The bug also says Qt 6.5 in addition to 6.7, but the linked bug says 6.5.6 (or .5 in the comments?). So chances are it was a change that landed in Qt 6.7 but also was backported to 6.5 LTS?

Assuming it's still there in Qt 6.7.1, I suppose it might be a good idea to leave the warning logging in.

from qutebrowser.

The-Compiler avatar The-Compiler commented on June 19, 2024

I can somewhat reliably reproduce via running qutebrowser and this script:

import time
import json
import getpass
import random
from PyQt6.QtNetwork import QLocalSocket

socket = QLocalSocket()

while True:
    name = f"qutebrowser-{getpass.getuser()}"
    socket.connectToServer(name)
    ok = socket.waitForConnected(1000)
    assert ok

    cmd = random.choice([":version", ":tab-close"])
    json_data = {"args": [cmd], "target_arg": None, "version": "3.1.0", "protocol_version": 1}
    line = json.dumps(json_data) + '\n'
    data = line.encode("utf-8")
    socket.writeData(data)
    ok = socket.waitForBytesWritten(1000)
    if not ok:
        print("!", end="", flush=True)
        continue
    assert socket.error() == QLocalSocket.LocalSocketError.UnknownSocketError
    socket.disconnectFromServer()
    if socket.state() != QLocalSocket.LocalSocketState.UnconnectedState:
        ok = socket.waitForDisconnected(1000)
        assert ok
    print(".", end="", flush=True)
    time.sleep(random.uniform(0.01, 0.2))

It seems to be easier to reproduce when I do some clicking in the UI, e.g. switching tabs.

The logging I added in db3d15d might still be a red herring, because the ID is fetched when the timeout function runs, and not when the timeout is started...

from qutebrowser.

The-Compiler avatar The-Compiler commented on June 19, 2024

My best bet at getting a Windows C++ stack trace by inserting a breakpoint() and letting DebugDiag do its thing:

readline/Python stuff:

ntdll!NtDeviceIoControlFile+14	  
KERNELBASE!ConsoleCallServerGeneric+e9	  
KERNELBASE!ReadConsoleInternal+18d	  
KERNELBASE!ReadConsoleW+1a	  
python310!PyOS_Readline+542	  
python310!PyOS_Readline+38d	  
python310!PyOS_Readline+132	  
python310!Py_gitversion+63ba2	  
python310!Py_SetStandardStreamEncoding+807	  
python310!PyEval_EvalFrameDefault+1aa2	  
python310!PyEval_EvalFrameDefault+f65	  
python310!PyEval_EvalFrameDefault+f65	  
python310!PyEval_EvalFrameDefault+f65	  
python310!PyEval_EvalFrameDefault+f65	  
python310!PyEval_EvalFrameDefault+f65	  
python310!PyFunction_Vectorcall+87	  
python310!PyObject_GC_Malloc+1133	  
python310!PySys_GetSizeOf+141	  
python310!PyUnicode_ToUppercase+3ef	  
python310!PyUnicode_ToUppercase+20c	  
python310!PyUnicode_ToUppercase+186	  
python310!PyUnicode_ToUppercase+325	  
python310!PyEval_EvalFrameDefault+9338	  
python310!PyFunction_Vectorcall+87	  
python310!PyObject_GC_Malloc+1133	  
python310!Py_HandleSystemExit+80c	  
python310!Py_HandleSystemExit+58f	  
python310!PyVectorcall_Call+5c	  
python310!PyObject_Call+4f	  
QtCore!PyInit_QtCore+eb2f5	  
QtCore!PyInit_QtCore+eb73a	  
QtCore!PyInit_QtCore+eb7f4	  
QtCore!PyInit_QtCore+ed7ab	  
Qt6Core!QMetaObject::metacall+41	  
Qt6Core!QObject::qt_static_metacall+1842	  
Qt6Core!QMetaObject::activate+84	 

The interesting part:

Qt6Core!QTimer::timerEvent+cf	  
QtCore+66e56	  
Qt6Core!QObject::event+64	  
QtCore+590f3	  
Qt6Widgets!QApplicationPrivate::notify_helper+10e	  
Qt6Widgets!QApplication::notify+18c1	  
QtWidgets!PyInit_QtWidgets+1c3dcf	  
Qt6Core!QCoreApplication::notifyInternal2+11c	  
Qt6Core!QEventDispatcherWin32Private::sendTimerEvent+1a6	  
Qt6Core!QEventDispatcherWin32::event+1fc	  
Qt6Widgets!QApplicationPrivate::notify_helper+10e	  
Qt6Widgets!QApplication::notify+18c1	  
QtWidgets!PyInit_QtWidgets+1c3dcf	  
Qt6Core!QCoreApplication::notifyInternal2+11c	  
Qt6Core!QCoreApplicationPrivate::sendPostedEvents+209	  
Qt6Gui!QWindowsGuiEventDispatcher::sendPostedEvents+f	  
Qt6Core!QEventDispatcherWin32::processEvents+90	  
Qt6Gui!QWindowsGuiEventDispatcher::processEvents+19	  
Qt6Core!QEventLoop::exec+19f	  
Qt6Core!QCoreApplication::exec+15d	  

Init stuff:

QtWidgets!PyInit_QtWidgets+7c3e5	  
python310!PyObject_IsTrue+101	  
python310!PyEval_EvalFrameDefault+3ffe	  
python310!PyFunction_Vectorcall+87	  
python310!PyEval_EvalFrameDefault+707	  
python310!PyFunction_Vectorcall+87	  
python310!PyEval_EvalFrameDefault+1b88	  
python310!PyFunction_Vectorcall+87	  
python310!PyEval_EvalFrameDefault+1b88	  
python310!PyObject_FastCallDictTstate+15d	  
python310!PyEval_EvalCode+82	  
python310!PyNumber_Negative+45f	  
python310!PyNumber_Negative+35f	  
python310!PyEval_EvalFrameDefault+1aa2	  
python310!PyFunction_Vectorcall+87	  
python310!PyEval_EvalFrameDefault+707	  
python310!PyFunction_Vectorcall+87	  
python310!PyVectorcall_Call+5c	  
python310!PyObject_Call+4f	  
python310!PyArg_Parse+1ba	  
python310!Py_RunMain+13d	  
python310!Py_RunMain+15	  
python310!Py_Main+25	  
python+1230	  
kernel32!BaseThreadInitThunk+10	  
ntdll!RtlUserThreadStart+2b

from qutebrowser.

The-Compiler avatar The-Compiler commented on June 19, 2024

I tried to improve the logging in f9ae555 and 245d117... and everything points to Windows/Qt/PyQt just immediately calling the timeout function after starting the timer.

Why? How?? Why (hopefully!) only this timer? I have no clue! I was unable to find anything that sounded remotely related. No bugreports, no changes that look remotely relevant, nothing.

I think this is user-facing: It happens before the IPC connection ever gets read, and then we disconnect, so we'd (even if rarely) sometimes miss IPC commands.

The best I can come up with is:

@@ -168,6 +168,7 @@ def __init__(self, socketname, parent=None):
         self._timer = usertypes.Timer(self, 'ipc-timeout')
         self._timer.setInterval(READ_TIMEOUT)
         self._timer.timeout.connect(self.on_timeout)
+        self._timer_start_time = None
 
         if utils.is_windows:  # pragma: no cover
             self._atime_timer = None
@@ -261,6 +262,7 @@ def handle_connection(self):
         log.ipc.debug("Client connected (socket {}).".format(self._socket_id))
         self._socket = socket
         self._timer.start()
+        self._timer_start_time = time.monotonic()
         socket.readyRead.connect(self.on_ready_read)
         if socket.canReadLine():
             log.ipc.debug("We can read a line immediately.")
@@ -393,11 +395,23 @@ def on_ready_read(self):
 
         if self._socket is not None:
             self._timer.start()
+            self._timer_start_time = time.monotonic()
 
     @pyqtSlot()
     def on_timeout(self):
         """Cancel the current connection if it was idle for too long."""
         assert self._socket is not None
+        assert self._timer_start_time is not None
+        if (
+            time.monotonic() - self._timer_start_time < READ_TIMEOUT / 1000 / 10
+            and qtutils.version_check("6.7.0", exact=True, compiled=False)
+            and utils.is_windows
+        ):
+            # WORKAROUND for unknown Qt bug (?) where the timer triggers immediately
+            # https://github.com/qutebrowser/qutebrowser/issues/8191
+            log.ipc.debug("Ignoring early on_timeout call")
+            return
+
         log.ipc.error("IPC connection timed out "
                       "(socket {}).".format(self._socket_id))
         self._socket.disconnectFromServer()

I hate it, but I spent the better part of my day with this now, and I can't come up with anything better. I think any further spelunking would require building Qt from source and firing up a C++ debugger to see what happens in Qt... but I'm exhausted and would rather move on to something more interesting.

@toofar thoughts?

from qutebrowser.

toofar avatar toofar commented on June 19, 2024

That's very strange. Had a look at recent commits to QTimer and QLocalSocket and nothing jumped out at me. I had a thought that maybe it could be due to calls getting interleaved, like if handle_connection() gets called when we are in the middle of reading data from the socket. But there's nothing I could find in search results warning of anything like that and the docs for QTimer and QLocalSocket don't call it out either.

I guess if identifying and ignoring the situation is the best we can do then we can try that and look into it further if it comes up again.

A few questions:

  1. should we be setting singleShot() on this timer? The default is false right? (I tested and it didn't fix the issue though.)
  2. when the timer fires early, what does timer.remainingTime() report?
  3. assuming it is 0, should we be restarting the time when it has fired early?

I also worry that this will affect other timers, since we don't know what causes it. But since we haven't seen this issue on any of the linux test runs and people are already using 6.7 without complaint, I guess we hope it remains this isolated ¯_(ツ)_/¯

from qutebrowser.

The-Compiler avatar The-Compiler commented on June 19, 2024

That's very strange. Had a look at recent commits to QTimer and QLocalSocket and nothing jumped out at me.

Agreed on both parts. I also looked at changes in qeventdispatcher_win.cpp which has the Windows-specific implementation, and Qt changelogs, and PyQt changelogs, and even diffed PyQt... no dice. Nothing I could find in Qt issues either. I have no idea what's going on there, but it does seem pretty isolated.

I had a thought that maybe it could be due to calls getting interleaved, like if handle_connection() gets called when we are in the middle of reading data from the socket. But there's nothing I could find in search results warning of anything like that and the docs for QTimer and QLocalSocket don't call it out either.

In f9ae555 I tried making sure that every connection is a separate object with its own QTimer. I like the result of that refactoring, but for now I want to do something different than poking at IPC stuff. I'll probably open an issue to finish that later™. However, it didn't help either.

I guess if identifying and ignoring the situation is the best we can do then we can try that and look into it further if it comes up again.

Agreed. I want to wait for Qt 6.7.1 before pushing it. It's now planned for next week apparently... but once it's out, I hope we'll be able to easily try it out with PyQt, as it should be binary compatible. This will probably delay v3.2.0 even further, but eh... Qt 6.7.0 seems to have other annoying issues on Windows, so it's probably for the better.

A few questions:

  1. should we be setting singleShot() on this timer? The default is false right? (I tested and it didn't fix the issue though.)

Good point. It shouldn't matter much because in on_timeout we call self._socket.disconnectFromServer(), and that calls the on_disconnected slot, which stops the timer.

  1. when the timer fires early, what does timer.remainingTime() report?

0

  1. assuming it is 0, should we be restarting the time when it has fired early?

See above - in theory yes, but since it's currently a recurring timer, it does already restart anyways.

I also worry that this will affect other timers, since we don't know what causes it. But since we haven't seen this issue on any of the linux test runs and people are already using 6.7 without complaint, I guess we hope it remains this isolated ¯\(ツ)

With 01cc3df and 31ab343 I tried to add a generic check to all timers that logs a warning. The CI run passed. I'm somewhat tempted to commit it and see how it fares over time (hah!). Then again, in case this issue just magically vanishes with Qt 6.7.1 again, we might as well pretend it was never there.

from qutebrowser.

toofar avatar toofar commented on June 19, 2024

Ohh, good spotting! So on Windows the timers are backed by some Windows API and somehow the Windows things aren't cleared when the Qt ones are and then a new Qt one will re-use a still active windows thing? Hmm, that doesn't sound right. Does stopping a timer fire a timer event and that's what's being picked up by the re-used one? Maybe we could make a new timer object before killing the old one and that would resolve this? We do stop and start the timer a couple of times in this class.

Anyway, it's reassuring to know it's for sure a Windows only problem. And that Thiago is assigned to the ticket.

I tried to add a generic check to all timers

Damn I saw that and though "wait, we are trying to get a release out, have you heard of scope creep!". Then I looked at the commits and remembered we already have a QTimer wrapper 🤣
Looks pretty reasonable to me.

from qutebrowser.

toofar avatar toofar commented on June 19, 2024

I pushed some more stuff to your branch! So the current set of stuff on there is:

  • incrementing socket IDs in the IPC server
  • ignore early timer firing for read timeout timer in IPC server
  • detect and log early timer firing in usertypes.Timer
  • switch all QTimer objects to be usertypes.Timer objects

The changes I made were to

  • refactor that "is this an early timer" check to be only in usertypes.timer and have the IPC server use that one
  • remove the is_windows and Qt6.7.0 checks (the checks should be a no-op if there is no bug, so they are safe?)
  • fix tests and lint and add a couple of unit tests

In my opinion that's all good to go, although I'm not hung up on much of it if you think some of it is risky and want to drop it (and I haven't looked at the socket ID stuff at all). Also I probably left a bunch of typos in there as usual. Personally I would edit the commit history to have less incidental stuff in it though ;) (If only so we don't have to skip over commits when running git blame on the ci workflow file.)

I should have some time tomorrow to follow up on this or have a look at pdf.js, so let me know how you want to move forward.

from qutebrowser.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.