Giter Site home page Giter Site logo

Comments (35)

X547 avatar X547 commented on May 22, 2024 1

https://temp.sh/fTWwm/haiku-rvvm-2022-11-23.7z

from rvvm.

X547 avatar X547 commented on May 22, 2024 1

This also probably should be better handler by the Haiku driver - faulty devices should not cause kernel crashes when they are avoidable, and we want some warning, right?

Filled bug report: https://dev.haiku-os.org/ticket/18093

from rvvm.

X547 avatar X547 commented on May 22, 2024 1

Yeah, that's it. You should use csrrs/csrrc instructions for setting/clearing bits in IP register. Otherwise you are subject to an interrupt data race in between RMW operation on it.

Seems fixed with following patch:

diff --git a/headers/private/system/arch/riscv64/arch_cpu_defs.h b/headers/private/system/arch/riscv64/arch_cpu_defs.h
index 67b8c96307..abf993f7c7 100644
--- a/headers/private/system/arch/riscv64/arch_cpu_defs.h
+++ b/headers/private/system/arch/riscv64/arch_cpu_defs.h
@@ -222,6 +222,10 @@ static B_ALWAYS_INLINE uint64 Mip() {
 	uint64 x; asm volatile("csrr %0, mip" : "=r" (x)); return x;}
 static B_ALWAYS_INLINE void SetMip(uint64 x) {
 	asm volatile("csrw mip, %0" : : "r" (x));}
+static B_ALWAYS_INLINE void SetBitsMip(uint64 x) {
+	asm volatile("csrs mip, %0" : : "r" (x));}
+static B_ALWAYS_INLINE void ClearBitsMip(uint64 x) {
+	asm volatile("csrc mip, %0" : : "r" (x));}
 static B_ALWAYS_INLINE uint64 Sip() {
 	uint64 x; asm volatile("csrr %0, sip" : "=r" (x)); return x;}
 static B_ALWAYS_INLINE void SetSip(uint64 x) {
@@ -236,6 +240,10 @@ static B_ALWAYS_INLINE uint64 Mie() {
 	uint64 x; asm volatile("csrr %0, mie" : "=r" (x)); return x;}
 static B_ALWAYS_INLINE void SetMie(uint64 x) {
 	asm volatile("csrw mie, %0" : : "r" (x));}
+static B_ALWAYS_INLINE void SetBitsMie(uint64 x) {
+	asm volatile("csrs mie, %0" : : "r" (x));}
+static B_ALWAYS_INLINE void ClearBitsMie(uint64 x) {
+	asm volatile("csrc mie, %0" : : "r" (x));}
 
 // exception delegation
 static B_ALWAYS_INLINE uint64 Medeleg() {
diff --git a/src/system/boot/platform/riscv/traps.cpp b/src/system/boot/platform/riscv/traps.cpp
index 649ec29ee3..968e9fec03 100644
--- a/src/system/boot/platform/riscv/traps.cpp
+++ b/src/system/boot/platform/riscv/traps.cpp
@@ -128,12 +128,12 @@ MTrap(iframe* frame)
 						enable, frame->a2);
 					*/
 					// dprintf("  mtime: %" B_PRIu64 "\n", gClintRegs->mTime);
-					SetMip(Mip() & ~(1 << sTimerInt));
+					ClearBitsMip(1 << sTimerInt);
 					if (!enable) {
-						SetMie(Mie() & ~(1 << mTimerInt));
+						ClearBitsMie(1 << mTimerInt);
 					} else {
 						gClintRegs->mtimecmp[0] = frame->a2;
-						SetMie(Mie() | (1 << mTimerInt));
+						SetBitsMie(1 << mTimerInt);
 					}
 					frame->a0 = B_OK;
 					return;
@@ -145,8 +145,8 @@ MTrap(iframe* frame)
 			break;
 		}
 		case causeInterrupt + mTimerInt: {
-			SetMie(Mie() & ~(1 << mTimerInt));
-			SetMip(Mip() | (1 << sTimerInt));
+			ClearBitsMie(1 << mTimerInt);
+			SetBitsMip(1 << sTimerInt);
 			return;
 		}
 	}

haiku_loader.riscv.zip

from rvvm.

X547 avatar X547 commented on May 22, 2024 1

Btw, is this bootloader planned to be for Haiku only?

Yes. In theory it can be improved to load Linux/FreeBSD kernel, construct kernel args from menu etc., but it will become out of scope of Haiku project and need fork.

from rvvm.

X547 avatar X547 commented on May 22, 2024
altps2_interrupt_unlocked(irq: 8, irq_enabled: 1, irq_pending: 0)
is_hart_busy(0): 0
riscv_interrupt
AlteraPs2::HandleInterrupt
 05 01 00
altps2_interrupt_unlocked(irq: 8, irq_enabled: 1, irq_pending: 0)
is_hart_busy(0): 0
riscv_interrupt
AlteraPs2::HandleInterrupt
 08 00 00
altps2_interrupt_unlocked(irq: 8, irq_enabled: 1, irq_pending: 0)
is_hart_busy(0): 0
riscv_interrupt
altps2_interrupt_unlocked(irq: 8, irq_enabled: 1, irq_pending: 1)
is_hart_busy(0): 1
altps2_interrupt_unlocked(irq: 8, irq_enabled: 1, irq_pending: 1)
is_hart_busy(0): 1
altps2_interrupt_unlocked(irq: 8, irq_enabled: 1, irq_pending: 1)
is_hart_busy(0): 1
altps2_interrupt_unlocked(irq: 8, irq_enabled: 1, irq_pending: 1)
is_hart_busy(0): 1
altps2_interrupt_unlocked(irq: 8, irq_enabled: 1, irq_pending: 1)
is_hart_busy(0): 1
altps2_interrupt_unlocked(irq: 8, irq_enabled: 1, irq_pending: 1)
is_hart_busy(0): 1
altps2_interrupt_unlocked(irq: 8, irq_enabled: 1, irq_pending: 1)
is_hart_busy(0): 1
altps2_interrupt_unlocked(irq: 8, irq_enabled: 1, irq_pending: 1)
is_hart_busy(0): 1
altps2_interrupt_unlocked(irq: 8, irq_enabled: 1, irq_pending: 1)
is_hart_busy(0): 1
altps2_interrupt_unlocked(irq: 8, irq_enabled: 1, irq_pending: 1)
is_hart_busy(0): 1

from rvvm.

LekKit avatar LekKit commented on May 22, 2024

Can you please share the Haiku image/fw in question to help reproducing it? I haven't seen such thing on Linux, so my best guess is blindly looking at plic code/dox

from rvvm.

X547 avatar X547 commented on May 22, 2024

Where can I upload it? It is about 35 MB.

from rvvm.

LekKit avatar LekKit commented on May 22, 2024

Where can I upload it? It is about 35 MB.

https://temp.sh/, or really any similar service you'd like

I can also give sshfs access to some vm

from rvvm.

X547 avatar X547 commented on May 22, 2024

Does it run? It should boot to desktop, but no mouse/keyboard working (only show PS/2 output in kernel log).

from rvvm.

LekKit avatar LekKit commented on May 22, 2024

Does it run? It should boot to desktop, but no mouse/keyboard working (only show PS/2 output in kernel log).

It does fine, after applying the ATA patch; I have reproduced the issue by actively spamming input at the Haiku boot time, but if I touch input only after it reaches the desktop it's fine. Will investigate more.

Could it be an issue with Haiku somehow not resetting interrupts when finishing the boot process?

from rvvm.

X547 avatar X547 commented on May 22, 2024

but if I touch input only after it reaches the desktop it's fine

It stop receive interrupts even after reaching desktop for me if moving a mouse for 2-3 minutes.

from rvvm.

X547 avatar X547 commented on May 22, 2024

Could it be an issue with Haiku somehow not resetting interrupts when finishing the boot process?

This is how extern interrupts are processed:
https://github.com/haiku/haiku/blob/34e92438724cdb062dae1765fc7e765b44f51ac7/src/system/kernel/arch/riscv64/arch_int.cpp#L520

from rvvm.

LekKit avatar LekKit commented on May 22, 2024

It stop receive interrupts even after reaching desktop for me if moving a mouse for 2-3 minutes.

Managed to reproduce it. What was interesting, is that initially I stopped doing any input, but AlteraPs2::HandleInterrupt continued logging for a bit, with a single 00 byte and interval suspiciously looking like keyboard typematic (Even though I wasn't pressing any keys at that moment). After some time, everything hanged and the ps2 ring started screaming about overflows. What's even more interesting is how both keyboard & mouse die at the same time, even though they have completely separate altps2 controllers with different IRQs assigned, there is nothing at RVVM side that could've broken them simultaneously. Look at that, please.

For now I'll keep debugging the state of devices

from rvvm.

X547 avatar X547 commented on May 22, 2024

What's even more interesting is how both keyboard & mouse die at the same time, even though they have completely separate altps2 controllers with different IRQs assigned, there is nothing at RVVM side that could've broken them simultaneously. Look at that, please.

From my debugging results (see second log) HART busy flag is sometimes not cleared causing all PLIC interrupts to stop working.

Timer interrupt seems still alive because CPU load meter tray icon keep changing.

from rvvm.

LekKit avatar LekKit commented on May 22, 2024

From my debugging results (see second log) HART busy flag is sometimes not cleared causing all PLIC interrupts to stop working.

Ahh I see. Thanks.

@cerg2010cerg2010, could you also have a look at that, please? You're the one who initially wrote these devices, although I have no problem completely adopting them if needed. That PLIC device needs huge refactoring anyways.

Timer interrupt seems still alive because CPU load meter tray icon keep changing.

They are unrelated to PLIC, it's the aclint-mtimer who is responsible for that.

from rvvm.

LekKit avatar LekKit commented on May 22, 2024

Side note: SMP seems completely broken with this setup, 4 cores crash the Haiku bootloader. With 2 cores, second one spins somewhere indefinitely.

from rvvm.

X547 avatar X547 commented on May 22, 2024

Side note: SMP seems completely broken with this setup, 4 cores crash the Haiku bootloader. With 2 cores, second one spins somewhere indefinitely.

It is expected because haiku_loader.riscv is not yet SMP-aware. SMP currently work only with haiku_loader.efi.

from rvvm.

LekKit avatar LekKit commented on May 22, 2024

It is expected because haiku_loader.riscv is not yet SMP-aware. SMP currently work only with haiku_loader.efi.

Alright, fine. Just please do some kind of protection like picking a boot hart (using atomic lottery) and place all others in a WFI sleep. This is all what OpenSBI does, the harts are then woken up by IPI after the control is given to the next boot stage.

from rvvm.

LekKit avatar LekKit commented on May 22, 2024
WARN: interrupt ctx 00
WARN: claim ctx 00
AlteraPs2::HandleInterrupt
 02 0d 08 01 08 00
WARN: claimcomplete ctx 00
WARN: interrupt ctx 00
WARN: claim ctx 00
AlteraPs2::HandleInterrupt
 02 07 00
WARN: claimcomplete ctx 00
WARN: interrupt ctx 00
WARN: claim ctx 00
AlteraPs2::HandleInterrupt
 02 05 08 03 05 00
WARN: claimcomplete ctx 00
WARN: interrupt ctx 00
*nothing*

It seems like at some point a hart is interrupted, but never bothers reading PLIC claim register.

from rvvm.

X547 avatar X547 commented on May 22, 2024

It seems like at some point a hart is interrupted, but never bothers reading PLIC claim register.

Does it actually call interrupt vector (STVEC)?

from rvvm.

LekKit avatar LekKit commented on May 22, 2024

Does it actually call interrupt vector (STVEC)?

It doesn't. Re-enabling pending interrupts in xIE CSR should dispatch them though, I'll see if that's the case.

from rvvm.

LekKit avatar LekKit commented on May 22, 2024

I don't see signs of writing into sie register at all, however.
This is where it hangs by the way, suspiciously

WARN: interrupt ctx 00
app_server: Finding best mode for 1024x768 (8, 60 Hz, strict) failed
slab memory manager: created area 0xffffffc009001000 (4144)
module: Search for bus_managers/pci/x86/v1 failed.
ahci: failed to get pci x86 module
ps2_hid: init_hardware
ps2_hid: init_driver
module: Search for bus_managers/ps2/v1 failed.

from rvvm.

X547 avatar X547 commented on May 22, 2024

ps2_hid: init_hardware
ps2_hid: init_driver
module: Search for bus_managers/ps2/v1 failed.

It is i8042 PS/2 controller driver that is not used with RVVM.

from rvvm.

X547 avatar X547 commented on May 22, 2024

I don't see signs of writing into sie register at all, however.

It is set early and not touched after that: https://github.com/haiku/haiku/blob/34e92438724cdb062dae1765fc7e765b44f51ac7/src/system/kernel/arch/riscv64/arch_cpu.cpp#L40.

from rvvm.

LekKit avatar LekKit commented on May 22, 2024
WARN: interrupt ctx 00
WARN: INTERRUPT_SEXTERNAL priv 1 status 000060aa
WARN: Hart 0x5618f62df400 irq to ffffffc0001533f0, cause 9
WARN: INTERRUPT_SEXTERNAL priv 1 status 000061a8
WARN: claim ctx 00
WARN: PLIC read 00000008 from 00000001
WARN: INTERRUPT_SEXTERNAL priv 1 status 000061a8
AlteraPs2::HandleInterrupt
 03 06 08 04 07 00
WARN: INTERRUPT_SEXTERNAL priv 1 status 000061a8
WARN: PLIC write 00000008 to 00200004
WARN: claimcomplete ctx 00
WARN: interrupt ctx 00
WARN: INTERRUPT_SEXTERNAL priv 3 status 000069a0
*hang*

image

from rvvm.

LekKit avatar LekKit commented on May 22, 2024

@X547 It's could be a bug in your M-mode bootloader.

WARN: interrupt ctx 00
WARN: INTERRUPT_SEXTERNAL priv 3 status 000068a0
WARN: MIP changed INTERRUPT_SEXTERNAL!

CPU enters M-mode due to M-mode timer interrupt. It writes to mip CSR to push timer interrupt down to S-mode (presumably, as that's what OpenSBI also does), but zeroes pending external interrupt as well. It never fires because of that.

Perhaps you should use atomic CSR operation to set bits, instead of read/write. Otherwise an external interrupt might come in between your read/write operations. This is why atomic CSR operations to set/clear bits exist at all.

from rvvm.

LekKit avatar LekKit commented on May 22, 2024

@X547 there are major power outages where I live, my phone is discharging so I could be offline soon for god knows how long. I hope the provided information about the culpit is enough so far, good luck.

from rvvm.

LekKit avatar LekKit commented on May 22, 2024

Back from my laptop, which is still charged a bit.

Tried to attach NVMe at the same time with ATA...
image

from rvvm.

X547 avatar X547 commented on May 22, 2024

Tried to attach NVMe at the same time with ATA...

Maybe NVMe driver bug or strange RVVM virtual NVMe device behavior.

from rvvm.

X547 avatar X547 commented on May 22, 2024

Perhaps you should use atomic CSR operation to set bits, instead of read/write. Otherwise an external interrupt might come in between your read/write operations. This is why atomic CSR operations to set/clear bits exist at all.

So it is basically a race condition when setting/clearing flag by reading, changing and writing back?

from rvvm.

LekKit avatar LekKit commented on May 22, 2024

Perhaps you should use atomic CSR operation to set bits, instead of read/write. Otherwise an external interrupt might come in between your read/write operations. This is why atomic CSR operations to set/clear bits exist at all.

So it is basically a race condition when setting/clearing flag by reading, changing and writing back?

Yeah, that's it. You should use csrrs/csrrc instructions for setting/clearing bits in IP register. Otherwise you are subject to an interrupt data race in between RMW operation on it.

Why that didn't happen in Temu/QEMU - no idea, either their input devices don't send so much interrupts as PS2 or their interrupt granularity is lower. Anyways that's unpredictable and should be fixed (Especially for real HW).

from rvvm.

LekKit avatar LekKit commented on May 22, 2024

Tried to attach NVMe at the same time with ATA...

Maybe NVMe driver bug or strange RVVM virtual NVMe device behavior.

It could be NVMe emulation bug, I'll investigate that (Have some admin command related trouble on mind). This also probably should be better handler by the Haiku driver - faulty devices should not cause kernel crashes when they are avoidable, and we want some warning, right?

from rvvm.

LekKit avatar LekKit commented on May 22, 2024

Yeah, that's it. You should use csrrs/csrrc instructions for setting/clearing bits in IP register. Otherwise you are subject to an interrupt data race in between RMW operation on it.

If you will confirm this is fixable on M-mode bootloader side, we should close this issue (notabug). About NVMe troubles, I'll either simply provide patches soon or open another one.

from rvvm.

LekKit avatar LekKit commented on May 22, 2024

Yes, I also cannot reproduce the issue with the updated bootloader. Great)

Btw, is this bootloader planned to be for Haiku only? It sounds like a cool thing to have for other guests, something like a more advanced firmware than SBI/UBoot (It has UI and multiple storage drivers, I like that. SBI spec could be complicated tho.)

from rvvm.

LekKit avatar LekKit commented on May 22, 2024

Btw, is this bootloader planned to be for Haiku only?

Yes. In theory it can be improved to load Linux/FreeBSD kernel, construct kernel args from menu etc., but it will become out of scope of Haiku project and need fork.

Alright, I see. Good luck on that)
Closing this issue as resolved.

from rvvm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.