Giter Site home page Giter Site logo

Comments (15)

xiaoxiang781216 avatar xiaoxiang781216 commented on June 6, 2024

@masayuki2009 could you help look at this issue? The deadlock can 100% repo on QEMU.

from incubator-nuttx.

masayuki2009 avatar masayuki2009 commented on June 6, 2024

I've just enabled CONFIG_DEBUG_FEATURES CONFIG_DEBUG_ERROR but it works.

diff --git a/boards/arm/imx6/sabre-6quad/configs/smp/defconfig b/boards/arm/imx6/sabre-6quad/configs/smp/defconfig
index f871b91a12..335915089f 100644
--- a/boards/arm/imx6/sabre-6quad/configs/smp/defconfig
+++ b/boards/arm/imx6/sabre-6quad/configs/smp/defconfig
@@ -21,6 +21,8 @@ CONFIG_ARCH_STACKDUMP=y
 CONFIG_BOARD_LOOPSPERMSEC=99369
 CONFIG_BOOT_RUNFROMSDRAM=y
 CONFIG_BUILTIN=y
+CONFIG_DEBUG_ERROR=y
+CONFIG_DEBUG_FEATURES=y
 CONFIG_DEBUG_FULLOPT=y
 CONFIG_DEBUG_SYMBOLS=y
 CONFIG_DEV_ZERO=y
@@ -28,7 +30,6 @@ CONFIG_EXAMPLES_HELLO=y
 CONFIG_FS_PROCFS=y
 CONFIG_HAVE_CXX=y
 CONFIG_HAVE_CXXINITIALIZE=y
-CONFIG_HOST_WINDOWS=y
 CONFIG_IMX6_UART1=y
 CONFIG_IMX_DDR_SIZE=1073741824
 CONFIG_INTELHEX_BINARY=y

from incubator-nuttx.

masayuki2009 avatar masayuki2009 commented on June 6, 2024

My defconfig is as follows:

Hmm, you modified many configs in boards/arm/imx6/sabre-6quad/configs/smp/defconfig.
So I've just tried your defconfig again and ran the nuttx on qemu on ubuntu18.04 but it worked.

from incubator-nuttx.

liujianqiang1016 avatar liujianqiang1016 commented on June 6, 2024

Just tried the latest version of the community, the problem did not repeat. Maybe my version is a little bit old. Thank you for your help. @masayuki2009

from incubator-nuttx.

masayuki2009 avatar masayuki2009 commented on June 6, 2024

@liujianqiang1016

Just tried the latest version of the community, the problem did not repeat. Maybe my version is a little bit old. Thank you for your help. @masayuki2009

Thanks for your updates.
I think it relates #774. which I fixed recently.

from incubator-nuttx.

xiaoxiang781216 avatar xiaoxiang781216 commented on June 6, 2024

@masayuki2009 we found that sabre-6quad:smp will always hang on QEMU if we add some log like the attached patch.
0001-Add-some-log-in-arm_decodeirq-to-repo-the-deadlock-i.zip

from incubator-nuttx.

masayuki2009 avatar masayuki2009 commented on June 6, 2024

@masayuki2009 we found that sabre-6quad:smp will always hang on QEMU if we add some log like the attached patch.
0001-Add-some-log-in-arm_decodeirq-to-repo-the-deadlock-i.zip

@xiaoxiang781216 I've just confirmed the deadlock. I will take a look at what is happening tomorrow.

from incubator-nuttx.

xiaoxiang781216 avatar xiaoxiang781216 commented on June 6, 2024

Sure, thank for take time to investigate SMP issue.

from incubator-nuttx.

patacongo avatar patacongo commented on June 6, 2024

Hmmm... just looking at the code, it looks like arm_decode_irq() would be a bad place to put any debug output. That is because arm_decode_irq() calls arm_doirq() and is where the interrupt mode is set:

 88   /* Current regs non-zero indicates that we are processing an interrupt;
 89    * CURRENT_REGS is also used to manage interrupt level context switches.
 90    */
 91
 92   CURRENT_REGS = regs;

Then it is tested with:

 60 bool up_interrupt_context(void)
 61 {
 62   return CURRENT_REGS != NULL;
 63 }

Prior to CURRENT_REGS being set, the system has no idea that it is in an interrupt handler. Syslog will think it is in a normal task mode and will almost certainly do the wrong thing. I have not tracked down all of that, but I have looked at enough to see that arm_decode_irq() is not a valid place to call syslog() functions. All of these tests would to the incorrect thing:

ramlog.c:  DEBUGASSERT(!up_interrupt_context());
syslog_device.c:  if (up_interrupt_context() || getpid() == 0)
syslog_putc.c:  if (up_interrupt_context() || sched_idletask())
syslog_putc.c:      if (up_interrupt_context())
syslog_write.c:  if (up_interrupt_context() || sched_idletask())
syslog_write.c:          if (up_interrupt_context())
syslog_write.c:  if (!up_interrupt_context() && !sched_idletask())

If you move the __err to _arm_doirq() after CURRENT_REGS is set, it may work. It should also work if you add logic to set CURRENT_REGS in arm_decode_irq() too.

from incubator-nuttx.

patacongo avatar patacongo commented on June 6, 2024

Another thing to be careful of is doing syslog output in the same logic patch that handles serial console output. If you generate serial console output everytime a serial interrupt occurs, then you will also get into a different kind of infinite loop. Best to check if the irq is the console irq and don't do syslog output.

from incubator-nuttx.

masayuki2009 avatar masayuki2009 commented on June 6, 2024

@xiaoxiang781216 I've just confirmed the deadlock. I will take a look at what is happening tomorrow.

(gdb) target extended-remote:1234                                                                                                                                                                        
Remote debugging using :1234                                                                                                                                                                             
up_testset () at armv7-a/arm_testset.S:120                                                                                                                                                               
120             mov             r0, #SP_LOCKED                                                                                                                                                           
(gdb) info thread                                                                                                                                                                                        
  Id   Target Id                  Frame                                                                                                                                                                  
* 1    Thread 1 (CPU#0 [running]) up_testset () at armv7-a/arm_testset.S:120                                                                                                                             
  2    Thread 2 (CPU#1 [running]) 0x10802eb0 in spin_trylock_wo_note () at semaphore/spinlock.c:178                                                                                                      
  3    Thread 3 (CPU#2 [halted ]) 0x108081cc in up_idle () at chip/imx_idle.c:59                                                                                                                         
  4    Thread 4 (CPU#3 [halted ]) 0x108081cc in up_idle () at chip/imx_idle.c:59                                                                                                                         
(gdb) where                                                                                                                                                                                              
#0  up_testset () at armv7-a/arm_testset.S:120                                                                                                                                                           
#1  0x10802e70 in spin_lock (lock=lock@entry=0x10829379 <g_cpu_paused+1> "\001") at semaphore/spinlock.c:89                                                                                              
#2  0x10801250 in up_cpu_pause (cpu=cpu@entry=1) at armv7-a/arm_cpupause.c:282                                                                                                                           
#3  0x10813164 in sched_addreadytorun (btcb=btcb@entry=0x10836eb0) at sched/sched_addreadytorun.c:280                                                                                                    
#4  0x10808148 in up_unblock_task (tcb=0x10836eb0) at armv7-a/arm_unblocktask.c:102                                                                                                                      
#5  0x10802e28 in nxsem_post (sem=sem@entry=0x10837f94) at semaphore/sem_post.c:165                                                                                                                      
#6  0x10804068 in nxtask_exitwakeup (status=277052944, tcb=0x10837e10) at task/task_exithook.c:547                                                                                                       
#7  nxtask_exithook (tcb=0x10837e10, status=status@entry=0, nonblocking=nonblocking@entry=0 '\000') at task/task_exithook.c:677                                                                          
#8  0x10803438 in exit (status=0) at task/exit.c:96                                                                                                                                                      
#9  0x1080341c in nxtask_start () at task/task_start.c:151                                                                                                                                               
#10 0x00000000 in ?? ()                                                            
(gdb) thread 2                                                                                                                                                                                           
[Switching to thread 2 (Thread 2)]                                                                                                                                                                       
#0  0x10802eb0 in spin_trylock_wo_note () at semaphore/spinlock.c:178                                                                                                                                    
178       return SP_UNLOCKED;                                                                                                                                                                            
(gdb) where                                                                                                                                                                                              
#0  0x10802eb0 in spin_trylock_wo_note () at semaphore/spinlock.c:178                                                                                                                                    
#1  0x10802638 in irq_waitlock (cpu=1) at irq/irq_csection.c:137                                                                                                                                         
#2  0x10802734 in enter_critical_section () at irq/irq_csection.c:345                                                                                                                                    
#3  0x10806154 in ramlog_addchar (priv=priv@entry=0x10829130 <g_sysdev>, ch=ch@entry=97 'a') at syslog/ramlog.c:230                                                                                      
#4  0x10806478 in ramlog_putc (ch=97) at syslog/ramlog.c:749                                                                                                                                             
#5  0x10814c48 in syslogstream_putc (ch=<optimized out>, this=<optimized out>) at syslog/syslog_stream.c:174                                                                                             
#6  syslogstream_putc (this=0x1082fb8c <g_irqstack_alloc+4004>, ch=97) at syslog/syslog_stream.c:130                                                                                                     
#7  0x10806d60 in vsprintf_internal (arglist=0x0, numargs=0, ap=..., fmt=<optimized out>, stream=0x1082fb8c <g_irqstack_alloc+4004>) at stdio/lib_libvsprintf.c:909                                      
#8  lib_vsprintf (stream=0x1082fb8c <g_irqstack_alloc+4004>, stream@entry=0x1082fb84 <g_irqstack_alloc+3996>, fmt=fmt@entry=0x10820de6 "%s: cpu = %d, irq %d.\n", ap=...) at stdio/lib_libvsprintf.c:1278                                                                                                                                                                                                        
#9  0x10814c1c in nx_vsyslog (priority=priority@entry=3, fmt=0x10820de6 "%s: cpu = %d, irq %d.\n", fmt@entry=0x10807434 <vsyslog+36> "\f\320\215\342\004\360\235\344T\221\202\020\016", ap=0x1082fbac <g_irqstack_alloc+4036>, ap@entry=0x1082fba4 <g_irqstack_alloc+4028>) at syslog/vsyslog.c:148                                                                                                              
#10 0x10807434 in vsyslog (priority=priority@entry=3, fmt=fmt@entry=0x10807434 <vsyslog+36> "\f\320\215\342\004\360\235\344T\221\202\020\016", ap=..., ap@entry=...) at syslog/lib_syslog.c:84           
#11 0x10807458 in syslog (priority=priority@entry=3, fmt=0x10820de6 "%s: cpu = %d, irq %d.\n") at syslog/lib_syslog.c:116                                                                                
#12 0x10800dd8 in arm_decodeirq (regs=0x10833c28 <g_cpu1_idlestack+1832>) at armv7-a/arm_gicv2.c:397

As you can see, the thread2 running on cpu#1 is in interrupt context.
However, as @patacongo pointed out, enter_critical_section() processes as if it were in normal tasking environment, because _err() was called before setting CURRENT_REGS.

from incubator-nuttx.

patacongo avatar patacongo commented on June 6, 2024

As you can see, the thread2 running on cpu#1 is in interrupt context.
However, as @patacongo pointed out, enter_critical_section() processes as if it were in normal tasking environment, because _err() was called before setting CURRENT_REGS.

That is not expected to work. That is a coding error that should result in a crash or a hang or other fatal consequence.

from incubator-nuttx.

masayuki2009 avatar masayuki2009 commented on June 6, 2024

@patacongo

If you move the __err to _arm_doirq() after CURRENT_REGS is set, it may work.

I've just revert the patch which @xiaoxiang781216 attached and added the following code.

--- a/arch/arm/src/armv7-a/arm_doirq.c
+++ b/arch/arm/src/armv7-a/arm_doirq.c
@@ -91,6 +91,14 @@ static inline uint32_t *_arm_doirq(int irq, uint32_t *regs)
 
   CURRENT_REGS = regs;
 
+#if 1
+  int cpu = up_cpu_index();
+  if (cpu != 0)
+    {
+      _err("cpu = %d, irq %d.\n", cpu, irq);
+    }
+#endif
+
   /* Deliver the IRQ */

Actually hello app works on qemu. (smp and ostest apps also work)

qemu-system-arm -M sabrelite -smp 4 -kernel ./nuttx/nuttx -nographic -s
ABCDGHIJKNOPQ

NuttShell (NSH) NuttX-8.2.0
nsh> dmesg
_arm_doirq: cpu = 1, irq 1.
_arm_doirq: cpu = 2, irq 1.
_arm_doirq: cpu = 3, irq 1.
nsh> hello
Hello, World!!
nsh> dmesg
_arm_doirq: cpu = 1, irq 2.

from incubator-nuttx.

xiaoxiang781216 avatar xiaoxiang781216 commented on June 6, 2024

@masayuki2009 and @patacongo thank for the explanation.
Maybe we can change up_interrupt_context to check the hardware status bit instead of CURRENT_REGS, then syslog can be called in any point, but it is another story.

from incubator-nuttx.

patacongo avatar patacongo commented on June 6, 2024

Maybe we can change up_interrupt_context to check the hardware status bit instead of CURRENT_REGS, then syslog can be called in any point, but it is another story.

That might be much more compilicated than you think. You should study the probleml before jumping into an unnecessary, alternative solution. Nothing is broken.

Whenever, up_interrupt_context() returns true, the code also assumes that CURRENT_REGS is non-NULL. That is the accepted semantics. Those two things cannot be easily separated. up_assert() is a good example. In the interrupt context, it needs the CURRENT_REGS. These are probably other less obvious dependences.

Also, high priority, nested, zero latency interrupts worked very differently. They are not treated as interrupts at all.

I think this is not a good idea. It is, at least, not something that you should do impulsively or without some significant analysis of the consequences.

from incubator-nuttx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.