Giter Site home page Giter Site logo

Comments (7)

joneschrisg avatar joneschrisg commented on May 8, 2024

(Seen in ubuntu 12.04 x64, running rr cross-compiled to x86, inside VMWare Workstation 9. So a lot of moving parts.)

from rr.

joneschrisg avatar joneschrisg commented on May 8, 2024

I should note that the symptom seen now is always a replayed rcb that's exactly 1 more than the recorded rcb. (That's part of what makes me suspect VMWare.) That means the original bug here was probably fixed, and this is a different issue. Oh well.

Example failure

Output from recording:
--------------------------------------------------
...................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Signal caught, Counter is 45083337
--------------------------------------------------
Output from replay:
--------------------------------------------------
...................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Signal caught, Counter is 45083338
--------------------------------------------------

from rr.

joneschrisg avatar joneschrisg commented on May 8, 2024

Reproduces on bare metal.

from rr.

joneschrisg avatar joneschrisg commented on May 8, 2024

This one is looking pretty scary. The test sets an alarm(1) and then does this

    for (counter=0; counter >= 0 && !stop; counter++)
        if (counter % 100000 == 0)
            write(STDOUT_FILENO, ".", 1);
    atomic_printf("\nSignal caught, Counter is %d\n", counter);

The failure mode is that counter is exactly one greater than printed during recording. When that happens, we always record the interrupt at the same instruction. The disassembly for the code looks like

loop_body:
   0x080488b1 <+111>:   mov    0x9c(%esp),%ecx
 // [compute $edx := counter % 100000]
   0x080488d9 <+151>:   mov    %edx,%eax
 $eax := counter % 100000
   0x080488db <+153>:   test   %eax,%eax
   0x080488dd <+155>:   jne    0x80488fb 
 if (0 != $eax) goto incr_counter
   0x080488df <+157>:   movl   $0x1,0x8(%esp)
   0x080488e7 <+165>:   movl   $0x8048a10,0x4(%esp)
   0x080488ef <+173>:   movl   $0x1,(%esp)
   0x080488f6 <+180>:   call   0x8048690 
incr_counter:
   0x080488fb <+185>:   addl   $0x1,0x9c(%esp)
 counter++
   0x08048903 <+193>:   cmpl   $0x0,0x9c(%esp)
   0x0804890b <+201>:   js     0x8048916 
 if (sign(counter)) goto done
   0x0804890d <+203>:   mov    0x804a03c,%eax
   0x08048912 <+208>:   test   %eax,%eax
   0x08048914 <+210>:   je     0x80488b1 
 if (0 == stop) goto loop_body
done:
   0x08048916 <+212>:   mov    0x9c(%esp),%eax
   // atomic_printf(...)

We always fail when the signal is delivered at $ip

    0x080488dd <+155>:  jne    0x80488fb 
//...
    [==>] 0x080488fb <+185>:    addl   $0x1,0x9c(%esp)

So the instruction that should be retired just before the interrupt is a conditional branch.

Will put down some speculations in the next comment.

from rr.

joneschrisg avatar joneschrisg commented on May 8, 2024

I forgot to add before: I checked out f7c5b32, which is from before we ran through the slack region using CONT with breakpoints, and before we tracked sighandler information with the sighandler table. The same bug reproduces there, apparently about as frequently. So this has likely been a problem "forever".

from rr.

joneschrisg avatar joneschrisg commented on May 8, 2024

I added an extra "canary counter" that's always incremented just after counter above (yes yes, verified the asm). I've got a failure mode where the canary counter ends up being one more than counter. That shouldn't be possible (except with things rr shouldn't be able to observe, like out-of-order execution), so it's looking like we're somehow running the instruction at the target $ip one extra time.

from rr.

joneschrisg avatar joneschrisg commented on May 8, 2024

The problem is indeed executing the interrupted instruction twice. The deliver-signal code is working perfectly, and then we emulate the sigreturn. Entering the sigreturn seems to do the right thing, but on exiting it the canary counter has incremented, and the $ip is set back to the increment-counter instruction. So when we resume from there, we run the instruction again.

This is because the emulation of the sigreturn exit does a PTRACE_SINGLESTEP_EMU, after which we rewind the $ip. Apparently that's wrong for a sigreturn.

So the fix seems to be not doing the singlestep at exit, but, gasp, that fix breaks another test :/.

from rr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.