Comments (7)
(Seen in ubuntu 12.04 x64, running rr cross-compiled to x86, inside VMWare Workstation 9. So a lot of moving parts.)
from rr.
I should note that the symptom seen now is always a replayed rcb that's exactly 1 more than the recorded rcb. (That's part of what makes me suspect VMWare.) That means the original bug here was probably fixed, and this is a different issue. Oh well.
Example failure
Output from recording: -------------------------------------------------- ................................................................................................................................................................................................................................................................................................................................................................................................................................................................... Signal caught, Counter is 45083337 -------------------------------------------------- Output from replay: -------------------------------------------------- ................................................................................................................................................................................................................................................................................................................................................................................................................................................................... Signal caught, Counter is 45083338 --------------------------------------------------
from rr.
Reproduces on bare metal.
from rr.
This one is looking pretty scary. The test sets an alarm(1)
and then does this
for (counter=0; counter >= 0 && !stop; counter++) if (counter % 100000 == 0) write(STDOUT_FILENO, ".", 1); atomic_printf("\nSignal caught, Counter is %d\n", counter);
The failure mode is that counter is exactly one greater than printed during recording. When that happens, we always record the interrupt at the same instruction. The disassembly for the code looks like
loop_body: 0x080488b1 <+111>: mov 0x9c(%esp),%ecx // [compute $edx := counter % 100000] 0x080488d9 <+151>: mov %edx,%eax $eax := counter % 100000 0x080488db <+153>: test %eax,%eax 0x080488dd <+155>: jne 0x80488fb if (0 != $eax) goto incr_counter 0x080488df <+157>: movl $0x1,0x8(%esp) 0x080488e7 <+165>: movl $0x8048a10,0x4(%esp) 0x080488ef <+173>: movl $0x1,(%esp) 0x080488f6 <+180>: call 0x8048690 incr_counter: 0x080488fb <+185>: addl $0x1,0x9c(%esp) counter++ 0x08048903 <+193>: cmpl $0x0,0x9c(%esp) 0x0804890b <+201>: js 0x8048916 if (sign(counter)) goto done 0x0804890d <+203>: mov 0x804a03c,%eax 0x08048912 <+208>: test %eax,%eax 0x08048914 <+210>: je 0x80488b1 if (0 == stop) goto loop_body done: 0x08048916 <+212>: mov 0x9c(%esp),%eax // atomic_printf(...)
We always fail when the signal is delivered at $ip
0x080488dd <+155>: jne 0x80488fb //... [==>] 0x080488fb <+185>: addl $0x1,0x9c(%esp)
So the instruction that should be retired just before the interrupt is a conditional branch.
Will put down some speculations in the next comment.
from rr.
I forgot to add before: I checked out f7c5b32, which is from before we ran through the slack region using CONT with breakpoints, and before we tracked sighandler information with the sighandler table. The same bug reproduces there, apparently about as frequently. So this has likely been a problem "forever".
from rr.
I added an extra "canary counter" that's always incremented just after counter
above (yes yes, verified the asm). I've got a failure mode where the canary counter ends up being one more than counter
. That shouldn't be possible (except with things rr shouldn't be able to observe, like out-of-order execution), so it's looking like we're somehow running the instruction at the target $ip one extra time.
from rr.
The problem is indeed executing the interrupted instruction twice. The deliver-signal code is working perfectly, and then we emulate the sigreturn. Entering the sigreturn seems to do the right thing, but on exiting it the canary counter has incremented, and the $ip is set back to the increment-counter instruction. So when we resume from there, we run the instruction again.
This is because the emulation of the sigreturn exit does a PTRACE_SINGLESTEP_EMU, after which we rewind the $ip. Apparently that's wrong for a sigreturn.
So the fix seems to be not doing the singlestep at exit, but, gasp, that fix breaks another test :/.
from rr.
Related Issues (20)
- fatal error: linux/openat2.h: No such file or directory HOT 1
- Cannot continue over exec HOT 4
- RR waitpid bug not seen during non-recording HOT 2
- Make rr work with `perf_event_paranoid`=2 HOT 9
- Use hardware breakpoints and bpf for fast fast-forwarding to asynchronous events
- `sigframe_grow_stack` no-syscallbuf test failing in ARM CI HOT 1
- Does rr support ARM cortex-A55 CPU? HOT 8
- openat test leads to undeclared SYS_openat2 HOT 4
- Emulated mlock + MADV_DONTNEED diverges HOT 19
- Make rr link with lld HOT 2
- `mmap` ignores `MAP_FIXED_NOREPLACE` when using rr in chaos mode HOT 1
- `netfilter` test fails in 32-bit when `CONFIG_NETFILTER_XTABLES_COMPAT` is configured off HOT 2
- LICENSE / Copyright adjustments needed (Pernesco, contributors?) HOT 3
- Crash when replaying a trace of Mixxx (part 2) HOT 7
- Crash when replaying a trace of Mixxx (part 3) HOT 3
- `rr replay -g` and `run <event>` don't agree with `when` HOT 3
- Test dlopen fails since c7d57227 HOT 5
- CMake Policy CMP0148
- Failing tests on on i5-12500? HOT 24
- GDB Checkpoint Issue
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rr.