Comments (15)
Yeah, there's definitely still some kind of race in the setup/teardown of the signal handlers. I gave justen bad advice on this. I'll have a PR with a fix shortly.
from containerpilot.
I ran it again and it succeeded - so its a race
https://travis-ci.org/justenwalker/containerbuddy/builds/94959813
Is it the order in which sendAndWaitForSignal blocks?
<-sig
runtime.Gosched()
vs.
runtime.Gosched()
<-sig
from containerpilot.
With that ordering it fails consistently when I run it locally. I suspect the mental model we're using here is flawed somehow about how signal handlers should be set up.
from containerpilot.
I've got a good workaround and I'm testing it now on a WIP branch... we'll set up the signal handler in a helper and then mark that as such in a global bool in the test module. This way we verify that we've only set it up once; the state should be safe to reuse between the signal tests so long as each test covers different signals.
from containerpilot.
I don't know why, but this seems to work:
Edit: Scratch that. I increased it to 1000 and it dies.
runtime.Gosched()
<-sig
runtime.Gosched()
signal.Stop(sig) //May not be needed
I added a test that re-runs the test 100 times, and it passes.
Dies with 1000 iterations.
func TestMaintenanceSignalRace(t *testing.T) {
for i := 0; i < 100; i++ {
TestMaintenanceSignal(t)
}
}
from containerpilot.
The only problem with that is that it might only work on your machine (the test that's failing never failed on my machine before, for example, and it still doesn't). If we remove the need to reset the handlers, we can avoid the question entirely. I've got a PR coming to that effect.
from containerpilot.
Figured - thats what I get for coding by coincidence
from containerpilot.
Interesting... apparently I'm totally wrong. It worked the first time I pushed up the commit (on a very dirty work-in-progress branch) and then not afterwards: https://travis-ci.org/tgross/containerbuddy/builds/94967601
from containerpilot.
Working on this is a little tricky because I can't force the bug on my machine.
from containerpilot.
@justenwalker I tried what you'd suggested in https://travis-ci.org/tgross/containerbuddy/builds/94972692 and it results in the same kinds of errors on Travis.
I'm going to do some integration testing on this and make sure that the signal handler is actually working as I expect and that this isn't just a test artifact.
from containerpilot.
Ok, so fortunately in an integration test this seems to work as we'd expect. That doesn't eliminate a possible race in the implementation that happens when signals arrive close together, of course. But it does lead me to believe that the problem is in the test rig.
from containerpilot.
Think I got it: #33
from containerpilot.
@justenwalker I slept on this one a bit and woke up this morning with some new insights. We're making what is probably a classic mistake of trying to force behavior in the test by forcing the test thread to yield or by sleeping. We're trying to fix a concurrency bug within asynchronous signals, and we can't coerce the signals to be received by the signal
package synchronously.
I've spent a little time trying to force synchronicity into this and I think the best approach will be to factor out the signal handlers from the signal receivers such that we can unit test the signal handling code properly without worrying about concurrency, and then just have a simple test that verifies we've wired up the handlers correctly. I'll have a PR inbound for this shortly.
from containerpilot.
I've opened #36 with what I believe to be the correct solution to the problem.
from containerpilot.
Fix merged and build is now marked as passing. Thanks for the assist on this, @justenwalker
from containerpilot.
Related Issues (20)
- Stability issues with signal events under SmartOS/LX HOT 2
- [Question]How to disable default metrics and only response custom defined metrics? HOT 5
- Building inside a docker container HOT 2
- SmartOS and LX brand issues with Go 1.9
- Docs incorrectly say 'initialStatus', should be 'initial_status' HOT 2
- Telemetry custom metrics always zero HOT 4
- Run as user per job HOT 5
- Allow for an ADHoc Sending of a signal to a ContainerPilot job. HOT 1
- Project status HOT 3
- Error parsing environment variable in config template
- Unable to execute job HOT 1
- CP ends up ignoring that it's jobs have been killed
- Local build on SmartOS fails due to upstream changes
- 100% CPU Usage
- github url's in documentation
- Documentation Update: docker-compose --scale "change"
- consul with TLS does not read env vars set by -putenv
- Broken link to blog/wordpress-on-autopilot
- Container Pilot process get hung and cannot recover when health check timeouts continues for more than an hour
- Support consul service meta data HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from containerpilot.