Giter Site home page Giter Site logo

Comments (6)

dhiltonp avatar dhiltonp commented on August 22, 2024

I'll have to go over the continue_time related code later to double check for the same bug (and revisit alternative implementations), but for the time being, I've applied your fix.

from hexbright.

AustinSpafford avatar AustinSpafford commented on August 22, 2024

I'm glad to help! :)

By the way, for a more detailed explanation of what happens, below is the
email I sent to my dad just after trying out the fix. As for the project
where this occurred, as part of a recovery from a serious accident, I used
the hexbright to build an animated tron-themed cane to help remove some of
the stigma. It was fantastically successful (garnered tons of praise at PAX
Prime), and I intend to post a video of it and share the source code once
it gets approved by the lawyer for my case (widely-shared information is
apparently something to be careful about).

Thank you for making the hexbright easier to use! Given how people
responded, the end product felt like a magic show. :)

Playing with numbers here told me what to data-storage and time-units to
look for: http://www.wolframalpha.com/input/?i=2%5E32+microseconds

Then I found this throttle in the library I'm using, which uses 32-bit
unsigned storage and microseconds, exactly the suspect I'm looking for!
do {
now = micros();
} while (continue_time > now); // not ready for update

This is a classic bug, usually seen with occasional hangs in programs that
have been running for 49.7 days (2^32 milliseconds). The fixed code is
simply:
do {
now = micros();
} while ((signed long)(continue_time - now) > 0); // not ready for
update

It's an evil bug, because each hour when the values returned by "micros()"
wrap around, there's a random chance that "now" overshoots "continue_time",
wrapping around and then being an entire hour behind it (which is why the
cane appeared to lock up for about an hour). Even worse, when that hour is
up, there's a small chance it will overshoot again and be hung for
another hour.

I'll test it tomorrow, but it should be easier for me to sleep knowing that
I'm damn-sure I found the bug!

Once I test out the fix and become damn certain it's correct, I'll notify
the hexbright programmer so he can fix the bug in their library. :)

-Austin

P.S. While this was a relatively easy bug to fix, it's definitely
rejuvenating to do something like that!

On Fri, Sep 6, 2013 at 5:23 AM, David Hilton [email protected]:

I'll have to go over the continue_time related code later to double check
for the same bug (and revisit alternative implementations), but for the
time being, I've applied your fix.


Reply to this email directly or view it on GitHubhttps://github.com//issues/38#issuecomment-23936129
.

from hexbright.

dhiltonp avatar dhiltonp commented on August 22, 2024

As soon as you sent in the bug I knew what condition you were talking about; I'm mostly surprised as I've never seen the bug in practice.

I've run code for at least 8 hours and not hit this bug. That's got to be due to a slight timing issue. Maybe your code is taking a little too long to run? If you set DEBUG to DEBUG_LOOP in libraries/hexbright/hexbright.h, it will tell you your average loop time.

I've just run a sample program;
In a non-debug mode, continue_time is initially set to 1136 microseconds.
I plugged in the exact formula into a loop that would cycle, and print the difference from maxint if we were within 20000 of maxint. Here is the output (how close continue_time gets to maxint):
starting values: 1136
starting values: 9470
starting values: 17803
countdown to overflow: -19711
countdown to overflow: -11263
countdown to overflow: -2815
overflowed!: 1
overflowed!: 8334
overflowed!: 16667
countdown to overflow: -12287
countdown to overflow: -3839
overflowed!: 1
overflowed!: 8334
overflowed!: 16667
countdown to overflow: -12287
countdown to overflow: -3839
overflowed!: 1
overflowed!: 8334
overflowed!: 16667
countdown to overflow: -12287
countdown to overflow: -3839
overflowed!: 1
overflowed!: 8334
overflowed!: 16667
countdown to overflow: -12287
countdown to overflow: -3839

Here is the relevant code:
const float update_delay = 8.3333333; // in lock-step with the accelerometer
unsigned long continue_time = 1136; // actual startup time
unsigned long maxint = 4294967295;
while(true) {
continue_time = continue_time+(1000*update_delay);
if(continue_time>4294947296) {
Serial.print("countdown to overflow: -");
Serial.println(maxint-continue_time);
} else if (continue_time<20000) {
Serial.print("overflowed!: ");
Serial.println(continue_time-maxint);
}
}

I'm actually quite surprised by the lack of proper overflow - though it does make this analysis quite simple. It does seem to indicate that your program is taking a little too long to execute - effectively, you're missing some of the accelerometer samples and light change operations take a little longer than requested (and you hit this freezing bug).

By the way, there is an onboard reset switch that could have been useful.

from hexbright.

AustinSpafford avatar AustinSpafford commented on August 22, 2024

Indeed, at the time I was running into it because some lazy serial-output
was dragging the flashlight down to 10 Hz. Come to think of it, that
explains quite clearly why I was seeing it reliably once an hour, and why
it wasn't sticking for subsequent hours (the first iteration was lapsing in
my heinously slow output code, but the throttle was breaking out due to the
shear update-frequency). Hmm, it even explains why it seemed to only stick
while in awake-mode (versus sleep-mode), since awake-mode was pouring out
about three times the debug text (especially trying to disambiguate
walking-steps from other motions).

That night I of course went through and trimmed down all the serial output
to be just event-based messages. Besides, the spam of orientation and
light-level data was no longer needed.

However, I learned my lesson that day, since I had not brought the tools
needed to open the cane and get to the flashlight to reset it. It did lead
to a funny situation where I said "I'm sorry, but my cane appears to have
crashed!", much to the amazement of those I was apologizing to.

On Fri, Sep 6, 2013 at 2:43 PM, David Hilton [email protected]:

As soon as you sent in the bug I knew what condition you were talking
about; I'm mostly surprised I've never seen the bug in practice.

I've run code for at least 8 hours and not hit this bug. That's got to be
due to a slight timing issue. Maybe your code is taking a little too long
to run? If you set DEBUG to DEBUG_LOOP in libraries/hexbright/hexbright.h,
it will tell you your average loop time.

I've just run a sample program;
In a non-debug mode, continue_time is initially set to 1136 microseconds.
I plugged in the exact formula into a loop that would cycle, and print the
difference from maxint if we were within 20000 of maxint. Here is the
output (how close continue_time gets to maxint):
starting values: 1136
starting values: 9470
starting values: 17803
countdown to overflow: -19711
countdown to overflow: -11263
countdown to overflow: -2815
overflowed!: 1
overflowed!: 8334
overflowed!: 16667
countdown to overflow: -12287
countdown to overflow: -3839
overflowed!: 1
overflowed!: 8334
overflowed!: 16667

I'm actually quite surprised by the lack of proper overflow - though it
does make this analysis quite simple. It does seem to indicate that your
program is taking a little too long to execute - effectively, you're
missing some of the accelerometer samples and light change operations take
a little longer than requested (and you hit this freezing bug).

By the way, there is an onboard reset switch that could have been useful.


Reply to this email directly or view it on GitHubhttps://github.com//issues/38#issuecomment-23971330
.

from hexbright.

dhiltonp avatar dhiltonp commented on August 22, 2024

So, this is executed as a floating point operation, and it has the reset to 1 problem.
continue_time = continue_time+(1000*update_delay);

This doesn't have the problem, though.
continue_time = continue_time+(int)(1000*update_delay);

I've applied this fix.

from hexbright.

AustinSpafford avatar AustinSpafford commented on August 22, 2024

I'm confused, since continue_time has to wrap around about once an hour, as the restriction is micros() itself. Are you saying that this was in addition to fixing the while loop?

That's a good find regarding the implicit float-to-int cast. I'm not sure how 8333 compares versus the resolution near (float)2^32, but the primary danger would be continue_time failing to increment, rather than it wrapping around (large-float plus tiny-float equals unchanged large-float).

Hmm, attacking the root of the problem, would it make sense to instead redefine update_delay as update_delay_micros?

In any case, thank you for writing the library, and good luck!

On Sep 7, 2013, at 6:12 AM, David Hilton [email protected] wrote:

So, this is executed as a floating point operation, and it has the reset to 1 problem.
continue_time = continue_time+(1000*update_delay);

This doesn't have the problem, though.
continue_time = continue_time+(int)(1000*update_delay);

I've applied this fix.


Reply to this email directly or view it on GitHub.

from hexbright.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.