Giter Site home page Giter Site logo

Comments (10)

patacongo avatar patacongo commented on June 6, 2024

There is a more complete discussion of how ECANCELED is required to support cancellation here: https://cwiki.apache.org/confluence/display/NUTTX/Cancellation+Points I think from that is is clear how nxsem_[timed]wait_uninterruptible breaks cancellation points.

from incubator-nuttx.

patacongo avatar patacongo commented on June 6, 2024

There are three interfaces affected by this issue: nxsem_wait_uninterruptible(), nxsem_timedwait_uninterruptible(), and nxsem_tickwait_uninteruptible. PR #645 handles the latter two cases.

The full solution will require similar modificatinos to nxsem_wait_uninterruptible(). That, however, is a much larger effort because nxsem_wait_uninterruptible is used much more frequently, and also because the return value is usually ignored. Those cases must all be changed to that they handle the -ECANCELED error return value.

Even though there are many files, the change would still be rather easy if it were not for the fact that all of the modified files would also need to go throught nxstyle. Most of the files requiring modificatin are under arch/ and those historically have more coding standard issues.

from incubator-nuttx.

patacongo avatar patacongo commented on June 6, 2024

The final change for to nxsem_wait_uninterruptible() looks like it would require modifying 202 files, 116 of those are under arch/ and would most like require significant time to pass nxstyle cleanly.

Perhaps the best strategy would be to first begin changing callers of nxsem_wait_uninterruptible and make certain that they handle the return value properly. That can be be done a little at a time with no risk and with small effort for each group of files.

When all of the callers handle the return value, then the core change to nxsem_wait_uninterruptle() and be made and the job is finished.

from incubator-nuttx.

patacongo avatar patacongo commented on June 6, 2024

#646 handles the return values from nxsem_wait_uninterruptible() under graphics/, mm/, net/, sched/, wireless/bluetooth. Still do do: fs/, drivers,/, and arch/

from incubator-nuttx.

xiaoxiang781216 avatar xiaoxiang781216 commented on June 6, 2024

@patacongo semaphore has two usage in NuttX:
1.Protect the shared data just like a light pthread_mutex_t
2.Signal something happen just like a light phtread_cond_t
Should we return -ECANCEL/-EINTR just only for item 2?

from incubator-nuttx.

patacongo avatar patacongo commented on June 6, 2024

I refer to those use cases as mutual exclusion sempahores, and signaling semaphores. I think both should work the same: For both the semaphore wait should be terminated and correct error should be returned.

If the API requires returning EINTR if a signal is received, we should return the error. This is required in most POSIX interfaces and don't see how you could avoid that. However, the mutual exclusion semaphores are the just there for the correctness in the design. There is seldom any real competitor for the exclusion semaphore and, in general, they never block and, hence, would never generate the error.

And even if taking the mutual exclusion semaphore does cause the thread to block, it should block form only a very brief moment of time and the likelihood of a signal received or of the task being canceled is very small.

But if there is even a small possibility of that event happening, it WILL happen in an embedded system.

But if during the that brief moment while the thread is blocked, a signal is received, I think that it is correct to return EINTR. This is not so critical only because such an event it is rare. But the specification requires it so we should do it.

But ECANCELED error is ABSOLUTELY critical to return under any circumstance. That cancellation notification is received only once and it if is ignored, then the thread will not cancel it will continue to run. It really must return immediately with ECANCELED for the cancellation to work quickly and reliably, where ever ECANCELED is detected. Ignoring it is a hard bug in any case.

The signaling semaphores are a different story. The may block for a very long time, for example, waiting for the receipt of data that will never come. For example, a read from a serial port will hang indefinitely if nothing is received on the serial port. So if a signal is received or if the thread is canceled, then it is essential to terminate the semaphore wait and return the error condition. But we have no difference of opinion on that.

There is really no difference in the actions that must be taken if a signal is received of if the task is canceled. Both mutual exclusion semaphores and signaling semaphores should wake up and return the error condition. The only real differences is that mutual exclusion semaphores either do not block or, if they do, the do not block as long.

from incubator-nuttx.

patacongo avatar patacongo commented on June 6, 2024

Currently, the only way for an OS interface function to know if a signal was received is to be awakened with an EINTR error. That is why is is critical to always return the EINTR error and conform with the POSIX requirements.

However, if a signal is received while the OS interface is NOT waiting, then there is no way to know if the signal was received. I think that is a limitation in the current design. There probably should be some global indication, perhaps in the TCB, that a signal has been received.

There is already a TCB_FLAG_CANCEL_PENDING in the TCB that will tell us that the if the task has been canceled. That flag is tested in the leave_cancellation_point() function so I don't think that there is any corresponding issue for thread cancellation. We just need to make sure that all waits are aborted if ECANCELED is received and let the error indication ripple all they back to the leave_cancellation_point() function. Then the function will exit cleanly, safely, and quickly.

Interestingly, the ECANCELED error will never be seen by the application. It just triggers the return uwind sequence where all resources are recovered and finally until leave_cancellation_point is called -- then the thread will exit before it returns to applicaton.

You can see all of this working in the board/sim/sim/sim/configs/ostest configuration if you also enable:

CONFIG_CANCELLATION_POINTS=y
CONFIG_PTHREAD_CLEANUP=y

There is a cancel.c test, but the more intesting is the pthread_cleanup.c test. You can see how all this works when the code unwinds with the ECANCELED error and calls leave_cancellation_point. Just single step through pthread_cond_wait() to see this.

PR #749 adds those settingst to that sim otest defconfig

from incubator-nuttx.

patacongo avatar patacongo commented on June 6, 2024

Returning EINTR really only applies to read() and write() (and open()) functions as covered by #669 . In other places, EINTR can be ignored.

open() and close() should also return EINTR. But i don't think that is very meaningful for close since most people ignore the return value from close().

But, on the other hand, ECANCELED can never be ignore under any condition.

from incubator-nuttx.

patacongo avatar patacongo commented on June 6, 2024

Most of the content of this Issue has been preserved as an Addendum to the "Cancellation Points" Wiki Page: https://cwiki.apache.org/confluence/display/NUTTX/Cancellation+Points

from incubator-nuttx.

patacongo avatar patacongo commented on June 6, 2024

And even if taking the mutual exclusion semaphore does cause the thread to block, it should block form only a very brief moment of time and the likelihood of a signal received or of the task being canceled is very small.

There is an exception to this. A driver that reads data could have a sequence like this:

  1. Get exclusive access to the driver by taking a semaphore
  2. Wait on a semaphore for data (might be a long time)
  3. Release the mutual exclusion semaphore.

Now suppose there are two applications, Task A and Task B, that open the driver and try to read from it. The first, Task A, will get the mutual exclusion semaphore at (1) and then will hang waiting for data at (2). But the second, Task B, will hang at (1) for very long time. It is effectively queued and cannot event begin its wait on the signaling semphore until Task B receives its data and releases the mutual exclusion semaphore. This normal behavior and exactly, how this is supposed to work.

But now suppose that Task B receives a signal interrupt or a cancellation event while waiting on the mutual exclusion semaphore. It MUST terminate the wait and return -EINTR or -ECANCEL immediately. Task A may never receive data and if that action is not taken, the signal interrup or cancellation event is lost and the functionality has failed.

from incubator-nuttx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.