Hi Unidata/Steve,
I would really like to see a "-wait" or equivalent option added to PIPE/EXEC actions to effectively limit the number of processes one pqact could have active at one time. This flag would cause pqact to not recycle the slot until that process has exited. I think I discussed this with you many years ago and you were not enthusiastic about it as misbehaving/naughty processes could wedge up and effectively jam up pqact as well as pqact waits for these processes to exit...
The issue is that any process that has this one product in execution model, could effectively DOS a system as pqact exec's off one process per product received. Starting up LDM after a considerable downtime is one example. Another is some products that come in rapid succession...
Currently, users have two options:
- Allow their process to handle more than one product on stdin, effectively making it long running
- Add some locking mechanism that checks to see if others like it are currently running and then sleeps for a bit waiting for those to exit.
I personally loathe option 2 as having potentially hundreds of scripts writing lock files and sleeping is a race condition waiting to happen. I have written lots of processes that do option 1, but not all are well suited for it. For example, satellite data processors.
A nice aspect of this is that pqact could then log non-zero exit statuses from these '-wait' processes, which would help users debugging this. Perhaps some other logging would already kick in, if pqact had no available slots over some given about of time, I am unsure of that one.
I think a reasonable exception is for '-wait' to imply a '-close' as well. I'd be happy to provide feedback if there are other edge cases you anticipate. Thanks for your consideration :)