Comments (8)
Thanks for the report. I believe I've reproduced this problem in the
past and have a patch that should limit the churn. I'll update later
with a cleaned up patch and more details.
- seth
On May 14, 2012, at 14:55, Jeremy Raymond
[email protected]
wrote:
In
pooler:take_member_from_pool/3
when there are no free PIDs we try to add pids:take_member_from_pool(....) -> case Free of [] when NumInUse =:= Max -> % snip [] when NumInUse < Max -> case add_pids(PoolName, 1, State) of % JR: try to add pids {ok, State1} -> take_member(PoolName, From, State1); % JR: try to take again, take_member/3 calls take_member_from_pool/3 {max_count_reached, _} -> % snipIn
add_pids/3
if we're unable to add any PIDs, say because the backend is down, we end up back in the same spot where there are no free Pids. We then try to add some PIDs again endlessly recursing until we are able to add some PIDs. Each time this is done new worker threads are being spawned to create the connection viastart_n_pids/4
's call tosupervisor:start_child/2
. This is causing me issues, as each time the connection to the backend fails it's generating errors in the logs, filling up disk at a fast rate.Need some way to detect his condition, and error out rather than spinning. This also blocks the
pooler:take_member()
call until the caller receives agen_server
error timeout.
Reply to this email directly or view it on GitHub:
#6
from pooler.
So I've updated (rebased and added a test) the limit_failed_adds branch. From the commit log:
commit 25dc19d7f0aea96bee4bbe81a14b4ee5222df55d
Author: Seth Falcon [email protected]
Date: Thu Feb 9 16:23:36 2012 -0800
Crash the pooler gen_server if too many failed adds occur
In a situation where pooler is unable to add new pool members, pooler
can end up looping on attempting to add a member, member crashes,
attempting to add again.
This patch adds a failed_adds counter to the pool record. When this
counter reaches the init_count for the pool, pooler itself crashes. The
failed_adds is reset anytime a call to add_pids is made in which there
were no failures (all requested members added).
Will this resolve your issue?
from pooler.
Thanks helps solve the run away process creation issue. What about instead of crashing returning an error? If an error was returned I could do something more reasonable than crashing when calling take_member/0
case pooler:take_member() of
error_no_members ->
% handle error
error_no_pids -> % JR: this is new
% handle error rather than crashing
Pid ->
% do some work
end
Maybe even just returning error_no_members
would be appropriate. A backend being temporarily unavailable (network outage, or backend upgrade) seems like a normal type of situation rather than something exceptional and worth crashing over.
from pooler.
That's not a bad idea. I initially opted for a crash as it applies at least a little bit of back-pressure on attempting to add new pids. For example, if the client code you are pooling with pooler is noisy, logging-wise, when it crashes on a bad start then pooler is still in a tight loop trying to start clients and this could cause your vm issues in terms of filling up logs etc.
So I wonder if what is really needed is a more explicit mechanism to back-off attempting to start clients when failures are frequent.
from pooler.
What is preferable to me would be that pooler would indicate to the client that the backend is down so the client can do some thing smart about it like display a MSG to the user or wait and retry later. If pooler just crashes then the linked client crashes as in my case. If pooler tries itself to wait or back off then it may be slow to return controll to the caller causing potential timeouts on the client end.
from pooler.
What you've described makes sense and I will amend the patch to return
an error and not crash pooler.
I think I may not have explained myself very well, however. In
suggesting a back off mechanism I wasn't imagining that this would
make clients wait -- return an error and let them decide what to do
next. Rather, I'm wondering if some back off would be sensible to
avoid the log storm issue since failing client starts are likely to
trigger a log storm -- so the mechanism that adds clients should
perhaps contain some start throttling.
Also, somewhat unrelated, I'd like to modify pooler to use monitors so
that a pooler crash would not have to crash clients.
On May 16, 2012, at 6:06 PM, Jeremy Raymond
[email protected]
wrote:
What is preferable to me would be that pooler would indicate to the client that the backend is down so the client can do some thing smart about it like display a MSG to the user or wait and retry later. If pooler just crashes then the linked client crashes as in my case. If pooler tries itself to wait or back off then it may be slow to return controll to the caller causing potential timeouts on the client end.
Reply to this email directly or view it on GitHub:
#6 (comment)
from pooler.
I've pushed a different fix to master that I think addresses this issue. The causes take_member to return with error_no_members after making one attempt to add a new member in the case that there are no free members and the pool count is less than max_count. The number of retries is configurable.
Please let me know if this resolves the issue you are seeing w/ temporary backend outage causing problems.
from pooler.
I think the fix on master for this addresses the out of control issue. With the new code, new member creation is only attempted once per take_member call (though this is configurable if more retries is desired).
from pooler.
Related Issues (20)
- Unit test failed for Erlang Release 17
- Support OTP 18 HOT 4
- How to perform a task across all workers in a pool? HOT 7
- metrics? HOT 1
- Unexpected :error_no_members HOT 7
- Build failure with OTP 18.1 HOT 2
- pooler:take_member returns same PID all the time HOT 10
- eredis and pooler - pair not made in heaven HOT 3
- Failed to start member
- Should add API for take_group_member/2? HOT 1
- Use docs instead of dev profile
- pooler:time_as_{millis, micro}/2 tests break with rebar3 on R15B03-1 HOT 2
- "exit with reason killed in context child_terminated" problem HOT 2
- Support: should add a map to choose a pool to start under different conditions in config HOT 1
- :error_no_group when function is called from a separate app HOT 3
- erlang/otp 20 , pooler 1.5.2, can not compile HOT 2
- ERROR REPORT when start with make run HOT 1
- Node Failure Handling, HOT 1
- add_member_retry seems to be not in use anymore
- Add prometheus-style metics API
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pooler.