epgsql / pooler Goto Github PK

An OTP Process Pool Application

License: Apache License 2.0

Makefile 1.87% Erlang 97.01% Shell 1.12%

pooler's Introduction

pooler - An OTP Process Pool Application

The pooler application allows you to manage pools of OTP behaviors such as gen_servers, gen_fsms, or supervisors, and provide consumers with exclusive access to pool members using pooler:take_member.

What pooler does

Protects the members of a pool from being used concurrently

The main pooler interface is pooler:take_member/1 and pooler:return_member/3. The pooler server will keep track of which members are in use and which are free. There is no need to call pooler:return_member if the consumer is a short-lived process; in this case, pooler will detect the consumer’s normal exit and reclaim the member. To achieve this, pooler tracks the calling process of take_member as the consumer of the pool member. Thus pooler assumes that there is no middle-man process calling take_member and handing out the member pid to another worker process.

Maintains the size of the pool

You specify an initial and a maximum number of members in the pool. Pooler will create new members on demand until the maximum member count is reached. New pool members are added to replace members that crash. If a consumer crashes, the member it was using will be destroyed and replaced. You can configure Pooler to periodically check for and remove members that have not been used recently to reduce the member count back to its initial size.

Manage multiple pools

You can use pooler to manage multiple independent pools and multiple grouped pools. Independent pools allow you to pool clients for different backend services (e.g. postgresql and redis). Grouped pools can optionally be accessed using pooler:take_group_member/1 to provide load balancing of the pools in the group. A typical use of grouped pools is to have each pool contain clients connected to a particular node in a cluster (think database read slaves). Pooler’s take_group_member function will randomly select a pool in the group to fetch a member from. If the randomly selected pool has no free members, pooler will attempt to obtain a member from each pool in the group. If there is no pool with available members, pooler will return error_no_members.

Motivation

The need for pooler arose while writing an Erlang-based application that uses Riak for data storage. Riak’s protocol buffer client is a gen_server process that initiates a connection to a Riak node. A pool is needed to avoid spinning up a new client for each request in the application. Reusing clients also has the benefit of keeping the vector clocks smaller since each client ID corresponds to an entry in the vector clock.

When using the Erlang protocol buffer client for Riak, one should avoid accessing a given client concurrently. This is because each client is associated with a unique client ID that corresponds to an element in an object’s vector clock. Concurrent action from the same client ID defeats the vector clock. For some further explanation, see post 1 and post 2. Note that concurrent access to Riak’s pb client is actual ok as long as you avoid updating the same key at the same time. So the pool needs to have checkout/checkin semantics that give consumers exclusive access to a client.

On top of that, in order to evenly load a Riak cluster and be able to continue in the face of Riak node failures, consumers should spread their requests across clients connected to each node. The client pool provides an easy way to load balance.

Since writing pooler, I’ve seen it used to pool database connections for PostgreSQL, MySQL, and Redis. These uses led to a redesign to better support multiple independent pools.

Usage and API

Pool Configuration via application environment

Pool configuration is specified in the pooler application’s environment. This can be provided in a config file using -config or set at startup using application:set_env(pooler, pools, Pools). Here’s an example config file that creates two pools of Riak pb clients each talking to a different node in a local cluster and one pool talking to a Postgresql database:

% pooler.config
% Start Erlang as: erl -config pooler
% -*- mode: erlang -*-
% pooler app config
[
 {pooler, [
         {pools, [
                  #{name => rc8081,
                    group => riak,
                    max_count => 5,
                    init_count => 2,
                    start_mfa =>
                     {riakc_pb_socket, start_link, ["localhost", 8081]}},

                  #{name => rc8082,
                    group => riak,
                    max_count => 5,
                    init_count => 2,
                    start_mfa =>
                     {riakc_pb_socket, start_link, ["localhost", 8082]}},

                  #{name => pg_db1,
                    max_count => 10,
                    init_count => 2,
                    start_mfa =>
                     {epgsql, connect, [#{host => "localhost", username => "user", database => "base"}]}}
                 ]}
           %% if you want to enable metrics, set this to a module with
           %% an API conformant to the folsom_metrics module.
           %% If this config is missing, then no metrics are sent.
           %% {metrics_module, folsom_metrics}
        ]}
].

Each pool has a unique name, specified as an atom, an initial and maximum number of members, and an {M, F, A} describing how to start members of the pool. When pooler starts, it will create members in each pool according to init_count. Optionally, you can indicate that a pool is part of a group. You can use pooler to load balance across pools labeled with the same group tag.

Culling stale members

The cull_interval and max_age pool configuration parameters allow you to control how (or if) the pool should be returned to its initial size after a traffic burst. Both parameters specify a time value which is specified as a tuple with the intended units. The following examples are valid:

%% two minutes, your way
{2, min}
{120, sec}
{120000, ms}

The cull_interval determines the schedule when a check will be made for stale members. Checks are scheduled using erlang:send_after/3 which provides a light-weight timing mechanism. The next check is scheduled after the prior check completes.

During a check, pool members that have not been used in more than max_age minutes will be removed until the pool size reaches init_count.

The default value for cull_interval is {1, min}. You can disable culling by specifying a value os {0, min}. The max_age parameter defaults to {30, sec}.

Pool Configuration via `pooler:new_pool`

You can create pools using pooler:new_pool/1 when accepts a map of pool configuration. Here’s an example:

PoolConfig = #{
    name => rc8081,
    group => riak,
    max_count => 5,
    init_count => 2,
    start_mfa => {riakc_pb_socket, start_link, ["localhost", 8081]}
},
pooler:new_pool(PoolConfig).

Dynamic pool reconfiguration

Pool configuration can be changed in runtime

pooler:pool_reconfigure(rc8081, PoolConfig#{max_count => 10, init_count => 4}).

It will update the pool’s state and will start/stop workers if necessary, join/leave group, reschedule the cull timer etc. The only parameters that can’t be updated are name and start_mfa.

However, updated configuration won’t survive pool crash (it will be restarted with old config by supervisor). But this should not normally happen.

Using pooler

Here’s an example session:

pooler:start().
P = pooler:take_member(mysql),
% use P
pooler:return_member(mysql, P, ok).

Once started, the main interaction you will have with pooler is through two functions, take_member/1 and return_member/3 (or return_member/2).

Call pooler:take_member(Pool) to obtain the pid belonging to a member of the pool Pool. When you are done with it, return it to the pool using pooler:return_member(Pool, Pid, ok). If you encountered an error using the member, you can pass fail as the second argument. In this case, pooler will permanently remove that member from the pool and start a new member to replace it. If your process is short lived, you can omit the call to return_member. In this case, pooler will detect the normal exit of the consumer and reclaim the member.

If you would like to obtain a member from a randomly selected pool in a group, call pooler:take_group_member(Group). This will return a Pid which must be returned using pooler:return_group_member/2 or pooler:return_group_member/3.

pooler as an included application

In order for pooler to start properly, all applications required to start a pool member must be start before pooler starts. Since pooler does not depend on members and since OTP may parallelize application starts for applications with no detectable dependencies, this can cause problems. One way to work around this is to specify pooler as an included application in your app. This means you will call pooler’s top-level supervisor in your app’s top-level supervisor and can regain control over the application start order. To do this, you would remove pooler from the list of applications in your_app.app and add it to the included_application key:

{application, your_app,
 [
  {description, "Your App"},
  {vsn, "0.1"},
  {registered, []},
  {applications, [kernel,
                  stdlib,
                  crypto,
                  mod_xyz]},
  {included_applications, [pooler]},
  {mod, {your_app, []}}
 ]}.

Then start pooler’s top-level supervisor with something like the following in your app’s top-level supervisor:

PoolerSup = {pooler_sup, {pooler_sup, start_link, []},
             permanent, infinity, supervisor, [pooler_sup]},
{ok, {{one_for_one, 5, 10}, [PoolerSup]}}.

Metrics

You can enable metrics collection by adding a metrics_module entry to pooler’s app config. Metrics are disabled by default. The module specified must have an API matching that of the folsom_metrics module in folsom (to use folsom, specify {metrics_module, folsom_metrics}} and ensure that folsom is in your code path and has been started.

When enabled, the following metrics will be tracked:

Metric Label	Description
pooler.POOL_NAME.take_rate	meter recording rate at which take_member is called
pooler.error_no_members_count	counter indicating how many times take_member has returned error_no_members
pooler.killed_free_count	counter how many members have been killed when in the free state
pooler.killed_in_use_count	counter how many members have been killed when in the in_use state
pooler.event	history various error conditions

Demo Quick Start

Clone the repo:

git clone https://github.com/epgsql/pooler.git

Build and run tests:
```
cd pooler; make && make test
    
```

Start a demo

make run

Erlang R16B03 (erts-5.10.4) [source] [64-bit] [smp:8:8] [async-threads:10] [kernel-poll:false]

Eshell V5.10.4  (abort with ^G)
1> pooler:start().
ok
2> M = pooler:take_member(pool1).
<0.44.0>
3> pooled_gs:get_id(M).
{"p1",#Ref<0.0.0.38>}
4> M2 = pooler:take_member(pool1).
<0.45.0>
5> pooled_gs:get_id(M2).
{"p1",#Ref<0.0.0.40>}
6> pooler:return_member(pool1, M, ok).
ok
7> pooler:return_member(pool1, M2, ok).
ok

Implementation Notes

Overview of supervision

The top-level supervisor is pooler_sup. It supervises one supervisor for each pool configured in pooler’s app config.

At startup, a pooler_NAME_pool_sup is started for each pool described in pooler’s app config with NAME matching the name attribute of the config.

The pooler_NAME_pool_sup starts the gen_server that will register with pooler_NAME_pool as well as a pooler_NAME_member_sup that will be used to start and supervise the members of this pool. The pooler_starter_sup is used to start temporary workers used for managing async member start.

pooler_sup: one_for_one pooler_NAME_pool_sup: all_for_one pooler_NAME_member_sup: simple_one_for_one pooler_starter_sup: simple_one_for_one

Groups of pools are managed using the pg (OTP-23+) or pg2 (OTP below 23) application. This imposes a requirement to set a configuration parameter on the kernel application in an OTP release. Like this in sys.config:

% OTP_RELEASE >= 23
{kernel, [{start_pg, true}]}
% OTP_RELEASE < 23
{kernel, [{start_pg2, true}]}

Contribute

All contributions are welcome!

Pooler uses rebar3 fmt code formatter. Please make sure to apply make format before committing any code.

In pooler we are trying to maintain high test coverage. Run make test to ensure code coverage does not fall below a threshold (it is automatically validated).

Pooler is quite critical to performance regressions. We do not run benchmarks in CI, so, to make sure your change does not make pooler slower, please run the benchmarks before and after your changes and make sure there are no major regressions on the most recent OTP release. The workflow is:

$ git checkout master
$ rebar3 bench --save-baseline master  # run benchmarks, save results to `master` file
$ git checkout -b <my feature branch>

# <do your code changes>

$ rebar3 bench --baseline master  # run benchmarks on updated code, compare results with `master` results
$ git commit ... && git push ...

Please attach the output of rebar3 bench --baseline master after your changes to the PR description in order to prove that there were no performance regressions. Please attach the OTP version you run the benchmarks on.

New release

Our goal is to allow the hot code upgrade of pooler, so it is shipped with .appup file and hot upgrade procedure is tested in CI.

To cut a new release, do the following steps:

In src/pooler.app.src: update the vsn
In src/pooler.appup.src: replace the contents with upgrade instructions for a new release
In test/relx-base.config: update the pooler’s app version to a previous release (or leave it without version)
In test/relx-current.config: update the pooler’s app version to a new one
In .github/workflows/hot_upgrade.yml: update from_version to a previous release, maybe bump OTP version as well
Push, wait for the green build, tag

License

Pooler is licensed under the Apache License Version 2.0. See the LICENSE file for details.

pooler's People

Contributors

Stargazers

Watchers

Forkers

dieter74 bipthelin christophermaier n1rvana jasonjackson langloisjp luigimax algking jbothma alepharchives sankark nivertech coderanger liveforeverx pombredanne licenser chinnurtb ivlis nevar bdacode tanyewwei wrw camshaft o0myself0o loguntsov jwilberding prajaktapurohit lastres marcelmeyer bookshelfdave drapp oferrigni axisofeval zhangf911 drewkerrigan rotosix sdebnath timcf imdeps hamidreza-s marcparadise sebastian andy-dufour sky-big charsyam cclam0827 zskevin prots guitarmind iamtwang newsdy cjimison therealseeds maxkuzmins randysecrist pbadeer yangcancai lixhq markan siftlogic jbreindel dams 221v serhadiletir deanzone sismaker ljming1106 davidalphafox yqfclid xircleag chef shpaky mikpe xaptum-eng xentelar bsnote dziaineka seriyps ioolkos alexcastano evslusar

pooler's Issues

Add prometheus-style metics API

Can use https://github.com/deadtrickster/prometheus.erl as a reference API

Alias pool names

Seth,

Hope you are doing well.

I have successfully tested the dynamic pool configuration and it is working fine. In the same context, I came across a potential need for pool name alias. In our application, there are different functional modules that commonly share a resource pool (like Riak connections). Each functional module has its own pool usage pattern and requires different performance latencies. At this time, all such functional modules use one common pool using the same pool name. As a result, the pool params such as number of connections are configured taking into account all usage patterns. In future though, one common pool will not suffice to scale out. It would have been nice if there was an ability to add alias names so that each functional module can access the same underlying pool using its own alias name and in future if there is a need to create separate pool for a given functional module, it would become easy for sys admins to configure a separate pool or even roll back to original common pool based on point in time scaling requirements.

As I mentioned in one of previous threads, a pool instance is perhaps an OTP gen_server and if several functional modules access the same pool, essentially, the mailbox of that pooler gen_server gets bombarded with requests and hence will add incremental latencies in processing the requests for pooled resources. One way that is possible is to create separate pools from word go but then it will just be a proliferation of pools without ability to optimize the resources as and when needed.

This is not urgent but wanted to share with you regardless.

Thanks,

Murali

pooler:take_member returns same PID all the time

Hello,
I am using pooler application from tag 1.5.0
I use following pooler configuration:

{pooler, [
           {pools, [
                    [
                     {name, riak_pool},
                     {group, riak},
                     {max_count, 20},
                     {init_count, 10},
                     %% client_option() :: queue_if_disconnected |
                     %% {queue_if_disconnected, boolean()} |
                     %% {connect_timeout, pos_integer()} |
                     %%     auto_reconnect |
                     %% {auto_reconnect, boolean()}.
                     {start_mfa, {riakc_pb_socket, start_link, ["localhost", 8087,
                         [queue_if_disconnected, auto_reconnect]]}}
                    ]
                   ]
            }
           ]
  }

I run queries using this approach:

Pid = connect(),
... using Pid ...
close(Pid),
...

connect and close functions are below:

connect() ->
    Pid = pooler:take_member(riak_pool, 5000),
    lager:debug("Retrieved PID ~p from pooler", [Pid]),
    Pid.

close(Pid) ->
    lager:debug("Returning PID ~p to pooler", [Pid]),
    pooler:return_member(riak_pool, Pid, ok).

In other module which has application behaviour I run pooler:

application(pooler)

The problem is that, I retrieve same Pid all the time.
I am sure that many processes in parallel are invoking connect and close methods but all the time same Pid is used.

Is there a bug in this version or am I doing something in a wrong way?

Regards,
Szymon Wlodarczyk

Build failure with OTP 18.1

mmartin@testcat:~/git/pooler$ make
WARN: Missing plugins: [rebar_lock_deps_plugin]
==> pooler (get-deps)
Pulling rebar_lock_deps_plugin from {git,
"git://github.com/seth/rebar_lock_deps_plugin.git",
{branch,"master"}}
Cloning into 'rebar_lock_deps_plugin'...
Pulling edown from {git,"git://github.com/seth/edown.git",{branch,"master"}}
Cloning into 'edown'...
WARN: Missing plugins: [rebar_lock_deps_plugin]
==> rebar_lock_deps_plugin (get-deps)
WARN: Missing plugins: [rebar_lock_deps_plugin]
==> edown (get-deps)
WARN: Missing plugins: [rebar_lock_deps_plugin]
WARN: Missing plugins: [rebar_lock_deps_plugin]
==> rebar_lock_deps_plugin (compile)
Compiled src/rldp_util.erl
Compiled src/rldp_change_log.erl
Compiled src/rebar_lock_deps_plugin.erl
==> edown (compile)
Compiled src/edown_make.erl
Compiled src/edown_lib.erl
Compiled src/edown_xmerl.erl
/home/mmartin/git/pooler/deps/edown/src/edown_doclet.erl:116: field packages undefined in record doclet_gen
/home/mmartin/git/pooler/deps/edown/src/edown_doclet.erl:118: field filemap undefined in record doclet_gen
Compiling /home/mmartin/git/pooler/deps/edown/src/edown_doclet.erl failed:
ERROR: compile failed while processing /home/mmartin/git/pooler/deps/edown: rebar_abort
make: *** [compile] Error 1

Dynamically adding a pool

Hi,

Is there a way to add a pool programmatically? As of now, it seems pools have to be set via a configuration that is read at start-up of the pooler. Having ability to add pools dynamically will help when pool parameters have to be determined during run-time.

I wasn't sure if there is a compelling reason for not having that feature currently.

Thanks,

Murali

Unexpected :error_no_members

Hello, Seth!

I'm using pooler in my Elixir app to connect to my favorite music player service (MPD). I'm a bit confused with the behavior I'm seeing from pooler, though. With my config, I do see 2 ExMpd.Connection processes spawn to start. But when both are in use and I request a third, instead of creating a new one and ramping up toward 5 as I expected, I notice that the pooler:take_group_member(:mpd) call is returning :error_no_members.

Surely I'm just missing something simple, but I'm not sure what it is! The readme says that new workers should be spawned "on demand" ... but I would have thought calling take_group_member would qualify.

config :pooler, pools: [
  [
    name:       :mpd,
    group:      :mpd,
    init_count: 2,
    max_count:  5,
    start_mfa:  {ExMpd.Connection, :start_link, [mpd_host, mpd_port]}
  ]
]

Thanks in advance for any thoughts!

Adam

blocking API

Instead of returning full when no workers available, It'd be nice to have an option to block the calling process until a worker becomes available (with timeout), like poolboy does.

Pooler app.config

hi,

I'm using pooler and I need to pass .hrl or record as an argument (MFA). The pooler's config is in priv directory.

How can I pass a record to M.F.A?

thanks,
-- buriwoy

intermittent test failure in travis

Seeing this:


pooler_tests: pooler_groups_test_ (take and return one group member (repeated))...*failed*
in function gen_server:call/2 (gen_server.erl, line 180)
in call from pooler_tests:'-pooler_groups_test_/0-lc$^0/1-0-'/1 (test/pooler_tests.erl, line 405)
in call from pooler_tests:'-pooler_groups_test_/0-fun-7-'/0 (test/pooler_tests.erl, line 403)
**exit:{{nodedown,group_1},{gen_server,call,[{error_no_group,group_1},get_id]}}
=======================================================
  Failed: 1.  Skipped: 0.  Passed: 80.
One or more tests were cancelled.
Cover analysis: /home/travis/build/seth/pooler/.eunit/index.html
ERROR: One or more eunit tests failed.
The command "rebar skip_deps=true eunit" exited with 1.
Done. Your build exited with 1.

https://travis-ci.org/seth/pooler/builds/22953645
https://travis-ci.org/seth/pooler/jobs/22953648

Failed to start member

it's ok

Unexpected "error_no_members" when load testing cowboy server

Hi,

I just began a simple cowboy rest service using the pooler configuration defined in the readme (basically setting up the basic config for the pool and adding it to the project as a "included_applications"). The rest service only consist of a simple endpoint where I query a database:

hello_to_json(Req, State) ->
    case pooler:take_member(pg_db, {1, min}) of
        error_no_members ->
            Body = jsone:encode(#{error => <<"could not get a database connection">>}),
            io:fwrite("Error!!~n"),
            {Body, Req, State};
        Conn ->
            {ok, _, [{Id, Name, Address, _, City}]} = epgsql:squery(Conn, "select * from person"),
            Body =
                jsone:encode(#{id => Id,
                               name => Name,
                               address => Address,
                               city => City}),
            {Body, Req, State}
    end.

My understanding is that, with the current approach, the process should wait for a minute before returning a "error_no_members". That's the case when I try it in the shell; however, when I load test the service with wrk (wrk -t4 -c100 -d30s http://127.0.0.1:3000/), the service begins to get "error_no_members" a lot of times even though the waiting time is longer than the load test is running.

Thanks in advance!

erlang/otp 20 , pooler 1.5.2, can not compile

==> "pooler"
Compiling /deps/pooler/src/pooler.erl
Line 20: error {undefined_type,{dict,0}} in "/deps/pooler/src/pooler.hrl"
Line 21: error {undefined_type,{queue,0}} in "/deps/pooler/src/pooler.hrl"
ERROR: []
escript: exception error: no function clause matching
lists:flatten({return,{return,[{error,[]}]}}) (lists.erl, line 616)

Max connection lifetime?

Is there any way to achieve a max connection lifetime in the pool, even for active connections? We're running this in a network with firewalls that kills connections after 1 hour regardless of how active they are. We're seeing a lot of errors in pooler because of closed socket:

{exit,sock_closed,
      [{gen_server,handle_common_reply,8,[{file,"gen_server.erl"},{line,1208}]},
       {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}

Or, alternatively, is there a way for pooler or epgsql to handle that message and self-close the connection process when the socket is closed remotely?

member crashes not handled when member is in free list

If you hard kill a pool member when it is in the free list, pooler ignores the crash and the member remains in the free list. We should fix that.

Unit test failed for Erlang Release 17

Use docs instead of dev profile

If instead of using dev profile here: https://github.com/seth/pooler/blob/549aad444e07f0edd911b055f37af5640b4c4b6e/rebar.config#L32-L41 you instead do:

{profiles, [
    {docs, [
        {edoc_opts, [{doclet, edown_doclet}]},

        {deps, [
            {edown, ".*",
                {git, "git://github.com/uwiger/edown.git", {tag, "0.8"}}}
        ]}
    ]}
]}.

Then those settings will be automatically used when running rebar3 edoc. As opposed to having to run rebar3 as dev edoc.

Resize pool at runtime

Hello,

I apologize if I missed this in the documentation/tests but is there a way to change the size (max_count) of a pool at runtime? Thanks.

pooler:time_as_{millis, micro}/2 tests break with rebar3 on R15B03-1

from #66:

Travis is failing test cases on R15B03-1, which seems to be related to rebar3 mess (I'm guessing it's not loading paths correctly since it cannot find pooler:time_as_{millis, micro}/2). I'm a strong advocate of using build tools which are agnostic to what OTP version your system is running. A build tool shouldn't change behaviour depending on what target you're aiming for.

What is the benefit of running rebar3 for a dependency project such as Pooler? This project has only edown as a dependency and Pooler isn't doing any advanced project management that requires Rebar3. So why not go with a build tool that's small, doesn't rely on specific OTP versions to work correctly and which also can be used in as many projects as possible?

For example erlang.mk, as far as I know rebar3 will invoke make for projects that aren't rebar3. Even if it doesn't, we can generate a rebar.config[1] and keep it in the project. This should work fine with erlang.mk based projects and rebar and as well with any other build tool which invokes make.

I'm currently tinkering with this but would like to hear what peoples' opinions are about it.

[1] https://erlang.mk/guide/compat.html 11.2

"exit with reason killed in context child_terminated" problem

I don't know why, I get the following error every one or two mins.

15:51:30.964 [error] Supervisor pooler_pgsql_pool_member_sup had child epgsql started with {epgsql,connect,undefined} at <0.25555.8> exit with reason killed in context child_terminated
15:51:30.965 [error] Supervisor pooler_pgsql_pool_member_sup had child epgsql started with {epgsql,connect,undefined} at <0.9309.8> exit with reason killed in context child_terminated
15:51:30.965 [error] Supervisor pooler_pgsql_pool_member_sup had child epgsql started with {epgsql,connect,undefined} at <0.31360.7> exit with reason killed in context child_terminated
15:51:30.966 [error] Supervisor pooler_pgsql_pool_member_sup had child epgsql started with {epgsql,connect,undefined} at <0.9305.8> exit with reason killed in context child_terminated
15:51:30.966 [error] Supervisor pooler_pgsql_pool_member_sup had child epgsql started with {epgsql,connect,undefined} at <0.10522.7> exit with reason killed in context child_terminated
15:51:30.966 [error] Supervisor pooler_pgsql_pool_member_sup had child epgsql started with {epgsql,connect,undefined} at <0.25559.8> exit with reason killed in context child_terminated
15:51:30.967 [error] Supervisor pooler_pgsql_pool_member_sup had child epgsql started with {epgsql,connect,undefined} at <0.17923.7> exit with reason killed in context child_terminated
15:51:30.969 [error] Supervisor pooler_pgsql_pool_member_sup had child epgsql started with {epgsql,connect,undefined} at <0.18451.9> exit with reason killed in context child_terminated
15:52:30.965 [error] Supervisor pooler_pgsql_pool_member_sup had child epgsql started with {epgsql,connect,undefined} at <0.18430.9> exit with reason killed in context child_terminated
15:52:30.965 [error] Supervisor pooler_pgsql_pool_member_sup had child epgsql started with {epgsql,connect,undefined} at <0.18448.9> exit with reason killed in context child_terminated

Does anyone help me? very thanks!!

Starting members not actually limited to batch size of init_count

This is not a bug per se, just something I wanted to bring to your attention -- on the off chance it's unintentional or has potential to cause anybody any grief.

Pooler grows the member pool by batches of init_count, but that is not strictly enforced. I don't necessarily think it should be enforced, but some interesting behavior might arise in certain cases because it's not.

What I mean by not strictly enforced is:
https://github.com/seth/pooler/blob/master/src/pooler.erl#L560

Here's a sample case to illustrate the point...all members are currently in use, but there is room for growth:

init_count = 10
max_count = 100
free_pids = []
in_use_count = 10
starting_members = []

Somebody tries to take a member. That results in add_members_async(10, ...) starting up another batch of 10, and it returns error_no_members to the caller. All good so far, and now:

starting_members = [...10 elements...]

Immediately thereafter (prior to any of the new batch being fully started/added/accepted), another caller tries to take a member. This gets evaluated:

NumToAdd = max(min(InitCount - NonStaleStartingMemberCount, NumCanAdd), 1)

...which equates to max(min(10 - 10, 80), 1), so NumCanAdd = 1. That results in add_members_async(1, ...) starting up another single member, and it returns error_no_members to the caller.

Now we have 11 members being started. This may not be a problem per se, but if (a) it takes long enough for new members to start, and (b) consumers are trying to take a member at a fast enough rate, it can result in a # of starting members that can grow quite large. Worst case scenario (I believe) is that you can have up to max_count - init_count members starting at once.

(a) and (b) is the perfect storm that can potentially overload things.

This may be a non-issue, but I wanted to bring it to your attention. If this ever bites anybody, a fix would be simple...just placing a configurable hard limit on it, i.e. max_starting_count. It could default to something like max(init_count,1) out of the box, and users could raise/lower it as they see fit.

What do you think?

Cull Members not called automatically

The README indicates that, "Pooler will remove members that have not been used in cull_after minutes". I see functions in pooler.erl to cull members, but I don't see it being called automatically anywhere. I don't see cull_after mentioned anywhere other than in docs. Is this a yet to be completed feature?

Errors when using rm_pool

When using rm_pool to remove a pool the process ends with an error given that rm_pool is not a crash behavior this probably should not be treated as an error?

Example:

2014-07-30 17:08:52.708 [error] <0.255.0> Supervisor 'pooler_howl@localhost:4240_member_sup' had child mdns_client_lib_worker started with {mdns_client_lib_worker,start_link,undefined} at <0.259.0> exit with reason killed in context child_terminated

Will there be a new release soon?

There are features on master like pooler:pool_reconfigure/2 that are not in any release. Will there be a new version soon?

Clarify intended return value for pooler:take_group_member/1, and add functionality for destroying pool groups

The README states that

If you would like to obtain a member from a randomly selected pool in a group, call pooler:take_group_member(Group). This will return a {Pool, Pid} pair. You will need the Pool value to return the member to its pool.

However, the code for the actual take_group_member/1 function only returns a pid()

Could we clarify which of these is the intended behaviour?

This was motivated by certain scenarios which I faced, whereby I would like to remove all pools in a particular group.

The way to do that would either be to provide some function like destroy_group/1, or for take_group_member/1 to return the pool name, so that we can call rm_pool/1 to destroy the pool as needed.

Any thoughts on which would be the best approach would be great.

Thanks!

Error with pooler 1.2.1 and sqerl

On startup I get:

=ERROR REPORT==== 30-Sep-2014::21:53:53 ===
** Generic server <0.48.0> terminating
** Last message in was {'$gen_cast',stop}
** When Server state == {starter,
                            {pool,sqerl,undefined,10,5,
                                {sqerl_client,start_link,[]},
                                [],0,0,1,
                                {1,min},
                                {30,sec},
                                pooler_sqerl_member_sup,undefined,
                                {dict,0,16,16,8,80,48,
                                    {[],[],[],[],[],[],[],[],[],[],[],[],[],
                                     [],[],[]},
                                    {{[],[],[],[],[],[],[],[],[],[],[],[],[],
                                      [],[],[]}}},
                                {dict,0,16,16,8,80,48,
                                    {[],[],[],[],[],[],[],[],[],[],[],[],[],
                                     [],[],[]},
                                    {{[],[],[],[],[],[],[],[],[],[],[],[],[],
                                      [],[],[]}}},
                                [],
                                {1,min},
                                pooler_no_metrics,folsom},
                            <0.47.0>,
                            {<0.48.0>,<0.53.0>}}
** Reason for termination ==
** {bad_return_value,
       {stop,normal,stop_ok,
           {starter,
               {pool,sqerl,undefined,10,5,
                   {sqerl_client,start_link,[]},
                   [],0,0,1,
                   {1,min},
                   {30,sec},
                   pooler_sqerl_member_sup,undefined,
                   {dict,0,16,16,8,80,48,
                       {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                       {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                   {dict,0,16,16,8,80,48,
                       {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                       {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                   [],
                   {1,min},
                   pooler_no_metrics,folsom},
               <0.47.0>,
               {<0.48.0>,<0.53.0>}}}}

ERROR REPORT when start with make run

pool 'pool1' failed to start member: {error,
{'EXIT',
{undef,
[{pooled_gs

Support: should add a map to choose a pool to start under different conditions in config

I have many pools in config, but i want to start one of that, others still be retained so that using different conditions.
For example, the connections of db, mysql, oracle. I hope pooler has a map to choose the configure under different db

eredis and pooler - pair not made in heaven

This has occurred too many times for me and I am not sure what could be the cause. Hoping someone else has experienced the issue and can provide some workarounds.

I manage 5 riak connections and 1 redis via pooler on my application. After N weeks all of a sudden eredis connections start reporting the below while riak connections are fine:

2016-04-04 19:17:53 =SUPERVISOR REPORT====
     Supervisor: {local,pooler_redis01_member_sup}
     Context:    child_terminated
     Reason:     killed
     Offender:   [{pid,<0.4356.64>},{name,eredis},{mfargs,{eredis,start_link,undefined}},{restart_type,temporary},{shutdown,brutal_kill},{child_type,worker}]

The kills continues after the first error and doesn't really seem to be able to recover.

My config is as follows:

[
  {pooler, [
    {pools, [
      [{name, riak01},
       {group, riak},
       {max_count, 5000},
       {init_count, 10},
       {start_mfa, {riakc_pb_socket, start_link, ["riak-01.prod.domain.com", 8087]}}],

      [{name, riak02},
       {group, riak},
       {max_count, 5000},
       {init_count, 10},
       {start_mfa, {riakc_pb_socket, start_link, ["riak-02.prod.domain.com", 8087]}}],

      [{name, riak03},
       {group, riak},
       {max_count, 5000},
       {init_count, 10},
       {start_mfa, {riakc_pb_socket, start_link, ["riak-03.prod.domain.com", 8087]}}],

      [{name, riak04},
       {group, riak},
       {max_count, 5000},
       {init_count, 10},
       {start_mfa, {riakc_pb_socket, start_link, ["riak-04.prod.domain.com", 8087]}}],

      [{name, riak05},
       {group, riak},
       {max_count, 5000},
       {init_count, 10},
       {start_mfa, {riakc_pb_socket, start_link, ["riak-05.prod.domain.com", 8087]}}],

      [{name, redis01},
       {group, redis},
       {max_count, 5000},
       {init_count, 10},
       {start_mfa, {eredis, start_link, ["redis.prod.domain.com", 6379]}}]
     ]}
  ]},

Riak and redis are both operating and being used successfully by other services. Open file descriptors for the beam.smp process is about 106 and limited to 65536. Unfortunately not much else in the log. Any help appreciated.

Node Failure Handling,

Pooler will halt OTP startup if one of a group members is unavailable but configuration specifies non zero init workers. (Running into problems on production with riak ts nodes periodically crashing due to GCE NVME local disk instability).
Depending on number of active workers (I have a cluster doing about a million riak writes per minute, and saw cascading failures with 2048 connections per node x 6 riak nodes duplicated across 5 elixir servers) node failure can cascade to halt pooler and the OTP tree.
In general are there any recommended strategies for handling group member failures gracefully. I could hook up process listeners for example and automate pool add/remove or something like that but if there is some possible mechanism to serve fewer connections from a group if it has a recent high failure rate would be nice if possible.

using pooler with
https://github.com/drewkerrigan/riak-elixir-client

:error_no_group when function is called from a separate app

I keep getting

** (exit) exited in: :gen_server.call({:error_no_group, :riak})
** (EXIT) no connection to riak

I have an app_users and app_db and app_api, my tests in app_users are running fine but if I have a test in app_api that calls app_users code I get a no connection to riak. app_db is the only interface to riak and app_users has a dependency to it and app_api has a dependency to app_users. The only app with pooler in the deps setting is app_db which handles connection pooling to the database.

What am I mixing up?

add_member_retry seems to be not in use anymore

It was introduced in 396a800 but seems to be lost somewhere later.

Guess we need to remove it completely?

How to perform a task across all workers in a pool?

I'm using Pooler for resource management of my Riak connections. What I'm missing at the moment is being able to send a message to all workers in a pool.

What I want to achieve is telling each worker to call riakc_pb_socket:ping/1 or any other operation which is done on a interval basis.

I can't find any support for this through the existing APIs and the only reasonable approach I can think of is asking supervisor:which_children over the member_supervisor. The idea of this type of command invocation is not to actually perform any real work but to make sure work can be done.

Is there any better option than the one I mentioned?

Thanks,
Sam

metrics?

I can't seem to get folsom to report metrics for pooler. folsom is on and running - I can get folsom_vm_metrics from my application's console, but nothing for pooler. Any idea what I
may be missing?

Here's the content of my sys.config:

{pooler,
[
{pools,
[
[
{name, riak_pool},
{group, riak},
{max_count, 500},
{init_count, 20},
{start_mfa, {riakc_pb_socket, start_link, ["nebriak1", 8087]}},
%% if you want to enable metrics, set this to a module with
%% an API conformant to the folsom_metrics module.
%% If this config is missing, then no metrics are sent.
{metrics_module, folsom_metrics}
]
]
}
]
}

Bug in calculating MaxCull

I'm probably missing something, but this looks like a bug to me:

https://github.com/seth/pooler/blob/master/src/pooler.erl#L636

MaxCull = FreeCount - (InitCount - InUseCount)

Shouldn't it just be:

MaxCull = FreeCount - InitCount

My understanding is that FreeCount decrements as InUseCount increments. If MaxCull serves to ensure that you never cull down below InitCount, then I think the way it's currently coded will over-cull.

Clue me in if I'm missing something. Thanks!

Support OTP 18

Currently pooler.erl:209 uses erlang:now() to get the seconds component of time. This causes rebar to error out due to deprecation warnings. There are a few options to maintain backwards compatibility, one of which is to use http://www.erlang.org/doc/apps/erts/time_compat.erl. This exposes functions that use the OTP 18 new time functions if available and the old ones otherwise. Alternatively, you can not warn on deprecated functions in the rebar.config.

Should add API for take_group_member/2?

Hi,
I using pooler to manage my db clusters(include MySQL cluster and Mongo cluster ...) via group feature.
I noticed that, take_group_member/1 API's doc:

%% @doc Take a member from a randomly selected member of the group
%% `GroupName'. Returns `MemberPid' or `error_no_members'.  If no
%% members are available in the randomly chosen pool, all other pools
%% in the group are tried in order.

That is, if no members are available in all pools, it will return error_no_members.
So, should wait for a member to become available from chosen pool?

Thanks.

Out of control thread spawning when unable to start PIDs

In pooler:take_member_from_pool/3 when there are no free PIDs we try to add pids:

take_member_from_pool(....) ->
    case Free of
        [] when NumInUse =:= Max ->
           % snip
        [] when NumInUse < Max ->
            case add_pids(PoolName, 1, State) of % JR: try to add pids
                {ok, State1} ->
                    take_member(PoolName, From, State1); % JR: try to take again, take_member/3 calls take_member_from_pool/3
                {max_count_reached, _} ->
% snip

In add_pids/3 if we're unable to add any PIDs, say because the backend is down, we end up back in the same spot where there are no free Pids. We then try to add some PIDs again endlessly recursing until we are able to add some PIDs. Each time this is done new worker threads are being spawned to create the connection via start_n_pids/4's call to supervisor:start_child/2. This is causing me issues, as each time the connection to the backend fails it's generating errors in the logs, filling up disk at a fast rate.

Need some way to detect his condition, and error out rather than spinning. This also blocks the pooler:take_member() call until the caller receives a gen_server error timeout.