Giter Site home page Giter Site logo

hash-ring's Introduction

libhashring

Build Status

This library provides a consistent hashing implementation that supports SHA-1 and MD5. It is high performance and suitable for rings with a large number of items (replicas or nodes). The following bindings are available:

  • Erlang
  • Java

License (See LICENSE file for full license)

Copyright 2015 Chris Moos

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Overview

hash_ring exposes a simple API to create/remove rings, add/remove nodes, and locate the node for a key on the ring.

A ring is created by using the hash_ring_create() function.

numReplicas, which determines how many times a node is placed on the ring. This parameter gives you flexibility on how consistent the ring is. Larger values will even out the node distribution on the ring. 128 is a sensible default for most users.

hash_fn is the hash function to use. Currently HASH_FUNCTION_MD5 and HASH_FUNCTION_SHA1 are supported. In tests MD5 is on average about 25% faster.

hash_ring_t *ring = hash_ring_create(128, HASH_FUNCTION_SHA1);

Next you can add some nodes to the ring. For example, if you have a cluster of Redis servers, you might do the following:

char *redis01 = "redis01", *redis02 = "redis02", *redis03 = "redis03";

hash_ring_add_node(ring, (uint8_t*)redis01, strlen(redis01));
hash_ring_add_node(ring, (uint8_t*)redis02, strlen(redis02));
hash_ring_add_node(ring, (uint8_t*)redis03, strlen(redis01));

The ring will now have 384 items, 128 per node (with 3 nodes total).

Node hashing

It is helpful to know how a node is hashed onto the ring, especially if you want to write a compatible library for another language or platform.

hash_ring hashes the node's name followed by the ASCII number of the replica. For example, with 2 replicas, the following pseudo code is used to hash the node:

redis01_0_int = SHA1("redis01" + "0") & 0x000000000000000000000000ffffffffffffffff
redis01_1_int = SHA1("redis01" + "1") & 0x000000000000000000000000ffffffffffffffff

The least significant 64-bits of the digest are used to retrieve the integer.

Note: You can use hash_ring_set_mode to use HASH_RING_MODE_LIBMEMCACHED_COMPAT which will add nodes as "node-index" and will hash to a 32-bit integer instead. This is what libmemcached uses.

Compiling

Compile the library and install with:

sudo make install

This will install the library to /usr/local/lib/libhashring.so. The header file hash_ring.h is copied to /usr/local/include/hash_ring.h.

Running the tests

make test

Bindings

To build all the bindings just run:

make bindings

Example

The following code is from the tests, which hashes some known keys and ensures that they are located on the ring properly.

hash_ring_t *ring = hash_ring_create(8, HASH_FUNCTION_SHA1);
char *slotA = "slotA";
char *slotB = "slotB";

char *keyA = "keyA";
char *keyB = "keyBBBB";
char *keyC = "keyB_";

hash_ring_node_t *node;

assert(hash_ring_add_node(ring, (uint8_t*)slotA, strlen(slotA)) == HASH_RING_OK);
assert(hash_ring_add_node(ring, (uint8_t*)slotB, strlen(slotB)) == HASH_RING_OK);

node = hash_ring_find_node(ring, (uint8_t*)keyA, strlen(keyA));
assert(node != NULL && node->nameLen == strlen(slotA) && memcmp(node->name, slotA, strlen(slotA)) == 0);

node = hash_ring_find_node(ring, (uint8_t*)keyB, strlen(keyB));
assert(node != NULL && node->nameLen == strlen(slotA) && memcmp(node->name, slotA, strlen(slotA)) == 0);

node = hash_ring_find_node(ring, (uint8_t*)keyC, strlen(keyC));
assert(node != NULL && node->nameLen == strlen(slotB) && memcmp(node->name, slotB, strlen(slotB)) == 0);

hash_ring_free(ring);

Erlang

If you use rebar you can put the following in your rebar.config to use libhashring:

{deps, [
 {hash_ring, ".*", {git, "https://github.com/chrismoos/hash-ring.git", "master"}}
]}.

Example

hash_ring:start_link(),
Ring = "myring",
?assert(hash_ring:create_ring(Ring, 8) == ok),
?assert(hash_ring:add_node(Ring, <<"slotA">>) == ok),
?assert(hash_ring:add_node(Ring, <<"slotB">>) == ok),

?assert(hash_ring:find_node(Ring, <<"keyA">>) == {ok, <<"slotA">>}),
?assert(hash_ring:find_node(Ring, <<"keyBBBB">>) == {ok, <<"slotA">>}),
?assert(hash_ring:find_node(Ring, <<"keyB_">>) == {ok, <<"slotB">>}),

?assert(hash_ring:remove_node(Ring, <<"slotA">>) == ok),
?assert(hash_ring:find_node(Ring, <<"keyA">>) == {ok, <<"slotB">>}),
?assert(hash_ring:find_node(Ring, <<"keyBBBB">>) == {ok, <<"slotB">>}).

Benchmarks

The following benchmark was done on a MacBook Pro with the following specs:

  • 2.8 Ghz Intel Core 2 Duo

  • 8GB 1067 MHz DDR3

  • Mac OS X 10.6.6

      ----------------------------------------------------
      bench: replicas = 1, nodes = 1, keys: 1000, ring size: 1
      ----------------------------------------------------
      generating and sorting 1 nodes...done
      generating keys...done
      stats: total = 0.04403s, avg/lookup: 0.44027us, min: 0.43374us, max: 0.73869us, ops/sec: 2272727
      ----------------------------------------------------
      bench: replicas = 1, nodes = 8, keys: 1000, ring size: 8
      ----------------------------------------------------
      generating and sorting 8 nodes...done
      generating keys...done
      stats: total = 0.04563s, avg/lookup: 0.45630us, min: 0.45339us, max: 0.50809us, ops/sec: 2192982
      ----------------------------------------------------
      bench: replicas = 1, nodes = 256, keys: 1000, ring size: 256
      ----------------------------------------------------
      generating and sorting 256 nodes...done
      generating keys...done
      stats: total = 0.04917s, avg/lookup: 0.49172us, min: 0.48684us, max: 0.54416us, ops/sec: 2036660
      ----------------------------------------------------
      bench: replicas = 8, nodes = 1, keys: 1000, ring size: 8
      ----------------------------------------------------
      generating and sorting 1 nodes...done
      generating keys...done
      stats: total = 0.04559s, avg/lookup: 0.45587us, min: 0.45276us, max: 0.46053us, ops/sec: 2197802
      ----------------------------------------------------
      bench: replicas = 8, nodes = 32, keys: 1000, ring size: 256
      ----------------------------------------------------
      generating and sorting 32 nodes...done
      generating keys...done
      stats: total = 0.04896s, avg/lookup: 0.48959us, min: 0.48601us, max: 0.54160us, ops/sec: 2044990
      ----------------------------------------------------
      bench: replicas = 8, nodes = 512, keys: 1000, ring size: 4096
      ----------------------------------------------------
      generating and sorting 512 nodes...done
      generating keys...done
      stats: total = 0.05401s, avg/lookup: 0.54006us, min: 0.53625us, max: 0.60199us, ops/sec: 1851852
      ----------------------------------------------------
      bench: replicas = 512, nodes = 8, keys: 1000, ring size: 4096
      ----------------------------------------------------
      generating and sorting 8 nodes...done
      generating keys...done
      stats: total = 0.05420s, avg/lookup: 0.54198us, min: 0.53301us, max: 1.03515us, ops/sec: 1848429
      ----------------------------------------------------
      bench: replicas = 512, nodes = 16, keys: 1000, ring size: 8192
      ----------------------------------------------------
      generating and sorting 16 nodes...done
      generating keys...done
      stats: total = 0.05576s, avg/lookup: 0.55763us, min: 0.55393us, max: 0.59292us, ops/sec: 1795332
      ----------------------------------------------------
      bench: replicas = 512, nodes = 32, keys: 100000, ring size: 16384
      ----------------------------------------------------
      generating and sorting 32 nodes...done
      generating keys...done
      stats: total = 5.73922s, avg/lookup: 0.57392us, min: 0.57195us, max: 0.58613us, ops/sec: 1745201
      ----------------------------------------------------
      bench: replicas = 512, nodes = 128, keys: 10000, ring size: 65536
      ----------------------------------------------------
      generating and sorting 128 nodes...done
      generating keys...done
      stats: total = 0.61953s, avg/lookup: 0.61953us, min: 0.61636us, max: 0.65255us, ops/sec: 1615509
      ----------------------------------------------------
      bench: replicas = 16, nodes = 1024, keys: 1000, ring size: 16384
      ----------------------------------------------------
      generating and sorting 1024 nodes...done
      generating keys...done
      stats: total = 0.05769s, avg/lookup: 0.57693us, min: 0.57307us, max: 0.63851us, ops/sec: 1736111
    

hash-ring's People

Contributors

cablehead avatar chrismoos avatar limitless083 avatar yrashk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hash-ring's Issues

should it balance correctly?

I had a test with erlang:phash/2 using the following information:

Nodes = [ "server1", "server2", "server3", "server4", "server5", "server6" ].

Phone = fun() -> trunc((random:uniform() * 89000000000) + 10000000000) end.
GenPhones = fun(N) -> [ Phone() || _ <- lists:seq(1,N) ] end.
Select = fun(Phones) -> [ lists:nth(erlang:phash(N, length(Nodes)), Nodes) || N <- Phones ] end.
Show = fun(A) -> lists:sort(dict:to_list(lists:foldl(fun(I,D) -> dict:update_counter(I,1,D) end, dict:new(), A))) end.

Phones = GenPhones(10000).
Show(Select(Phones)).

The result is this:

[{"server1",1663},  
 {"server2",1706},
 {"server3",1650},
 {"server4",1713},
 {"server5",1629},
 {"server6",1639}]

And I tried with this library... and the code is here:

Nodes = [ "server1", "server2", "server3", "server4", "server5", "server6" ].

Phone = fun() -> trunc((random:uniform() * 89000000000) + 10000000000) end.
GenPhones = fun(N) -> [ Phone() || _ <- lists:seq(1,N) ] end.
Select = fun(Phones,Ring) -> [ element(2,hash_ring:find_node(Ring,integer_to_binary(N))) || N <- Phones ] end.
Show = fun(A) -> lists:sort(dict:to_list(lists:foldl(fun(I,D) -> dict:update_counter(I,1,D) end, dict:new(), A))) end.

{ok, Ring} = hash_ring:start_link().
hash_ring:create_ring(instances, 6).
[ hash_ring:add_node(instances, N) || N <- Nodes ].

Phones = GenPhones(10000).
Show(Select(Phones, instances)).

and the result is here:

[{<<"server1">>,2005},
 {<<"server2">>,2142},
 {<<"server3">>,781},
 {<<"server4">>,1985},
 {<<"server5">>,921},
 {<<"server6">>,2166}]

Looks like the distribution between the nodes is not equitable at all.

crash when remove_node

#0 0x00007fa300006d4b in ?? ()
#1 0x00007fa3aa83dd96 in quicksort () from /data0/neo_mgw_0.4.4/lib/hash_ring-0.1.6/priv/hash_ring_drv.so
#2 0x00007fa3aa83d026 in hash_ring_remove_node () from /data0/neo_mgw_0.4.4/lib/hash_ring-0.1.6/priv/hash_ring_drv.so
#3 0x00007fa3aa83c295 in hash_ring_drv_output () from /data0/neo_mgw_0.4.4/lib/hash_ring-0.1.6/priv/hash_ring_drv.so
#4 0x00000000004924ef in call_driver_output (c_p=0x7fa3b0ec8f50, flags=2064, prt=0x7fa3d8d40bc0, from=55559696966931,

list=140341277983642, refp=0x0) at beam/io.c:1768

allow lib/python/chashring.py to be pip installable

Hi Chris,

I'd like to put together a pull request so the repo can be made available on pypi so chashring is pip installable for python users. Would you be open to this change. I'm pretty sure a setup.py is needed in the root directory of the repo, but I may be mistaken. Will do more research.

1/ Would you be open to this and
2/ Would you like to own the pypi package for chashring, or would you like me to manage that?

thanks!

Weights

sometimes it's needed to do a balance between several nodes but based on weights, for example, we have three nodes (node1, node2 and node3) but we know that node1 is capable to handle 100 requests, node2 is capable to handle 50 requests and node3 is only capable to handle 10 requests.

Do a perfect balance between them it's not the solution because node3 could be down when we try to put there more than 10 requests.

At this moment I'm adding to the ring the name of the node, a separator and the sequence number based on the weight so, in the above example I have to insert: node1_1, node1_2, node1_3, ..., node1_100; node2_1, node2_2, node2_3, ..., node2_50; node3_1, node3_2, node3_3, ..., node3_10. But maybe it could be great if exists an option in the libraries to avoid to do that.

Memleak with hash_ring_add_node

Example code:

#include "hash_ring.h"

#define NUM_REPLICAS 512
#define NODE_NAME "asdf"

int main() {
    hash_ring_t *ring = hash_ring_create(NUM_REPLICAS, HASH_FUNCTION_SHA1);

    hash_ring_add_node(ring, NODE_NAME, sizeof(NODE_NAME));

    hash_ring_free(ring);

    return 0;
}

Valgrind output:

valgrind --leak-check=full ./chrismoos-hash-ring_test
make: Für das Ziel „all“ ist nichts zu tun.
==7480== Memcheck, a memory error detector
==7480== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==7480== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==7480== Command: ./chrismoos-hash-ring_test -d -w=1 -m=2
==7480== 
==7480== 
==7480== HEAP SUMMARY:
==7480==     in use at exit: 3,986 bytes in 512 blocks
==7480==   total heap usage: 1,030 allocs, 518 frees, 20,447 bytes allocated
==7480== 
==7480== 3,986 bytes in 512 blocks are definitely lost in loss record 1 of 1
==7480==    at 0x4C2CEDF: malloc (vg_replace_malloc.c:299)
==7480==    by 0x109DC9: hash_ring_add_items (hash_ring.c:188)
==7480==    by 0x10A314: hash_ring_add_node (hash_ring.c:256)
==7480==    by 0x1098E6: main (chrismoos-hash-ring_test.c:74)
==7480== 
==7480== LEAK SUMMARY:
==7480==    definitely lost: 3,986 bytes in 512 blocks
==7480==    indirectly lost: 0 bytes in 0 blocks
==7480==      possibly lost: 0 bytes in 0 blocks
==7480==    still reachable: 0 bytes in 0 blocks
==7480==         suppressed: 0 bytes in 0 blocks
==7480== 
==7480== For counts of detected and suppressed errors, rerun with: -v
==7480== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

hash_ring:find_node hashed the same key to different nodes depending on the value of the node.

The following eunit test fails:

-module(test).

-include_lib("eunit/include/eunit.hrl").
-include_lib("../deps/hash_ring/include/hash_ring.hrl").

hash_ring_instances_test() ->
        hash_ring:start_link(),
        hash_ring:create_ring(ring1, 128, ?HASH_RING_FUNCTION_MD5),
        hash_ring:create_ring(ring2, 128, ?HASH_RING_FUNCTION_MD5),
        hash_ring:add_node(ring1, term_to_binary({node1, 1})), %1 here (and 2 below) is just some constant used to produce slightly different binaries
        hash_ring:add_node(ring1, term_to_binary({node2, 1})),
        hash_ring:add_node(ring1, term_to_binary({node3, 1})),

        hash_ring:add_node(ring2, term_to_binary({node1, 2})),
        hash_ring:add_node(ring2, term_to_binary({node2, 2})),
        hash_ring:add_node(ring2, term_to_binary({node3, 2})),

        lists:map(fun(Key) ->
                                          {ok, Bin1} = hash_ring:find_node(ring1, term_to_binary(Key)),
                                          {Node1,_} = binary_to_term(Bin1),
                                          {ok, Bin2} = hash_ring:find_node(ring2, term_to_binary(Key)),
                                          {Node2,_} = binary_to_term(Bin2),
                                          ?assertEqual(Node1, Node2)
                end, lists:seq(1, 1024)).

with:

**error:{assertEqual_failed,[{module,test},
                     {line,23},
                     {expression,"Node2"},
                     {expected,node3},
                     {value,node2}]}

I expect it to pass because the keys are the same and the hashing shouldn't depend on the particular values of the "nodes".

I am using the erlang port of hash_ring but the problem is probably in the C implementation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.