Giter Site home page Giter Site logo

Comments (9)

vsreekanti avatar vsreekanti commented on August 24, 2024 1

Sharing a single seed IP is okay. It sounds like the ulimit for open FDs on the machine you are running on is too low. Can you try increasing the ulimit? My hunch is that all 64 clients are talking to all nodes and as a result, there process is running into a limit for the number of allowed open sockets. That's why only using 16 clients works -- the aggregate number of clients is lower.

from anna.

authwork avatar authwork commented on August 24, 2024

I also encounter this issue, To get the value of the same key, it sometimes can read the value, sometimes cannot.
image

image

from anna.

authwork avatar authwork commented on August 24, 2024

All these issues do not appear in Anna local mode, I guess it may be caused by the configuration of the Anna Cluster mode.
Would you mind giving some correct examples of cluster configuration, e.g. use 4 nodes for 3 replications in memory?
@vsreekanti
Many thanks

from anna.

vsreekanti avatar vsreekanti commented on August 24, 2024

Hm, this is a surprising error. It isn't something I've seen before. Is this error occurring for disk nodes or memory nodes? It's unclear what the file descriptors are being created for, unless you're using a very large number of clients.

from anna.

authwork avatar authwork commented on August 24, 2024

I am running a cluster of 8 server nodes and 64 clients to access those replicas.
I only uses memory node. The number of replicas is set to be 8.

@vsreekanti Hello, I have evaluated the system under a cluster of 1 server node and 16 clients from different nodes (the replication factor is set to 1). It works normal.
A cluster of 1 server node and 32 clients from different nodes also works normal.
In this way, I think 8 server nodes are enough for serving 64 clients, while the issue remains.

How should we configure the seed_ip.
Currectly ,I configure all server nodes to share the same seed_ip.

from anna.

authwork avatar authwork commented on August 24, 2024

@vsreekanti Many thanks for your help.
Hello! Increasing the ulimit (see here) solved the "Too many files" issue.
Now, the cluster of 1 server node and 64 clients works normal.

However, the cluster of 8 server nodes and 64 clients remains some issues:
I am a little confused:

  1. Is it necessary to run the monitors in Anna cluster mode?
    In my pervious experiments (the cluster of 1 server node and 64 clients), we do not run the monitors and it works normal. I am not very sure whether we do not need to run monitors in the cluster of 8 server nodes and 64 clients.

  2. The mgmt_ip of monitoring ("127.0.0.1" by default) and the mgmt_ip of server ("NULL" by default)

  3. What is the difference among public ips and private ips?

Currently, I guess that it is not necessary to run the monitors, and I set all monitor-related to "127.0.0.1". In the first step, I configure 8 replications on 8 server nodes.
My configuration files are shown below (10.2.x.x is public ip and 10.4.x.x is private ip):
image
Would you please give some suggestions?

Update:
I only reduce the number of replica to 4, and it seems work normal on the cluster of 8 server nodes and 64 clients.

==================================================
When I run the benchmark on it, I find a new issue:
Assuming we have R1, R2, R3, R4 (four replicas).
At very begining, a client PUT K1 to R1, and read from R2 at once. Since the key-value are not trasferred from R1 to R2 in time. This client may not be able to read the key-value.

==================================================
When I run benchmark, I found:

client.put_async(key, serialize(val), LatticeType::LWW);
receive(&client);

client.get_async(key);
receive(&client);
  1. Is it necessary to run receive(&client) after each PUT/GET operation? (Just want to be sure)
  2. How to improve the speed when do a batch of PUT/GET operations.

from anna.

vsreekanti avatar vsreekanti commented on August 24, 2024
  1. Regarding running the monitoring node, it should not be necessary. However, the system will not increase/decrease the number of replicas of each key in response to load change if that is a feature you would like.

  2. You can ignore the mgmt_ip as well -- it's used for Kubernetes autoscaling.

  3. Public IPs and private IPs are used when running in VPCs (e.g., for EC2). All internal communication is done on private IPs, and request handling is done on public IPs. If you are not running in a VPC or don't need KVS access outside the VPC, you can just use the same IP address for both.

Regarding R2 not being able to read the KV pair, this is expected behavior. That is the nature of the coordination-freeness of the system, that you may read stale values (including NULL as a value). If you want more deterministic behavior, you can try making a client sticky to a single replica for a particular key by changing the KVS client, but that is not something we currently support out of the box.

  1. You can make multiple requests then call receive if that workflow is more convenient for you.

  2. When do you mean by the speed of batch of operations? Are you seeing that batches of operations are particularly slow?

from anna.

authwork avatar authwork commented on August 24, 2024
  1. If I make multiple GET requests, how can I match each response to each key~(may be based on the ID returned by the requests instantly)
  2. I just want it to be faster, make multiple requests then call receive seems to work like pipeline

Many thanks, I will take more try.

from anna.

authwork avatar authwork commented on August 24, 2024
  1. Regarding running the monitoring node, it should not be necessary. However, the system will not increase/decrease the number of replicas of each key in response to load change if that is a feature you would like.
  2. You can ignore the mgmt_ip as well -- it's used for Kubernetes autoscaling.
  3. Public IPs and private IPs are used when running in VPCs (e.g., for EC2). All internal communication is done on private IPs, and request handling is done on public IPs. If you are not running in a VPC or don't need KVS access outside the VPC, you can just use the same IP address for both.

Regarding R2 not being able to read the KV pair, this is expected behavior. That is the nature of the coordination-freeness of the system, that you may read stale values (including NULL as a value). If you want more deterministic behavior, you can try making a client sticky to a single replica for a particular key by changing the KVS client, but that is not something we currently support out of the box.

  1. You can make multiple requests then call receive if that workflow is more convenient for you.
  2. When do you mean by the speed of batch of operations? Are you seeing that batches of operations are particularly slow?

@vsreekanti In my current cluster configuration (shown before), both PUT and GET operation caused long delay. (RF=4 seems to work normal, while long delay appears with RF=8)

from anna.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.