puniverse / galaxy Goto Github PK
View Code? Open in Web Editor NEWA cache-coherent in-memory data grid
Home Page: http://docs.paralleluniverse.co/galaxy/
License: GNU Lesser General Public License v3.0
A cache-coherent in-memory data grid
Home Page: http://docs.paralleluniverse.co/galaxy/
License: GNU Lesser General Public License v3.0
Hey, If one of the registered galaxy nodes receives SESSION_EXPIRED event from ZooKeeper client than all ephemeral info about this node will be deleted from ZooKeeper cluster. And moreover it breaks consistency of Galaxy cluster cause that particular node will think I am alive but other nodes will be pretty sure it's dead. This situation can be achieved quite easily if we set sessionTimeoutMs to small value like 500 ms. Anyway there should be a valid fail recovery strategy in case of ZooKeeper session expiration.
For more theoretical info about ZooKeeper internal you could read this https://wiki.apache.org/hadoop/ZooKeeper/FAQ#A3
If you can give any advices where I should look forward in order to fix this issue may be I will try to do that. It seems like recreating all ephemeral nodes will be enough as a simpliest solution.
lib directory does not exist
high-scale-lib does not exist
Saw the high scalability blog post on Galaxy and got intrigued. You discuss a B+ tree implementation in the posting and I was wondering whether that implementation actually exists or is just a thought experiment?
Would be great to build on top of this and connect it to the Titan graph database.
PS: Sorry for spamming your issue tracker - could not find a mailing list or forum or such.
Method create(String node, boolean ephemeral)
is not thread safe. Pattern
if (exists(path)) {
return;
}
createNode(path);
doesn't work in distributed environment. To make this pattern thread-safe we have to catch and ignore KeeperException$NodeExistsException
while creating a node.
...so that, in the future, we'll be able to pass in a lock factory that makes Fiber-blocking locks.
If ZK thinks a node is dead, the information may not spread to the cluster at once. The server may think the node has died and start serving its items to other nodes who've noticed the death, while the node itself thinks it's still alive and continue serving nodes that also think it's alive.
This will enable us to control backpressure for remote Quasar channels.
Because we might want more than a 2x replication, we shouldn't use Galaxy's backup mechanism, but rely on the underlying database's replication. The slave nodes should just do nothing, and start serving data once they're master.
The scenario I tried is this.
begin transaction
then from other node:
4.root = getRoot
5. get(root) ==> TIMEOUT
when I added before 2:
1.5: getX(root) everything worked fine.
In the logs I saw that the getX got to the first node, but handled moved to the PendingMessages probably by shouldHoldMessage. I think that the reason is that it was flagged as MODIFIED.
when I sent 100 msgs from 100 parallel actors and then exit, the last msgs is not received by the receiver. I know that some of the msgs where postponed because the queue was full.
After I sleeped 500 ms, all the msgs were received. I guess that when the node shutdown the postponed msgs have not been sent
The ack is sent when the cache gets the message and not after the call to the listener.
If the receiver is not fast enough there is no way to do backPressure this way.
The ack should be sent after the listener is finished successfully, and sometimes when the listener is async we should ack only upon the listener application explictlty ACKs that the message handling finished.
Use log4j 2
Adding LogBack should be trivial, as we're using the slf4j API.
See: Reasons to switch
This is probably a stupid question with a simple answer, but I can't seem to figure out which knob to turn.
How do you send large messages without crashing? I either get a Channel exception: java.io.IOException Message too long
error for the example below or the process seems to just hang for even larger sizes.
To reproduce the bug is simple. Using the distributed ping pong example, simply add a new field to the Message
class with a large enough size:
final byte[] bar = new byte[102400];
I am using JGroups in case that is relevant.
Thanks!
[peer2] 09:49:31.972 [RemoteCall: RemoteCall{org.gridkit.zerormi.RmiGateway$CounterAgent#remoteCall(Callable)}.4-EventThread] zookeeper.ClientCnxn [ERROR] {} Error while calling watcher java.lang.AssertionError
[peer2] at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.nodeUpdated(DistributedBranchHelper.java:179)
[peer2] at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.access$200(DistributedBranchHelper.java:37)
[peer2] at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$1.nodeChildUpdated(DistributedBranchHelper.java:69)
[peer2] at co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree$MyWatcher$1.process(ZooKeeperDistributedTree.java:332)
[peer2] at org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:61)
[peer2] at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
[peer2] at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
[peer2]
node 1 updates A in transaction.
no one can read A till all the invacks returns. Can node1 read it (outside the transaction) ?
Like quasar and pulsar.
This will require some consensus algorithm for the slaves, so it might be quite complicated. The problem is that when the master dies, the slaves may not all be in the same state, and once a new master is elected, its slaves may7 not be in sync with it.
Perhaps a better idea (and who would want 3x hardware anyways?) is to have a new type of node. Nodes that aren't assigned a node id, and simply monitor the cluster for failures. Once a master dies and a slave takes over, one (how is it chosen?) of these passive nodes will become a new slave (it will have to replicate all the old data - how much burden would that put on the new master?)
When transitioning to requiring Java 8, please upgrade to Caffeine. You know the details, so I won't bore you with the elevator pitch.
ConcurrentLinkedHashMap changes will continue to be minimal, even more so now, and driven by requests from Java 6 users unable to upgrade. Caffeine is ideally the upgrade path for Guava cache users too, which due to Android cannot be significantly modified.
The latest version has significant improvements when judged by a synthetic benchmark. Previous versions were optimized around real-world usage only, so the performance is likely comparable (e.g. as found by Cassandra). This release focused on synthetic workloads (benchmarks) and, depending on the application, may offer a real-world improvement.
9:16:39.270 #11 co.paralleluniverse.galaxy.core.AbstractCluster [INFO ] New node added: node-0000000004
java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(Thread.java:1342)
at co.paralleluniverse.galaxy.core.AbstractCluster.nodeAdded(AbstractCluster.java:432)
at co.paralleluniverse.galaxy.core.AbstractCluster.access$600(AbstractCluster.java:55)
at co.paralleluniverse.galaxy.core.AbstractCluster$3.nodeChildAdded(AbstractCluster.java:222)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.nodeCompleted(DistributedBranchHelper.java:174)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.access$800(DistributedBranchHelper.java:44)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.testComplete(DistributedBranchHelper.java:310)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.testProperty(DistributedBranchHelper.java:297)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.<init>(DistributedBranchHelper.java:273)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.nodeAdded(DistributedBranchHelper.java:158)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.access$000(DistributedBranchHelper.java:44)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$1.nodeChildAdded(DistributedBranchHelper.java:64)
at co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree$1.processResult(ZooKeeperDistributedTree.java:87)
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:535)
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:446)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:136)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:600)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
19:16:39.274 #11 co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree [INFO ] Adding listener NODE node-0000000004 id: -1 on /co.paralleluniverse.galaxy/nodes/node-0000000004
19:16:39.768 #11 co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree [SEVERE ] Node /co.paralleluniverse.galaxy/nodes/node-0000000004/ip_port getData has failed!
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /co.paralleluniverse.galaxy/nodes/node-0000000004/ip_port
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131)
at com.netflix.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:254)
at com.netflix.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:243)
at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85)
at com.netflix.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:239)
at com.netflix.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:231)
at com.netflix.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:39)
at co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree.get(ZooKeeperDistributedTree.java:195)
at co.paralleluniverse.galaxy.core.AbstractCluster$NodeInfoImpl.nodeChildUpdated(AbstractCluster.java:974)
at co.paralleluniverse.galaxy.core.AbstractCluster$NodeInfoImpl.<init>(AbstractCluster.java:860)
at co.paralleluniverse.galaxy.core.AbstractCluster.createNodeInfo(AbstractCluster.java:624)
at co.paralleluniverse.galaxy.core.AbstractCluster.nodeAdded(AbstractCluster.java:433)
at co.paralleluniverse.galaxy.core.AbstractCluster.access$600(AbstractCluster.java:55)
at co.paralleluniverse.galaxy.core.AbstractCluster$3.nodeChildAdded(AbstractCluster.java:222)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.nodeCompleted(DistributedBranchHelper.java:174)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.access$800(DistributedBranchHelper.java:44)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.testComplete(DistributedBranchHelper.java:310)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.testProperty(DistributedBranchHelper.java:297)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.<init>(DistributedBranchHelper.java:273)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.nodeAdded(DistributedBranchHelper.java:158)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.access$000(DistributedBranchHelper.java:44)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$1.nodeChildAdded(DistributedBranchHelper.java:64)
at co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree$1.processResult(ZooKeeperDistributedTree.java:87)
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:535)
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:446)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:136)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:600)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
19:16:39.775 #11 co.paralleluniverse.galaxy.core.AbstractCluster [SEVERE ] Exception while reading control tree value.
java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /co.paralleluniverse.galaxy/nodes/node-0000000004/ip_port
at com.google.common.base.Throwables.propagate(Throwables.java:156)
at co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree.get(ZooKeeperDistributedTree.java:201)
at co.paralleluniverse.galaxy.core.AbstractCluster$NodeInfoImpl.nodeChildUpdated(AbstractCluster.java:974)
at co.paralleluniverse.galaxy.core.AbstractCluster$NodeInfoImpl.<init>(AbstractCluster.java:860)
at co.paralleluniverse.galaxy.core.AbstractCluster.createNodeInfo(AbstractCluster.java:624)
at co.paralleluniverse.galaxy.core.AbstractCluster.nodeAdded(AbstractCluster.java:433)
at co.paralleluniverse.galaxy.core.AbstractCluster.access$600(AbstractCluster.java:55)
at co.paralleluniverse.galaxy.core.AbstractCluster$3.nodeChildAdded(AbstractCluster.java:222)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.nodeCompleted(DistributedBranchHelper.java:174)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.access$800(DistributedBranchHelper.java:44)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.testComplete(DistributedBranchHelper.java:310)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.testProperty(DistributedBranchHelper.java:297)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.<init>(DistributedBranchHelper.java:273)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.nodeAdded(DistributedBranchHelper.java:158)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.access$000(DistributedBranchHelper.java:44)
at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$1.nodeChildAdded(DistributedBranchHelper.java:64)
at co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree$1.processResult(ZooKeeperDistributedTree.java:87)
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:535)
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:446)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:136)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:600)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /co.paralleluniverse.galaxy/nodes/node-0000000004/ip_port
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131)
at com.netflix.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:254)
at com.netflix.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:243)
at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85)
at com.netflix.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:239)
at com.netflix.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:231)
at com.netflix.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:39)
at co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree.get(ZooKeeperDistributedTree.java:195)
... 20 more
19:16:39.780 #11 co.paralleluniverse.galaxy.core.AbstractCluster [INFO ] nodes: {node-0000000004=NODE node-0000000004 id: 1 ip_addr: /10.197.82.95 ip_slave_port: 8051, node-0000000000=NODE node-0000000000 id: 0 ip_addr: /10.197.57.247
java.lang.IllegalArgumentException: Data field for DatabaseEntry key cannot be null
com.sleepycat.je.utilint.DatabaseUtil.checkForNullDbt(DatabaseUtil.java:58)
com.sleepycat.je.Cursor.getSearchKeyRange(Cursor.java:1723)
co.paralleluniverse.galaxy.berkeleydb.BerkeleyDB.findAllocation(BerkeleyDB.java:298)
co.paralleluniverse.galaxy.core.MainMemory.handleMessageGet(MainMemory.java:164)
co.paralleluniverse.galaxy.core.MainMemory.receive(MainMemory.java:111)
co.paralleluniverse.galaxy.netty.TcpServerServerComm.receive(TcpServerServerComm.java:114)
co.paralleluniverse.galaxy.netty.AbstractTcpServer$4.messageReceived(AbstractTcpServer.java:137)
co.paralleluniverse.galaxy.netty.ChannelMessageNodeResolver.messageReceived(ChannelMessageNodeResolver.java:36)
co.paralleluniverse.galaxy.netty.OneToOneCodec.handleUpstream(OneToOneCodec.java:63)
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
shutdownHooks prevent the exit:
2013-07-17 16:36:17
"Thread-2" - Thread t@11
java.lang.Thread.State: TIMED_WAITING
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <1b8cfa0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1433)
at org.jboss.netty.util.internal.ExecutorUtil.terminate(ExecutorUtil.java:103)
at org.jboss.netty.channel.socket.oio.OioDatagramChannelFactory.releaseExternalResources(OioDatagramChannelFactory.java:107)
at co.paralleluniverse.galaxy.netty.UDPComm.shutdown(UDPComm.java:367)
at co.paralleluniverse.common.spring.Component.destroy(Component.java:78)
at org.springframework.beans.factory.support.DisposableBeanAdapter.destroy(DisposableBeanAdapter.java:211)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroyBean(DefaultSingletonBeanRegistry.java:498)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingleton(DefaultSingletonBeanRegistry.java:474)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingletons(DefaultSingletonBeanRegistry.java:442)
- locked <6e670e34> (a java.util.LinkedHashMap)
at org.springframework.context.support.AbstractApplicationContext.destroyBeans(AbstractApplicationContext.java:1066)
at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1040)
at org.springframework.context.support.AbstractApplicationContext$1.run(AbstractApplicationContext.java:958)
Locked ownable synchronizers:
- None
"main" - Thread t@1
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Native Method)
- waiting on <65778857> (a org.springframework.context.support.AbstractApplicationContext$1)
at java.lang.Thread.join(Thread.java:1258)
at java.lang.Thread.join(Thread.java:1332)
at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
at java.lang.Shutdown.runHooks(Shutdown.java:123)
at java.lang.Shutdown.sequence(Shutdown.java:167)
at java.lang.Shutdown.exit(Shutdown.java:212)
- locked <58218a30> (a java.lang.Class)
at java.lang.Runtime.exit(Runtime.java:107)
at java.lang.System.exit(System.java:960)
at co.paralleluniverse.galaxy.example.PeerTKB.run(PeerTKB.java:304)
at co.paralleluniverse.galaxy.example.Peer2.main(Peer2.java:9)
Locked ownable synchronizers:
- None
I have a setup where many (50-1000) messages are sent from one Quasar actor to another.
I receive the following error on the recipient end:
68325 [WARN] UDPComm: Exception caught in channel [id: 0x5c10b3ac, 0.0.0.0/0.0.0.0:7052]: java.lang.RuntimeException java.io.EOFException
java.lang.RuntimeException: java.io.EOFException
at co.paralleluniverse.common.io.Persistables$1.read(Persistables.java:53) ~[galaxy-1.4.jar:1.4]
at co.paralleluniverse.galaxy.core.Message.read(Message.java:553) ~[galaxy-1.4.jar:1.4]
at co.paralleluniverse.galaxy.core.Message.fromByteBuffer(Message.java:207) ~[galaxy-1.4.jar:1.4]
at co.paralleluniverse.galaxy.netty.MessagePacket.fromByteBuffer(MessagePacket.java:181) ~[galaxy-1.4.jar:1.4]
at co.paralleluniverse.galaxy.netty.MessagePacketCodec.decode(MessagePacketCodec.java:54) ~[galaxy-1.4.jar:1.4]
at co.paralleluniverse.galaxy.netty.OneToOneCodec.handleUpstream(OneToOneCodec.java:59) ~[galaxy-1.4.jar:1.4]
at org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:43) [netty-3.9.2.Final.jar:?]
at org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:67) [netty-3.9.2.Final.jar:?]
at org.jboss.netty.handler.execution.OrderedMemoryAwareThreadPoolExecutor$ChildExecutor.run(OrderedMemoryAwareThreadPoolExecutor.java:314) [netty-3.9.2.Final.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_45-internal]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_45-internal]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_45-internal]
Caused by: java.io.EOFException
at co.paralleluniverse.common.io.ByteBufferInputStream.readFully(ByteBufferInputStream.java:94) ~[galaxy-1.4.jar:1.4]
at co.paralleluniverse.galaxy.core.Message$MSG.readNoHeader(Message.java:1404) ~[galaxy-1.4.jar:1.4]
at co.paralleluniverse.galaxy.core.Message$LineMessage.read1(Message.java:665) ~[galaxy-1.4.jar:1.4]
at co.paralleluniverse.galaxy.core.Message$1.read(Message.java:471) ~[galaxy-1.4.jar:1.4]
at co.paralleluniverse.common.io.Persistables$1.read(Persistables.java:51) ~[galaxy-1.4.jar:1.4]
... 11 more
If I send many messages using the ping pong example, I receive a similar error:
11:37:30.624 �[33m[udpCommReceiveExecutor-3] netty.UDPComm [WARN ]�[m {} Exception caught in channel [id: 0x26fa984c, 0.0.0.0/0.0.0.0:7052]: java.lang.RuntimeException java.io.EOFException java.lang.RuntimeException: java.io.EOFException
at co.paralleluniverse.common.io.Persistables$1.read(Persistables.java:53)
at co.paralleluniverse.galaxy.core.Message.read(Message.java:553)
at co.paralleluniverse.galaxy.core.Message.fromByteBuffer(Message.java:207)
at co.paralleluniverse.galaxy.netty.MessagePacket.fromByteBuffer(MessagePacket.java:181)
11:37:30.624 �[36m[udpCommReceiveExecutor-3] netty.UDPComm [DEBUG]�[m {} Exception caught in channel java.lang.RuntimeException: java.io.EOFException
at co.paralleluniverse.common.io.Persistables$1.read(Persistables.java:53)
at co.paralleluniverse.galaxy.core.Message.read(Message.java:553)
at co.paralleluniverse.galaxy.core.Message.fromByteBuffer(Message.java:207)
at co.paralleluniverse.galaxy.netty.MessagePacket.fromByteBuffer(MessagePacket.java:181)
11:37:30.625 �[32m[udpCommReceiveExecutor-3] netty.UDPComm [INFO ]�[m {} Channel exception: java.lang.RuntimeException java.io.EOFException
11:37:30.625 �[36m[udpCommReceiveExecutor-3] netty.UDPComm [DEBUG]�[m {} Channel exception java.lang.RuntimeException: java.io.EOFException
at co.paralleluniverse.common.io.Persistables$1.read(Persistables.java:53)
at co.paralleluniverse.galaxy.core.Message.read(Message.java:553)
at co.paralleluniverse.galaxy.core.Message.fromByteBuffer(Message.java:207)
at co.paralleluniverse.galaxy.netty.MessagePacket.fromByteBuffer(MessagePacket.java:181)
There are also many other logging messages, but I do not know if they are related.
I can try to come up with a short example if necessary.
If node A owns items X and Y, and node B shares X but not Y, and then A modifies X and Y (resulting in INV X sent to B), and then ownership of Y transfers to node C, and then B gets (multicast) Y (from C), it will read Y's new value but still allow stale reads on X.
Solution: a PUT to a line whose node was previously unknown must purge all I
s.
Simply delegate to ZooKeeper and JGroups implementations.
Both already have an internal counter for allocating ref-ids.
See ZooKeeperCluster.java and JGroupsCluster.Java
The problem is that during item migration (getx
-putx
), the old owner needs to both transfer ownership to the requesting node (putx
), as well as notify its slave (inv
) that it is no longer the owner (of the item). If one of them (slave or peer) do not get the message and the old owner fails, the item may be lost (both peer and slave think they're not the owner) or become conflicted (both think they're the owner).
I suspect that doing this correctly would require a consensus protocol, which might make the whole thing not worthwhile. The only hope is that we could somehow take advantage of the fact that we know that the old owner has failed, i.e. that the node-switch event is received by everyone (we have consensus over that provided by ZooKeeper/JGroups).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.