puniverse / galaxy Goto Github PK

View Code? Open in Web Editor NEW

337.0 337.0 32.0 7.18 MB

A cache-coherent in-memory data grid

Home Page: http://docs.paralleluniverse.co/galaxy/

License: GNU Lesser General Public License v3.0

Java 99.85% Shell 0.15%

galaxy's People

Contributors

Stargazers

Watchers

galaxy's Issues

Support node recovery after ZooKeeper SESSION_EXPIRED event

Hey, If one of the registered galaxy nodes receives SESSION_EXPIRED event from ZooKeeper client than all ephemeral info about this node will be deleted from ZooKeeper cluster. And moreover it breaks consistency of Galaxy cluster cause that particular node will think I am alive but other nodes will be pretty sure it's dead. This situation can be achieved quite easily if we set sessionTimeoutMs to small value like 500 ms. Anyway there should be a valid fail recovery strategy in case of ZooKeeper session expiration.

For more theoretical info about ZooKeeper internal you could read this https://wiki.apache.org/hadoop/ZooKeeper/FAQ#A3

If you can give any advices where I should look forward in order to fix this issue may be I will try to do that. It seems like recreating all ephemeral nodes will be enough as a simpliest solution.

errors in first build after download

lib directory does not exist
high-scale-lib does not exist

B+ Tree implementation in Galaxy

Saw the high scalability blog post on Galaxy and got intrigued. You discuss a B+ tree implementation in the posting and I was wondering whether that implementation actually exists or is just a thought experiment?
Would be great to build on top of this and connect it to the Titan graph database.

PS: Sorry for spamming your issue tracker - could not find a mailing list or forum or such.

ZooKeeperDistributedTree#create fails sometimes

Method create(String node, boolean ephemeral) is not thread safe. Pattern

if (exists(path)) {
  return;
}
createNode(path);

doesn't work in distributed environment. To make this pattern thread-safe we have to catch and ignore KeeperException$NodeExistsException while creating a node.

Use j.u.c.Lock instead of synchronized in UDPComm.sendXXX

...so that, in the future, we'll be able to pass in a lock factory that makes Fiber-blocking locks.

ZooKeeper may think a node is dead before the node realizes

If ZK thinks a node is dead, the information may not spread to the cluster at once. The server may think the node has died and start serving its items to other nodes who've noticed the death, while the node itself thinks it's still alive and continue serving nodes that also think it's alive.

BackupImpl deadlocks under load

Allow more control in the API over Messenger message ACKS

This will enable us to control backpressure for remote Quasar channels.

Server slaves

Because we might want more than a 2x replication, we shouldn't use Galaxy's backup mechanism, but rely on the underlying database's replication. The slave nodes should just do nothing, and start serving data once they're master.

set without getx in transaction does't work

The scenario I tried is this.
begin transaction

root = getRoot
set(root)
commit

then from other node:
4.root = getRoot
5. get(root) ==> TIMEOUT

when I added before 2:
1.5: getX(root) everything worked fine.

In the logs I saw that the getX got to the first node, but handled moved to the PendingMessages probably by shouldHoldMessage. I think that the reason is that it was flagged as MODIFIED.

Support storing and retrieving of nulls

queue doesn't flush before cluster node is shutdown

when I sent 100 msgs from 100 parallel actors and then exit, the last msgs is not received by the receiver. I know that some of the msgs where postponed because the queue was full.
After I sleeped 500 ms, all the msgs were received. I guess that when the node shutdown the postponed msgs have not been sent

ack on send

The ack is sent when the cache gets the message and not after the call to the listener.
If the receiver is not fast enough there is no way to do backPressure this way.
The ack should be sent after the listener is finished successfully, and sometimes when the listener is async we should ack only upon the listener application explictlty ACKs that the message handling finished.

Upgrade to Metrics 3

Use Log4j 2 for logging

Use log4j 2

~~Adding LogBack should be trivial, as we're using the slf4j API.~~
~~See: Reasons to switch~~

Question about message size

This is probably a stupid question with a simple answer, but I can't seem to figure out which knob to turn.

How do you send large messages without crashing? I either get a Channel exception: java.io.IOException Message too long error for the example below or the process seems to just hang for even larger sizes.

To reproduce the bug is simple. Using the distributed ping pong example, simply add a new field to the Message class with a large enough size:

final byte[] bar = new byte[102400];

I am using JGroups in case that is relevant.

Thanks!

AssertionError in nodeUpdated

 [peer2] 09:49:31.972 [RemoteCall: RemoteCall{org.gridkit.zerormi.RmiGateway$CounterAgent#remoteCall(Callable)}.4-EventThread]           zookeeper.ClientCnxn [ERROR] {} Error while calling watcher  java.lang.AssertionError
    [peer2]     at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.nodeUpdated(DistributedBranchHelper.java:179)
    [peer2]     at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.access$200(DistributedBranchHelper.java:37)
    [peer2]     at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$1.nodeChildUpdated(DistributedBranchHelper.java:69)
    [peer2]     at co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree$MyWatcher$1.process(ZooKeeperDistributedTree.java:332)
    [peer2]     at org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:61)
    [peer2]     at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
    [peer2]     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
    [peer2]

Inconsistency in O to E Transition ?

node 1 updates A in transaction.
no one can read A till all the invacks returns. Can node1 read it (outside the transaction) ?

Dually license under EPL

Like quasar and pulsar.

Add remote CAS operation

Allow more than one peer slave

This will require some consensus algorithm for the slaves, so it might be quite complicated. The problem is that when the master dies, the slaves may not all be in the same state, and once a new master is elected, its slaves may7 not be in sync with it.

Perhaps a better idea (and who would want 3x hardware anyways?) is to have a new type of node. Nodes that aren't assigned a node id, and simply monitor the cluster for failures. Once a master dies and a slave takes over, one (how is it chosen?) of these passive nodes will become a new slave (it will have to replicate all the old data - how much burden would that put on the new master?)

ConcurrentLinkedHashMap --> Caffeine when Java-8 based

When transitioning to requiring Java 8, please upgrade to Caffeine. You know the details, so I won't bore you with the elevator pitch.

ConcurrentLinkedHashMap changes will continue to be minimal, even more so now, and driven by requests from Java 6 users unable to upgrade. Caffeine is ideally the upgrade path for Guava cache users too, which due to Android cannot be significantly modified.

Upgrade to ConcurrentLinkedHashMap v1.4

The latest version has significant improvements when judged by a synthetic benchmark. Previous versions were optimized around real-world usage only, so the performance is likely comparable (e.g. as found by Cassandra). This release focused on synthetic workloads (benchmarks) and, depending on the application, may offer a real-world improvement.

exception on ip_port property

9:16:39.270 #11                 co.paralleluniverse.galaxy.core.AbstractCluster [INFO   ] New node added: node-0000000004 
java.lang.Exception: Stack trace
    at java.lang.Thread.dumpStack(Thread.java:1342)
    at co.paralleluniverse.galaxy.core.AbstractCluster.nodeAdded(AbstractCluster.java:432)
    at co.paralleluniverse.galaxy.core.AbstractCluster.access$600(AbstractCluster.java:55)
    at co.paralleluniverse.galaxy.core.AbstractCluster$3.nodeChildAdded(AbstractCluster.java:222)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.nodeCompleted(DistributedBranchHelper.java:174)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.access$800(DistributedBranchHelper.java:44)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.testComplete(DistributedBranchHelper.java:310)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.testProperty(DistributedBranchHelper.java:297)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.<init>(DistributedBranchHelper.java:273)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.nodeAdded(DistributedBranchHelper.java:158)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.access$000(DistributedBranchHelper.java:44)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$1.nodeChildAdded(DistributedBranchHelper.java:64)
    at co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree$1.processResult(ZooKeeperDistributedTree.java:87)
    at com.netflix.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:535)
    at com.netflix.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:446)
    at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:136)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:600)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
19:16:39.274 #11    co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree [INFO   ] Adding listener NODE node-0000000004 id: -1 on /co.paralleluniverse.galaxy/nodes/node-0000000004 
19:16:39.768 #11    co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree [SEVERE ] Node /co.paralleluniverse.galaxy/nodes/node-0000000004/ip_port getData has failed! 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /co.paralleluniverse.galaxy/nodes/node-0000000004/ip_port
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131)
    at com.netflix.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:254)
    at com.netflix.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:243)
    at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85)
    at com.netflix.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:239)
    at com.netflix.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:231)
    at com.netflix.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:39)
    at co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree.get(ZooKeeperDistributedTree.java:195)
    at co.paralleluniverse.galaxy.core.AbstractCluster$NodeInfoImpl.nodeChildUpdated(AbstractCluster.java:974)
    at co.paralleluniverse.galaxy.core.AbstractCluster$NodeInfoImpl.<init>(AbstractCluster.java:860)
    at co.paralleluniverse.galaxy.core.AbstractCluster.createNodeInfo(AbstractCluster.java:624)
    at co.paralleluniverse.galaxy.core.AbstractCluster.nodeAdded(AbstractCluster.java:433)
    at co.paralleluniverse.galaxy.core.AbstractCluster.access$600(AbstractCluster.java:55)
    at co.paralleluniverse.galaxy.core.AbstractCluster$3.nodeChildAdded(AbstractCluster.java:222)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.nodeCompleted(DistributedBranchHelper.java:174)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.access$800(DistributedBranchHelper.java:44)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.testComplete(DistributedBranchHelper.java:310)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.testProperty(DistributedBranchHelper.java:297)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.<init>(DistributedBranchHelper.java:273)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.nodeAdded(DistributedBranchHelper.java:158)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.access$000(DistributedBranchHelper.java:44)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$1.nodeChildAdded(DistributedBranchHelper.java:64)
    at co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree$1.processResult(ZooKeeperDistributedTree.java:87)
    at com.netflix.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:535)
    at com.netflix.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:446)
    at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:136)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:600)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)

19:16:39.775 #11                 co.paralleluniverse.galaxy.core.AbstractCluster [SEVERE ] Exception while reading control tree value. 
java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /co.paralleluniverse.galaxy/nodes/node-0000000004/ip_port
    at com.google.common.base.Throwables.propagate(Throwables.java:156)
    at co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree.get(ZooKeeperDistributedTree.java:201)
    at co.paralleluniverse.galaxy.core.AbstractCluster$NodeInfoImpl.nodeChildUpdated(AbstractCluster.java:974)
    at co.paralleluniverse.galaxy.core.AbstractCluster$NodeInfoImpl.<init>(AbstractCluster.java:860)
    at co.paralleluniverse.galaxy.core.AbstractCluster.createNodeInfo(AbstractCluster.java:624)
    at co.paralleluniverse.galaxy.core.AbstractCluster.nodeAdded(AbstractCluster.java:433)
    at co.paralleluniverse.galaxy.core.AbstractCluster.access$600(AbstractCluster.java:55)
    at co.paralleluniverse.galaxy.core.AbstractCluster$3.nodeChildAdded(AbstractCluster.java:222)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.nodeCompleted(DistributedBranchHelper.java:174)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.access$800(DistributedBranchHelper.java:44)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.testComplete(DistributedBranchHelper.java:310)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.testProperty(DistributedBranchHelper.java:297)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$Node.<init>(DistributedBranchHelper.java:273)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.nodeAdded(DistributedBranchHelper.java:158)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper.access$000(DistributedBranchHelper.java:44)
    at co.paralleluniverse.galaxy.cluster.DistributedBranchHelper$1.nodeChildAdded(DistributedBranchHelper.java:64)
    at co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree$1.processResult(ZooKeeperDistributedTree.java:87)
    at com.netflix.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:535)
    at com.netflix.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:446)
    at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:136)
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:600)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /co.paralleluniverse.galaxy/nodes/node-0000000004/ip_port
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131)
    at com.netflix.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:254)
    at com.netflix.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:243)
    at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85)
    at com.netflix.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:239)
    at com.netflix.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:231)
    at com.netflix.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:39)
    at co.paralleluniverse.galaxy.zookeeper.ZooKeeperDistributedTree.get(ZooKeeperDistributedTree.java:195)
    ... 20 more

19:16:39.780 #11                 co.paralleluniverse.galaxy.core.AbstractCluster [INFO   ] nodes: {node-0000000004=NODE node-0000000004 id: 1 ip_addr: /10.197.82.95 ip_slave_port: 8051, node-0000000000=NODE node-0000000000 id: 0 ip_addr: /10.197.57.247

Complete monitoring information in the documentation.

java.lang.IllegalArgumentException when using BerkeleyDB

java.lang.IllegalArgumentException: Data field for DatabaseEntry key cannot be null

com.sleepycat.je.utilint.DatabaseUtil.checkForNullDbt(DatabaseUtil.java:58)
com.sleepycat.je.Cursor.getSearchKeyRange(Cursor.java:1723)
co.paralleluniverse.galaxy.berkeleydb.BerkeleyDB.findAllocation(BerkeleyDB.java:298)
co.paralleluniverse.galaxy.core.MainMemory.handleMessageGet(MainMemory.java:164)
co.paralleluniverse.galaxy.core.MainMemory.receive(MainMemory.java:111)
co.paralleluniverse.galaxy.netty.TcpServerServerComm.receive(TcpServerServerComm.java:114)
co.paralleluniverse.galaxy.netty.AbstractTcpServer$4.messageReceived(AbstractTcpServer.java:137)
co.paralleluniverse.galaxy.netty.ChannelMessageNodeResolver.messageReceived(ChannelMessageNodeResolver.java:36)
co.paralleluniverse.galaxy.netty.OneToOneCodec.handleUpstream(OneToOneCodec.java:63)
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)

system.exit(0) doesn't exit

shutdownHooks prevent the exit:

2013-07-17 16:36:17

"Thread-2" - Thread t@11
   java.lang.Thread.State: TIMED_WAITING
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <1b8cfa0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
    at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1433)
    at org.jboss.netty.util.internal.ExecutorUtil.terminate(ExecutorUtil.java:103)
    at org.jboss.netty.channel.socket.oio.OioDatagramChannelFactory.releaseExternalResources(OioDatagramChannelFactory.java:107)
    at co.paralleluniverse.galaxy.netty.UDPComm.shutdown(UDPComm.java:367)
    at co.paralleluniverse.common.spring.Component.destroy(Component.java:78)
    at org.springframework.beans.factory.support.DisposableBeanAdapter.destroy(DisposableBeanAdapter.java:211)
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroyBean(DefaultSingletonBeanRegistry.java:498)
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingleton(DefaultSingletonBeanRegistry.java:474)
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingletons(DefaultSingletonBeanRegistry.java:442)
    - locked <6e670e34> (a java.util.LinkedHashMap)
    at org.springframework.context.support.AbstractApplicationContext.destroyBeans(AbstractApplicationContext.java:1066)
    at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1040)
    at org.springframework.context.support.AbstractApplicationContext$1.run(AbstractApplicationContext.java:958)

   Locked ownable synchronizers:
    - None

"main" - Thread t@1
   java.lang.Thread.State: WAITING
    at java.lang.Object.wait(Native Method)
    - waiting on <65778857> (a org.springframework.context.support.AbstractApplicationContext$1)
    at java.lang.Thread.join(Thread.java:1258)
    at java.lang.Thread.join(Thread.java:1332)
    at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
    at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
    at java.lang.Shutdown.runHooks(Shutdown.java:123)
    at java.lang.Shutdown.sequence(Shutdown.java:167)
    at java.lang.Shutdown.exit(Shutdown.java:212)
    - locked <58218a30> (a java.lang.Class)
    at java.lang.Runtime.exit(Runtime.java:107)
    at java.lang.System.exit(System.java:960)
    at co.paralleluniverse.galaxy.example.PeerTKB.run(PeerTKB.java:304)
    at co.paralleluniverse.galaxy.example.Peer2.main(Peer2.java:9)

   Locked ownable synchronizers:
    - None

Use MurmurHash for hashing

https://github.com/tnm/murmurhash-java

Sent messages are not always received

I have a setup where many (50-1000) messages are sent from one Quasar actor to another.

I receive the following error on the recipient end:

68325 [WARN] UDPComm: Exception caught in channel [id: 0x5c10b3ac, 0.0.0.0/0.0.0.0:7052]:  java.lang.RuntimeException java.io.EOFException
java.lang.RuntimeException: java.io.EOFException
    at co.paralleluniverse.common.io.Persistables$1.read(Persistables.java:53) ~[galaxy-1.4.jar:1.4]
    at co.paralleluniverse.galaxy.core.Message.read(Message.java:553) ~[galaxy-1.4.jar:1.4]
    at co.paralleluniverse.galaxy.core.Message.fromByteBuffer(Message.java:207) ~[galaxy-1.4.jar:1.4]
    at co.paralleluniverse.galaxy.netty.MessagePacket.fromByteBuffer(MessagePacket.java:181) ~[galaxy-1.4.jar:1.4]
    at co.paralleluniverse.galaxy.netty.MessagePacketCodec.decode(MessagePacketCodec.java:54) ~[galaxy-1.4.jar:1.4]
    at co.paralleluniverse.galaxy.netty.OneToOneCodec.handleUpstream(OneToOneCodec.java:59) ~[galaxy-1.4.jar:1.4]
    at org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:43) [netty-3.9.2.Final.jar:?]
    at org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:67) [netty-3.9.2.Final.jar:?]
    at org.jboss.netty.handler.execution.OrderedMemoryAwareThreadPoolExecutor$ChildExecutor.run(OrderedMemoryAwareThreadPoolExecutor.java:314) [netty-3.9.2.Final.jar:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_45-internal]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_45-internal]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_45-internal]
Caused by: java.io.EOFException
    at co.paralleluniverse.common.io.ByteBufferInputStream.readFully(ByteBufferInputStream.java:94) ~[galaxy-1.4.jar:1.4]
    at co.paralleluniverse.galaxy.core.Message$MSG.readNoHeader(Message.java:1404) ~[galaxy-1.4.jar:1.4]
    at co.paralleluniverse.galaxy.core.Message$LineMessage.read1(Message.java:665) ~[galaxy-1.4.jar:1.4]
    at co.paralleluniverse.galaxy.core.Message$1.read(Message.java:471) ~[galaxy-1.4.jar:1.4]
    at co.paralleluniverse.common.io.Persistables$1.read(Persistables.java:51) ~[galaxy-1.4.jar:1.4]
    ... 11 more

If I send many messages using the ping pong example, I receive a similar error:

11:37:30.624 �[33m[udpCommReceiveExecutor-3]                  netty.UDPComm [WARN ]�[m {} Exception caught in channel [id: 0x26fa984c, 0.0.0.0/0.0.0.0:7052]:  java.lang.RuntimeException java.io.EOFException java.lang.RuntimeException: java.io.EOFException
    at co.paralleluniverse.common.io.Persistables$1.read(Persistables.java:53)
    at co.paralleluniverse.galaxy.core.Message.read(Message.java:553)
    at co.paralleluniverse.galaxy.core.Message.fromByteBuffer(Message.java:207)
    at co.paralleluniverse.galaxy.netty.MessagePacket.fromByteBuffer(MessagePacket.java:181)
11:37:30.624 �[36m[udpCommReceiveExecutor-3]                  netty.UDPComm [DEBUG]�[m {} Exception caught in channel java.lang.RuntimeException: java.io.EOFException
    at co.paralleluniverse.common.io.Persistables$1.read(Persistables.java:53)
    at co.paralleluniverse.galaxy.core.Message.read(Message.java:553)
    at co.paralleluniverse.galaxy.core.Message.fromByteBuffer(Message.java:207)
    at co.paralleluniverse.galaxy.netty.MessagePacket.fromByteBuffer(MessagePacket.java:181)
11:37:30.625 �[32m[udpCommReceiveExecutor-3]                  netty.UDPComm [INFO ]�[m {} Channel exception: java.lang.RuntimeException java.io.EOFException 
11:37:30.625 �[36m[udpCommReceiveExecutor-3]                  netty.UDPComm [DEBUG]�[m {} Channel exception java.lang.RuntimeException: java.io.EOFException
    at co.paralleluniverse.common.io.Persistables$1.read(Persistables.java:53)
    at co.paralleluniverse.galaxy.core.Message.read(Message.java:553)
    at co.paralleluniverse.galaxy.core.Message.fromByteBuffer(Message.java:207)
    at co.paralleluniverse.galaxy.netty.MessagePacket.fromByteBuffer(MessagePacket.java:181)

There are also many other logging messages, but I do not know if they are related.

I can try to come up with a short example if necessary.

Add examples

Consistency broken when stale reads are allowed after a new PUT

If node A owns items X and Y, and node B shares X but not Y, and then A modifies X and Y (resulting in INV X sent to B), and then ownership of Y transfers to node C, and then B gets (multicast) Y (from C), it will read Y's new value but still allow stale reads on X.

Solution: a PUT to a line whose node was previously unknown must purge all Is.

T

Add distributed counter service

Simply delegate to ZooKeeper and JGroups implementations.
Both already have an internal counter for allocating ref-ids.

See ZooKeeperCluster.java and JGroupsCluster.Java

Server-less configuration not fault-tolerant

The problem is that during item migration (getx-putx), the old owner needs to both transfer ownership to the requesting node (putx), as well as notify its slave (inv) that it is no longer the owner (of the item). If one of them (slave or peer) do not get the message and the old owner fails, the item may be lost (both peer and slave think they're not the owner) or become conflicted (both think they're the owner).

I suspect that doing this correctly would require a consensus protocol, which might make the whole thing not worthwhile. The only hope is that we could somehow take advantage of the fact that we know that the old owner has failed, i.e. that the node-switch event is received by everyone (we have consensus over that provided by ZooKeeper/JGroups).

puniverse / galaxy Goto Github PK

galaxy's People

Contributors

Stargazers

Watchers

Forkers

galaxy's Issues

Recommend Projects

Recommend Topics

Recommend Org