devsisters / shardcake Goto Github PK

View Code? Open in Web Editor NEW

373.0 12.0 27.0 15.31 MB

Sharding and location transparency for Scala

Home Page: https://devsisters.github.io/shardcake/

License: Apache License 2.0

Scala 98.96% JavaScript 1.00% Stylus 0.05%

functional-programming messaging scala sharding actor-model

shardcake's Introduction

Shardcake

Shardcake is a Scala open source library that makes it easy to distribute entities across multiple servers and interact with those entities using their ID without knowing their actual location (this is also known as location transparency).

Shardcake exposes a purely functional API and depends heavily on ZIO.

The Documentation explains how to use Shardcake, in particular:

how to get started
the architecture behind shardcake
how to configure it
parts you can customize

shardcake's People

Contributors

Stargazers

Watchers

shardcake's Issues

An alternative to Replier, which can be used to ingest multiple messages

Currently, Replier can be used only once to complete send.
However, in some cases, it is not enough to receive a single response.
Instead, you may need a Queue to consume multiple messages without blocking.

For instance, each pod could expose a web socket, which needs a steady stream of events generated by an entity.
Currently, the only possible way (seemingly) is to execute send in a loop, waiting for the response from the entity, which is blocking busy waiting.

It would be convenient if there were an operation exposing a Queue.
Thus, it could be possible to avoid blocking and there would be a possibility for an entity to respond back continually.

[Feature Request] Graceful early termination without message timeout

An entity may currently terminate itself early using ZIO.interrupt or be externally interrupted, but this comes at the expense of a SendTimeoutException eventually being issued for every message in the entity's queue before the queue shuts down. It would be nice to have at least one fail-fast/recovery mechanism that stops accepting new messages into the queue and allows processing of any remaining messages in the queue so that the senders do not have to wait the full timeout period.

Broad-strokes ideas for termination strategies:

enqueue a custom message and start a configurable termination timeout. the entity may continue processing the queue, eventually handling the termination message and interrupt itself or discard the custom message where it will be interrupted after the termination timeout
register a callback accepting the Dequeue and invoke that upon any kind of termination signal
a conditional method that stops the entity only when the queue is currently empty. this may either require synchronization or be combined with the first two options to cover the simultaneous issuance of a termination and the receipt of a message
an immediate restart, where if the shard still manages the entity it is re-created using the non-empty queue, and may be combined with the first two strategies in the event the entity is no longer managed by the shard

It might be enough to have a single Sharding.terminateEntity[T](typ: EntityType[T], id: String, termination: TerminationStrategy[T]) style method, but perhaps specific methods like Sharding.terminateMessage[T](typ: EntityType[T], id: String, termination: T) and Sharding.terminateRemaining[T](typ: EntityType[T], id: String)(termination: Dequeue[T] => ZIO[...]) and variants make sense

Ability to respond with a stream

The idea was suggested during a Zymposium and I think it might be doable. It would basically be a variant sendStream that returns a stream and pass a Replier of ZStream. We could back it by a gRPC stream for a full streaming experience.

[Feature Request] resume failed singleton job

Is there some way to restart a died singleton job?
It would be great if there is some option that determines whether to restart or not in registerSingleton function, or some api like restartSingleton which can restart died job manually.

[Interruption] terminateEntity does not wait for graceful interruption to complete

It seems like terminateEntity after sending the termination message to the queue does not wait for the created promise.
That means that after the message is succefully delivered, the entity terminates immediatly without waiting for proper interruption.

Example written in typescript:
https://github.com/mattiamanzati/shardcake/blob/dca200d6c6c3d10a18379355cc23c74fab6338cf/test/SampleTests.ts#L238-L290

Shard termination is handled fine instead, as all promises are awaited or timed out.

[Feature Request] Passivation

After talking to a colleague who has a lot of experience with Akka clustering, it seems this feature would be well-received by those looking for Akka clustering alternatives. In Akka, passivation works with persistent entities: if an entity is not being used, it can be stopped to reduce memory consumption. Looking through the documentation, it could be done by:

Setting a timeout on receiving messages (context.setReceiveTimeout in Akka)
Automatically through a config by providing passivation strategy options (e.g. idle entity passivation, limits on the active number of entities, etc)

I don't know if there's an implementation in the works for this, but in case there isn't but there is some interest for it, I'd be happy to take a crack at it.

Allow override of maxSessionIdleTime per entity

Either by config or on registerEntity.

I have many entities that don't need to live long, they are called once and then will lie dormant for quite a long time. On the other hand there are some that are called with a long call timeout because it has to deal with slow interfaces, but increasing globally the idleTimeout for that uses a lot of memory to hold the short-lived ones.

I have solution already, quite simple to do

log when Shard Manager is fully started

When start shard manager, we need a log that it is fully initialized.
Now, shard manager logs Shard Manager loaded when it makes ShardManager.live
But, after making ShardManager.live, it makes api through Server.run.
So I'm confused if it's fully initialized.

When to release newer version, which supports zio-grpc 0.6.0-rc5

I'm trying to migrate my akka-cluster dependency into shardcake.
The latest version of shardcake is 2.0.6, which depends on zio-grpc 0.6.0-rc1.
But as you know, breaking change occurred between 0.6.0-rc1 and latest version (0.6.0-rc5). And I saw the PR which handles this change. (#69)
I want to apply this update to my project. When is the newer version supposed to be released?

[Interruption] if entity terminates, and then shard shuts down, entity termination is not waited

When an entity starts to terminate, it forks daemon the interruption effect and removes the entities from the alive set.
If shard interruption occurs and completes before that entity fully shutdown, entity termination will not complete.

Sample test written in typescript:
https://github.com/mattiamanzati/shardcake/blob/dca200d6c6c3d10a18379355cc23c74fab6338cf/test/SampleTests.ts#L292-L344

Fail when you send a message to an entity type that wasn't registered

See logic in Sharding when we access entityStates, currently we just do nothing when the entity type is not found. An error would be better.

Question about performance tuning

Hello, I'm trying to migrate my project from akka-cluster to shardcake.

I finished migrating my code, and I'm comparing the overall performance of akka & shardcake, using K6.

I expected that there would not be any big difference in performance between these two library, but it seems that the performance when I use akka is about 4~5x better than when I use shardcake.

Is there any configuration that I should adjust further, or is this just shardcake's structural problem?

K6 test result

shardcake version:

akka version:

Infrastructure & Configuration about shardcake

Deployed using k8s
Used single Redis for storage
number of pods: 5
numberOfShards: 50
sendTimeout: 60 seconds

Fix streaming responses

After doing more tests on the streaming responses feature, I found that I missed some issues. See #77

Helpers to know if an entity type is registered

add isEntityTypeRegistered to Sharding
a function that returns a stream of changes could be nice

Documentation about message versioning strategy

What are the options for evolving the messages that entities can process, in particular with regards to how to roll that out (with downtime or online)?

From the architecture overview, it looks like

messages are not persisted (just used for network calls), so if you can shut down the whole system and reboot with a new version, any kind of changes are possible
for rolling upgrades, you'd need to be in some state where new messages can be defined in code already, but as long as they are not used, the older version of the protocol still needs to be able to read the older messages sent by the new protocol. After every pod has been upgraded to the new version of the code, you can start using the new messages.
from the looks of it, adding new entity types and new message case classes should work regardless of serialization method (as well as deleting entities and case classes)
you can have different version of the pods specify different entities as being available in it in pretty much any combination, which should help here
adding new fields to existing case classes with provided default or as optional fields may work depending on the serialization method. How does Kryo behave in this regard? If you add an optional field to a case class, can it still handle messages from the earlier version that does include it?

[Feature request] Cluster stats

Consider adding support for reporting statistics such as the following:

Number of active pods
Number of entities
Number of shards
Number of register/unregister events
Results of health checks
Start and stop (and by extension runtime) times of rebalances

Having the above in something like StatsD or Prometheus would enable visualizing cluster size, status and health very easily.

Problem with using multiple pods

Simple Memory Example

GuildApp

object GuildApp extends ZIOAppDefault {
  val config: ZLayer[Any, SecurityException, Config] =
    ZLayer(
      System
        .env("port")
        .map(_.flatMap(_.toIntOption).fold(Config.default)(port => Config.default.copy(shardingPort = port)))//8001
    )

  val program =
    for {
      _     <- Sharding.registerEntity(Guild, behavior)
      _     <- Sharding.registerScoped
      guild <- Sharding.messenger(Guild)
      _     <- guild.send("guild1")(Join("user1", _)).debug
      _     <- guild.send("guild1")(Join("user2", _)).debug
      _     <- guild.send("guild1")(Join("user3", _)).debug
      _     <- guild.send("guild1")(Join("user4", _)).debug
      _     <- guild.send("guild1")(Join("user5", _)).debug
      _     <- guild.send("guild1")(Join("user6", _)).debug
      _     <- ZIO.never
    } yield ()

  def run: Task[Unit] =
    ZIO
      .scoped(program)
      .provide(
        config,
        ZLayer.succeed(GrpcConfig.default),
        Serialization.javaSerialization,
        Storage.memory,
        ShardManagerClient.liveWithSttp,
        GrpcPods.live,
        Sharding.live,
        GrpcShardingService.live
      )
}

GuildAppPod2

 //... same code 8002
  val program =
    for {
      _     <- Sharding.registerEntity(Guild, behavior)
      _     <- Sharding.registerScoped
      guild <- Sharding.messenger(Guild)
      _     <- guild.send("guild1")(Join("a", _)).debug
      _     <- guild.send("guild1")(Join("b", _)).debug
      _     <- guild.send("guild1")(Join("c", _)).debug
      _     <- guild.send("guild1")(Join("d", _)).debug
      _     <- guild.send("guild1")(Join("e", _)).debug
      _     <- guild.send("guild1")(Join("f", _)).debug
      _     <- ZIO.never
    } yield ()

//... same code

simple.ShardManagerApp

timestamp=2022-12-21T09:03:14.168151Z level=INFO thread=#zio-fiber-7 message="Shard Manager loaded" location=com.devsisters.shardcake.ShardManager.live file=ShardManager.scala line=208
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
timestamp=2022-12-21T09:03:14.749419Z level=INFO thread=#zio-fiber-6 message="Shard Manager server started on port 8080." location=com.devsisters.shardcake.Server.run file=Server.scala line=29
timestamp=2022-12-21T09:03:29.723609Z level=INFO thread=#zio-fiber-50 message="Registering Pod(localhost:8001,1.0.0)" location=com.devsisters.shardcake.ShardManager.register file=ShardManager.scala line=33
timestamp=2022-12-21T09:03:30.188122Z level=INFO thread=#zio-fiber-20 message="ShardsAssigned(localhost:8001,HashSet(69, 138, 101, 249, 234, 88, 170, 115, 5, 269, 202, 217, 276, 120, 247, 10, 56, 142, 153, 174, 185, 42, 24, 288, 37, 25, 257, 52, 14, 184, 110, 125, 196, 157, 189, 20, 46, 93, 284, 152, 228, 289, 57, 78, 261, 29, 216, 164, 179, 106, 238, 121, 84, 211, 253, 147, 280, 61, 221, 293, 132, 1, 265, 74, 206, 89, 133, 116, 243, 292, 248, 270, 220, 102, 233, 6, 60, 117, 85, 201, 260, 160, 192, 165, 33, 28, 38, 297, 70, 275, 21, 137, 92, 229, 252, 197, 65, 97, 285, 224, 156, 9, 188, 53, 169, 141, 77, 193, 212, 96, 109, 256, 124, 225, 173, 13, 129, 41, 134, 73, 2, 266, 205, 128, 237, 105, 244, 298, 166, 148, 264, 45, 161, 17, 149, 32, 34, 279, 64, 180, 296, 176, 191, 22, 44, 286, 291, 59, 118, 281, 204, 259, 27, 71, 12, 54, 144, 49, 236, 181, 86, 159, 187, 172, 113, 219, 274, 81, 230, 76, 7, 245, 39, 98, 271, 208, 103, 140, 213, 91, 66, 155, 198, 108, 240, 251, 130, 278, 223, 135, 299, 267, 167, 35, 226, 3, 241, 80, 162, 255, 209, 112, 123, 194, 145, 48, 63, 295, 18, 282, 150, 95, 263, 50, 67, 199, 16, 127, 31, 177, 182, 154, 11, 72, 175, 143, 43, 99, 87, 203, 218, 104, 250, 231, 40, 26, 258, 158, 186, 171, 139, 23, 55, 114, 8, 75, 207, 272, 82, 290, 119, 58, 235, 246, 214, 287, 151, 300, 36, 146, 30, 51, 190, 273, 168, 262, 183, 19, 210, 107, 268, 79, 195, 94, 283, 239, 242, 4, 294, 126, 136, 15, 68, 62, 178, 277, 131, 47, 163, 200, 122, 83, 215, 222, 232, 100, 90, 111, 254, 227))" location=com.devsisters.shardcake.ShardManager.live file=ShardManager.scala line=207
timestamp=2022-12-21T09:03:43.032295Z level=INFO thread=#zio-fiber-70 message="Registering Pod(localhost:8002,1.0.0)" location=com.devsisters.shardcake.ShardManager.register file=ShardManager.scala line=33
timestamp=2022-12-21T09:03:53.242502Z level=INFO thread=#zio-fiber-75 message="Unregistering localhost:8002" location=com.devsisters.shardcake.ShardManager.unregister file=ShardManager.scala line=56

GuildApp Pod1

timestamp=2022-12-21T09:03:29.249530Z level=INFO thread=#zio-fiber-29 message="Registered entity guild" location=com.devsisters.shardcake.Sharding.live file=Sharding.scala line=422
Success(Set(user1pod1))
Success(Set(user1pod1, user2pod1))
Success(Set(user1pod1, user2pod1, user3pod1))
Success(Set(user1pod1, user2pod1, user3pod1, user4pod1))
Success(HashSet(user5pod1, user3pod1, user2pod1, user4pod1, user1pod1))
Success(HashSet(user5pod1, user3pod1, user2pod1, user4pod1, user6pod1, user1pod1))

GuildAppPod2

timestamp=2022-12-21T09:03:43.008882Z level=INFO thread=#zio-fiber-29 message="Registered entity guild" location=com.devsisters.shardcake.Sharding.live file=Sharding.scala line=422
<FAIL> Fail(com.devsisters.shardcake.errors.SendTimeoutException: Timeout sending message to guild guild1 - Join(a,Replier(5d0f1408-9361-40b9-8a6e-844c0b903a19)),Stack trace for thread "zio-fiber-6":
	at com.devsisters.shardcake.Sharding.messenger.$anon.send(Sharding.scala:252)
	at example.simple.GuildAppPod2.program(GuildAppPod2.scala:22)
	at example.simple.GuildAppPod2.run(GuildAppPod2.scala:33)
	at example.simple.GuildAppPod2.run(GuildAppPod2.scala:34))
timestamp=2022-12-21T09:03:53.289079Z level=ERROR thread=#zio-fiber-0 message="" cause="Exception in thread "zio-fiber-6" com.devsisters.shardcake.errors.SendTimeoutException: com.devsisters.shardcake.errors.SendTimeoutException: Timeout sending message to guild guild1 - Join(a,Replier(5d0f1408-9361-40b9-8a6e-844c0b903a19))
	at com.devsisters.shardcake.Sharding.messenger.$anon.send(Sharding.scala:252)
	at example.simple.GuildAppPod2.program(GuildAppPod2.scala:22)
	at example.simple.GuildAppPod2.run(GuildAppPod2.scala:33)
	at example.simple.GuildAppPod2.run(GuildAppPod2.scala:34)"

Process finished with exit code 1

Expected Result

I think it should work with two pods on the memory state

Complex redis examples

GuildBehavior

  object Guild extends EntityType[GuildMessage]("guild")
  object Lonca extends EntityType[GuildMessage]("lonca")

GuildApp Pod1

import com.devsisters.shardcake._
import com.devsisters.shardcake.interfaces.Serialization
import dev.profunktor.redis4cats.RedisCommands
import example.complex.GuildBehavior.GuildMessage.{ Join, Terminate }
import example.complex.GuildBehavior._
import zio._

object GuildApp extends ZIOAppDefault {
  val config: ZLayer[Any, SecurityException, Config] =
    ZLayer(
      System
        .env("port")
        .map(_.flatMap(_.toIntOption).fold(Config.default)(port => Config.default.copy(shardingPort = port)))//2000
    )

  val program: ZIO[Sharding with Scope with Serialization with RedisCommands[Task, String, String], Throwable, Unit] =
    for {
      _     <- Sharding.registerEntity(Guild, behavior, p => Some(Terminate(p)))
      _     <- Sharding.registerScoped
      guild <- Sharding.messenger(Guild)
      user1 <- Random.nextUUID.map(_.toString)
      user2 <- Random.nextUUID.map(_.toString)
      user3 <- Random.nextUUID.map(_.toString)
      _     <- guild.send("guild1")(Join(user1, _)).debug
      _     <- guild.send("guild1")(Join(user2, _)).debug
      _     <- guild.send("guild1")(Join(user3, _)).debug
      _     <- ZIO.never
    } yield ()

  def run: Task[Unit] =
    ZIO
      .scoped(program)
      .provide(
        config,
        ZLayer.succeed(GrpcConfig.default),
        ZLayer.succeed(RedisConfig.default),
        redis,
        StorageRedis.live,
        KryoSerialization.live,
        ShardManagerClient.liveWithSttp,
        GrpcPods.live,
        Sharding.live,
        GrpcShardingService.live
      )
}

GuildAppTwo Pod2

//... same code  port 3000
  val program: ZIO[Sharding with Scope with Serialization with RedisCommands[Task, String, String], Throwable, Unit] =
    for {
      _     <- Sharding.registerEntity(Lonca, behavior, p => Some(Terminate(p)))
      _     <- Sharding.registerScoped
      lonca <- Sharding.messenger(Lonca)
      user1 <- Random.nextUUID.map(_.toString)
      user2 <- Random.nextUUID.map(_.toString)
      user3 <- Random.nextUUID.map(_.toString)
      _     <- lonca.send("lonca1")(Join(user1, _)).debug
      _     <- lonca.send("lonca2")(Join(user2, _)).debug
      _     <- lonca.send("lonca3")(Join(user3, _)).debug
      _     <- ZIO.never
    } yield ()
//... same code

complex.SharManagerApp

timestamp=2022-12-21T09:33:39.419819Z level=INFO thread=#zio-fiber-7 message="Shard Manager loaded" location=com.devsisters.shardcake.ShardManager.live file=ShardManager.scala line=208
timestamp=2022-12-21T09:33:39.671944Z level=INFO thread=#zio-fiber-6 message="Shard Manager server started on port 8080." location=com.devsisters.shardcake.Server.run file=Server.scala line=29
timestamp=2022-12-21T09:35:47.164573Z level=INFO thread=#zio-fiber-197 message="Registering Pod(localhost:2000,1.0.0)" location=com.devsisters.shardcake.ShardManager.register file=ShardManager.scala line=33
timestamp=2022-12-21T09:35:47.634762Z level=INFO thread=#zio-fiber-32 message="ShardsAssigned(localhost:2000,HashSet(69, 138, 101, 249, 234, 88, 170, 115, 5, 269, 202, 217, 276, 120, 247, 10, 56, 142, 153, 174, 185, 42, 24, 288, 37, 25, 257, 52, 14, 184, 110, 125, 196, 157, 189, 20, 46, 93, 284, 152, 228, 289, 57, 78, 261, 29, 216, 164, 179, 106, 238, 121, 84, 211, 253, 147, 280, 61, 221, 293, 132, 1, 265, 74, 206, 89, 133, 116, 243, 292, 248, 270, 220, 102, 233, 6, 60, 117, 85, 201, 260, 160, 192, 165, 33, 28, 38, 297, 70, 275, 21, 137, 92, 229, 252, 197, 65, 97, 285, 224, 156, 9, 188, 53, 169, 141, 77, 193, 212, 96, 109, 256, 124, 225, 173, 13, 129, 41, 134, 73, 2, 266, 205, 128, 237, 105, 244, 298, 166, 148, 264, 45, 161, 17, 149, 32, 34, 279, 64, 180, 296, 176, 191, 22, 44, 286, 291, 59, 118, 281, 204, 259, 27, 71, 12, 54, 144, 49, 236, 181, 86, 159, 187, 172, 113, 219, 274, 81, 230, 76, 7, 245, 39, 98, 271, 208, 103, 140, 213, 91, 66, 155, 198, 108, 240, 251, 130, 278, 223, 135, 299, 267, 167, 35, 226, 3, 241, 80, 162, 255, 209, 112, 123, 194, 145, 48, 63, 295, 18, 282, 150, 95, 263, 50, 67, 199, 16, 127, 31, 177, 182, 154, 11, 72, 175, 143, 43, 99, 87, 203, 218, 104, 250, 231, 40, 26, 258, 158, 186, 171, 139, 23, 55, 114, 8, 75, 207, 272, 82, 290, 119, 58, 235, 246, 214, 287, 151, 300, 36, 146, 30, 51, 190, 273, 168, 262, 183, 19, 210, 107, 268, 79, 195, 94, 283, 239, 242, 4, 294, 126, 136, 15, 68, 62, 178, 277, 131, 47, 163, 200, 122, 83, 215, 222, 232, 100, 90, 111, 254, 227))" location=com.devsisters.shardcake.ShardManager.live file=ShardManager.scala line=207
timestamp=2022-12-21T09:35:48.172872Z level=INFO thread=#zio-fiber-218 message="Creating listener for channel: RedisChannel(shard_assignments)" location=example.complex.package.redis.logger.$anon.info file=package.scala line=19
timestamp=2022-12-21T09:36:08.975143Z level=INFO thread=#zio-fiber-242 message="Registering Pod(localhost:3000,1.0.0)" location=com.devsisters.shardcake.ShardManager.register file=ShardManager.scala line=33
timestamp=2022-12-21T09:36:09.412721Z level=INFO thread=#zio-fiber-251 message="Unregistering localhost:3000" location=com.devsisters.shardcake.ShardManager.unregister file=ShardManager.scala line=56

GuildApp Pod1

timestamp=2022-12-21T09:35:46.832875Z level=INFO thread=#zio-fiber-43 message="Registered entity guild" location=com.devsisters.shardcake.Sharding.live file=Sharding.scala line=422
timestamp=2022-12-21T09:35:47.682447Z level=INFO thread=#zio-fiber-68 message="Started entity guild1" location=example.complex.GuildBehavior.behavior file=GuildBehavior.scala line=26
Success(Set(1d3bb131-4601-417d-a487-0f5a308b76f6))
Success(Set(1d3bb131-4601-417d-a487-0f5a308b76f6pod1, e4071203-58bd-47c4-9609-6a0b34288ceb))
Success(Set(e4071203-58bd-47c4-9609-6a0b34288cebpod1, 1d3bb131-4601-417d-a487-0f5a308b76f6pod1, 7b22920c-34a2-4197-9474-73312e151002))
timestamp=2022-12-21T09:35:47.911356Z level=INFO thread=#zio-fiber-46 message="Creating listener for channel: RedisChannel(shard_assignments)" location=example.complex.package.redis.logger.$anon.info file=package.scala line=19

GuildAppTwo Pod2

timestamp=2022-12-21T09:36:08.960621Z level=INFO thread=#zio-fiber-43 message="Registered entity lonca" location=com.devsisters.shardcake.Sharding.live file=Sharding.scala line=422
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.esotericsoftware.kryo.util.UnsafeUtil (file:/home/burak/.cache/coursier/v1/https/repo1.maven.org/maven2/com/esotericsoftware/kryo-shaded/4.0.2/kryo-shaded-4.0.2.jar) to constructor java.nio.DirectByteBuffer(long,int,java.lang.Object)
WARNING: Please consider reporting this to the maintainers of com.esotericsoftware.kryo.util.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
<FAIL> Fail(io.grpc.StatusException: INTERNAL: Entity type lonca was not registered.,Stack trace for thread "zio-fiber-48":
	at com.devsisters.shardcake.GrpcPods.sendMessage(GrpcPods.scala:76)
	at com.devsisters.shardcake.Sharding.sendToPod(Sharding.scala:216)
	at com.devsisters.shardcake.Sharding.sendToPod(Sharding.scala:234)
	at com.devsisters.shardcake.Sharding.messenger.$anon.sendMessage.trySend(Sharding.scala:265)
	at com.devsisters.shardcake.Sharding.messenger.$anon.sendMessage.trySend(Sharding.scala:262)
	at com.devsisters.shardcake.Sharding.messenger.$anon.send(Sharding.scala:248)
	at com.devsisters.shardcake.Sharding.messenger.$anon.send(Sharding.scala:252))
timestamp=2022-12-21T09:36:09.371320Z level=INFO thread=#zio-fiber-50 message="Creating listener for channel: RedisChannel(shard_assignments)" location=example.complex.package.redis.logger.$anon.info file=package.scala line=19
timestamp=2022-12-21T09:36:09.543638Z level=INFO thread=#zio-fiber-6 message="Releasing PubSub connection: redis://localhost" location=example.complex.package.redis.logger.$anon.info file=package.scala line=19
timestamp=2022-12-21T09:36:09.544986Z level=INFO thread=#zio-fiber-6 message="Releasing PubSub connection: redis://localhost" location=example.complex.package.redis.logger.$anon.info file=package.scala line=19
timestamp=2022-12-21T09:36:09.545296Z level=INFO thread=#zio-fiber-6 message="Releasing Commands connection: redis://localhost" location=example.complex.package.redis.logger.$anon.info file=package.scala line=19
timestamp=2022-12-21T09:36:09.546868Z level=INFO thread=#zio-fiber-6 message="Releasing Redis connection: RedisURI(redis://localhost)" location=example.complex.package.redis.logger.$anon.info file=package.scala line=19
timestamp=2022-12-21T09:36:09.568137Z level=ERROR thread=#zio-fiber-0 message="" cause="Exception in thread "zio-fiber-48" io.grpc.StatusException: io.grpc.StatusException: INTERNAL: Entity type lonca was not registered.
	at com.devsisters.shardcake.GrpcPods.sendMessage(GrpcPods.scala:76)
	at com.devsisters.shardcake.Sharding.sendToPod(Sharding.scala:216)
	at com.devsisters.shardcake.Sharding.sendToPod(Sharding.scala:234)
	at com.devsisters.shardcake.Sharding.messenger.$anon.sendMessage.trySend(Sharding.scala:265)
	at com.devsisters.shardcake.Sharding.messenger.$anon.sendMessage.trySend(Sharding.scala:262)
	at com.devsisters.shardcake.Sharding.messenger.$anon.send(Sharding.scala:248)
	at com.devsisters.shardcake.Sharding.messenger.$anon.send(Sharding.scala:252)"

Process finished with exit code 1

Expected Result

I can add two entities to the same pod but when I want to add them to two different pods I get an error.

Question

In the Redis example, when I send a message to the entity, does it get the data from the cache / memory or does it pull the data from redis?

So, is the need for redis necessary for the pods to be able to map to each other, or is it also necessary for adding and accessing records?

Add a documentation website

recycled pod IP address causes rebalance failure

In one of our small-ish dev clusters, the pod ip address of one of the sharding pods was recycled to another pod that was not part of the shardcake cluster.

We were able to resolve this by restarting the pod with the recycled ip address.

We have decreased rebalanceInterval and rebalanceRetryInterval in an attempt to reduce the likelihood of this happening again.

I think it would be good to use the pod uid as pod identity.

Replace zio-grpc-core with release version

Hi, I noticed that shardcake references a snapshot version of zio-grpc-core on https://github.com/devsisters/shardcake/blob/series/2.x/build.sbt#L6

I'm curious as to what the reason for this is, as snapshot versions are less convenient for regular sbt dependency imports. AFAIK zio-grpc-core already has a release version 0.6 that supports both scala 3/2.13/2.12, see maven repository

Add a more complex example

Pod name in logs

Shard cake shows only the pod ip when printing logs.
When i want to see pod name on debug process, we have to execute k8s command through ip.
It's tiresome.
Why don't you log the name of the pod?

[DX] make registerSingleton accept and automatically provide environment

It would be nice if registerSingleton behaves the same way of registerEntity, by accepting a ZIO with an env, and spread the env requirement to the result ZIO.

see: mattiamanzati/shardcake@8f9bbe9#diff-504aba56bd6bfc9961e6f146a43bea0b654ea96c0549b509879d73d076e18c59

Singleton entities not being started after sharding.refreshAssignments

Can this sequence of events cause a singleton entity not to spawn properly?

The pod with shard 1 fails and is restarted with the same ip address.
The pod calls sharding.refreshAssignments, which runs with .forkDaemon.
Before the pod knows its original assignment, sharding.registerSingleton and sharding.register both runs. The pod thinks it has no assignment.
sharding.refreshAssignments finishes

Release to Sonatype

Delivery guarantees

Hi, could you tell which delivery guarantees the Shardcake offers?

Add more tests

Only ShardManager has tests now. It would be good to have an end-to-end test involving all components.

[Interruption] entityTerminationTimeout is handled only on pod shutdown

I am not sure if this is an intended behaviour, but it seems that entityTerminationTimeout is respected only in pod shutdown.
I think this may lead to misleading situations, where the entity takes a lot to terminate after an idle timeout, and than shutting down the entire pod will hang because will wait for that entity to complete shutdown (or hang on resources for more that entityTerminationTimeout).

shardcake/entities/src/main/scala/com/devsisters/shardcake/internal/EntityManager.scala

Line 166 in 20ca3e5

    
           _        <- ZIO.foreachDiscard(promises)(_.await).timeout(config.entityTerminationTimeout)

Add possibility to broadcast a message to all pods

It would be practical to have the ability to broadcast a message to all pods. It could be a fire and forget messages without an enitityId.

Making multiple pods

Hi,

I'm running a complex example. I didn't understand how to make multiple pods in this example.

Redis server

Make everything that is currently hardcoded configurable

For example, timeouts and other hardcoded durations.

Custom entityID to shardID calculation

First of all, great library!

I've worked on a project where, using Kafka, the entity IDs were not uniformly distributed over the Kafka partitions. This could cause some skew when running close to maximum capacity. In shardcake, using the current algorithm of shardId = abs(entityId.hashCode % numberOfShards) + 1, the situation could arise where the entities are not evenly divided over the shards.

Would it be possible to supply a custom shardID calculation? I'm also looking into a use case where shardId = int(entityId).

[Feature request] Pod metrics

Similar to idea of the PodsHealth trait:
If there was a trait PodsMetrics or similar, which requires a way to define a load factor (maybe define it to be 0.0 <= x <= 1.0) for a Pod.

trait PodsMetrics {
  def loadFactor(podAddress: PodAddress): UIO[Float]
}

By using this load factor, a pod would report how "busy" it is (it could simply report the underlying machine's CPU usage or take into account other metrics).
Such a load factor could then be exposed further and, for example, be used to implement a Kubernetes HPA.
This would enable pods (in both the Kubernetes and Shardcake sense) to be added and removed automatically.

Why GrpcShardingService.layer returns Unit, it breaks type safe

Missing GrpcShardingService.live provide will cause exception at runtime

AWS Discovery

Thanks for open-sourcing shardcake, @ghostdogpr and Devsisters!

I'd like to port a running akka cluster to shardcake, and the largest blocker is the service discovery.

We have an Akka cluster that is running on ECS Fargate instead of Kubernetes so that we're using ECS discovery module provided by Akka Management:

https://doc.akka.io/docs/akka-management/current/discovery/aws.html

Multiple serializers

It proved to be very useful when using Akka to be able to define multiple serializers identified by name as part of the protocol (https://doc.akka.io/docs/akka/current/serialization.html#rolling-updates). We were able to use this to do rolling updates in live system from Kryo with Scala 2.12 to Kryo with Scala 2.13 and eventually an alternative serialization library.

It would be nice to have the same capability in shardcake, for example by providing a Serializer which holds a map of other serializers and defines some header format to identify which one to use.

Question about graceful shutdown

hello, I want to implement graceful shutdown and it seems that we need to use terminateMessage of EntityManager.
I think that once the termination message is offered to queue, then subsequent messages should not be enqueued, so that the function send can return EntityNotManagedByThisPod error after some retry.
I can find out the code setting the state from Left[Queue] to Right[Signal] in terminateEntity, but not in terminateEntities.
Is it intended? or the code should be fixed?

[Feature Request] Proxy only mode

Hi, I wanna run some nodes as proxy only mode. A node running as proxy mode does not host any entities, and can only send messages to entities hosted in other nodes.

devsisters / shardcake Goto Github PK

shardcake's Introduction

Shardcake

shardcake's People

Contributors

Stargazers

Watchers

Forkers

shardcake's Issues

K6 test result

Infrastructure & Configuration about shardcake

Simple Memory Example

Expected Result

Complex redis examples

Expected Result

Question

Recommend Projects

Recommend Topics

Recommend Org