Giter Site home page Giter Site logo

Async Client about aerospike-client-rust HOT 24 OPEN

aerospike avatar aerospike commented on May 20, 2024 1
Async Client

from aerospike-client-rust.

Comments (24)

jonas32 avatar jonas32 commented on May 20, 2024 2

In theory yes, but there is some more into this, for example the TCP sockets. I don't know if there is any abstraction for stuff like that. I only ever saw this in relation to the Tokio or async-std API.
You can also see that in most other database related libs. Most of them explicitly reference Tokio and/or async-std.
I like the way sqlx solved that problem:
https://github.com/launchbadge/sqlx/blob/master/sqlx-rt/src/lib.rs

from aerospike-client-rust.

jonas32 avatar jonas32 commented on May 20, 2024 1

I started to work on the async version...
Porting the current client is a huge pain and causes a lot of overhead. I spent >25 hours, just trying to fix all the problems caused by references and lifetimes and I'm far from getting this to work.
Async rust really does not like the way variables are stored in the client. There are two ways to solve the problem. First one would be by moving the variables instead of referencing, the second one would be smartpointers.
I will go for the first one since this is the way cleaner and more usable way.

This will cause a lot of refactoring in the lower levels of the client and change the API a lot.
I'm going to do this changes anyway, so now the question is should i do that here or should i fork out?

The changes will partly abandon the low memory profile idea of the client. I will have an eye on good memory usage, but this wont be the priority anymore. The memory usage will probably increase, but i don't see the benefit compared to all the trade-offs in adding a lot of boilerplate to keep it as low as possible. Overall performance is probably the better target (the smartpointer way would probably affect this).

In addition, the sync client will also not be a fully independent one, it will basically just wrap the async one and block. This is a common strategy in rust. It will still require a runtime like Tokio, but that is no problem since there is one in place in most of the projects. I underestimated how many basic functions need to be changed to make them run async. There is nearly no function besides msgpack and buffer writing that works without changes.
I managed to get cluster and net more or less running on async, but as soon as i start with commands, it becomes ugly and hacky.

Short overview of what i will do:

  • Crate aerospike-rt -> Runtime abstraction
  • Crate aerospike-core -> Async library
  • Crate aerospike-sync -> Sync wrapper for the core lib
  • Crate aerospike -> The exported API, depending on feature flags

Flags:

  • rt-tokio (default on)
  • rt-async-std (will follow later)
  • rt-actix (will follow later)
  • serialization (default on)

Major changes:

  • Abandon low memory profile focus
  • Timeouts will be removed because they are not supported by async sockets and also not needed in an async env
  • Operations and expressions need a lot of refactoring for async and will switch to moving variables instead of referencing
  • The client API will have a lot of changes to make it more usable (focus on batch reading and operations)
  • The client will probably have buffer caches in a later stage like the golang one has to improve performance when operations are executed multiple times

Since it is closely related to the async runtime, ill try to also implement encryption while building the updated core.

Sorry for skipping the discussion and bringing in facts, but i need progress on this. My last PR is marked ready for review for over 4 months now. If you are ok with this changes, i will do them here with a PR. If you have any comments on this, please share!

from aerospike-client-rust.

khaf avatar khaf commented on May 20, 2024

Sorry for responding so late. I postponed chipping in to study the rust ecosystem a bit, but it seems that it's been months and hasn't materialized!

  1. I believe we should have an async implementation, and then provide a thin compatibility layer for the old sync API on top of it, while deprecating it. I see no point to having sync data API in this day and age.
  2. Here is where I'm not sure. I'd rather have a design with pluggable async backends, but I don't know how viable that is. Any ideas?

from aerospike-client-rust.

jonas32 avatar jonas32 commented on May 20, 2024

I support deprecating the sync API. That makes sense.
I'm also not sure about point 2. Making the client async is already a lot of work, supporting 2 different async platforms means feature flagging the code or splitting it into different crates. Thats a lot of maintaining overhead.
There is probably not much benefit from building this fully modular since async-std and Tokio are the only two really spread runtimes.
I would be fine with both ways, but i would only take care of the base and the Tokio part.
When Tokio is done, we probably got most use-cases covered.
Having the sync and async code base mixed up will end in a mess, so how would you like to separate that?
I would think of either forking to build it clean and maintain both repos for some time or moving one of the APIs to a sub package. "aerospike::sync::Client or aerospike::async::Client" for example.
What do you think about that?

from aerospike-client-rust.

databasedav avatar databasedav commented on May 20, 2024

shouldn't the async client use the abstractions in futures so it's runtime agnostic?

from aerospike-client-rust.

jhecking avatar jhecking commented on May 20, 2024

The sqlx wrapper for Tokio/async-std reminds me of the small abstraction layer we have in the Aerospike C client to support libuv, libev, and libevent for async operations. If it's not clear whether either Tokio or async-std will be the dominant API for async Rust, then it might be worthwhile to support both.

from aerospike-client-rust.

khaf avatar khaf commented on May 20, 2024

The way rust async design and ecosystem has evolved leads me to believe there may not even be just one dominant async library in the future, so it is important for us to design something in the spirit of the original philosophy. At the first glance and on the surface, sqlx implementation seems like a good direction to me as well.

from aerospike-client-rust.

jonas32 avatar jonas32 commented on May 20, 2024

I think we probably can take a lot of stuff from the sqlx abstraction and actix-rt. That means the abstraction will not be a big part of the work.
But the client will need a lot of restructuring.
I would go for this way:
Project structure:

  • Crate aerospike-core-> Everything thats shared between the codebases, like the encoding functions.
  • Crate aerospike-rt -> The internal async abstraction lib
  • Crate aerospike-async -> The async codebase
  • Crate aerospike-sync -> The sync codebase
  • Crate aerospike -> Reexporter for the 2 above, depending on the feature flags
    The async/sync crate could also be a single one, but i would split that for better overview and structure.
    Essentially they will be the same, just that the sync one will be deprecated and not use the rt lib.

Feature flags for actix-rt, tokio and async-std. actix-rt is just another wrapper lib for async-std and Tokio, but its common to only have this one and no explicit Tokio runtime. So most projects also support that one. Its nearly no overhead since the imports are nearly identical to Tokio.

from aerospike-client-rust.

khaf avatar khaf commented on May 20, 2024

I've been working to support the new server v5.6 features into the client, and as part of that - since it would be a breaking change anyway - I was going to remove a few lifetimes. I thought I'd be done in couple of weeks, so it would make sense for you to wait for my changes to get in. In the mean time, maybe we should talk about the changes first so we are on the same page. My changes are focused on removing many of the &str in the library and turning them into String. This will not have much, if any impact on the memory usage.

I did not know you had a PR open, I thought you just had a branch. If it passes the tests, we can get it merged ASAP.

from aerospike-client-rust.

jonas32 avatar jonas32 commented on May 20, 2024

Thanks for the response! Can you give a more exact list on what you are implementing?
While clicking through the base of the Java client, i found many missing parts.
My refactoring is currently mainly in the net and cluster crates. Im far from getting any kind of command to work.
Async rust requires all structs to be Send + Sync + Sized. Now we could do this like its done for the Client, but that unsafe block is a bad practice. The only way to resolve this is removing everything that is not thread safe.

from aerospike-client-rust.

khaf avatar khaf commented on May 20, 2024

I'm trying to implement the partition scans so that the tests pass. The old scan protocol is deprecated, so the client can not run any scan/queries. I also have to implement the new auth protocol.
I have nothing against changing the client to get it working with async, I'd just prefer to do it in phases, so that the async job becomes just about the async instead of a huge changeset that interlocks the whole span of changes.

from aerospike-client-rust.

jonas32 avatar jonas32 commented on May 20, 2024

Totally. I did not know that you started to work on the client again.
I wanted to do a load of side features in the same move, because i would otherwise have to port "deprecated" and outdated code to async, so i can remove it again in a later stage to update it.
New Auth protocol and Rack aware are probably the "deepst" ones in the client.

from aerospike-client-rust.

jonas32 avatar jonas32 commented on May 20, 2024

Please have a look at my #108 PR. If you see anything i missed, please tell me.
Its a stupid 1:1 port from old sync to ne async. No functionality changes.

The only part i dont really understand yet is the threadpool. I would like to get rid of that, since it makes not much sense in async. The only part that prevents me is this "scoped" pools. Can you explain more why the client needs them instead of simple run and die threads?

from aerospike-client-rust.

jhecking avatar jhecking commented on May 20, 2024

The only part that prevents me is this "scoped" pools. Can you explain more why the client needs them instead of simple run and die threads?

ca9940b: Does this help?

from aerospike-client-rust.

jonas32 avatar jonas32 commented on May 20, 2024

Unscoped threads are a good use case for executing queries accross
multiple nodes because the query threads continue running even after the
result set has been passed back to the caller.

Can you explain this further? Why does the client keep them? Is there any technical reason or is it just to skip the overhead of spawning new threads everytime?

However in the case of
the batch read operation, all threads used to fetch records from
different nodes need to complete before the results are passed back to
the caller.

So the point of the scoped ones is basically just to await until all of them are finished before returning anything?

from aerospike-client-rust.

jhecking avatar jhecking commented on May 20, 2024

The main difference between the scoped and unscoped threads is in the lifetimes of the data that you can pass to the thread to process. For unscoped threads, you either have to pass ownership to the thread or you can pass references to data with 'static lifetimes. You can't pass references to any data with non-static lifetimes, since you don't know how long the thread is going to run, so it might outlive any non-static lifetimes. For scoped threads, you can pass references to data with non-static lifetimes, as long as its guaranteed that the lifetime of the data outlives the lifetime of the scope.

We use that for batch reads. All the batch records are held in a single vector. All the batch jobs (one per cluster node) hold a shared reference to this structure, but each has a unique set of offsets its responsible for. (Since each record is owned by exactly one node.) All the batch jobs are executed in parallel. Once all jobs are complete, the same batch records vector is passed back to the caller.

Probably could be done without the scoped threads. You'd have to slice up the batch records into smaller vectors which can be passed to each batch job. Once a job is done it passes ownership back. Then you just have to reassemble all the individual pieces back into a single result vector in the correct order.

I haven't used async/await in Rust yet, so I don't know how this would work in that case.

from aerospike-client-rust.

jonas32 avatar jonas32 commented on May 20, 2024

Im trying to implement that splitting right now.
There is one point that brings in a little complexity. How important is it to keep the order in the return vec?
Right now, the order of batch records is the same that the user gave in. Does the returned one have to keep this order or can it be "unordered"?
Right now, the client will first get the nodes for each batch read and then add the index of the element to the node vec.
Instead, i would like to directly add the batch record instead of its index. That would do the slicing part but then i give up the original ordering.

from aerospike-client-rust.

jhecking avatar jhecking commented on May 20, 2024

I think all Aerospike clients have always kept the order of the results the same, so for consistency the Rust client should probably do the same.

If we decide to break compatibility with other clients, then the batch function should probably just take IntoIter instead of a Vec and return some collection that doesnโ€™t imply order like a HashSet.

from aerospike-client-rust.

jonas32 avatar jonas32 commented on May 20, 2024

We can extend the BatchRead struct with an index field. Then we can rebuild a new vec at the end. That would probably work.
Accepting IntoIter might be a problem because iters are mutable references. I have not yet managed to send such a reference over multiple threads without a lot of overhead and cloning.
However, this will add overhead and slow down the client, since i will have to move around mutable vectors to do that.
If that bloat is fine, ill implement it. Right now its more or less ordered by node.

from aerospike-client-rust.

jhecking avatar jhecking commented on May 20, 2024

I think we should keep the current interface that takes a Vec and returns a Vec with records in the same order. @khaf thoughts?

from aerospike-client-rust.

jonas32 avatar jonas32 commented on May 20, 2024

Did you already have time to look into the async PR?
Whats the status on partition scan and the new auth? We would like to update our infrastructure but the client is preventing that right now.

from aerospike-client-rust.

jonas32 avatar jonas32 commented on May 20, 2024

Sorry for pinging again, but i need at least an answer on the second question because otherwise i will have to do it myself.
I waited for over a month now. This is starting to destroy our time plan. @jhecking @khaf

from aerospike-client-rust.

khaf avatar khaf commented on May 20, 2024

@jonas32 Sorry I was distracted on other projects and @jhecking has very limited time to invest on this project. I'm back at the Rust client full time for the next few weeks. Let me finish a few minor fixes this week and I'll get back to you early next week.

from aerospike-client-rust.

bmuddha avatar bmuddha commented on May 20, 2024

Hello guys, can we expect async client to be released anytime soon? I can see there's a PR open for it, and it seems to be at testing stage

from aerospike-client-rust.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.