Giter Site home page Giter Site logo

Comments (7)

kwakwaversal avatar kwakwaversal commented on May 5, 2024 2

I'm reluctant to call it guaranteed delivery, because once it sends it to ES it won't wait for an acknowledgement back.

I think it would make sense for me to handle to the writes to Solr rather use create a webhook in the server. I would need to update Solr FIFO but I read in #33 (comment) that you suggested webhooks that don't return 2XX that a log is written somewhere. That would break the order of the records that update the relevant documents in Solr which isn't ideal.

As a side note, if records that are not successfully forwarded to a webhook aren't easily queried (and optionally extracted) I think this might be an issue. A log is great obviously, but I would like to see some stats if some webhooks fail for whatever reason. Programmatically polling the server for failed webhooks would be helpful, but then you're changing the simplicity of the server. It does sound more and more like you're going to end up having to write some job queue for the webhooks which is non-trivial. Or at the very least guarantee writes to somewhere so that events are not lost.

The payload values are all strings so it might be more universally useful to add a boolean flag like PARSE_PAYLOAD, and if true, this server could parse the payload into the correct types, dropping the columns key altogether.

Parsing the payload into the correct types might be useful, but I wouldn't say it's necessary. A JSON transformer would allow someone to cherry pick properties so that in a table of 30 columns, it would just return 3 and also be able to rename the columns in the process.

storage is cheaper than compute

Therefore, it may not be the best idea if it's only to save space on a database. Any reason you couldn't cleanse the column from the payload after it has landed in ES?

When I mentioned the payload size, I wasn't talking about storing it, I was literally talking about the payload size that's being sent across interfaces. The network traffic between hosts.

If I have a DB server, and put supabase/realtime on there, then having a consuming websocket server on a different host I'm going to get a lot of data being sent between the hosts. It's quite possible that I am throwing away a lot of that data so it would be more performant for me to revert back to sending NOTIFY from triggers. But then I lose some of the flexibility that supabase/realtime was giving me originally by not having to keep changing me database migrations to add/remove columns from the NOTIFY payload. As discussed, the JSON transformers would really shine here.

In case you didn't see this, there is an 8000 byte limit for NOTIFY payloads

Hah, yeah. I ran into this when trying to send emails stored in the database over the notifies. I mean, it makes sense to maintain performance but at the time it was a head scratcher.

I will explore a JSON transformer with one of our team and let you know. It could take some time but I think it's worth considering as part of our work on #47

Great stuff.

Addendum

For anyone else that might stumble across this issue looking for a way of syncing PostgreSQL to ES/Solr, https://debezium.io/ looks like a good alternative. I was hoping not to add too much extra to the stack because this becomes: ES/Solr, Debezium, Zookeeper, Kafka but I guess that's the price you have to pay for guaranteed delivery in this instance.

from realtime.

kiwicopple avatar kiwicopple commented on May 5, 2024 1

I was literally talking about the payload size that's being sent across interfaces. The network traffic between hosts.

Got it, that makes sense.

Debezium

Yes, great system! Definitely heavy if you want full functionality I think it is the only thing on the market right now.

Thanks again for the context. I will need to chat to Francesco about this use-case. He's not active right now, so unfortunately it could take a while to see progress on this one. Nonetheless, here are the actions I have:

  • Investigate the use of a queuing system, or some other guaranteed delivery
  • explore a JSON transformer

from realtime.

kiwicopple avatar kiwicopple commented on May 5, 2024

Hey @kwakwaversal thanks for the amazing write up.

the checkboxes in the README for this repo suggests you're already in Beta

Oops. Thanks for the heads up! We are standardising release statuses across the whole org. I shifted it back to alpha - I think that it's a fairer representation of the state of the server, but also Supabase as a whole. We will move into Beta once we have some benchmarks and, more importantly, we will hire someone who can support this repo full time (we're a very small team right now)

update or insert records to ES/Solr. Guaranteed delivery is obviously preferred for this.

Very cool. This one will be doable with the webhooks (#33). Once I iron out the issue with reusing replication slots (#22) then it will have guaranteed read from Postgres and send. I'm reluctant to call it guaranteed delivery, because once it sends it to ES it won't wait for an acknowledgement back. If the payload fails to reach ES (for whatever reason), the change event is lost too. This might be something we work on in the future, but it will be a major development (since it requires persistence).

the payload size including the types object, while useful, really adds up over thousands of records (even more if you're receiving the OLD record)

Interesting - also one that I hadn't thought of. I'll dig into the idea of a JSON transformer. The payload values are all strings so it might be more universally useful to add a boolean flag like PARSE_PAYLOAD, and if true, this server could parse the payload into the correct types, dropping the columns key altogether. This will have a an impact on performance, so it's one that we will have to weigh up. You would also lose a lot of detail (is it a smallint or bigint?), but in many cases that won't matter.

My initial thoughts on this:

  • storage is cheaper than compute
  • performance is important

Therefore, it may not be the best idea if it's only to save space on a database. Any reason you couldn't cleanse the column from the payload after it has landed in ES?

I'm assuming the preferred approach to reduce the load on the server and the streaming output from PostrgreSQL is to only create a publication for a subset of tables

Yes, that's correct. If there are tables you don't care about streaming, just don't enable the replication. Let me know if that's a problem for your use-case!

Subscribing to tables instead of creating lots of NOTIFY triggers (as we currently do in production).

In case you didn't see this, there is an 8000 byte limit for NOTIFY payloads, so be careful that your system isn't silently dropping the events. Postgres actually raises an error but it's hard to catch. This is the exact reason why we created Realtime last year - I was using trigger/NOTIFY and discovered this the hard way ..

Next steps

I will explore a JSON transformer with one of our team and let you know. It could take some time but I think it's worth considering as part of our work on #47

from realtime.

rrjanbiah avatar rrjanbiah commented on May 5, 2024

@kwakwaversal

Addendum

For anyone else that might stumble across this issue looking for a way of syncing PostgreSQL to ES/Solr, https://debezium.io/ looks like a good alternative. I was hoping not to add too much extra to the stack because this becomes: ES/Solr, Debezium, Zookeeper, Kafka but I guess that's the price you have to pay for guaranteed delivery in this instance.

I was looking for a debezium alternative (to avoid JVM stack) and stumbled upon this issue. Are you able to succeed with the attempt to use supabase for Elastic search sync? In my case, I have found Elastic search alternative MeiliSearch (crosslinking meilisearch/integration-guides#20)

On other note, for JSON transforming, I think these libraries are better and popular: https://github.com/jmespath/jmespath.js, https://github.com/jsonata-js/jsonata or https://github.com/wankdanker/node-object-mapper (FWIW, https://www.npmtrends.com/json-query-vs-jsonata-vs-JSONPath-vs-jsonpath-vs-jsonpath-plus-vs-jmespath-vs-object-mapper )

from realtime.

kiwicopple avatar kiwicopple commented on May 5, 2024

Thanks for the parsers @rrjanbiah - we might need to find something elixir-native, but these are good prior art to work from.

Our current discussion is to update the event parser to be more inline with AWS's Simple Work Flows. eg: https://states-language.net/#choice-state. This is a full state machine, which should make it this a very versatile server. It's a large task though, so we'll have to spend time figuring out if it's possible and if we can ship it in parts

from realtime.

rrjanbiah avatar rrjanbiah commented on May 5, 2024

@kiwicopple

Thanks for the parsers @rrjanbiah - we might need to find something elixir-native, but these are good prior art to work from.

Looks not maintained, but I found this one https://github.com/stephan83/ex-jmes

Our current discussion is to update the event parser to be more inline with AWS's Simple Work Flows. eg: https://states-language.net/#choice-state. This is a full state machine, which should make it this a very versatile server. It's a large task though, so we'll have to spend time figuring out if it's possible and if we can ship it in parts

Not quite understand here... but since, you're referring to the state machines, you may want to check https://github.com/davidkpiano/xstate as that's quite popular

from realtime.

chasers avatar chasers commented on May 5, 2024

This is a great idea and with Realtime v2 we have an extensions concept where we could additionally insert your realtime feed into another database.

from realtime.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.