Giter Site home page Giter Site logo

Comments (5)

adranwit avatar adranwit commented on September 26, 2024 1

Add #bqtail slack channel on gophers.slack.com

from bqtail.

adranwit avatar adranwit commented on September 26, 2024
  1. Transform is applied when copying data from a transient table it is simply SELECT FROM temp table enriched with transform and side input expression. Transformed is shared between the split, so you can not have a dedicated transform per split. But you can always use a CASE expression to account for variation, between destination table.
    See example execution plan for split (expect.json) in https://github.com/viant/bqtail/tree/master/stage/load/test/008_table_split

  2. The way BqTail work it runs each BigQuery Job async, which mean if one fails, the things can happen if failure is classified as recoverable:(503, connection reset by per), cloud function returns errors and will be automatically retried (BeTail CF has retry set), if however you get an internal server error, the whole ingestion process will be restarted.
    If ingestion file on success delete action never run thus affected data files stay in bqtail trigger bucket.
    When any ingestion process fails, it will be listed as stalled by BqMonitor (after the grace period),
    in that case you can rectify the issue and replay a stalled datafile or a process.

All that said for non-recoverable error you may end up with duplication so that has to be taken care downstream.

  1. Dest.Table in split has no role to play so it can be the same as a template.

from bqtail.

ktopcuoglu avatar ktopcuoglu commented on September 26, 2024
  1. Transform is applied when copying data from a transient table it is simply SELECT FROM temp table enriched with transform and side input expression. Transformed is shared between the split, so you can not have a dedicated transform per split. But you can always use a CASE expression to account for variation, between destination table.
    See example execution plan for split (expect.json) in https://github.com/viant/bqtail/tree/master/stage/load/test/008_table_split

Wonderful, I didnt pay attention to expect files before.

  1. The way BqTail work it runs each BigQuery Job async, which mean if one fails, the things can happen if failure is classified as recoverable:(503, connection reset by per), cloud function returns errors and will be automatically retried (BeTail CF has retry set), if however you get an internal server error, the whole ingestion process will be restarted.
    If ingestion file on success delete action never run thus affected data files stay in bqtail trigger bucket.
    When any ingestion process fails, it will be listed as stalled by BqMonitor (after the grace period),
    in that case you can rectify the issue and replay a stalled datafile or a process.

All that said for non-recoverable error you may end up with duplication so that has to be taken care downstream.

Can we add EventID as a new column to splits?
So it will be possible to remove inserted rows from stalled job for avoiding duplication.
(it may be require scan whole table but it is not concern at flat-rate plan)

And as I understand from expect.json split's inserts are runs sequentially right?
When we set async=true, will they all start at the same time? (after transient table loaded.)

from bqtail.

adranwit avatar adranwit commented on September 26, 2024

Yes you can add eventID or any other column with $EventID expression
and run extra deduplication in the split clause, EvenID should stick to the original process, consider adding EventID as cluster filed to manage performance/cost.

expect.json is an example of an ingestion process execution plan for rule and test process,
Async means that on success or on failure section runs only if the proceeding task has been completed with error or success.
BqTail cloud function never waits in async mode fo BigQuery Job completion, it just submits BQ jobs and quits.
BqDispatcher actively check for completed BigQuery jobs to notify BqTail process with post action.
Post action can be nested without limits.

from bqtail.

adranwit avatar adranwit commented on September 26, 2024

I think Dest.Table can be used as template for split dest tables.

from bqtail.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.