Comments (5)
Add #bqtail slack channel on gophers.slack.com
from bqtail.
-
Transform is applied when copying data from a transient table it is simply SELECT FROM temp table enriched with transform and side input expression. Transformed is shared between the split, so you can not have a dedicated transform per split. But you can always use a CASE expression to account for variation, between destination table.
See example execution plan for split (expect.json) in https://github.com/viant/bqtail/tree/master/stage/load/test/008_table_split -
The way BqTail work it runs each BigQuery Job async, which mean if one fails, the things can happen if failure is classified as recoverable:(503, connection reset by per), cloud function returns errors and will be automatically retried (BeTail CF has retry set), if however you get an internal server error, the whole ingestion process will be restarted.
If ingestion file on success delete action never run thus affected data files stay in bqtail trigger bucket.
When any ingestion process fails, it will be listed as stalled by BqMonitor (after the grace period),
in that case you can rectify the issue and replay a stalled datafile or a process.
All that said for non-recoverable error you may end up with duplication so that has to be taken care downstream.
- Dest.Table in split has no role to play so it can be the same as a template.
from bqtail.
- Transform is applied when copying data from a transient table it is simply SELECT FROM temp table enriched with transform and side input expression. Transformed is shared between the split, so you can not have a dedicated transform per split. But you can always use a CASE expression to account for variation, between destination table.
See example execution plan for split (expect.json) in https://github.com/viant/bqtail/tree/master/stage/load/test/008_table_split
Wonderful, I didnt pay attention to expect files before.
- The way BqTail work it runs each BigQuery Job async, which mean if one fails, the things can happen if failure is classified as recoverable:(503, connection reset by per), cloud function returns errors and will be automatically retried (BeTail CF has retry set), if however you get an internal server error, the whole ingestion process will be restarted.
If ingestion file on success delete action never run thus affected data files stay in bqtail trigger bucket.
When any ingestion process fails, it will be listed as stalled by BqMonitor (after the grace period),
in that case you can rectify the issue and replay a stalled datafile or a process.All that said for non-recoverable error you may end up with duplication so that has to be taken care downstream.
Can we add EventID
as a new column to splits?
So it will be possible to remove inserted rows from stalled
job for avoiding duplication.
(it may be require scan whole table but it is not concern at flat-rate plan)
And as I understand from expect.json
split's inserts are runs sequentially right?
When we set async=true, will they all start at the same time? (after transient table loaded.)
from bqtail.
Yes you can add eventID or any other column with $EventID expression
and run extra deduplication in the split clause, EvenID should stick to the original process, consider adding EventID as cluster filed to manage performance/cost.
expect.json is an example of an ingestion process execution plan for rule and test process,
Async means that on success or on failure section runs only if the proceeding task has been completed with error or success.
BqTail cloud function never waits in async mode fo BigQuery Job completion, it just submits BQ jobs and quits.
BqDispatcher actively check for completed BigQuery jobs to notify BqTail process with post action.
Post action can be nested without limits.
from bqtail.
I think Dest.Table can be used as template for split dest tables.
from bqtail.
Related Issues (6)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bqtail.