viant / bqtail Goto Github PK
View Code? Open in Web Editor NEWBigQuery Google Storage Based Data Loader
License: Apache License 2.0
BigQuery Google Storage Based Data Loader
License: Apache License 2.0
Prerequisites :
Version: latest
Trigger Bucket - multiregional
Other buckets are regional
No limits on max instances of CF
256M on BqTail CF
It can be related to CF autoscaler, because after several hours of run there are no more such errors
Hi all,
currently i am loading schemaless json files with a simple code:
can i implement this flow (or better one) with bqtail?
ps: i don't prefer to use autodetect types feature of bq load jobs, because it looks only first 100 rows and most of the time it is useless for my data.
best regards,
kursat
The higher the test coverage, the better your project will be ranked.
Check this
[Run Go Unit Tests mode easy] (https://medium.com/rungo/unit-testing-made-easy-in-go-25077669318)
Learning Go with Tests
Frequently we are getting error from bq-dispatch scheduler which invokes bq-dispatch cloud function. Because we cannot see verbose logs (even setting logging = TRUE) Google Support cannot handle the errors.
{
"insertId": "nl9h25g10cr1ii",
"jsonPayload": {
"url": "https://europe-west3-******.cloudfunctions.net/BqDispatch-2",
"@type": "type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished",
"status": "INTERNAL",
"targetType": "HTTP",
"jobName": "projects//locations/europe-west3/jobs/BqDispatch-2"
},
"httpRequest": {
"status": 500
},
"resource": {
"type": "cloud_scheduler_job",
"labels": {
"project_id": "",
"location": "europe-west3",
"job_id": "BqDispatch-2"
}
},
"timestamp": "2023-04-01T04:28:58.419380842Z",
"severity": "ERROR",
"logName": "projects/******/logs/cloudscheduler.googleapis.com%2Fexecutions",
"receiveTimestamp": "2023-04-01T04:28:58.419380842Z"
}
I found an error message in the cloud function log also:
fatal error: concurrent map iteration and map write
I try to follow behaviour from codes but i lost.
When I use split
and transform
config for append incoming data across different final tables.
When are the Transform
operations applied? as i understand it, these operations working when inserting to the target table from transient table. So I can't modify transforms per split table right? Also do all tables have to have the same schema?
When most of the Split
operations succeed but one of them failed with some reason, (temporary internal error or permanent error like column type mismatch, or a transform
error) what is the expected behaviour?
When using transient table, what is the usage/role of Dest.Table
?
When:
Prefix: /data/clickstream/
Suffix: .json.gz
Async: true
Dest:
Table: bqtail.dummy
Transient:
Dataset: temp
Transform:
event : lower(event)
userid : cast(replace(userid,'"') as int64)
price : safe_cast(price as float64)
contentid = safe_cast(contentid as int64)
Schema:
Template: clickstream._template_table
Split:
ClusterColumns:
- event
Mapping:
- When: event = 'pageview'
Then: clickstream.pageview
- When: event = 'addtobasket'
Then: clickstream.addtobasket
- When: event not in ('pageview','addtobasket')
Then: clickstream.other_events
OnSuccess:
- Action: delete
We have a string field in json files with %99 int data type.
Want to load it as string into transient table, then using safe_cast transform to persist this fields to final table as int64.
tv072: safe_cast(t.tv072 as int64)
tv050: safe_cast(t.tv050 as float64)
(these fields are string in template table.)
"Errors":[{"Message":"Invalid schema update. Field tv072 has changed type from STRING to INTEGER","Reason":"invalid"}]
I think this situation because of both transient and target table use Schema.Template
table as template. How can we fix this?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.