Giter Site home page Giter Site logo

bqtail's People

Contributors

adranwit avatar shovelwagon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bqtail's Issues

Q: How do you handle split operations in case of failure

I try to follow behaviour from codes but i lost.

When I use split and transform config for append incoming data across different final tables.

  1. When are the Transform operations applied? as i understand it, these operations working when inserting to the target table from transient table. So I can't modify transforms per split table right? Also do all tables have to have the same schema?

  2. When most of the Split operations succeed but one of them failed with some reason, (temporary internal error or permanent error like column type mismatch, or a transform error) what is the expected behaviour?

  3. When using transient table, what is the usage/role of Dest.Table ?

When:
  Prefix: /data/clickstream/
  Suffix: .json.gz
Async: true
Dest:
  Table: bqtail.dummy
  Transient:
    Dataset: temp
  Transform:
    event : lower(event)
    userid : cast(replace(userid,'"') as int64)
    price : safe_cast(price as float64)
    contentid = safe_cast(contentid as int64)
  Schema:
    Template: clickstream._template_table
    Split:
      ClusterColumns:
        - event
      Mapping:
        - When: event = 'pageview'
          Then: clickstream.pageview
        - When: event = 'addtobasket'
          Then: clickstream.addtobasket
        - When: event not in ('pageview','addtobasket')
          Then: clickstream.other_events
OnSuccess:
  - Action: delete

Q: How can i handle schema-less json loads

Hi all,
currently i am loading schemaless json files with a simple code:

  • try load json files.
  • parse missing columns from error (if any), add them with string type.
  • retry load job (until there is no missing column)

can i implement this flow (or better one) with bqtail?

ps: i don't prefer to use autodetect types feature of bq load jobs, because it looks only first 100 rows and most of the time it is useless for my data.

best regards,
kursat

httpRequest: { status: 500 }

Frequently we are getting error from bq-dispatch scheduler which invokes bq-dispatch cloud function. Because we cannot see verbose logs (even setting logging = TRUE) Google Support cannot handle the errors.

{
"insertId": "nl9h25g10cr1ii",
"jsonPayload": {
"url": "https://europe-west3-******.cloudfunctions.net/BqDispatch-2",
"@type": "type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished",
"status": "INTERNAL",
"targetType": "HTTP",
"jobName": "projects//locations/europe-west3/jobs/BqDispatch-2"
},
"httpRequest": {
"status": 500
},
"resource": {
"type": "cloud_scheduler_job",
"labels": {
"project_id": "
",
"location": "europe-west3",
"job_id": "BqDispatch-2"
}
},
"timestamp": "2023-04-01T04:28:58.419380842Z",
"severity": "ERROR",
"logName": "projects/******/logs/cloudscheduler.googleapis.com%2Fexecutions",
"receiveTimestamp": "2023-04-01T04:28:58.419380842Z"
}

I found an error message in the cloud function log also:

fatal error: concurrent map iteration and map write

Sometimes BQ load jobs fail on File Not Found error

Prerequisites :
Version: latest
Trigger Bucket - multiregional
Other buckets are regional
No limits on max instances of CF
256M on BqTail CF

  1. After checking the job runs I saw that there were 2 jobs on same batch, one of them succeed and the second which run right after the first one, failed
  2. The first batch has been moved by dispatcher with extra 9 sec delay
  3. 23 second delays between upload and bqtail event trigger

It can be related to CF autoscaler, because after several hours of run there are no more such errors

Cant change datatype with transforms

We have a string field in json files with %99 int data type.
Want to load it as string into transient table, then using safe_cast transform to persist this fields to final table as int64.

tv072: safe_cast(t.tv072 as int64)
tv050: safe_cast(t.tv050 as float64)

(these fields are string in template table.)

"Errors":[{"Message":"Invalid schema update. Field tv072 has changed type from STRING to INTEGER","Reason":"invalid"}]

I think this situation because of both transient and target table use Schema.Template table as template. How can we fix this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.