Giter Site home page Giter Site logo

snowplow's Introduction

Snowplow sessionization

โ›”๐Ÿš๏ธ As of January 2023, this package is obsolete and no longer developed. We strongly recommend that you instead use the snowplow/snowplow_web package, maintained by the team at Snowplow.


This dbt package:

  • Rolls up page_view and page_ping events into page views and sessions
  • Performs "user stitching" to tie all historical events associated with an anonymous cookie (domain_userid) to the same user_id

Adapted from Snowplow's web model.

Models

The primary ouputs of this package are page views and sessions. There are several intermediate models used to create these two models.

model description
snowplow_page_views Contains a list of pageviews with scroll depth, view timing, and optionally useragent and performance data.
snowplow_sessions Contains a rollup of page views indexed by cookie id (domain_sessionid)

snowplow graph

Prerequisites

This package takes the Snowplow JavaScript tracker as its foundation. It assumes that all Snowplow events are sent with a web_page context.

Mobile

It is possible to sessionize mobile (app) events by including two predefined contexts with all events:

As long as all events are associated with an anonymous user, a session, and a screen/page view, they can be made to fit the same canonical data model as web events fired from the JavaScript tracker. Whether this is the desired outcome will vary significantly; mobile-first analytics often makes different assumptions about user identity, engagement, referral, and inactivity cutoffs.

For specific implementation details:

Installation Instructions

Check dbt Hub for the latest installation instructions, or read the docs for more information on installing packages.

Configuration

The variables needed to configure this package are as follows:

variable information required
snowplow:timezone Timezone in which analysis takes place. Used to calculate local times. No
snowplow:page_ping_frequency Configured timeout for page pings in tracker (seconds). Default=30 No
snowplow:events Schema and table containing all snowplow events Yes
snowplow:context:web_page Schema and table for web page context Yes
snowplow:context:performance_timing Schema and table for perf timing context, or false if none is present Yes
snowplow:context:useragent Schema and table for useragent context, or false if none is available Yes
snowplow:pass_through_columns Additional columns for inclusion in final models No
snowplow:page_view_lookback_days Amount of days to rescan to merge page_views in the same session Yes

An example dbt_project.yml configuration:

# dbt_project.yml

...

vars:
  'snowplow:timezone': 'America/New_York'
  'snowplow:page_ping_frequency': 10
  'snowplow:events': "{{ ref('sp_base_events') }}"
  'snowplow:context:web_page': "{{ ref('sp_base_web_page_context') }}"
  'snowplow:context:performance_timing': false
  'snowplow:context:useragent': false
  'snowplow:pass_through_columns': []
  'snowplow:page_view_lookback_days': 1

Database support

Core:

  • Redshift
  • Snowflake
  • BigQuery
  • Postgres

Plugins:

Contributions

Additional contributions to this package are very welcome! Please create issues or open PRs against master. Check out this post on the best workflow for contributing to a package..

Much of tracking can be the Wild West. Snowplow's canonical event model is a major asset in our ability to perform consistent analysis atop predictably structured data, but any detailed implementation is bound to diverge.

To that end, we aim to keep this package rooted in a garden-variety Snowplow web deployment. All PRs should seek to add or improve functionality that is contained within a plurality of Snowplow deployments.

If you need to change implementation-specific details, you have two avenues:

  • Override models from this package with versions that feature your custom logic. Create a model with the same name locally (e.g. snowplow_id_map) and disable the snowplow package's version in dbt_project.yml:
snowplow:
    ...
    identification:
      default:
        snowplow_id_map:
          +enabled: false
  • Fork this repository :)

snowplow's People

Contributors

axelazaid avatar clrcrl avatar dbeatty10 avatar drewbanin avatar elyse-weiss avatar emilieschario avatar epapineau avatar ericalouie avatar fred-shaykis avatar joellabes avatar jtcohen6 avatar jthandy avatar jwerderits avatar sanjanasen96 avatar selvaticus avatar sphinks avatar tayloramurphy avatar vlambertgrove avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

snowplow's Issues

Syntax Error in BigQuery snowplow_page_views

When variable snowplow:pass_through_columns is empty, the Jinja strippers at line 170 - 174 in /macros/adapters/bigquery/pageviews/snowplow_page_views.sql remove too many whitespaces, resulting in the following compiled code:

  struct(
        br_renderengine as browser_engine,
        dvce_type as type,
        dvce_ismobile as is_mobile
    ) as devicefrom events
  where event = 'page_view'

Removing one of the strippers will yield the correct syntax:

  struct(
        br_renderengine as browser_engine,
        dvce_type as type,
        dvce_ismobile as is_mobile
    ) as device

  from events
  where event = 'page_view'

Performance issue with model snowplow_page_views

I belive in v 0.7.0 was introduce code that dramatically slow down building of model snowplow_page_views. Merge: 91e42ef
Line of code 45-57: https://github.com/fishtown-analytics/snowplow/blob/91e42efaef97a5f52ba9965155b3d50ac6d45f6b/macros/adapters/default/page_views/snowplow_page_views.sql#L45
It was commented as

-- we need to grab all events for any session that has appeared
-- in order to correctly process the session index below

We started collecting snowplow data about 8 month ago, and currently in snowplow_web_events there are about 73M of record. Looks like join to that table just as is will be painful for any mature projects. I suppose this part of code could be optimized (e.g. taking events for last n days). As far as I know sessions in snowplow by default lasts 30 minutes. So in 99% of time checking events from last several days will be OK. Am I correct follow the idea of introducing that code?

Note undesired behavior from setting enabled: true in root project

From dbt Slack:

A user had the following config in dbt_project.yml:

models:
 ...
 snowplow:
   enabled: true

Setting enabled: true in the root project overrides the dynamic, target-dependent enabled values set from within the snowplow package. Removing that line made this magically work again.

I don't think there's anything we could or should do to "lock" those enabled values within a package, since it's expected (and good) behavior that the user can override locally. What I will do is document that setting a top-level enabled config can have surprising results.

Reminder to self to add a note to the README :)

Use enabled config to implement cross-database support

Right now, Snowplow models are encoded as macros in order to implement cross-database support. Instead, we should experiment with making models specifically for each support adapter. I think broadly, this falls into:

  • default (snowflake/redshift/postgres)
  • bigquery

The models directory might look like:

models:
 - common/
 - default/
 - bigquery/

Every model would need to have a config like:

{{ config(enabled=(var('adapter') == 'default')) }}
or
{{ config(enabled=(var('adapter') == 'bigquery')) }}

I don't think we can configure this at a folder level in dbt_project.yml, but it might be worth exploring if there's some clever way to implement that.

The results of this experimentation might result is us wanting to keep the adapter macro pattern, but I'm interested in exploring alternative ways of supporting cross-db model implementations in packages!

Deduplication in snowplow_web_page_context drops all events that have a duplicate

If I don't get it wrong, the deduplication implemented here
https://github.com/fishtown-analytics/snowplow/blob/3795d06f365213ca4930d2447bd1580cb7031557/models/page_views/default/snowplow_web_page_context.sql#L43
drops all events that have a duplicated event_id (named root_id there), instead of keeping only the first one of those. Seems strange to me, is there a reason?

Also, for bigquery it would be quite easy to implement an incremental model for this since all the timestamps are there, right? Should I try to submit a PR?

Check br_family field for some value is killing lot of results with NULL value

I have faced an issue with br_family NULL value in Redsfhit. There is a condition in snowplow_page_views.sql at line 307 where a.br_family != 'Robot/Spider'. In case there is a NULL value in br_family, such condition will not output that rows, cause NULL can not be equal or not equal to anything. As result, lot of pageviews with null value in br_family are totaly lost in result tables of DBT model.

Package tidy up

Copied from dbt-labs/segment#42

Repo checklist:

  • A license
  • Issue and PR templates (as of #79)
  • A codeowner (as of #79)
  • The hub URL listed on the repo
  • If relevant, a note in the README saying this package is not actively supported
  • Installation instructions, including:
    • Link to packages docs
    • Variables that need to be included in the dbt_project.yml file
    • Listed database support
  • A link to our package contribution guide (as of #79)
  • A require-dbt-version parameter in the dbt_project.yml
  • Models are documented
  • Macros are documented (as of #79)
  • Integration tests
  • Move model logic intro model files, rather than entirely in adapter macros
  • A Zap to the #package-releases channel in dbt Slack

"Unrecognized name: root_id" and "No matching signature for function TIMESTAMP"

Hi, I'm using the GCP Snowplow setup. This puts events into a single BigQuery table, with this schema: https://gist.github.com/wub/9d4b9d27f15f8e7c87906e55cb913f68

When I run with this configuration in dbt 0.14.0:

models:
    snowplow:
        vars:
            'snowplow:timezone': 'Pacific/Auckland'
            'snowplow:page_ping_frequency': 10
            'snowplow:events': "dataset_name.events"
            'snowplow:context:web_page': "dataset_name.events"
            'snowplow:context:performance_timing': false
            'snowplow:context:useragent': false
            'snowplow:pass_through_columns': []
    base:
      optional:
        enabled: false
    page_views:
      optional:
        enabled: false

I get this error:

Found 7 models, 22 tests, 0 snapshots, 0 analyses, 163 macros, 0 operations, 1 seed files, 0 sources

21:34:12 | Concurrency: 2 threads (target='dev')
21:34:12 |
21:34:13 | 1 of 5 START table model dataset_name.snowplow_web_page_context [RUN]
21:34:13 | 2 of 5 START incremental model dataset_name.snowplow_id_map.... [RUN]
21:34:17 | 2 of 5 ERROR creating incremental model dataset_name.snowplow_id_map [ERROR in 2.85s]
21:34:19 | 1 of 5 ERROR creating table model dataset_name.snowplow_web_page_context [ERROR in 3.23s]
21:34:19 | 3 of 5 SKIP relation dataset_name.snowplow_page_views.......... [SKIP]
21:34:19 | 4 of 5 SKIP relation dataset_name.snowplow_sessions_tmp........ [SKIP]
21:34:19 | 5 of 5 SKIP relation dataset_name.snowplow_sessions............ [SKIP]
21:34:19 |
21:34:19 | Finished running 3 incremental models, 2 table models in 10.23s.

Completed with 2 errors:

Database Error in model snowplow_id_map (models\identification\snowplow_id_map.sql)
  No matching signature for function TIMESTAMP for argument types: TIMESTAMP. Supported signatures: TIMESTAMP(STRING, [STRING]); TIMESTAMP(DATE, [STRING]); TIMESTAMP(DATETIME, [STRING]) at [101:13]
  compiled SQL at target\compiled\snowplow\identification\snowplow_id_map.sql

Database Error in model snowplow_web_page_context (models\page_views\snowplow_web_page_context.sql)
  Unrecognized name: root_id at [35:9]
  compiled SQL at target\compiled\snowplow\page_views\snowplow_web_page_context.sql

Done. PASS=0 ERROR=2 SKIP=3 TOTAL=5
  • I understand that it's probably dodgy to simply set "dataset_name.events" as the context table
  • Is this related to #52 or #41?
  • Should I override the given models to try and make this work, or am I simply running it wrong?

Any info/direction would be greatly appreciated - thanks!

Fails to exclude page views from pingdom

I have encountered the case where page hits from pindgom are not behind excluded.

In this particular case the useragent from pindgom comes like this:
Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/534.34 (KHTML, like Gecko) PingdomTMS/0.8.5 Safari/534.34

The regex is only setup to pick up pingdom and PingdomBot

Compilation Error in model snowplow_web_ua_parser_context (models/page_views/optional/snowplow_web_ua_parser_context.sql) at path ['enabled']: None is not of type 'boolean' Code: 10004

When trying to install snowplow using the following packages.yml file:

packages:

  • package: fishtown-analytics/snowplow
    version: 0.12.0

this error is generated:

Installing fishtown-analytics/[email protected]
Installed from version 0.12.0

Installing fishtown-analytics/[email protected]
Installed from version 0.6.6

Compilation Error in model snowplow_web_ua_parser_context (models/page_views/optional/snowplow_web_ua_parser_context.sql)
at path ['enabled']: None is not of type 'boolean' Code: 10004

Incorrect mapping for user agent and performance timing contexts

I have faced a problem with snowplow model in DBT:

Database Error in model snowplow_web_timing_context (models/page_views/optional/snowplow_web_timing_context.sql)
  column "event_id" specified in USING clause does not exist in left table
  compiled SQL at target/compiled/snowplow/page_views/optional/snowplow_web_timing_context.sql

I'm using for timing context just standart context: https://github.com/snowplow/snowplow/wiki/1-General-parameters-for-the-Javascript-tracker#performanceTiming with schema https://github.com/snowplow/iglu-central/blob/master/sql/org.w3/performance_timing_1.sql
I suspect there is a bug of using event_id as key for join, because in schema used root_id as reference to event. Also, here I see an example of processing timing: https://discourse.snowplowanalytics.com/t/measuring-page-load-times-with-the-performance-timing-context-tutorial/100 and it is done with root_id.

There is one more issue: not existing field page_view_id, I suspect that not optional views were amended accroding to new field name: https://github.com/fishtown-analytics/snowplow/blob/274c7969d61a552cdc996d00801d7f43cc9b6c03/macros/adapters/default/page_views/snowplow_web_page_context.sql#L35 In optional view it was not changed, so they still use page_view_id, not just id.

Add Spark support

I could have a WIP PR for this very quickly. The changes to the default implementations are actually very simple:

  • Use cast(column as datatype) instead of column::datatype
  • Implement spark__convert_timezone and spark__dateadd
  • Call snowplow.dateadd, which then calls dbt_utils.dateadd or spark__dateadd` as appropriate

Thoughts:

  • Should Snowplow encode the logic for spark__dateadd, or pull it from (e.g.) spark-utils? I wouldn't want that dependency, however, in the 99% of cases when people aren't running this package on Spark. This will all be easier once we reimplement adapter macros.
  • It's trivial to implement this for Delta, which supports merging on a unique key, but what about standard Spark (insert_overwrite) incrementals? Do we need to add a truncated date column to these models, to pass to partition_by? I don't think we can partition on a column expression

Snowflake web page context is unnecessarily separated and is not incremental

Describe the feature

The default models assume that the web_page_context data will be landed in a separate table to the raw events. On Snowflake that is not the case; the data is inside a variant field. Could some models for Snowflake be skipped or made unnecessary by just flattening out the field in the source events something like CONTEXTS_COM_SNOWPLOWANALYTICS_SNOWPLOW_WEB_PAGE_1[0]['id'] as PAGE_VIEW_ID ? Has that been considered?

The snowplow_web_page_context model contains this comment:

-- This one is a little tougher to make incremental
-- because there's no timestamp field here. We could
-- relocate the event collector_tstamp (by root_id)
-- onto snowplow_base_web_page_context, but that would
-- likely mitigate any performance gains!

I have found it necessary anyway to dedupe the data in my raw events table, because if I don't then the snowplow_sessions_tmp model fails on merge.

Describe alternatives you've considered

To achieve that deduping in my raw events, this was recommended ( by helpful people on the slack channel ):

QUALIFY
  row_number() over (partition by EVENT_ID order by COLLECTOR_TSTAMP, DERIVED_TSTAMP) = 1;

That works but is incredibly slow so I have had to implement it as incremental, using the collector_tstamp as a watermark.

My concern is with the web_page_context model being separated from the events, not incremental, only to be later recombined into a number of other models.

If you think there is merit to this I can look at trying to put in Snowflake specific changes to optimise the model flow for that system. Otherwise if this has already been considered and ruled out could you let me know why it might be a bad idea?

Additional context

This is Snowflake specific.

Who will this benefit?

Those with huge amounts of evergrowing events from Snowplow pipelines.

snowflake parity

To Do:

  • investigate field definitions for useragents and performance_timing contexts
  • change field definition in snowplow_web_events for min_tstamp and max_tstamp to use timestamp_ntz
  • create a dbt_util macro for similar to (RS) vs. rlike (Snowflake)
  • remove page_view, users, sessions time dimensions

Map inferred_user_id in snowplow_page_views

Rationale

The Snowplow package has two first-class outputs: snowplow_sessions and snowplow_page_views. Both should include the stitched user identity, inferred_user_id, that is the product of snowplow_id_map.

We frequently find ourselves building reports on top of both tables, and we should be able to perform counts of distinct visitors that agree between them.

When are the todos in bigquery going to be implemented? Columns missing

Describe the bug

The final output tables for BigQuery are missing useragent and pref_timing columns. See the view for snowplow_page_views.sql

Screenshots and log output

System information

The contents of your packages.yml file:

packages:
  - package: dbt-labs/dbt_utils
    version: ["1.0.0"]
  - package: calogica/dbt_expectations
    version: [">=0.5.0", "<0.9.0"]
  - package: dbt-labs/snowplow
     version: ["0.15.0"]

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

Core:
  - installed: 1.3.2
  - latest:    1.4.0 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - bigquery: 1.3.0 - Up to date!

The operating system you're using:

The output of python --version:

Testing for late-arriving facts

Include in package integration tests event_update:

  • new event, same session_id as earlier event
  • new event, same page_view_id as earlier event (?)
  • same event_id as earlier event (?)

Our approach to these edge cases has been in flux over the past few releases. I think we have a good grasp of what desired behavior is as of 0.7.1. Let's include these cases in our integration tests to ensure the modeling operates as expected.

Out of range exception

Due to code regarding timing: https://github.com/fishtown-analytics/snowplow/blob/692938e572e3eca5cb019d2adf877f0d1b7f0147/macros/adapters/default/page_views/snowplow_page_views.sql#L272
there is a possible issue. It is casted to integer, however in source table
https://github.com/fishtown-analytics/snowplow/blob/692938e572e3eca5cb019d2adf877f0d1b7f0147/models/page_views/optional/snowplow_web_timing_context.sql#L110
There is a calculation with bigint values, so resulting value is also bigint. I get such error:

Value out of range for 4 bytes.
DETAIL:
-----------------------------------------------
error: Value out of range for 4 bytes.
code: 8001
context: Input:2311553464.
query: 1121021
location: funcs_int.hpp:97
process: query0_106_1121021 [pid=12067]
-----------------------------------------------
compiled SQL at target/compiled/snowplow/page_views/snowplow_page_views.sql`

I figured out that it is about field dom_loading_to_interactive_time_in_ms, but it can happens with any field.

I'm going to prepare PR regarding fixing casting to int. The problem is that table is incremental, so where should I place some migration script from int based table to bigint table?

May be it is incorrect way of filtering duplicates?

I'm not sure, but may be it is not a correct way for filtering duplicates:
https://github.com/fishtown-analytics/snowplow/blob/274c7969d61a552cdc996d00801d7f43cc9b6c03/models/page_views/optional/snowplow_web_ua_parser_context.sql#L43
Searching for several rows with same id:

duplicated as (
select
    id

from prep

group by 1
having count(*) > 1

)

Next step is to exclude them all:
where page_view_id not in (select id from duplicated)
Let's assume that we have 3 rows with the same id, so we will deny all of them, but isn't it to be correct way to filter 2 of them and 1 use in result set?

NoneType has no len() error.

Describe the bug

I'm getting a cryptic error message after removing a module.: TypeError: object of type 'NoneType' has no len()

Steps To Reproduce

dbt clean
dbt run
(I'm sure the problem is something to do with my project.yml but i'm not sure what exactly is throwing the error.... will report back when i can.)

Expected behavior

An error message that will help me fix my configs/project.yml

Screenshots and log output

2021-01-15 11:19:42.774812 (MainThread): Traceback (most recent call last):
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/dbt/main.py", line 124, in main
results, succeeded = handle_and_check(args)
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/dbt/main.py", line 202, in handle_and_check
task, res = run_from_args(parsed)
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/dbt/main.py", line 255, in run_from_args
results = task.run()
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/dbt/task/runnable.py", line 399, in run
self._runtime_initialize()
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/dbt/task/runnable.py", line 118, in _runtime_initialize
super()._runtime_initialize()
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/dbt/task/runnable.py", line 78, in _runtime_initialize
self.load_manifest()
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/dbt/task/runnable.py", line 65, in load_manifest
self.manifest = get_full_manifest(self.config)
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/dbt/perf_utils.py", line 31, in get_full_manifest
adapter.connections.set_query_header,
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/dbt/parser/manifest.py", line 678, in load_manifest
return ManifestLoader.load_all(config, macro_manifest, macro_hook)
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/dbt/parser/manifest.py", line 348, in load_all
loader.load(macro_manifest=macro_manifest)
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/dbt/parser/manifest.py", line 204, in load
self.parse_project(project, macro_manifest, old_results)
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/dbt/parser/manifest.py", line 177, in parse_project
self.parse_with_cache(path, parser, old_results)
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/dbt/parser/manifest.py", line 133, in parse_with_cache
parser.parse_file(block)
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/dbt/parser/base.py", line 429, in parse_file
self.parse_node(file_block)
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/dbt/parser/base.py", line 402, in parse_node
self.render_update(node, config)
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/dbt/parser/base.py", line 377, in render_update
self.render_with_context(node, config)
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/dbt/parser/base.py", line 289, in render_with_context
parsed_node.raw_sql, context, parsed_node, capture_macros=True
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/dbt/clients/jinja.py", line 577, in get_rendered
return render_template(template, ctx, node)
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/dbt/clients/jinja.py", line 528, in render_template
return template.render(ctx)
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/jinja2/environment.py", line 1090, in render
self.environment.handle_exception()
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/jinja2/environment.py", line 832, in handle_exception
reraise(*rewrite_traceback_stack(source=source))
File "/Users/JoshuaLayton-Wood/venvs/dbt/lib/python3.7/site-packages/jinja2/_compat.py", line 28, in reraise
raise value.with_traceback(tb)
File "", line 111, in top-level template code
TypeError: object of type 'NoneType' has no len()

System information

Which database are you using dbt with?

  • [X ] bigquery

The output of dbt --version:

0.18.0

The operating system you're using:
MAC OS
The output of python --version:
3.7.3

Additional context

I've recently removed the snowplow module and copied some of the models into my own models folder. reversing these steps gets rid of the problem.

Update snowplow models to accept non-js event trackers as first-class citizens

Currently these open source models rely on functionality only provided in the js tracker -- domainsessionidx and domainsessionid. We need to reimplement sessionization to allow other trackers that don't contain that functionality to also participate as first-class citizens in these models (currently they get filtered out).

Support new snowplow data model

Describe the feature

It looks like Snowplow has switched to the new of processing raw data https://github.com/snowplow/data-models and call new data model as V1. Current implementation of DBT module does not suits it anymore. New data model should be supported/adopted by DBT snowplow module.

Describe alternatives you've considered

Possible workaround - create by DBT module all snowplow tables by parsing JSON fields in EVENTS table. E.g. create table com_snowplowanalytics_snowplow_web_page_1 by parsing field EVENTS.contexts_com_snowplowanalytics_snowplow_web_page_1

Who will this benefit?

All users who has using DBT and Snwoplow at the same time on their projects.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.