Giter Site home page Giter Site logo

liz-acosta / snowplow Goto Github PK

View Code? Open in Web Editor NEW

This project forked from snowplow/snowplow

0.0 1.0 0.0 25.54 MB

Cloud-native web, mobile and event analytics, running on AWS and GCP

Home Page: http://snowplowanalytics.com

CSS 0.11% HTML 3.50% JavaScript 7.35% Scala 51.63% Thrift 3.55% PLpgSQL 26.07% Python 7.79%

snowplow's Introduction

Snowplow

License

Snowplow logo

Snowplow is an enterprise-strength marketing and product analytics platform. It does three things:

  1. Identifies your users, and tracks the way they engage with your website or application
  2. Stores your users' behavioural data in a scalable "event data warehouse" you control: Amazon Redshift, Google BigQuery, Snowflake or Elasticsearch
  3. Lets you leverage the biggest range of tools to analyze that data, including big data tools (e.g. Spark) via EMR or more traditional tools e.g. Looker, Mode, Superset, Re:dash to analyze that behavioural data

To find out more, please check out Snowplow website and the docs website.

If you wish to get everything setup and managed for you, you can take a look at our commercial offer, Snowplow Insights.

Snowplow technology 101

The repository structure follows the conceptual architecture of Snowplow, which consists of six loosely-coupled sub-systems connected by five standardized data protocols/formats:

architecture

To briefly explain these six sub-systems:

  • Trackers fire Snowplow events. Currently we have 12 trackers, covering web, mobile, desktop, server and IoT
  • Collector receives Snowplow events from trackers. Currently we have one official collector implementation with different sinks: Apache Kafka, Amazon Kinesis, NSQ
  • Enrich cleans up the raw Snowplow events, enriches them and puts them into storage. Currently we have several implementations, built for different environments (GCP, AWS, Apache Kafka) and one core library
  • Storage is where the Snowplow events live. Currently we store the Snowplow events in a flatfile structure on S3, and in the Redshift, Postgres, Snowflake and BigQuery databases
  • Data modeling is where event-level data is joined with other data sets and aggregated into smaller data sets, and business logic is applied. This produces a clean set of tables which make it easier to perform analysis on the data. We have data models for Redshift and Looker
  • Analytics are performed on the Snowplow events or on the aggregate tables.

For more information on the current Snowplow architecture, please see the Technical architecture.

About this repository

This repository used to be an umbrella repository for all loosely-coupled Snowplow components. However, since June 2020 all components have been extracted into their dedicated repositories (more info here) and this repository serves as an entry point for OSS users and historical artifact.

Components that have been extracted to their own repo are still here as git submodules.

Please use directly the repo of a component to report issues and create PRs:

Trackers

Loaders

Testing

Parsing enriched event

Need help?

We want to make it super-easy for Snowplow users and contributors to talk to us and connect with each other, to share ideas, solve problems and help make Snowplow awesome. Here are the main channels we're running currently, we'd love to hear from you on one of them:

This is for all Snowplow users: engineers setting up Snowplow, data modelers structuring the data and data consumers building insights. You can find guides, recipes, questions and answers from Snowplow users including the Snowplow team.

We welcome all questions and contributions!

Twitter

@SnowplowData for official news or @SnowplowLabs for engineering-heavy conversations.

GitHub

If you spot a bug, then please raise an issue in the GitHub repo of this component. Likewise if you have developed a cool new feature or an improvement, please open a pull request, we'll be glad to integrate it in the codebase!

If you want to brainstorm a potential new feature, then Discourse is probably a better place to start.

Email

[email protected]

If you want to talk directly to us (e.g. about a commercially sensitive issue), email is the easiest way.

Copyright and license

Snowplow is copyright 2012-2020 Snowplow Analytics Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

snowplow's People

Contributors

alexanderdean avatar fblundun avatar jbeemster avatar benfradet avatar chuwy avatar benjben avatar ninjabear avatar knservis avatar yalisassoon avatar mtibben avatar rgabo avatar lukeindykiewicz avatar rzats avatar dilyand avatar szareiangm avatar peel avatar aldemirenes avatar ronnyml avatar bogaert avatar misterpig avatar bernardosrulzon avatar miike avatar oguzhanunlu avatar kazjote avatar keanerobinson avatar aalekh avatar richo avatar shermozle avatar rupeshmane avatar kingo55 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.