Giter Site home page Giter Site logo

sfguide-twitter-auto-ingest's Introduction

Snowflake Guide: Auto-Ingest Twitter Data into Snowflake

➡️ Complete this end-to-end tutorial on guides.snowflake.com

This demo shows how to auto-ingest streaming and event-driven data from Twitter into Snowflake using Snowpipe. By completing this demo you will have built a docker image containing a python application that listens and saves live tweets; those tweets are uploaded into Snowflake using AWS S3 as a file stage.

The lessons learned in demo can be applied to any streaming or event-driven data source.

The core topics covered in this demo include:

  1. Data Loading: Load Twitter streaming data in an event-driven, real-time fashion into Snowflake with Snowpipe
  2. Semi-structured data: Querying semi-structured data (JSON) without needing transformations
  3. Secure Views: Create a Secure View to allow data analysts to query the data
  4. Snowpipe: Overview and configuration

Architecture:

Twitter to Snowflake Auto-Ingest Architecture

INSTRUCTIONS

PREREQUISITES

You will need:

SETUP SCRIPT

1. Download the repository

clone this repository locally

git clone https://github.com/Snowflake-Labs/demo-twitter-auto-ingest

navigate to the repository you just cloned:

cd demo-twitter-auto-ingest

2. Add your AWS and Twitter keys

Use your text editor of choice to edit the following files:

  • Dockerfile (lines 9 to 16)
  • 0_setup_twitter_snowpipe.sql (lines 23 to 25)

As you will be able to see in the files, you will also need to specify your AWS S3 bucket (where the data will be stored) and a default search keyword.

3. Build the image

  1. While in your demo-twitter-auto-ingest directory run:
docker build . -t snowflake-twitter

This command builds the Dockerfile in the current directory, and tags the built image as snowflake-twitter.

The last two lines of the output should look similar to the following:

Successfully built c1c0b7262436
Successfully tagged snowflake-twitter:latest

Note: In the above example, c1c0b7262436 is the image id - yours will likely be different.

4. Run the image

$ docker run --name <YOUR_CONTAINER_NAME> snowflake-twitter:latest <YOUR_TWITTER_KEYWORD>

Example (searching for #wednesdaymotivation):

$ docker run --name twitter-wednesdaymotivation snowflake-twitter:latest wednesdaymotivation

At this point you should be able to see the tweets coming in... (every . represents two tweets)

5. Configure Snowpipe in Snowflake

  • Log into your Snowflake demo account and load the 0_setup_twitter_snowpipe.sql script (edited at point 2).
  • Execute the script one statement at a time.
  • Make sure to configure event notifications in AWS S3 as described here.

6. Stop your container

Once you have finished with the setup, it's important that you stop your container in order not to reach your Twitter API rate limits.

Go back to Terminal, open a new Terminal tab (you can use the shortcut ⌘T) and execute the following command:

docker stop <YOUR_CONTAINER_NAME>

Note: the container has a "safety" timeout of 15 minutes.

sfguide-twitter-auto-ingest's People

Contributors

jdanielmyers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

sfguide-twitter-auto-ingest's Issues

Update dockerfile to use specific python and tweepy

Getting errors while running docker image. Fixed by using specific python and tweepy version.

Top of DockerFile:

FROM python:3.6

RUN pip install -U pip
RUN pip install --no-cache-dir boto3
RUN pip install --no-cache-dir awscli
RUN pip install --no-cache-dir tweepy==3.7
RUN pip install --no-cache-dir datetime`

Depricated API in tweepy

  1. tweepy.StreamListener has been replaced with tweepy.StreamingClient and requires bearertoken attribute.
  2. tweepy.Stream requires auth attributes, instead of just auth object. But listerner isn't used, causing a 404 error.
  3. myStream.filter(track=[keyword]) #, is_async=True) no longer uses async

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.