Giter Site home page Giter Site logo

nagyistge / dataflow-whitepaper Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sungard-labs/dataflow-whitepaper

0.0 3.0 0.0 33 KB

Accompanying repository for FIS/SunGard's whitepaper on the using the Dataflow SDK to transform options market data

License: MIT License

Shell 6.75% Java 93.25%

dataflow-whitepaper's Introduction

FIS Google Cloud Platform Whitepapers

About

Welcome to the source repository for various code references found in several FIS Advanced Technology white papers.

In December 2015, FIS Advanced Technology released the white paper "Transforming Options Market Data with the Dataflow SDK", to provide software engineers insight into Google Cloud Dataflow's optimal programming model and execution environent.

In August of 2016, FIS Advanced Technology produced a sequel to the original 2015 Bigtable whitepaper, detailing the introduction of Google Cloud Dataflow and Google Cloud BigQuery to the Market Reconstruction Tool's solution architecture, as well as to provide a deeper look into the material covered at our Analyzing 25 billion stock market events in an hour with NoOps on GCP talk from Google NEXT 2016.

September of 2016 saw the release of "Market Reconstruction 2.0: Visualization at Scale", illustrating the team's experience designing the user interface for a securities transaction regulatory database expected to grow to 35 petabytes over 6 years.

White papers

Running the example Dataflow options market data transformaton project

To build the project:

mvn clean install

Out of the box, the repository is configured to run a standalone Dataflow job on the local workstation, using input data that ships with the repository (input/zvzzt.input.txt).

The example can be run locally either executing:

cd bin && ./run

or by calling Maven with:

mvn clean install && mvn -Plocal exec:exec

Running the project on Google Cloud Platform / BigQuery

Once you have activated a Google account on Google Cloud Platform, you will need your Project ID and at least one GCS bucket to be created (for storing deployment artifacts and input files.)

Log your shell into GCP:

gcloud auth login

If you do not already have a Google Cloud Storage bucket, you can create one with the following command:

gsutil mb gs://<pick_a_bucket_name>

Copy input specimen to Google Cloud Storage:

gsutil cp input/zvzzt.input.txt gs://<pick_a_bucket_name>

Ensure that there is a proper destination dataset in your BigQuery account. For example, this command will create a dataset called dataflow-project within BigQuery for your account:

bq mk <dataflow_project>

Execute the following, substituting your own values PROJECT and BQDEST in bin/run:

cd bin && ./run gs://<pick_a_bucket_name>/zvzzt.input.txt

The Pipeline will automatically create the table if it does not exist, although it cannot create the initial dataset.

To execute the job upon Google Cloud Platform using Maven, edit the associated values for your project ID and account within pom.xml and then run:

mvn clean install && mvn -Pgcp exec:exec

Remember that you can not use local files but have to use files stored from/to GCS (gs://).

Errata

Please open up a GitHub issue for any discrepancies or inconsistencies you may discover and we will correct and publish here.

See Also

License

MIT. See license text in LICENSE.

Copyrights and Names

Copyright © FIS 2016. Licensed under the MIT license.

dataflow-whitepaper's People

Contributors

salsferrazza avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.