Giter Site home page Giter Site logo

ibm-cloud-streaming-retail-demo / kafka-producer-for-simulated-data Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 1.0 42 KB

Producer for sending simulated data into Kafka

License: Apache License 2.0

Shell 7.94% Python 92.06%
kafka ibm-cloud cloud-foundry python flask

kafka-producer-for-simulated-data's Introduction

Introduction

The purpose of this project is to take the simulated data set created by the dataset-generator and continuosly produce that data to Kafka.

This project is a Cloud Foundry application.

Dependencies

  • This project has a dependency on the datasets OnlineRetail.json.gz and OnlineRetailCustomers.csv.gz that are output from the dataset-generator

Prerequisites

Deploy

  • If you haven't already done so, create two Message Hub topics called transactions_load and customers_load in your Message Hub instance. The default topic creation settings should be ok to start with.
# clone this project
git clone https://github.com/ibm-cloud-streaming-retail-demo/kafka-producer-for-simulated-data
cd kafka-producer-for-simulated-data

# change the applications.name and applications.route values in the manifest.yml to values that
# should be unique to you

# copy `OnlineRetail.json.gz` and `OnlineRetailCustomers.csv.gz` to this folder
cp ../dataset-generator/OnlineRetail.json.gz .
cp ../dataset-generator/OnlineRetailCustomers.csv.gz .

# deploy this application
cf push [your_app_name]

# bind the Message Hub instance to this application
cf bind-service [your_app_name] [your_name_for_your_messagehub_service]

# restage this application
cf restage [your_app_name]

Scaling

TODO (each instance of cloud foundry generates a unique dataset)

Example data

An example of the transactions data published to Message Hub:

InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country LineNo InvoiceTime StoreID TransactionID
5370812 15056BL EDWARDIAN PARASOL BLACK 12 1515542400000 5.95 15332 Lithuania 3 00:00:00 0 537081230180110

The main changes between this dataset and the dataset created by the dataset-generator project are:

  • InvoiceDate has been converted to a unix timestamp (milliseconds since epoch)
  • The following fields have been added:
    • LineNo - the invoice line item number
    • InvoiceTime - the invoice time (format HH:MM:SS)
    • StoreID - the store ID (each cloud foundry instance will have a unique store id - see TODO)
    • TransactionID - the unique transaction ID, derived from InvoiceNo, TODO

Developing

Copy the file etc/message_hub_vcap.json_template to etc/message_hub_vcap.json and populate with your Message Hub instance values. These can be found in the IBM Console in the section 'Credentials'.

cd kafka-producer-for-simulated-data
virtualenv venv
source venv/bin/activate
pip3 install -r requirements.txt

./bin/run_locally.sh

Or, using docker:

docker run -it --rm -v "$PWD":/usr/src/myapp -w /usr/src/myapp amancevice/pandas:0.20.2-python3 bash -c "pip3 install -r requirements.txt && ./bin/run_locally.sh"

Verifying data is being produced

To verify that the topic is receiving data, you can consume the data with:

cd kafka-producer-for-simulated-data
virtualenv venv
source venv/bin/activate

./bin/run_consumer.sh
  • IMPORTANT: OSX users; if you receive the error: SSL: CERTIFICATE_VERIFY_FAILED you may need to google how to fix it, for example my running /Applications/Python\ 3.6/Install\ Certificates.command

Or, using docker:

docker run -it --rm -v "$PWD":/usr/src/myapp -w /usr/src/myapp amancevice/pandas:0.20.2-python3 bash -c "pip3 install -r requirements.txt && ./bin/run_consumer.sh"

Description

TODO

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.