Giter Site home page Giter Site logo

arunkpatra / athena Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 1.92 MB

Data driven and AI led insights for the Gift Card business.

License: MIT License

Java 99.58% Dockerfile 0.21% Shell 0.21%
breakage escheatment uplift analytics gift-cards notifications

athena's Introduction

License: MIT Build Status Coverage Status Issues Latest Release Join the chat at https://gitter.im/athena-chat/community

Athena

Hilights of this Repo

Hilight Notes
Demo Site Endpoint hosting REST APIs that this repository delivers. The REST APIs execute live queries on an an Amazon Redshift cluster. REST API Demo URL (Current Status: Shutdown)
Scaling Athena Thoughts on how to scale this application to extreme levels beyond the expectations of the current programming challenge.
Technology Stack This repo uses a variety of technologies including Amazon Redshift, S3, Elastic Beanstalk, Java 11, Spring Data, Spring Boot, Swagger, Gradle, JUnit, Docker and Kubernetes. See Technologies used by Athena
Engineering Best Practices demonstrated See Engineering best practices demonstrated in this repo
End to end implementation This repo implements a set of use cases with working automated tests. See What software components this repo implements
GC Breakage Forecast Approaches Multiple approaches on implementing a hyper scale gift card breakage forecast platform. See Discussion on multiple approaches to forecast gift card breakage

Background

US shoppers spent around $40 Billion last year on Gift Cards. Around $1.2 Billion of this was left unredeemed. Based on existing regulation in various states, governments can claim away this money from businesses selling these gift cards. Interestingly, as one is aware, its best to ensure that customers redeem gift cards as much as possible. That increases the chances of uplift, acquire new customers, promote new business and actually allow the business to keep the revenue from the gift card sales at the first place (technically speaking in accounting terms, the GC sale revenue is nothing more than a liability on the books till the consumer actually redeems the gift card!).

Problem Statement: What business wants

  1. Minimize escheatment - Its just a risk and doesn't help my business. Solve the breakage problem to start with.
  2. Maximize uplift - That's a major success criteria for my business.

Business expectation: What business needs to meet stated goals

  1. Tools to predict breakage values each quarter by card type. I need this for planning and to protect my Gift card business better.
  2. Tools to notify customers and hence prevent breakage, thereby creating uplift hopefully. This is a key success factor for Athena.

Solution Approach: How do we meet business expectations

  1. Use historical data around breakage and/or train a model to predict breakage numbers. Find out when redemption chances become remote.
  2. Try to formulate a 'smart' strategy to trigger notifications to the customer so that chances of actual redemption improve. A 'dumb' strategy would be to just trigger notifications a week or two weeks prior to expiry. The smartness factor is a function of multiple parameters that might include customer profile, situational aspects, in store/online offers etc. We must try everything possible to not only just prevent breakage, but create uplift. Historically, 65% customers have a likelihood of spending 38% more than the gift card value. Tapping this opportunity should be a key success factor for Athena.
  3. Build a solution that scales with data volumes, is dynamic enough to cater to dynamic escheatment regulation and enables the smart insights mentioned above.

Key Usecases: Where the rubber meets the road

  1. As a gift card issuer, I want to see by card type, breakage probability and projected breakage value at a given point in time. A simple UI is just fine. If you can aggregate by brand, that's awesome. If you can show me an overall predicted escheatment value for my business, it would be great.

  2. As a gift card issuer, I want the system to tell me the best possible time when notifications should be generated to customers so that I can maximize uplift. It's awesome to see a predicted uplift value in response to this action that was taken.

Solution Architecture: How does the blueprint of the solution look like

See ARCHITECTURE.md for a detailed discussion.

Development Tips

See DEVELOPMENT.md

Screenshot: The proof is in the pudding

The APIs developed as part of the programming challenge:

Athena Swagger UI

Mantra for Success

Thing big, start small... The design and code developed as part of the programming challenge addresses a thin sliver of the broad objectives mentioned earlier.

athena's People

Contributors

arunkpatra avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

gitter-badger

athena's Issues

Data load for analytics

Ask

Populate data for analyitcs.

Approach

Load up a decent amount of data so that tests cover positive, negative and edge cases. The focus is right now not scale, but validation and demonstration of insights.

Outcome

Load up S3 with following:

  1. Load around 50 different gift cards covering Open, Close Semi-Open loop cards.
  2. Load Merchant data
  3. Load Transaction log data
  4. Load customer data
  5. Pre-calculate Historical Breakage rates by card for a few years and populate historical breakage rate data. (this is derived data)

Implement skinny REST APIs to float insights

Ask

Float up insights acquired from Redshift via REST APIs

Approach

  1. Use Spring boot REST infrastructure
  2. Use the required Redshift Java drivers
  3. Document APIs using Swagger

Outcome

  1. Swagger UI endpoint from where APIs could be called
  2. Ensure unit tests exist and coverage is reported

Agree on key use cases

Do these make sense, vis-a-vis objectives and schedule constraints?

  1. As a card issuer, I need to see which cards would have a probability of breakage at what time and overall breakage value
  2. As a card issuer, i need to know the schedule when notifications should be triggered, and the go ahead and approve the notification triggers for customers (system assisted, but human approved maybe)
  3. See a simple user interface that gives me predicted escheatment risk value, and possible uplift from timely triggers.

User interface

Simple ReactJS UI would do for now. If not anything else (and if time is a constraint) just Swagger UI is the fallback.

Agree on physical outcomes

Does the following outcomes of the exercise sound reasonable?

  • An engine/model that can look at data and provide breakage predictions? The thought around the model is important and not the actual data or training. In any case real training data is unavailable. A rudimentary implementation is fine for now for the engine?
  • The technology components to float up the data and present in a simple UI app is acceptable?
  • A robust architecture and broad vision on the platform is important at the point? Is it acceptable to show the bigger picture but implement achievable parts of it for the moment? (stub out the rest)

Data model for programming challenge

Ask

Create a simplified data model to get insights from transaction log data.

Approach

Consider the following entities:

  • Card Type: Information about a gift card type
  • Card Data: Information about a specific card(sold card - plastic)
  • Merchant: Information about a merchant
  • Customer: Customer data
  • Transaction log: Sufficiently denormalized transaction log data fit for insight extraction

Notes

  • We purposefully denormalize the transaction log data model to facilitate efficient queries from a OLAP standpoint.
  • All other entities considered are seen as some form of truth that holds good over time.
  • The data model presented here stems purely from a OLAP standpoint and hence not normalized at all. It is assumed that some system of record is the ultimate owner/origin of the data and that in turn holds data in a fully normalized format. It makes sense probably to assume that data flows into the DW from those systems and have been sufficiently pre-processed to facilitate OLAP operations.

Outcomes

  • DDL for the tables
  • Some sample data to start with that can be pushed into S3

Layout Broad Solution Architecture

Compile:

  • Architecturally significant use cases
  • Architectural constraints
  • Technology options and evaluation parameters
  • Technology fitment
  • Logical and physical architectures

Programming Challenge

Why?

  1. The immediate goal is to gain some preliminary insights from the data. The data is transactional data and some reference data, essentially written once but read many times for analytical purposes.

How?

  1. It would make sense to have transactional data (and reference data as well) loaded into S3, and then use a variety of tools to look at the data.
  2. We use Amazon Redshift to start with. We copy data from S3 into Redshift and will do some EDA. Later on, we will attempt using Spectrum instead of copying data over to Redshift.
  3. We will consume the Redshift queries in a thin API layer (REST APIs).

What?

  1. Model card data, customer data, merchant data and transaction log data.
  2. Do EDA to get some meaningful insights.
  3. Expose insights via REST APIs. Spring Boot stack.
  4. Time permitting, do a ReactJS UI
  5. The overarching objective is to have a fully working model that works end to end for which a demonstration can be done. This demonstration should exhibit, sound engineering practices, architectural maturity, design, logical thinking and coding capabilities.

Exclusions

  1. The model in this challenge is not expected to work with massive scale. The analytical queries would have opportunities to be tuned to work for scale at a later time progressively.

What next?

See #18

Insights derived in programming challenge

Ask

Try to derive the following insights.

  • Top selling cards by quantity and gross volume
  • Cards in the 90th percentile by gross sales volumes
  • Top selling cards by business model
  • Highest grossing merchants
  • Top cards by breakage as of today
  • Gross breakage by merchant as of today
  • Cards that are about to have breakage for a customer
  • Breakage forecast for Merchant, categorized by aspects like card category, customer segment, business model, card medium etc.

Approach

  • Query Redshift data for now. (will think about Spectrum later)
  • Will optimize queries progressively. Will look at distkey and sortkey level optimizations subsequently.

Outcome

  • Set of tested queries and sample results
  • Fine to test with a 'nano scale' dataset to start with.
  • Load up a reasonable amount of data next, so that the insights are not badly skewed or look childish.

Next Steps: What more can be done beyond the programming challenge?

Ask

List down items you would like/need to do if given more time.

Next Steps

  1. The analytical queries are not tuned for massive scale. They would need to be tuned for scale and leverage the power of Redshift's query optimization techniques like carefully chosen distkey and sortkey.
  2. Use Amazon Spectrum to directly query S3 data and have Redshift act as a conduit between the business application and the OLAP engine.
  3. Tweak the data model that mimics the actual GC business model more closely. There are opportunities to merge certain entities to facilitate better OLAP queries.
  4. Do a UI that floats up data and presents powerful visualizations.
  5. Implement a more advanced algorithm for breakage forecast.
  6. Productionize API': Security, ExceptionHandling, Logging, Monitoring, Tracing, Scaling, Stress testing, Deployment, Containerization etc.
  7. An ambitious goal would be to develop a robust breakage forecast Machine Learning model trained on actual production data. This is of significant commercial value.

Thoughts on cost

  1. Point 1 requires a massive amount of actual test data, and requires some effort.
  2. Points 2 through 4 in the thoughts mentioned above, are fairly straightforward.
  3. Point 5 requires effort and a more comprehensive data model and business research.
  4. Point 6 is necessary to build production worthy code. It takes a non-trivial amount of time and effort.
  5. Point 7 is a significantly complex effort, but has the highest commercial value. It's probably a valuable goal for the GC business and can be a real income generator.

Elevator pitch

  1. Elevator pitch
  2. Three minute video, just a screen recording is fine for now?

Thoughts and approach on predicting breakage

Ask

What are the possible approaches for predicting breakage for a given card?

Thoughts

Multiple approaches could be used, of which two stand out.

  • One is based on historical data analysis(looking at past trends).
  • Another is based on machine learning models. We discuss both.

Outcome

See the wiki page on Gift Card Breakage Forecast Approach for a detailed discussion.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.