Giter Site home page Giter Site logo

ratestask's Introduction

ratestask

Rates Task

Development section

Learned Python and PostgreSQL in 2 days.

I followed this tutorial to get started:

https://scotch.io/tutorials/build-a-restful-api-with-flask-the-tdd-way

As for Sunday, 4th of March I have done the following:

  1. Completed GET API
  2. Started POST API
  3. Wrote 12 test cases that run successfully

Today, Tuesday, 6th of March completed the Development Task

Spent 1/2 a day

  1. Refactored the code in accordance with OOP (only exception handling part caused some trouble, couldn't correctly hide the logic inside a class)
  2. Used batch insert for adding the records into the table (COPY could be used with storing the rows in memory)
  3. 16 unit tests ran successfully

Instructions to setup environment:

  1. Install Python3 (I installed Anaconda for MasOS)
  2. Install Flask
  3. Run this: export FLASK_APP="run.py" (I used Virtualenv from the tutorial)
  4. Install PostgreSQL
  5. Install Psycopg2

In order to run the tests run the following command:

python test_rates.py

To run the application:

flask run

To test GET request run:

curl http://127.0.0.1:5000/rates?date_from=2016-01-01&date_to=2016-01-10&origin=CNSGH&destination=north_europe_main

Theoretical section

How would you design the system?

In my opinion, a table that is used to insert data into should be kept as small as possible, since we require high performance for our INSERT statements.

I would keep only data for the last couple of weeks or even days as business requirements state.

The historical data should be kept in another database with main focus on reads, i.e. SELECT statements. The purpose of this second database is to provide analytical information for our end users.

By having to databases with different purposes would help us to separate logic and allow us to tailor the databases for our needs. So, the first database is OLTP database it can be SQL relational database or NoSQL database.

I would use a NoSQL as an OLTP database with insert, update, delete queries, since it is better when you have to regularly insert and delete a lot of records at once.

Also NoSQL database would be better to horizontally scale by sharding. The database can be partitioned in case of using SQL relational database as well, but it causes more burden and is not so straightforward. Relational databases usually scale vertically, which is not cost effective.

I would use queues like RabitMQ in order to handle all the incoming data batches, so that each batch id is stored in a queue, and batches are processed by fetching their ids from the queue. The queue can provide us kind of fault tolerancy in case of sudden peak load, and nothing would be lost or dropped when all the batch proccessing instances busy.

To handle the application logic I would use a fleet of instances that can horizontally scale. I would align the scaling logic with the queue length metric, whenever the number of batch ids in the queue are higher than a treshold I would launch a new instance to process the batches.

If the data from the OLTP database is read, then I would create read replicas in case of relational databases. Also I would use in memory caching like Redis for frequent SELECT queries.

What parts of the system do you expect to become the bottlenecks as the load grows?

As load grows the batch processing instances, database itself, network throughput I expect to become bottlenecks. I would think about CPU, Memory, Disk I/O and Throughput and Network performance.

How can those bottlenecks be addressed in the future?

As I mentioned above this issues can be addressed by using horizontally scalable design for batch processing instances, by splitting the database for historical and operational needs.

Also caching, queues and maintaining small size of the table would help to address these bottlenecks.

Additional questions

  1. The batch updates have started to become very large, but the requirements for their processing time are strict.

Being a cloud architect I would definitely build my environment in AWS. My architecture could handle very large batch updates. I could change the type of instance, i.e. scale vertically to handle a larger batch.

  1. Code updates need to be pushed out frequently. This needs to be done without the risk of stopping a data update already being processed, nor a data response being lost.

The bottleneck is the batch processing logic, as the instances may be unable to handle new batches properly. They might drop them and do not process. As I mentioned above queues can be used for this purpose. So, the batch can wait or be processed in parallel.

  1. For development and staging purposes, you need to start up a number of scaled-down versions of the system.

My architecture when I would use horizontally scalable batch processing instances could be scaled down very easily. There is no problem to scale them down, I could change the instance type to a cheaper one and even one instance would be enough for DEV and QA environments.

ratestask's People

Contributors

smartrus avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.