Shop that is designed to be scalable.
Please make sure to read the important sections at the end of the page
- Frontend (/buy) -> cm-server -> Kafka (produce)
- Frontend (/getAllUserBuys/{user}) -> cm-server -> cm-api (/buyList/{user})
- cm-api -> Kafka (consume purchase)
- cm-api -> MongoDB (insert and find purchases for user)
- Each purchase consists of the following fields
- username:: name of the user purchasing
- userid: id of the user purchasing
- price: price of the item
- timestamp: when the purchase was sent
Kafka and Kafka UI are installed and configured with helm using 3 controllers and one broker configuration for KRaft. See kafka/README.md for more info.
Strimzi was considered as a possible solution for kafka, but in this case the bitnami chart was good enough for the task.
MongoDB is installed and configured with helm using PSA configuration. See mongodb/README.md for more info.
Install Ingress-Nginx to handle external requests to cm-server
helm repo add nginx-stable https://helm.nginx.com/stable
helm repo update
helm upgrade --install ingress-nginx ingress-nginx --repo https://kubernetes.github.io/ingress-nginx --namespace ingress-nginx --create-namespace
Customer-Management API for scalable-shop
- Provide an API that's queried by customer-management server
- Consume events from Kafka that were produced by cm-server service
cm-api is installed and configured with helm. See cm-api/README.md for more info.
Customer-Management Server for scalable-shop
- Provide Endpoints that are accessed by the customer frontend
- Produce events for Kafka based on buy data sent via the customer frontend that are consumed by the cm-api service
- Make requests from cm-api e.g.
cm-server is installed and configured with helm. See cm-api/README.md for more info.
Set Environment
export INGRESS_HOST=localhost
curl http://${INGRESS_HOST}/scalable-shop-cm-server/healthz
Expected Response:
Success
curl http://${INGRESS_HOST}/scalable-shop-cm-server/buy \
--header 'Content-Type: application/json' \
--data '{
"username": "joe",
"userid": "005",
"price": "1195.30",
"timestamp": "2024-05-31T15:54:32.204Z"
}'
Expected Response:
{
"status": "Ok!",
"message": "Message successfully sent to Kafka!"
}
curl http://${INGRESS_HOST}/scalable-shop-cm-server/getAllUserBuys/joe
Expected response:
{
"username": "joe",
"purchases": [
{
"_id": "665ca845678cb66959d42f4a",
"username": "joe",
"userid": "005",
"price": "1195.30",
"timestamp": "2024-05-31T15:54:32.204Z"
}
]
}
- We assume that timestamp is sent by the client and reflects when the request was sent. However, in case this should be when request is received, a timestamp should be generated by the server.
- Based on the initial spec for this project it seems that getAllUserBuys and buyList endpoints should provide similar results. Ideally, we should probably use snake cased paths for APIs e.g. buy-list and also use consistent naming to avoid confusion later. However getAllUserBuys and buyList names were used to be consistent with the spec.
- We did not overly generalize controllers (kafka controller can either consume or produce depending on the service) -- but maintaining a single shared kafka controller might be better.
While some efforts were made to increase security and availability, due to the nature of this project being a PoC, you should definitely not use this project as-is in production. In order to make this project production ready, the following points should be considered.
- Caching and more unique message lookup scenarios were not part of the design
- It is probably better to store users in a separate collection with userid and username and then lookup username by userid for example
- We can also consider a design where we consistently update a user's list of purchase ids, so we don't need to find which purchases are associated with a user every time.
- Currently logs are verbose by default and no logging controller is used. Generally different log levels should be set with varying levels of verbosity and and production logs should be minimized by default to avoid log spam
- NodePorts services are used for debugging purposes. In a production environment the service type should be changed to hide internal services.
- Healthchecks are incomplete (e.g. in the PoC we don't check dependent services like mongodb and kafka) so a service reports itself as healthy even when it can't handle requests
- We should check all data before its submitted to avoid duplicate purchases e.g. purchases with same value sent at exact time
- Every request and attached data submitted from the client-side should be validated to prevent various cyber attacks.
- TLS is not used at all, but in production environments, connections should be encrypted with TLS
- API Authentication should be used to ensure that requests come form a trusted source
- Dedicated non-root users or IAM Authorization should be used for database and queue connections
- Secret values should not be placed in the deployment as env vars. They should actually come from a secret (even though atm most clusters use Opaque secrets)