asterion-digital / asterion-as-code Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 5.0 714 KB

Deploying asterion digital infrastructure to aws and raspberry pi's using pulumi

Python 73.10% Smarty 14.83% Mustache 12.07%

aws k3s kubernetes pulumi wordpress

asterion-as-code's People

Contributors

Stargazers

Watchers

Forkers

shawngerrard gyle bridgecrew-perf6 bridgecrew-perf7

asterion-as-code's Issues

Implement aws sso with external identity provider (gsuite)

Linked to issue #12 - rather than relying on manual methods, we can utilize aws sso to authenticate with the active identities in our gsuite directory.

We should investigate identity federation in order to better centralize and automate authentication moving forward. In the first instance, we should be looking for a way to deliver this through pulumi.

Move aws accounts into a holding org unit on pulumi destroy of org-asterion

When destroying the pulumi stack for org-asterion, pulumi attempts to move the aws member accounts. This triggers aws to attempt transforming the member accounts into stand-alone accounts, which causes exceptions as these member accounts created through pulumi are missing the proper configurations (such as credit card details) to become stand-alone accounts. The exceptions are currently critical issues to bringing the org-asterion stack down.

Duplication/deletion also renders email addresses inert and should be avoided.

I propose that we configure the aws account resources created in main.py so that when pulumi destroy is initiated, an aws ou is created in organizations as a holding bay for all accounts to be shifted to prior to destruction.

main.py will also need further modification so that when the stack is brought up, the holding bay ou contents are placed into a list and checked so that we reuse these resources instead of creating duplicates.

The additional benefit here is that we may be able to run some kind of scheduled jobs in the future over resources in this ou for further vetting or for initiating the aws account deletion processes.

Data backup processes

We need backup processes to store all application data in the Asterion AWS infrastructure in the event that a critical failure occurs, causing data to become corrupted.

The best solution would be a job template that can be integrated with existing git workflows, which will run a process to completely store data in an archive file (7z, rar, tar, iso, etc) format prior to commencing actions to replace parts of our infrastructure.

The destination can just be local storage for now, but the final intention should be comparable to storing on a NFS that can be shared across all nodes/clusters.

Update aws secrets into github for ci workflows

Our CI workflows are currently failing as we do not have aws credentials set up as secrets within this github repository. This issue is reliant on work within issues #15 and #12 to centralize company org accounts and activate auditing within aws.

Set up company root email accounts for aws registration

We should begin the process of creating shared group credentials and services.

Currently, I can see the need for three primary group email accounts that will be mapped to specific permission/role levels in aws iams:

root email account that can manage everything across all aws organizations/accounts. e.g - [email protected].
administrator email account that can manage everything across all aws accounts/services within specific aws organizations. e.g - [email protected].
customer email account that can manage automated communication to external parties. e.g - [email protected].

Many services depend on email as a primary means of identification and communication. We have a company email service with g-suite that we could utilize for this purpose.

Merge `org-asterion` and `org-asterion-dev` into a multi-stack single project

As per the title - allows each stack to be more easily maintained and critical for usability.

Some values will need to be stored in Pulumi.dev.yaml:

awsAccountId: store the aws account id of the dev environment.
accountAlias: must contain the string "asterion" followed by the environment/stack name, I.E "asteriondev".
iamUsersToAdd: a string list of usernames to create aws iam users.
masterAccountId: store the aws account id of the master/root account. It may be more appropriate/secure to obtain this value from aws during program execution and export the value rather than preset it in the config.
masterRootId: store the id of the master root profile.

**Note: ** The pulumi service cannot be utilized to manage global variables, however a workaround could be achieved by exporting a pulumi output json object containing hardcoded values.

org.py, ou.py, and users.py will need to be modified to be repeatable by placing the relevant, existing code into callable functions that are more scalable. The scalability must derive from the pulumi stack model.

Modify git workflows to trigger upon action within specific folders/files

We need our git CI workflows to trigger when actions occur (e.g - push, pr) over specific files/folders (e.g infra-aws) rather than when these actions occur over the entire repository. This will ensure that in our mono-repo implementation, the git workflows will only trigger when necessary.

Set up and test aws budgets

To manage our resource usage in the cloud, we would need to have controls in place to avoid needless or accidental overspend. For our ec2 instances, aws provides budgets to fulfil that capability.

We need to setup aws budgets to:

Limit aws spending on ec2 instances to a reasonable daily limit. Initially, this should be limited to a single ec2 instance daily spend.
Limit aws spending on all other services to nil.

This should also be discussed amongst the team as to how these controls should be configured moving forward.

Define an aws organization in pulumi

We need to be able to define the asterion organization in pulumi when deploying our infrastructure in aws.

This organization must have access to all features, be easily identifiable, and will eventually define default member accounts and adopt tag policies to control the permissions across all aws services for this organization.

Refer to aws organizations in pulumi for notes on the pulumi-python implementation.

Follow and review aws infrastructure guides

Welcome!

As a good starter, lets "hit three end-points with one api call", so to speak🤣, by doing the following:

Following the instructions on the main page to install our pre-requisites.
Following the instructions on infra-aws to install our pre-requisites.
Amending the instructions above as necessary to ensure a smoother process moving forward.

Let's see if you can get to the end successfully! 😈

Data recovery processes

Using the same rationale as issue #7, we need recovery processes to replace existing data with the extracted data from our backup archive file, and seamlessly integrate this with any of our existing kubernetes workloads on AWS.

This could be a git job that initially needs to be manually run.

Self-hosted secrets manager

We need a secrets manager to centrally manage, automate, and standardize authentication with external service providers.

There's multiple solutions out there, but some basic requirements:

Passwords have to be encrypted at rest.
Scalable technically and economically.
Ideally open-source, cloud native.
Access to secrets can be restricted.
Support for changing secrets or secret rotation.
Ability to automatically interact with pulumi, aws, kubernetes, etc.

First steps TBD.

Asterion on-premise infrastructure hardware

As a company, we need to self-host services, and as an initial iteration we're running K3S on raspberry pi's (arm64) with networking and persistent storage.

To set this up, we'll need to make some hardware purchases. These should include:

Create aws org unit and account between root and infra-aws levels and move iam users

As per this aws document about service control policies, we need to make some adjustments to the structure of the current asterion aws organization to conform with the aws principles of effective permissions.

This involves creating another org unit and account at a new level between root and the infra stacks to be able to apply SCP's at that level - the document linked above refers to the point that SCP's cannot be applied at a root level.

This should be prioritized low as this use case would only apply if the volume of asterion users scaled horizontally.

Environment controls in git workflows to destroy aws pulumi stacks

To avoid accidental overuse of resources, we should introduce controls to destroy or keep pulumi stacks active, depending on environment or activity, when a git workflow is triggered.

For instance, if not requiring a stack to be active, such as a stack produced in a dev environment, the stack may need to be destroyed upon workflow completion.

We can implement some type of conditional control within the git workflow to control this at a rudimentary level. We could also create a boolean environment secret to indicate to our workflow when to destroy stacks or when to keep them alive.

Deploy nginx ingess controller to k3s cluster

The raspberry pi infrastructure requires http/s traffic incoming to the existing k3s cluster to be routed through nginx to our existing applications (wordpress, mariadb).

The intention is using load balancing principles to mitigate future bottlenecks and improve performance from heavy traffic accessing interfaces/end-points of our applications.

Define aws accounts and aws iams users/roles/policies in pulumi

This issue is dependent on #12 and #15 completion.

The email accounts created in #12 will also require associated members set up in aws iams and then applied to all required organizations that will be housed within the solution of #15. You can consult the pulumi documentation to determine how to implement this in pulumi.