Giter Site home page Giter Site logo

infrastructure's Introduction

AWS CDK repo for Finch

Prerequisites

Deployment Steps

Step 1: Clone the infrastructure repo and open a terminal.
Step 2: Before the deployment, check whether the key pair runner-key.pem exists in your AWS EC2 console. If not, set up a ssh key pair runner-key.pem for ssh to the ec2 instance. Go to AWS console > EC2, in the left tab “Network & Security” > “Key Pairs”, click “Create key pair” with name runner-key.pem.
Or create the key pair with the AWS CLI command below.
aws ec2 create-key-pair --key-name runner-key --output text > runner-key.pem

Step 3: For the first time deployment to an environment, run cdk bootstrap aws://PIPELINE-ACCOUNT-NUMBER/REGION to bootstrap the pipeline account and cdk bootstrap --cloudformation-execution-policies 'arn:aws:iam::aws:policy/AdministratorAccess' --trust PIPELINE-ACCOUNT-NUMBER aws://STAGE-ACCOUNT-NUMBER/REGION to bootstrap the beta/prod accounts. Then run cdk deploy with the pipeline account credentials set up to deploy the pipeline stack, and all the application stacks will be deployed by the pipeline for each commit.

Self-hosted Runners (Mac Arm64 and Mac Amd64)

The stack ASGRunnerStack is used to provision EC2 Mac instances through an autoscaling group. The runner configurations can be edited in config/runner-config.json.
When the runners are initialized, a user data script (scripts/setup-runner.sh) runs to setup the instance. This script downloads and installs GitHub actions runner application on the our self-hosted runner, which is used to connect our runner with the GitHub actions. Then the script connects the instance with our GitHub repos, starting the runner service.

Connect to the runners for troubleshooting

After the runner is linked to the GitHub repo, you can access a runner to trouble shoot it by noting down the name of the runner (usually ip172-31-xx-xxx), access the AWS account that hosts the instance, and find the instance with the private IP address matching the name above. Connect to the runner either by using an SSH service with the key saved in the Secret Manager, or use Session Manager in EC2. Go to the AWS EC2 console, select the instance and click Connect > Session Manager.

S3 Bucket and Cloudfront Distributions

The S3 buckets are used for storing project artifacts and dependencies that should be publicly accessible. To make the content delivery more effective and secure, we also set up CloudFront to work with the S3 buckets.
The construct CloudfrontCdn creates a new CloudFront distribution in front of an existing S3 bucket and adds an OAI to it which makes the content in the bucket can be read by the CloudFront distribution. Users can then access the bucket objects through the CloudFront domain instead of the S3 bucket URL, and benefit from CloudFront's features, like caching.

  • Get the distribution domain from the AWS console.
  • Enter the CloudFront URL, concatenated with the path to a file in your browser to download a file. For example, *.cloudfront.net/path/to/file.

Unit and Integration Tests

The unit tests and integration tests are both executed by the pipeline post-deployment steps in Beta stage. Or you can run the tests with the command below.

npm run test
npm run integration

Format your code with the command npm run prettier-format.

Host Licensing

When using Auto Scaling Group with macOS instances on EC2, a host license has to be created in AWS License Manager. Create a self-managed license named MacHostLicense, set the license type to sockets, and save the arn to the runner-config.json file.

Access Tokens

Overall, 3 access tokens are required - one for the pipeline to access the runfinch/infrastructure code and update when there is a code update, and two for runfinch/finch and runfinch/finch-core to provide the github runner keys for automatically creating and registering the runners

infrastructure's People

Contributors

austinvazquez avatar azhouwd avatar chandrushetty avatar dependabot[bot] avatar ginglis13 avatar kern-- avatar kevinliaws avatar monirul avatar pendo324 avatar sam-berning avatar vsiravar avatar weikequ avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

infrastructure's Issues

Tune dependency automation to be less noisy

My recommendation is to consider the following options to make dependabot updates less churnful:

  1. Apply a update group to the npm package ecosystem. (Ref)
  2. Downgrade the frequency from daily to weekly.

image scanning: monitoring

The lambda function used for sending notifications about Inspector V2 image scan findings can be improved by adding Cloudwatch monitoring and alarming to alert on failed and/or missing invocations of the function.

ci: Build action has been consistently failing macos-arm64-build in finch-core

The Build action has been consistently failing for the last month: https://github.com/runfinch/finch-core/actions/workflows/release.yaml

GitHub runners use passwordless sudo. However runners provisioned via finch infra don't allow passwordless sudo. (EDIT: this is observed consistently on the macOS 12 runner for arm64 https://github.com/runfinch/finch-core/actions/runs/6010202363/job/16301161154)

The log lines in step Make and release deps before timeout have been:

if [ "Darwin" != "Linux" -a ! -e "/opt/homebrew/bin/nerdctl" ]; then ln -sf nerdctl.lima "/opt/homebrew/bin/nerdctl"; fi
if [ "Darwin" != "Linux" -a ! -e "/opt/homebrew/bin/apptainer" ]; then ln -sf apptainer.lima "/opt/homebrew/bin/apptainer"; fi
sudo may prompt for password to run FileMonitor
Error: The operation was canceled. # <-- timeout, cancelled workflow

This message is coming from https://github.com/runfinch/finch-core/blob/08a4ca2a9285f1dd2fac3bd4701087b1b2fdec87/bin/lima-and-qemu.pl#L46

Still looking to verify but the smoking gun is that the script is hanging on a prompt for password.

eOn my machine macOS Ventura 13.4 M1 chip:

./bin/lima-and-qemu.pl                                            
ls: /opt/homebrew/bin/limactl: No such file or directory
Missing argument in sprintf at ./bin/lima-and-qemu.pl line 213.
sudo may prompt for password to run FileMonitor
Password:

Add lifecycle policies for S3 buckets, ECR repo

Currently, there are no lifecycle policies for the S3 buckets or ECR repo created by this stack. Ideally we should discuss some reasonable policy and define it for resources in this repository.

Dedicated hosts should have auto placement

Dedicated hosts should be created with auto placement on. This allows for reusing of hosts when deployments fail and hosts are created (but have to wait for the 24 hour limit on macOS EC2 instances).

Automatically detect and cut CRs for GHA dependency updates

This repo makes use of dependabot for code dependency updates. However, our infrastructure relies on other software, in particular GitHub Actions runners (https://github.com/actions/runner/releases).

Success criteria: a scheduled GitHub action utilizing the create-pull-request action that will 1/ check https://github.com/actions/runner/releases for a new release of the GitHub actions runner and 2/ cut a PR to update the version and sha for both x86 and ARM.

Changing the config sometimes has unintended consequences to the cloudformation stack

As each ASG is a stack made from the combination of a macOS version, the arch, and the repo, changing any of the attributes to a new combination will result in a new stack being created and the old stack being abandoned. See this PR for example. This PR will orphan the 12.6/arm/finch-core and 12.6/x86/finch-core stacks and create a new stack with 13.2/arm/finch-core and 13.2/x86/finch-core.

The old stacks should be cleaned up somehow when renamed like this.

image scanning lambda: unit test coverage

Right now there are no unit tests for the rootfs image scanning Lambda. While this is non-critical infrastructure and the function is fairly simple, unit test coverage for the function would surely be an enhancement to the overall testing profile of this package.

A solution could define a class like this with a boto3 resource to be easily mocked:

class SnsTopicClass:
    """
    AWS SNS Topic Resource Class
    """
    def __init__(self,  sns_topic_resource):
        self.resource = sns_topic_resource["resource"]
        self.topic_arn = sns_topic_resource["topic_arn"]
        self.topic = self.resource.Topic(self.topic_arn)

# [1] Globally scoped resources
# Initialize the resources once per Lambda execution environment by using global scope.
_LAMBDA_SNS_TOPIC_RESOURCE = { "resource" : resource('sns'), 
                              "topic_arn" : os.environ.get("SNS_ARN","NONE") }

Alternatively, given this project is mainly TypeScript, the lambda function could be rewritten in TS to get the built in benefit of all testing tooling already living with the project, i.e. not having to maintain a python virtual environment or similar for testing this both locally for devs and in the actual pipeline defined in this project.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.