gsa-tts / datagov-brokerpak-eks Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 7.0 533 KB

Broker AWS EKS instances using the OSBAPI (eg from cloud.gov)

License: Other

Makefile 7.00% HCL 79.62% Shell 13.18% Dockerfile 0.21%

datagov-brokerpak-eks's People

Contributors

Stargazers

Watchers

Forkers

ohsh6o loganmeetsworld bengerman13 muzakh mogul ethicalsecurity-agency

datagov-brokerpak-eks's Issues

Security Policy violation Repository Administrators

This issue was automatically created by Allstar.

Security Policy Violation
Users are not allowed to be administrators of this repository.
Instead a team should be added as administrator.

To add a team as administrator From the main page of the repository, go to Settings -> Manage Access.
(For more information, see https://docs.github.com/en/organizations/managing-access-to-your-organizations-repositories)

Issued created by GSA-TTS Allstar

This issue will auto resolve when the policy is in compliance.

Issue created by Allstar. See https://github.com/ossf/allstar/ for more information. For questions specific to the repository, please contact the owner or maintainer.

Migrate from PodSecurityPolicy (PSP) to Pod Security Standards (PSS)

Key points:

Kubernetes version 1.21 to 1.25
PodSecurityPolicy (PSP) to built-in Kubernetes Pod Security Standards (PSS)

This might be a no-op. But creating an issue since I haven't worked with EKS recently enough to remember the setup. The following is an email from AWS,

What is changing?
PodSecurityPolicy (PSP) was deprecated [1] in Kubernetes version 1.21 and has been removed in Kubernetes version 1.25 [2]. If you are using PSPs in your cluster, then you must migrate from PSP to the built-in Kubernetes Pod Security Standards (PSS) or to a policy as code solution before upgrading your cluster to version 1.25 to avoid interruption to your workloads.

What actions can customers take?
PSP resources were used to specify a set of requirements that pods had to meet before they could be created. Since PSPs have been removed in Kubernetes version 1.25, you must replace those security controls. Two solutions can fill this need:

Kubernetes Pod Security Standards (PSS)

Policy-as-code solutions from the Kubernetes ecosystem

In response to the PSP deprecation and the ongoing need to control pod security out-of-the-box, the Kubernetes community created a built-in solution with PSS [3] and Pod Security Admission (PSA) [4]. The PSA webhook implements the controls defined in the PSS. To review best practices for migrating PSPs to the built-in Pod Security Standards, see references [5] and [6].

Policy-as-code solutions provide guardrails to guide cluster users, and prevent unwanted behaviors, through prescribed and automated controls. Policy-as-code solutions typically use Kubernetes Dynamic Admission Controllers to intercept the Kubernetes API server request flow, via a webhook call, and mutate and validate request payloads, based on policies written and stored as code. There are several open source policy-as-code solutions available for Kubernetes. To review best practices for migrating PSPs to a policy-as-code solution, see reference [7].

You can run the following command to view the PSPs in your cluster: kubectl get psp. If you see the eks.privileged PSP in your cluster, it will be automatically migrated to PSS by Amazon EKS. No action is needed on your part.

To summarize, if you are using PSP in your cluster, then you must migrate from PSP to the built-in Kubernetes PSS or to a policy as code solution before upgrading your cluster to version 1.25 to avoid interruptions to your workloads. EKS offers best practices for pod security and guidance for implementing pod security standards [8]. You can find details on PSP Migration in EKS documentation [1].

If you have any questions or concerns, please reach out to AWS Support [9].

[1] https://docs.aws.amazon.com/eks/latest/userguide/pod-security-policy-removal-faq.html
[2] https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html#kubernetes-release-calendar
[3] https://kubernetes.io/docs/concepts/security/pod-security-standards/
[4] https://kubernetes.io/docs/concepts/security/pod-security-admission/
[5] https://aws.github.io/aws-eks-best-practices/security/docs/pods/#pod-security-standards-pss-and-pod-security-admission-psa
[6] https://kubernetes.io/docs/tasks/configure-pod-container/migrate-from-psp/
[7] https://aws.github.io/aws-eks-best-practices/security/docs/pods/#policy-as-code-pac
[8] https://aws.amazon.com/blogs/containers/implementing-pod-security-standards-in-amazon-eks/
[9] https://aws.amazon.com/support

Sincerely,
Amazon Web Services

Limit EKS log retention to 180 days

User Story

In order to avoid ballooning storage requirements for our logs, the data.gov team wants to cap EKS instance log retention at 180 days.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

GIVEN [a contextual precondition]
[AND optionally another precondition]
WHEN [a triggering event] happens
THEN [a verifiable outcome]
[AND optionally another verifiable outcome]

Background

[Any helpful contextual notes or links to artifacts/evidence, if needed]

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]

Sketch

[Notes or a checklist reflecting our understanding of the selected approach]

Security Policy violation Branch Protection

This issue was automatically created by Allstar.

Security Policy Violation
Signed commits required, but not enabled for branch: main

Issued created by GSA-TTS Allstar

This issue will auto resolve when the policy is in compliance.

Issue created by Allstar. See https://github.com/ossf/allstar/ for more information. For questions specific to the repository, please contact the owner or maintainer.

Binding should create a namespace with service account

In order to follow the principle of least privilege, security folks want bindings to an EKS service instance to return a service account restricted to a single namespace.

A Fargate profile referring to the corresponding namespace will have to be created during the binding process.

Route logs for EKS control plane and pods into Cloudtrail

User Story

In order to aggregate, analyze, and alert on logs from EKS instances, we want to configure EKS instances to send control plane and pod logs to Cloudtrail logs when we provision them.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

GIVEN I have provisioned an instance of the EKS service
AND I am authenticated with the AWS Console for the SSB account
WHEN I look at Cloudtrail
THEN I see logs corresponding to the EKS instance control plane
AND I see logs corresponding to the EKS data plane (workloads in Fargate)

Background

Necessary for meeting compliance controls. See the AU family of controls in particular.

Security Considerations (required)

Once this story is complete, we will be able to demonstrate full visibility of logging from provisioned EKS clusters via Cloudtrail. This meets some of the NIST compliance requirements for auditing.

Sketch

Here's how to do it for EKS and how to do it for pods. There are Terraform docs for this too.

Verify container image signatures before deployment (Notary/cosign/Connaisseur)

User Story

In order to ensure that images have not been tampered with before they are deployed, we want our EKS instances to verify the image signature in an admission controller hook using the Notary or cosign protocols.

Background

Here's the actual Connaisseur repository; it can be deployed using a Helm chart.

Demo

Sketch

We can/should also verify the signature of helm_release resources used in brokerpaks.

Security Policy violation SECURITY.md

This issue was automatically created by Allstar.

Security Policy Violation
Security policy not enabled.
A SECURITY.md file can give users information about what constitutes a vulnerability and how to report one securely so that information about a bug is not publicly visible. Examples of secure reporting methods include using an issue tracker with private issue support, or encrypted email with a published key.

To fix this, add a SECURITY.md file that explains how to handle vulnerabilities found in your repository. Go to https://github.com/GSA-TTS/datagov-brokerpak-eks/security/policy to enable.

For more information, see https://docs.github.com/en/code-security/getting-started/adding-a-security-policy-to-your-repository.

Issued created by GSA-TTS Allstar

This issue will auto resolve when the policy is in compliance.

Issue created by Allstar. See https://github.com/ossf/allstar/ for more information. For questions specific to the repository, please contact the owner or maintainer.

Verify provisioned clusters pass the EKS CIS benchmark

User Story

In order to give auditors confidence that provisioned EKS clusters are following best-practices, we should be able to demonstrate that a provisioned cluster can pass the CIS EKS benchmark.

Acceptance Criteria

GIVEN I have installed the tree kubectl plugin
WHEN I use kubectl tree on pods and nodes
THEN I see resources that containing scanning results are present
WHEN I run kubectl get CISKubeBenchReport <nodename> -o wide
THEN I see a report indicating no tests failed
WHEN I run kubectl get CISKubeBenchReport <nodename> -o yaml
THEN I see a detailed report
WHEN I run make clean build up demo-up demo-test
THEN I see that there is a test for a CISKubeBenchReport with zero FAIL results

Background

[Any helpful contextual notes or links to artifacts/evidence, if needed]

See also this GSA ISE hardening guide for EKS

Security Considerations (required)

This change will ensure that any new deployment of the eks-brokerpak will only deploy CIS-compliant instances of AWS EKS. This will bolster confidence in the configuration of the EKS instances we create.

Sketch

Install the Aquasec starboard-operator
Add lines at the end of the tests that check that the AWS EKS CIS benchmark had zero FAIL results
Document how someone can check these reports on any existing instance

Note that AWS Security Hub can ingest kube-bench results. We may want to set this up if it turns out that we need to continuously report on existing instances, but it's probably out of scope for this story. Let's wait to see if it's required, and write that separate story when it's time.

Optimize ingress to use just one ALB per cluster

User Story

In order to reduce the cost of operating EKS cluster instances, and reduce dependence on a single k8s provider, the team would like to provision just a single ALB per AWS EKS cluster, rather than one per individual ingress.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

GIVEN I have provisioned an EKS instance
AND I have deployed two ingresses
AND I am authenticated with the AWS console
WHEN I look at the AWS EC2 "Load Balancer" list
THEN I see just one LB associated with the EKS cluster

Background

People using cloud.gov to provision and bind k8s service should, as much as possible, not care about and not refer to the underlying implementation when they use the service. This leaves the provider of the k8s service flexibility to use a different implementation (eg GCP or Azure instead of AWS) without customers of the service preventing migration.

Normally, using AWS EKS requires customers to know about and use the AWS-specific ingress annotations in order to make their deployments accessible to the outside world. By using a second level ingress controller based on the widely used and documented nginx-ingress, we can enable cross-provider specifications on ingress using labels and tags that will not need to be changed.

Using a secondary controller has the added benefit of requiring just a single AWS Load Balancer instance for all the workloads in the cluster, no matter how many. This in turn cuts down on the government's cost to run the service.

Security Considerations (required)

The secondary nginx-ingress controller is only accessible from within Fargate, and for traffic to reach that service, it must traverse the AWS ALB first. So no additional network exposure is implied here.

Because customers of the service cannot specify tags or labels that the ALB controller will act on, there is no way for customers to introduce a separate ingress to the cluster.

Sketch

There is precedent for setting up this architecture with AWS EKS and Fargate.

[Cost Improvements] Consolidate NAT Gateways per AZ

Email from AWS Support:

Hello,

We have observed that your Amazon VPC resources are using a shared NAT Gateway across multiple Availability Zones (AZ). To ensure high availability and minimize inter-AZ data transfer costs, we recommend utilizing separate NAT Gateways in each AZ and routing traffic locally within the same AZ.

Each NAT Gateway operates within a designated AZ and is built with redundancy in that zone only. As a result, if the NAT Gateway or AZ experiences failure, resources utilizing that NAT Gateway in other AZ(s) also get impacted. Additionally, routing traffic from one AZ to a NAT Gateway in a different AZ incurs additional inter-AZ data transfer charges. We recommend choosing a maintenance window for architecture changes in your Amazon VPC.

Security Policy violation SECURITY.md

This issue was automatically created by Allstar.

To fix this, add a SECURITY.md file that explains how to handle vulnerabilities found in your repository. Go to https://github.com/GSA-TTS/datagov-brokerpak-eks/security/policy to enable.

For more information, see https://docs.github.com/en/code-security/getting-started/adding-a-security-policy-to-your-repository.

This issue will auto resolve when the policy is in compliance.

Issue created by Allstar. See https://github.com/ossf/allstar/ for more information. For questions specific to the repository, please contact the owner or maintainer.

2048.yml vulnerabilities

Date of report: 12/06/2022
Severity: Moderate and Low (not active in production)

Due date is based on severity and described in RA-5. 15-days for Critical, 30-days for High, and 90-days for Moderate and lower.

Container is running without root user control (Moderate)
- Detailed paths
  - Path: terraform/modules/provision-aws/2048_fixture.yml
  - Introduced through: [DocId: 0] › input › spec › template › spec › containers[app-2048] › securityContext › runAsNonRoot
- This issue is...
  - Container is running without root user control
- The impact of this is...
  - Container could be running with full administrative privileges
- You can resolve it by...
  - Set securityContext.runAsNonRoot to true
Container does not drop all default capabilities (Moderate)
- Detailed paths
  - Introduced through: [DocId: 0] › input › spec › template › spec › containers[app-2048] › securityContext › capabilities › drop
- This issue is...
  - All default capabilities are not explicitly dropped
- The impact of this is...
  - Containers are running with potentially unnecessary privileges
- You can resolve it by...
  - Add ALL to securityContext.capabilities.drop list, and add only required capabilities in securityContext.capabilities.add
Container is running without liveness probe (Low)
- Detailed paths
  - Introduced through: [DocId: 0] › spec › template › spec › containers[app-2048] › livenessProbe
- This issue is...
  - Liveness probe is not defined
- The impact of this is...
  - Kubernetes will not be able to detect if application is able to service requests, and will not restart unhealthy pods
- You can resolve it by...
  - Add livenessProbe attribute
Container is running with writable root filesystem (Low)
- Detailed paths
  - Introduced through: [DocId: 0] › input › spec › template › spec › containers[app-2048] › securityContext › readOnlyRootFilesystem
  - This issue is...
    - readOnlyRootFilesystem attribute is not set to true
  - The impact of this is...
    - Compromised process could abuse writable root filesystem to elevate privileges
  - You can resolve it by...
    - Set securityContext.readOnlyRootFilesystemtotrue`
Container has no CPU limit (Low)
- Detailed paths
  - Introduced through: [DocId: 0] › input › spec › template › spec › containers[app-2048] › resources › limits › cpu
- This issue is...
  - Container has no CPU limit
- The impact of this is...
  - CPU limits can prevent containers from consuming valuable compute time for no benefit (e.g. inefficient code) that might lead to unnecessary costs. It is advisable to also configure CPU requests to ensure application stability.
- You can resolve it by...
  - Add resources.limits.cpu field with required CPU limit value
Container is running without memory limit (Low)
- Detailed paths
- Introduced through: [DocId: 0] › input › spec › template › spec › containers[app-2048] › resources › limits › memory
- This issue is...
  - Memory limit is not defined
- The impact of this is...
  - Containers without memory limits are more likely to be terminated when the node runs out of memory
- You can resolve it by...
  - Set resources.limits.memory value

Funnel all app ingress through TLS

User Story

In order to ensure security from the outside world to our brokered cluster, we want provision TLS certificates with ACM and have the ingress ALB configured to use them.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

GIVEN I have provisioned an EKS instance
AND I have deployed a sample workload (eg the 2048 game)
WHEN I visit the URL listed in the kubernetes ingress for the sample workload
THEN I see that I am redirected from http:// to https://
AND I see that there is a valid certificate in place for the TLS connection.

Background

Federal compliance requires that we use TLS for any connection over the internet.

Security Considerations (required)

Implementing this story helps us comply with the SC family of NIST controls

Sketch

Here are the docs on setting up cert auto-discovery and redirecting HTTP to HTTPS.

Upgrade to the AWS Load Balancer Controller

In order to use supported software and have newer features available, we want to upgrade from the AWS ALB Ingress Controller (v1) to the AWS Load Balancer Controller (v2).

Ideally we will also sent a PR with changes upstream to the Terraform module we've been using to manage the controller.

Create a DNS entry in Route53 for each ingress

User Story

In order to make deployments addressable by the outside world, the EKS brokerpak should manage DNS entries in Route53 pointing to each ingress.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

GIVEN a provisioned EKS service instance
AND a valid kubeconfig.yml for using the service instance
AND the domain_name for the service instance
AND I run kubectl --kubeconfig kubeconfig.yml apply -f terraform/provision/2048_fixture.yml
AND I wait two minutes
WHEN I visit https://ingress-2048.<k8sdomain>
THEN I see the 2048 game.

Background

[Any helpful contextual notes or links to artifacts/evidence, if needed]

Security Considerations (required)

We must limit the ServiceAccount that external-dns uses to a role that can only manage records in the Route53 zone that corresponds to the specific cluster, and only for the domains expected of that cluster.
We also need to ensure these endpoints end up in a list exported for scanning by NetSparker (by dumping the zone).

Sketch

[Notes or a checklist reflecting our understanding of the selected approach]

Deploy ExternalDNS and monkey with it until it works

Rebrand codebase from datagov to tts

In order to better reflect the purpose and usability of this repository, the CG-Contributors Team wants to rename [datagov/data.gov/ et cetera] references to tts. See Motivating Slack Thread with more details.

Ensure all inter-pod traffic uses TLS

User Story

In order to have TLS on every network hop between the outside world and individual pods, we want EKS clusters configured to use AWS App Mesh and cert-manager.

Acceptance Criteria

GIVEN I have provisioned an EKS instance
AND I have deployed the 2048 fixture
AND I have accessed the 2048 application using my browser
WHEN I run kubectl -n default exec -it ${2048_POD_NAME} -c envoy -- curl -s localhost:9901/stats | grep ssl.handshake
THEN I see a non-zero count of ssl_handshake entries between the 2048 pod and the nginx-ingress pod.

Background

[Any helpful contextual notes or links to artifacts/evidence, if needed]

Security Considerations (required)

This work will help us meet our compliance requirements. See section 10.9.6.

Sketch

For this story, we only need to work up through step 4.1 of the referenced blog post... That is, we want to demonstrate mTLS between the nginx-ingress pod and the 2048 pod.

We can work up through step 5 (TLS between the ALB controller and nginx-ingress controller) in a separate/future story.

We're now considering 4 options going forward:

Remove nginx-ingress to get as close to the AWS-supported configuration as possible (adds ALB costs)
Try the new solr-operator support for inter-node TLS (solves for Solr, further work needed in future for other k8s services)
Try the AWS+Kong documented method that uses Kong as the ingress controller (keeps single ALB)
Keep trying to debug existing path

Incorporate crossplane.io for service-provisioning

https://github.com/aws-samples/crossplane-aws-blueprints/tree/main/bootstrap/terraform

Use ALB ingress controller to remove the need for a worker pool

In order to reduce the compliance burden for the brokered EKS service as much as possible, we want clusters to use the AWS Load Balancer controller to route traffic to hosted services.

There's an AWS blog post about using the ALB ingress controller for Fargate. This Terraform module should come in handy too.

Security Policy violation Branch Protection

This issue was automatically created by Allstar.

Security Policy Violation
Signed commits required, but not enabled for branch: main

This issue will auto resolve when the policy is in compliance.

Issue created by Allstar. See https://github.com/ossf/allstar/ for more information. For questions specific to the repository, please contact the owner or maintainer.

Investigate using AWS EKS Blueprints for Terraform

https://aws-ia.github.io/terraform-aws-eks-blueprints/v4.11.0/

In particular look at the list of configurable add-ons:
https://aws-ia.github.io/terraform-aws-eks-blueprints/v4.11.0/add-ons/

Security Policy violation Repository Administrators

This issue was automatically created by Allstar.

Security Policy Violation
Users are not allowed to be administrators of this repository.
Instead a team should be added as administrator.

This issue will auto resolve when the policy is in compliance.

Issue created by Allstar. See https://github.com/ossf/allstar/ for more information. For questions specific to the repository, please contact the owner or maintainer.

gsa-tts / datagov-brokerpak-eks Goto Github PK

datagov-brokerpak-eks's People

Contributors

Stargazers

Watchers

Forkers

datagov-brokerpak-eks's Issues

User Story

Acceptance Criteria

Background

Security Considerations (required)

Sketch

User Story

Acceptance Criteria

Background

Security Considerations (required)

Sketch

User Story

Background

Sketch

User Story

Acceptance Criteria

Background

Security Considerations (required)

Sketch

User Story

Acceptance Criteria

Background

Security Considerations (required)

Sketch

User Story

Acceptance Criteria

Background

Security Considerations (required)

Sketch

User Story

Acceptance Criteria

Background

Security Considerations (required)

Sketch

User Story

Acceptance Criteria

Background

Security Considerations (required)

Sketch

Recommend Projects

Recommend Topics

Recommend Org