Giter Site home page Giter Site logo

ministryofjustice / staff-device-dns-dhcp-infrastructure Goto Github PK

View Code? Open in Web Editor NEW
5.0 11.0 3.0 3.28 MB

Staff Device DHCP and DNS Terraform infrastructure

Home Page: https://github.com/ministryofjustice/cloud-operations#dhcp--dns

License: MIT License

HCL 91.47% Makefile 1.84% Shell 6.68%

staff-device-dns-dhcp-infrastructure's Introduction

repo standards badge

DNS / DHCP AWS Infrastructure

Introduction

This repository contains the Terraform code to build the AWS infrastructure for the Ministry of Justice's DNS and DHCP platform. The infrastructure is implemented in AWS and applied using AWS CodePipelines specified in the Shared Services management account.

The running applications are defined and run as docker containers using AWS Fargate

Related Repositories

This repository defines the system infrastructure only. Specific components and applications are defined in their own logical external repositories.

Other Documentation

Architecture

architecture Image Source

CI/CD

staff-device-dns-dhcp-infrastructure's People

Contributors

astrobinson avatar bagg3rs avatar caitbarnard avatar darey-io avatar efuaakum avatar efuaakumanyi avatar elcorbs avatar emileswarts avatar gary-h9 avatar github-actions[bot] avatar jamesgreen-moj avatar jbevan4 avatar jivdhaliwal avatar juddin927 avatar laurentb4 avatar mitchdawson1982 avatar mtouhid avatar neilkidd avatar paulmchenry avatar renovate-bot avatar satishgummadellimoj avatar sb21460 avatar scotteza avatar smjmoj avatar themitchell avatar thip avatar wanieldilson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

staff-device-dns-dhcp-infrastructure's Issues

[gitmoji] Reduce noise in Terraform plan by adding lifecycle ignore_changes to desired_count

User Story

As a DevOps Engineer
I want a clean terraform plan without unnecessary noise, so that it make the plan easier to review.

Currently in the plan we get constantly:

module.dns.aws_ecs_service.service will be updated in-place

~ resource "aws_ecs_service" "service" {
~ desired_count = 2 -> 5

This is because, Terraform has a hardcoded desired count, this can be ignored with something like this:
lifecycle {
ignore_changes = ["desired_count"]
}

This will silence this from the plan thus reducing the noise.

Value / Purpose

Reduce noise and make the plan easier to review

Useful Contacts

James Green

Additional Information

No response

Definition of Done

Example - [ ] Documentation has been written / updated

  • A successful plan without the desired count

Collaborator review date expires soon for user emileswarts

Hi there

The user @emileswarts has its access for this repository maintained in code here: https://github.com/ministryofjustice/github-collaborators

The review_after date is due to expire within one month, please update this via a PR if they still require access.

If you have any questions, please post in #ask-operations-engineering on Slack.

Failure to update the review_date will result in the collaborator being removed from the repository via our automation.

πŸ”“ Allow TCP port53 against MoJO-DNS

User Story

As an Active Directory Domain Controller
I need to talk on TCP rather than UDP
So that I can do a lookup against MoJO-DNS

Value / Purpose

Only UDP/53 is enabled which is causing name resolution to fail.
Close down old DNS servers and use MoJO-DNS

Useful Contacts

Touhid, Rich, Dom Robinson

Additional Information

There is an email chain in which Rich/Touhid are cc'd
For testing of the change see slack thread here

Definition of Done

  • Another team member has reviewed

Add database bastion for service

Access to the service RDS database is required to perform analysis with a service issue.
Create a module to create an Admin DataBase Bastion using the existing pattern but applicable for the service DB.
Note: this DB is in the other VPC.

A branch protection setting is not enabled: administrators require review

Hi there
The default branch protection setting called administrators require review is not enabled for this repository
See repository settings/Branches/Branch protection rules
Either add a new Branch protection rule or edit the existing branch protection rule and select the Require a pull request before merging option
See the repository standards: https://github.com/ministryofjustice/github-repository-standards
See the report: https://operations-engineering-reports.cloud-platform.service.justice.gov.uk/github_repositories
Please contact Operations Engineering on Slack #ask-operations-engineering, if you need any assistance

User access removed, access is now via a team

Hi there

The user emileswarts had Direct Member access to this repository and access via a team.

Access is now only via a team.

You may have less access it is dependant upon the teams access to the repo.

If you have any questions, please post in #ask-operations-engineering on Slack.

This issue can be closed.

πŸ€“ Add Terraform validate GH Action

User Story

As a CloudOps engineer
I want to know if terraform code is valid
So that I don't have to take manual steps to prove it is viable

Value / Purpose

Save time with terraform PRs

Useful Contacts

No response

Additional Information

No response

Definition of Done

Example

  • Documentation has been written / updated
  • README has been updated
  • User docs have been updated
  • Another team member has reviewed
  • Tests are green

Critical notification recipient list need updating

User Story

As a devops engineer I would like to be able to have visibility of the system notification, We need to update the recipient list to ensure communication are going to the right channel and critical notification are not missed.

Value / Purpose

Ensure system notifications are going to the correct channel

Useful Contacts

No response

Additional Information

TF_VAR_critical_notification_recipients=["[email protected]","[email protected]"]

Definition of Done

  • Update the value of the recipient list
  • Another team member has reviewed
  • Tests are green

Azure AD Signing Certificates needs to be updated.

azure_federation_metadata_url = "https://login.microsoftonline.com/0bb413d7-160d-4839-868a-f3d46537f6af/federationmetadata/2007-06/federationmetadata.xml?appid=ce2b341e-0cee-4fee-86a4-4d853ab6e5e7"

β”‚ Error: updating Cognito Identity Provider (eu-west-2_TwUO4rgUL:Azure): InvalidParameterException: Signing certificates are expired β”‚ β”‚ with module.authentication.aws_cognito_identity_provider.cognito_identity_provider[0], β”‚ on modules/authentication/main.tf line 53, in resource "aws_cognito_identity_provider" "cognito_identity_provider": β”‚ 53: resource "aws_cognito_identity_provider" "cognito_identity_provider" { β”‚ β•΅ make: *** [Makefile:39: apply] Error 1

Fix connection and public IP issues and add documentation. JIRA

During the development of the NACs RDS Bastion feature, it was noted that this implementation had some issues.

  1. associate_public_ip_address | this was incorrectly set.
  2. security_group_ids | the VPC endpoinst security group had been omitted.
  3. ssm_session_manager_endpoints | for the services required some logic as they are shared with the load testing bastion.

In order to create an easy engineer experience some additional outputs have been added for consistency across the two projects.

The documentation has been added.

Collaborator review date expires soon for user emileswarts

Hi there

The user @emileswarts has its access for this repository maintained in code here: https://github.com/ministryofjustice/github-collaborators

The review_after date is due to expire within one month, please update this via a PR if they still require access.

If you have any questions, please post in #ask-operations-engineering on Slack.

Failure to update the review_date will result in the collaborator being removed from the repository via our automation.

Security: Restrict access to DHCP/DNS Portal to staff

User Story: As a… cloud ops team member, I would like to secure access to the dns-dhcp-admin portal to only MoJ staff So that our security posture of our portal is improved

Value / Purpose To help secure our DHCP/DNS portal.

Useful ContactsRich, Rachel

Additional Info
Do we need to include, third parties who are accessing from outside e.g. DOM1? CGI Helpdesk etc.

Definition of Done (DoD)DHCP/DNS Portal for all environmentshttps://dhcp-dns-admin.staff.service.justice.gov.uk/https://dhcp-dns-admin.prep.staff.service.justice.gov.uk/https://dhcp-dns-admin.dev.staff.service.justice.gov.uk/

Is only available from MoJ VPN-connected devices.
MoJO-VPN
Alpha-VPN

✨Update DHCP service to enable option 234

User Story

As an EUC service
I need all devices connected to a defined FITS site in the DHCP service to have DHCP Option 234 defined with a generic GUID per site
So that the EUC service can provide delivery optimisation configuration and reduce the bandwidth on the WAN

Value / Purpose

This change enables the MoJ OFFICIAL EUC devices to better share application and Windows Update payloads across the local site network without going out to the WAN and Internet.

Useful Contacts

Matt White, Chandra Singh, Charlie Coverdale

Additional Information

Historical SPIKE carried out by Cloud Ops - https://dsdmoj.atlassian.net/browse/PTTP-8933
PTTP DAA team backlog item to reduce load - https://dsdmoj.atlassian.net/browse/PTTP-8931

Definition of Done

  • DHCP config generation has the ability to define and manage DHCP option 234
  • DHCP option 234 is defined once for a FITS site and cascades down to all subnets linked to that site
  • DHCP option 234 is unique for each FITS site and does not overlap (configuration should validate this on changes)
  • DHCP option 234 is defined automatically when a new site is created
  • DHCP option 234 is generated for all existing FITS sites in the service.
  • Value of DHCP option 234 is shown in the admin UI so that users of the DHCP admin UI can map the relationship between FITS site and DHCP option

Remove Generate TF_VARs script

User Story

As an engineer
I expect to have consistent development environment parameters generated.

So that testing is consistent between CI testing and local development testing, we need to remove the deprecated generate_tfvars script and makefile target.

Value / Purpose

Consistency

Useful Contacts

No response

Additional Information

No response

Definition of Done

Example - [ ] Documentation has been written / updated

  • README has been updated
  • User docs have been updated
  • Another team member has reviewed
  • Tests are green

πŸ‘©β€πŸ’» Add `generate-tfvars` to Makefile

User Story

As a… user of the repo
I want to be able to easily pull tfvars from somewhere
So that… I can quickly and easily go about my job without having to pull 20+ variables from numerous places.

Value / Purpose

No response

Useful Contacts

No response

Additional Information

Example of this is here.

Definition of Done

  • Add content to Parameter store
  • Add command to make file
  • Test functionality

User access removed, access is now via a team

Hi there

The user emileswarts had Direct Member access to this repository and access via a team.

Access is now only via a team.

You may have less access it is dependant upon the teams access to the repo.

If you have any questions, please post in #ask-operations-engineering on Slack.

This issue can be closed.

πŸ› publish_terraform_outputs.sh times out

The script below never completes. It's unclear to me whether this only occurs to myself or other users.

Having stepped through the script manually in the command line the terraform_outputs line is never set and therefore cannot be passed into the command which follows it.

#!/bin/bash
set -euo pipefail

terraform_outputs=$(terraform output -json terraform_outputs)

aws ssm put-parameter --name "/terraform_dns_dhcp/$ENV/outputs" \
  --description "Terraform outputs that other pipelines or processes depend on" \
  --value "$terraform_outputs" \
  --type String \
  --overwrite
...

πŸ› missing container CloudWatch logging

Describe the bug.

During an investigation into ticket: https://app.zenhub.com/workspaces/nvvs-devops-622a0b371800e400133bb924/issues/gh/ministryofjustice/staff-device-dns-dhcp-infrastructure/293

It was observed a container was logging out bind requests to CloudWatch, below is a snip of the logs:

`

2023-10-12T07:01:35.791Z 12-Oct-2023 07:01:35.786 success resolving 'use-of-force.service.justice.gov.uk/A' (in '.'?) after reducing the advertised EDNS UDP packet size to 512 octets
Β  2023-10-12T07:01:35.815Z
Β  2023-10-12T07:01:37.138Z
Β  2023-10-12T07:14:04.024Z
Β  2023-10-12T07:34:04.024Z
Β  2023-10-12T07:54:04.024Z
Β  2023-10-12T08:14:04.028Z
Β  2023-10-12T08:23:22.117Z
`

After reviewing the above logs it was observed that there is a delay between "2023-10-12T07:01:37.138Z" and the next successful resolve at 2023-10-12T08:23:22.117Z, this delay is very significant due to this being a busy time for our service.

Container this behaviour was noticed on:
"
Region: eu-west-2
LogGroup Name: staff-device-production-dns-server-log-group
LogStream Name: eu-west-2-docker-logs/dns-server/aa5840ef64dd4700992550e1d4dfee2d
"

To Reproduce

No response

Expected Behaviour

No response

Environment

- OS:
- Browser:
- Browser Version:

Additional context

No response

Security Groups - Unrestricted Access

Several security groups have unrestricted access.

Groups:

  • staff-device-production-dhcp-admin-database

  • staff-device-production-dhcp-dhcp-container

  • staff-device-production-dhcp-dhcp-database-in

  • staff-device-production-dns-dns-container

  • staff-device-production-dns-dns-container

  • Update Trusted Advisor

According to AWS security advisory report, these security groups need better IP restrictions.

Update VPC Module to the Latest

User Story

VPC module for this repository is out of date, as a part of the Decision not to centralise the VPC module this work can continue to use community VPC module and update to the latest where possible
staff-device-dns-dhcp-infrastructure pinned locally to 3.14.0

Value / Purpose

Consistency across the repos for Terraform versions and uniformity
Early issue detection

Useful Contacts

No response

Additional Information

No response

Definition of Done

  • Upgrade VPC module to the latest
  • Another team member has reviewed
  • Tests are green

Secrets accessible in plain text in dhcp-server task definition

Describe the bug.

This ticket to be run as a POC - we may need to create more tickets for rolling this out to other repos

Currently secrets are stored in plain text in the ECS task definition, these secrets should be moved into secrets manager and referenced as a secrets instead, so it does not get populated as plain text.

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html

E.G:
"secrets": [ { "name": "environment_variable_name", "valueFrom": "arn:aws:ssm:region:aws_account_id:parameter/parameter_name" } ]

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/secrets-envvar-secrets-manager.html

To Reproduce

No response

Expected Behaviour

This should be pulled from secrets manager instead.

πŸ› Unhealthy container occurred on 12-10-23 at 7am UTC

Describe the bug.

An ECS container task for the DNS service went unhealthy at 7am on the 12-10-23.

An investigation into how this container wen unhealthy needs to occur.

It has also came to our attention that the team did not get alerted to this unhealthy container. So this will also need to be investigated.

Useful links:

https://eu-west-2.console.aws.amazon.com/ec2/home?region=eu-west-2#TargetGroup:targetGroupArn=arn:aws:elasticloadbalancing:eu-west-2:037161842252:targetgroup/staff-device-production-dns/53bd525bd9a1ef0e

https://eu-west-2.console.aws.amazon.com/cloudwatch/home?region=eu-west-2#metricsV2?graph=~(metrics~(~(~'AWS*2fNetworkELB~'UnHealthyHostCount~'TargetGroup~'targetgroup*2fstaff-device-production-dns*2f53bd525bd9a1ef0e~'LoadBalancer~'net*2fstaff-device-production-dns*2f754e69ecf996b619~(label~'staff-device-production-dns~region~'eu-west-2)))~period~60~region~'eu-west-2~stat~'Maximum~title~'Unhealthy*20Hosts*20*28Maximum*29~yAxis~(left~(min~0))~start~'2023-10-12T06*3a55*3a00.000Z~end~'2023-10-12T07*3a05*3a59.000Z~view~'timeSeries~stacked~false)

https://eu-west-2.console.aws.amazon.com/cloudwatch/home?region=eu-west-2#logsV2:log-groups/log-group/staff-device-production-dns-server-log-group/log-events$3Fstart$3D1697092200000$26end$3D1697094000000

A ticket to AWS has also been raised:

Case ID 14047008101

To Reproduce

No response

Expected Behaviour

  • Containers should not go unhealthy
  • Team should be alerted in the event a container does go unhealthy

Environment

- OS:
- Browser:
- Browser Version:

Additional context

No response

Collaborator review date expires soon for user emileswarts

Hi there

The user @emileswarts has its access for this repository maintained in code here: https://github.com/ministryofjustice/github-collaborators

The review_after date is due to expire within one month, please update this via a PR if they still require access.

If you have any questions, please post in #ask-operations-engineering on Slack.

Failure to update the review_date will result in the collaborator being removed from the repository via our automation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.