Giter Site home page Giter Site logo

infrastructure's Introduction

DoSomething.org Infrastructure

This is DoSomething.org's infrastructure as code, built using Terraform. We use it to manage and provision resources in Fastly, Heroku, and AWS (EC2, RDS, SQS, S3, IAM users, amongst others). It's a work in progress.

Installation

Install Terraform 0.12. On macOS, this is easy with Homebrew:

brew install terraform

Create a Terraform Cloud account with your work email & ask for an invite to our organization in #dev-infrastructure. Don't forget to enable two-factor auth! Then, create your API token and place it in your ~/.terraformrc file, like so:

credentials "app.terraform.io" {
  token = "xxxxxx.atlasv1.zzzzzzzzzzzzz"
}

Run make init from this directory to install a githook to check formatting before you commit changes. You can run make format at any time to format your code, or install the Terraform extension for your editor.

Alright, now you're ready to build some infrastructure!! πŸ—

Usage

Terraform allows us to create & modify infrastructure declaratively. The files in this repository define what infrastructure (apps, databases, queues, domains, etc.) we should have, and Terraform figures out what changes it needs to make the get there based on what currently exists.

We separate our configuration into workspaces. We also build reusable modules in the applications/ and components/ directories that can be used to provision the same type of thing in multiple places.

See Terraform's Getting Started guide & documentation for more details.

Plan

We use workspaces to separate different contexts (e.g. the main application vs. our data stack) and environments (proudction, QA, and development). Each workspace exists as a top-level folder in this repository.

To make changes in a workspace, first cd into the workspace's directory and run terraform init to pull down dependencies. Then, make your changes to the Terraform configuration files with your text editor.

You can make a plan to find out how your changes will affect the current state of the system:

terraform plan

Once you're satisfied with Terraform's plan for your changes, commit your work & make a pull request. Your pull request will automatically run a plan for all workspaces (even if they're not affected by your change).

Apply

After your pull request is reviewed and merged, you can then apply your change to update the actual infrastructure. Terraform Cloud will make your changes, update the remote state, and ensure nobody else makes any changes until you're done.

To apply pending changes to a workspace, visit Terraform Cloud and open the latest run for the workspace you want to modify. Review the plan & then choose "Confirm & Apply" to make the change.

Security Vulnerabilities

We take security very seriously. Any vulnerabilities should be reported to [email protected], and will be promptly addressed. Thank you for taking the time to responsibly disclose any issues you find.

References

License

Β© DoSomething.org. This config is free software, and may be redistributed under the terms specified in the LICENSE file. The name and logo for DoSomething.org are trademarks of Do Something, Inc and may not be used without permission.

infrastructure's People

Contributors

dfurnes avatar sheyd avatar weerd avatar mshmsh5000 avatar aaronschachter avatar chloealee avatar lindsayrmaher avatar mendelb avatar blisteringherb avatar rapala61 avatar

Stargazers

Alex Mills avatar Aakash avatar Tim Broder avatar

Watchers

 avatar James Cloos avatar Noah Nakell avatar Sahil Gupta avatar Jen Ng avatar  avatar Gleb Boundin avatar  avatar  avatar Freddie avatar  avatar  avatar

infrastructure's Issues

QUESTION: Fastly pass-through property for vote.ds?

Coming out of the National Voter Reg Day pre-mortem, we've asked ourselves whether the easiest way to redirect vote.ds as a contingency is to position a Fastly layer now, set with a minimal pass-through configuration. If we do need to do any redirects, and with those any URL/param transformations, we could do them via Fastly.

Migrate Footlocker/Scholarship Apps to Heroku

We run into at least a few issues, either related to supervisor, code file updates, or DevOps issues a couple times a year related to Whitelabel Scholarship apps, including Supervisor/email send issues, DB exports (although these are much better than they used to be), and sometimes deployment issues.

These take up a fair amount of DevOps time, and are unpredictable in whether resolution is simple, complicated, or even doable. Almost every fix is somewhat DDF (Duct-tape Driven Fix).

Migrating to Heroku would simplify this process, especially since the servers we're using are infrequently updated, and setup in a fairly non-standard way.

Would love to hear @DFurnes and @katiecrane thoughts on LoE and complexity here, as well as thoughts on best timing, and tradeoffs on code migration effort vs addressing issues as they continually come up.

Update Rogue Prod and Dev`binlog_format` to be Row instead of Mixed

We need to update the default parameter group to have binlog_format variable set to ROW instead of MIXED, to allow for DMS streaming replication of Rogue Campaigns table into Quasar.

Currently Rogue Prod and Dev are on MariaDB 10.2. Talking to @DFurnes these aren't captured in Terraform yet, so going to manually do the following:

  • Notify #announce-tech of update/potential small downtime window.
  • Set upgrade to MariaDB 10.3 for Rogue Dev and Prod
  • Create parameter group with binlog_format set to ROW based on the MariaDB 10.3 param family.
  • Confirm on Sunday evening that the MariaDB upgrade has taken, and swap out MariaDB parameter group and reboot if necessary.
  • Notify #announce-tech if there's any downtime and when work is complete.

@DFurnes is going to capture these changes in Terraform likely slated for next sprint.

Simplify front-end (Phoenix & Ashes) Fastly configs.

BUG REQUEST

Current Behavior

We simplified a lot of our Fastly config in #39, and moved the majority of properties into Terraform so that we can track changes in code. The two outstanding services that still need to be cleaned up are DoSomething.org (the O.G), and thor.dosomething.org.

These contain routing rules for Phoenix & Ashes, including relatively sizable dictionaries for redirects and backend assignments. As we've moved more stuff into Phoenix, this has created an increasing workload (for devops & now the product team) since every new URL needs to be manually assigned.

Desired Behavior

First of all, let's move these properties into Terraform!

Since we're creating the majority of (…or all??) new content on Phoenix, I'd like to flip the "default" backend to that application. This should remove all the work of pool assignments in one fell swoop.

I'd also like to see if we can simplify redirect logic, since currently PMs need to create redirects for every distinct URL that a user may visit (including query strings, like UTMs!). In nearly every case, we only really care about the path when creating a URL redirect.

Relevant Screenshots + Links

N/A

Make it clearer what environment an app is in.

It can sometimes be confusing to see what environment you're in. Even with our simplified naming scheme, you still have to look at the URL to see where you're at, e.g. activity-qa vs. activity. This makes it easier than it should be to make a change in the wrong place, or wonder why your local changes aren't applying because you're refreshing QA! Oof!

I introduced the idea of "environment badges" in GraphQL last year, but never had a chance to roll them out further. I'd like to build a small library to make it easy to add the same feature everywhere:

screen shot 2018-11-08 at 10 56 03 am

Figure Out Where to Put Infrastructure Documentation

I'd love everyone tagged on this ticket thoughts on where to house cross-eng infrastructure documentation. Dave and I have discussed this a little bit, but I think it warrants a standardized approach in 2019. Examples of documentation would be:

  1. How do you connect to MongoDB databases from your dev machine?
  2. How do you connect to MariaDB databases from your dev machine?
  3. What's the normal caching rules for our Fastly properties?

While something like point 3 can probably exist in the infrastructure repo, I think it'd be great to have at-a-glance infrastructure info that anyone can look at for our most pertinent information. Do we want to use GitBooks in this repo? Readme in a dedicated infra-doc repo? Something else entirely?

Allow some crawlers on servers outside US access to our environments.

facebook crawler coming from outside us

When trying to debug and test meta tags in our HTML pages, ran into an issue where a www-dev.dosomething.org URL was returning the "Sorry" page for all users outside of the US due to our current approach to deal with GDPR.

This poses a problem when testing if crawlers that exist outside the US can't read our pages properly and are getting redirected.

Simplify how we configure New Relic agent in Heroku.

We've historically configured New Relic on our Heroku apps by attaching a "placeholder" free-tier New Relic add-on and then replacing the auto-generated NEW_RELIC_LICENSE_KEY environment variable with our own. While this works, it results in some inconsistent behavior –

  1. New Relic must be installed on QA environments, since that's where builds happen. If the application that a build is promoted from doesn't have the add-on, New Relic won't be installed. (And moving forward, we likely won't want New Relic on QA environments to reduce usage!)
  2. Each app gets a standalone unused New Relic account, which has led to confusion.

When setting this up for Longshot, I was able to simplify setup to make this more reliable:

For future reference, the steps followed:

  1. Add the ext-newrelic PHP extension to the application, instead of using a Heroku addon. This has the advantage of not clobbering environment variables & provisioning a parallel "mini-account" in Heroku that goes unused. (DoSomething/longshot#923)
  2. Add support for per-environment newrelic.enabled, so we can install the New Relic agent in the initial Heroku build but only enable it when the same compiled slug is promoted to a production app. (DoSomething/longshot#924)
  3. Set the proper environment variables per-application, via with_newrelic arg. (#59)

If we continue using New Relic, we should take this approach in other applications as well.

Migrate Infrastructure Jenkins Jobs to Code/Heroku

BUG

Current Behavior

  • A miscellaneous number of tasks remain in the Production Jenkins environment that handles tasks like weekly DB refreshes from Prod to QA.

Desired Behavior

  • Most of these tasks are just wrapper scripts around bash jobs or other similar utilities. They can and should be moved to code in a repo and run via the Heroku scheduler in one place.

Why This Matters

  • We're not capturing this critical part of infrastructure in code. It provides simple but vital services, and should be run with production grade attention to detail, documentation, and tracking.

Do not serve cached edge content to authenticated requests.

We added caching to Northstar's user profile endpoint in an attempt to improve broadcast performance. Rafa flagged the other day that he was seeing unexpected caching on authenticated requests too, though, which is not expected behavior.

From some futher investigation this morning, I've confirmed that once a profile is cached in Fastly, we will continue to return that cached "public" profile, even for requests that come in with a privileged authentication token. That's not helpful!

To fix this, we should add a "pass" cache setting for requests with an Authorization header.

Revisit how we run extra Rogue RDS backups.

We created a Jenkins job to perform an additional daily backup of the Rogue RDS instance in DoSomething/internal#369, in response to the issues we ran into with DoSomething/internal#357. This "Rogue DB Backup" job ran successfully until November 12th, when it silently failed because we'd hit our cap of 100 manual snapshots.

I noticed this when trying to take final snapshots before terminating databases in #49, and got this error: cannot create more than 100 manual snapshots (Service: AmazonRDS; Status Code: 400; Error Code: SnapshotQuotaExceeded). Bummer!

The short-term fix was to delete the 49 snapshots that had been piling up over the past few months to get us back safely under the limit. Longer-term, we'd like to investigate whether there are better ways to do this, like RDS's point-in-time restores (or at least adding some alerting to that Jenkins job).

Upgrade Heroku apps to Heroku-18 stack.

Heroku released their Heroku-18 stack a while back, running on Ubuntu 18.04. This is the default for newly created apps, but existing apps aren't automatically upgraded.

When we have some time, it'd be nice to test that this is a safe upgrade for each application that's still running on Heroku-16 (Ubuntu 16.04) and standardize this across the board using the stack option on our heroku_app resources in Terraform.

Update Cert for HRBlock

The SSL cert for caps.hrblock.com is set to expire on 12/9, and it needs to be updated. The ideal solution would be to use Let's Encrypt with auto-renew so we don't have to worry about it anymore, but there are some issues with that. There are a few entries that are run before all others that are preventing using HAProxy as a standalone webserver to verify the caps URL:

redirect prefix https://longshot-qa.dosomething.org code 301 if { hdr(host) -i longshot-qa.dosomething.org }
redirect prefix https://footlockerscholarathletes.com code 301 if { hdr(host) -i footlockerscholarathletes.com }
redirect prefix https://footlockerscholarathletes.com code 301 if { hdr(host) -i www.footlockerscholarathletes.com }
redirect prefix https://caps.hrblock.com code 301 if { hdr(host) -i caps.hrblock.com }

I need to add these entries to allow Let's Encrypt to validate the URL and enable auto renew:

# Test URI to see if its a letsencrypt request
acl letsencrypt-acl path_beg /.well-known/acme-challenge/
use_backend letsencrypt-backend if letsencrypt-acl  

# LE Backend
backend letsencrypt-backend
    server letsencrypt 127.0.0.1:8888

The issue is that because HAProxy always runs redirect rules before use_backend and sends all requests straight to the Heroku app, Let's Encrypt isn't able to use our standalone server to respond properly to the verification. Are we able to move or remove those entries or accomplish what the purpose of those entries in another way? @sheyd @DFurnes If we can, we'll be able to use the same approach for Footlocker as well.

Here's the command to set up the certs initially:

sudo certbot certonly --standalone -d caps.hrblock.com \
    --non-interactive --agree-tos --email [email protected] \
    --http-01-port=8888

Here's what the auto-renew script will look like for reference:

#!/usr/bin/env bash

# Renew the certificate
certbot renew --force-renewal --tls-sni-01-port=8888

# Concatenate new cert files, with less output (avoiding the use tee and its output to stdout)
bash -c "cat /etc/letsencrypt/live/caps.hrblock.com/fullchain.pem /etc/letsencrypt/live/caps.hrblock.com/privkey.pem > /etc/ssl/demo.scalinglaravel.com/caps.hrblock.com.pem"

# Reload  HAProxy
service haproxy reload  

There are some additional setup steps not listed here, but I'm following this tutorial as a guide. https://serversforhackers.com/c/letsencrypt-with-haproxy

Heroku provider doesn't support auto-scaling.

We're currently using Heroku Autoscaling to automatically spin up extra Northstar dynos when we increase server load (usually during an SMS broadcast). Unfortunately, this is not supported by Terraform's Heroku provider so any planning during a broadcast marks this resource as "dirty", and attempts to scale back to the provided quantity.

We've also seen somewhat iffy results with autoscaling (detailed in DoSomething/devops#435) since it's based on response time (rather than request queueing or throughput), and so will tend to thrash between over- and under-provisioning as response time stabilizes and deteriorates.

Simplify Fastly config & track changes in code.

BUG REQUEST

Background

From Matt's write-up in the Q3-Q4 technology memo:

We’ve built up a pretty large catalog of Fastly services, in many cases using a separate service per-environment and per-app. This increases the number of places a change must be made, and the chance that things might get out of sync between environments.

We’ve also had trouble with discoverability and the review process around updating VCLs for applications, as it’s not always clear to application developers when a change has been made or how it will impact their application. We should investigate simplifying our systems to rely on fewer separate configs (re: Fastly ALTITUDE talks from USA Today & Conde Nast), and investigate better processes or tooling around making changes more visible.

Current Behavior

We have 19(!!) Fastly properties, of which DoSomething.org (Phoenix & Ashes), API/Northstar, Rogue, GraphQL, and CatchAll (some redirects) receive the most use.

We also have 4 services for different voting app instances, a search property for Solr, Ashes Staging & Thor, and a property with vanity redirects for two campaigns. The remaining 7 don't seem to be receiving any traffic and can probably be deleted.

Desired Behavior

It'd be great to consolidate these into fewer properties so we can roll out changes across the board more easily (e.g. gzipping or geolocation headers). We should also have separate QA & production configs for our other services (and ideally, an easy way to promote changes from QA to prod).

Finally, it's not always clear what changes are made to our configs and why (e.g. if we forget to drop a note in #deploys), or whether a draft config is safe to push to production or should be discarded (which should be aided by Morgan's new discussions for the DoSomething.org and API/Northstar properties).

Suggested Solution

An easy first step is to audit our existing configs and clean them up! This includes removing unnecessary origins, conditions, custom VCLs, etc. We can also probably delete a bunch of those unused properties, or consolidate ones that are infrequently changed (like the voting apps).

I'd also like to experiment with configuring & tracking our Fastly config (and other infrastructure!) in code with Terraform. This builds off some of the work we've done with CloudFormation on Bertly, but lets us configure more things (like Fastly, but also AWS, Heroku, DNS, Papertrail, etc.) in one place.

Relevant Screenshots + Links

N/A

Rename Blink instances to match new naming scheme.

FEATURE

Current Behavior

We currently have some inconsistencies with domains & environment naming. Let's update those to be more consistent for Blink, following the discussion in #366.

Desired Behavior

We should have the following Heroku apps in a dosomething-blink pipeline:

(Does it make sense to add "dev" environments for Blink?)

Why This Matters

This is a step towards reducing confusion about apps & environments!

Countries Affected (optional)

N/A

OS/Browser (optional)

N/A

Relevant Screenshots + Links

N/A

Rename Gambit instances to match new naming scheme.

FEATURE

Current Behavior

We currently have some inconsistencies with domains & environment naming. Let's update those to be more consistent for Gambit, following the discussion in #366.

Desired Behavior

We should have the following Heroku apps in a gambit-conversations pipeline:

And the following Heroku apps in a gambit-campaigns pipeline:

(Does it make sense to add "dev" environments for Gambit?)

Why This Matters

This is a step towards reducing confusion about apps & environments!

Countries Affected (optional)

N/A

OS/Browser (optional)

N/A

Relevant Screenshots + Links

N/A

Incident: Footlocker Scholarship timeouts.

INCIDENT

What's gone wrong?

On Friday morning, the Foot Locker Scholarship site went down with a Heroku "application error" page. New Relic showed lots of request queueing and we were seeing a slew of H12s in Papertrail.

Restarting the Heroku dyno fixed the issue, although the underlying cause is still unclear.

Timeline

As the incident develops, the "incident lead" should continue to fill in this timeline:

Follow-up:

  • Jen created Ghost Inspector monitoring tests for Footlocker & New Relic.
  • Dave added a New Relic uptime monitor to our alert policy for this application.
  • Dave reached out to New Relic support to figure out why our alert policy didn't trigger.

Relevant Screenshots + Links

Add S3 buckets to Terraform.

We use Amazon S3 for storage in many of our applications. We should move the rest of these buckets, permissions & IAM roles, and environment variable config into Terraform. As a stretch goal, we also have tons of empty & unused buckets that'd be nice to clean up so the S3 admin panel is less intimidating.

Incident: Users unable to create posts on Phoenix.

INCIDENT

What's gone wrong?

We've been receiving support tickets that users are unable to report back on Phoenix, receiving an "Unauthenticated" message in the uploader when they try to submit:

screen_shot_2018-11-11_at_10 24 16_pm

Timeline

  • Deployed Rogue on Thursday at 10:47am EST (diff).
  • Deployed Northstar on Friday at 2:49pm EST (diff).
  • Deployed Phoenix on Friday at 1:22pm EST (diff), and at 2:49pm EST (diff).
  • Help tickets began coming in Friday at 12:31am EST (unrelated scholarship question) 9:33pm EST and continued through weekend. Hannah found them and compiled them, and raised in #team-product Monday at 1:50am EST.
  • Matt CC'd the issue in #dev-phoenix at 7:21am.
  • Mendel jumped in and started digging into Rogue errors at 9:46am.
  • Dave saw Hannah's message in #team-product at 10:10am & created this issue. πŸ€“
  • Dave rolled back Phoenix to v206 at 10:35am, resolving the issue in production.
  • Mendel figured out the underlying issue at 10:58am, and pushed up a fix in DoSomething/legacy-website#1182 at 11:17am. This fixed things up on QA.
  • We re-deployed master with that fix at 2:20pm, and ran through manual testing of signup, photo/text/share post, quiz, and article flows on production to make sure no new issues appeared.
  • We reached out to members who were affected by the bug via email on Monday at 6:08pm.

Relevant Screenshots + Links

Improve Northstar performance & concurrency.

This is a follow-up ticket from DoSomething/devops#383.

Background

We successfully moved Northstar from AWS (an m4.xlarge EC2 instance) to Heroku (1-5 autoscaled Performance-M dynos). This simplified our operations & also allows us to scale up-and-down based on demand more easily. I've monitored performance and made some adjustments since then:

From July 17th 2018's broadcast:

We ran a couple of broadcasts yesterday - a smaller 11:00am broadcast, and a 2:00-8:30pm full-list broadcast (which is continuing today). Both were set to run at 75rps, and seem to have happily ran at between 70-85rps(!!). Here's what New Relic had to say:

screen shot 2018-07-18 at 11 12 06 am

And here's that same response time graph overlaid over auto-scaling events:

screen shot 2018-07-18 at 1 09 53 pm

It's interesting to see how much scaling fluctuates between 2-4 dynos when under load, which perhaps suggests we should drop our desired p95 response time a little more (currently at 750ms, curious to try 500ms again).

And the following day when we finished that broadcast:

I think one problem we're running into with auto-scaling is that it's entirely based on response time, so if we get good performance at 3 dynos, we scale down to 2 until things bog down, and then back up... for example, here's the hour from 12:50-1:50pm with a target 500ms p95 response time:

screen shot 2018-07-18 at 4 59 25 pm

Current Behavior

I think we've found the sweet spot for p95 response time (750ms). This seems to let us spin up quickly enough when throughput spikes, but still happily provisions down to 1 dyno when throughput drops.

I think the remaining scaling issues we're seeing come from alternatingly over- and under-provisioning, since Heroku's autoscaling is based on the past hour's performance (so we scale up when things get too slow, but then as soon as they're under control we scale back down again... gah!!)

While things seem to be working mostly okay, we do still see some slow requests & 503s when in one of those underprovisioned "dips" (until we scale back up again).

Desired Behavior

We may want to consider handling scaling ourselves (based on throughput), or further tuning performance to get more out of each individual dyno.

Why This Matters

We want to make sure we're getting the best performance bang for our buck! Specifically, we want to make sure that we can get the most throughput when we need to make a ton of API requests to Northstar during a broadcast.

Checklist: Migrate campaign metadata from Ashes & remove campaign run IDs.

Overview

In order to retire Ashes, we need to move legacy campaigns to the Rogue database. This document has more information about what data will be migrated to Rogue and how we will deprecate runs. Below is a checklist of the order of operations for the migration to successfully run.

Order of Operations

First, we need to create the new Campaign IDs table in Rogue (based on old IDs/Run IDs):

  • Team Bleed to merge PR to migrate all legacy campaigns from Ashes to Rogue.
  • Morgan to complete taking the Campaigns table from Rogue and piping it into Quasar
  • Team Bleed to run/test migration script to get all legacy campaigns into Rogue on QA.
  • ALL TEAMS: Check that data looks good & everything still works as expected on QA!
  • Run migration script to get all legacy campaigns into Rogue on production.
  • ALL TEAMS: Check that data looks good & everything still works as expected on production!

Once we have that table, we'll update signups & posts from old runs to their new canonical IDs:

  • Team Bleed to write script to update all signup and post campaign_ids in Rogue's signups and posts table according to new data in campaigns table.
  • Remove logic from Rogue which queries on campaign_run_id on QA! (DoSomething/rogue#788)
  • Deploy the Rogue fix for campaign_run_id and new signups to production. (DoSomething/rogue#788) Whoops, never mind!
  • Update Rogue Admin to read campaign data from the above table, so we don't get errors when trying to query Ashes for these new consolidated campaign IDs. This has been tested by @DFurnes and @katiecrane on QA (pre-script) and all looks well!
  • Team Bleed to run the script to update signups/post's campaign_id on QA.
  • Team Storm will the update signup/post campaign IDs on Quasar QA based on these SQL queries.
  • Team Bleed: Check that data looks good & everything still works as expected on QA.
  • Team Storm: Check that data looks good & everything still works as expected on Quasar QA.
  • Team Storm: Check that data looks good & everything still works as expected on Looker QA.
  • Team Bleed to run the script to update signups/post's campaign_id on production.
  • Team O'Doyle to deploy DoSomething/gambit-conversations#456 which updates to filter by campaign_id when checking posts/signups (was previously only filtering by campaign_run_id).
  • Team Bleed to update Rogue to ignore campaign_run_id values when querying, creating, and updating signups or posts (by deploying DoSomething/rogue#798 to production).
  • Team Storm will the update signup/post campaign IDs on Quasar Production based on these SQL queries.
  • Team Bleed: Check that data looks good & everything still works as expected on production.
  • Team Storm: Check that data looks good & everything still works as expected on Quasar production.
  • Team Storm: Check that data looks good & everything still works as expected on Looker production.
  • ALL TEAMS: Check that data looks good & everything still works as expected on production!

Once this script has run, we can update front-ends to exclude Run IDs anytime:

  • Phoenix will stop sending campaign_run_id on the next production deploy.
  • Gambit to exclude campaign_run_id from all Rogue requests (alter signups filtering by campaign_id instead of campaign_run_id when checking for signup.why_participated)

Other things to think about in 2019:

  • Galleries previously showed all runs for a multi-run campaign, now they'll just show the "latest" campaign. We may need to rethink for pre-seeding campaigns

Migrate custom VCLs into Fastly snippets.

The upcoming 0.4.0 release of Terraform's Fastly provider includes support for VCL snippets. This is a welcome change from maintaining a completely custom VCL, and we should consider migrating these once that release is shipped.

Convert HAProxy instance to a Fastly property w/ Fastly Anycast.

ISSUE

Current Behavior

  • The Footlocker cert expired, and I've renewed it using Let's Encrypt. We'd like to continue using Let's Encrypt going forward for all our properties because it's free and because it will auto-renew. The certs have to be renewed every 90 days which would be a non-issue if we were pointing directly to the webserver by using a cron job, but because we're using HAProxy we have to go through some somewhat extraordinary measures to utilize the Let's Encrypt auto-renew features.

Desired Behavior

  • Per @sheyd's suggestion, we should migrate the old HAProxy box to Nginx so we can take advantage of the auto-renew features so we eliminate these issues for good in the future. This will also have the added benefit of cleaning up that config and removing cruft that's no longer necessary.

Why This Matters

  • We don't want to have to think about certs anymore!

Properties Affected (optional)

  • Including but not limited to Footlocker, HRBlock, and other properties that go through the current standalone HAProxy box

NOTE: Footlocker will expire 1/29/19 and HRBlock will expire 12/9/18

Standardize more common patterns with modules.

We tend to duplicate a lot of the same boilerplate for each service (e.g. Rogue and Northstar look pretty much identical), and same for the same app between environments (see Northstar Dev, Northstar QA, and Northstar Production).

We should consider standardizing some of these common patterns with reusable DoSomething-specific modules (like a heroku_php_app module that sets up the app, buildpack, standard environment variables, domain, log drain, and just has arguments for things we may customize per-app).

Move all Longshot environments for Foot Locker & HR Block to RDS

FEATURE OVERVIEW

User Story

As DS, we want every Longshot environment for Foot Locker internal, external and HR Block to be on RDS so that it handles automated backups, upgrades, and availability.

Additional Information

This was done for Foot Locker production here

Related ticket

This was also resurfaced in a Longshot post-mortem as a next step here

Tentative soft launch date for HR Block is Nov 6, hard launch is Nov 7

Why This Matters

We want more stability with the Longshot app across environments

Definition of Done

Given that I'm a DS dev
When I look at any Longshot environment
Then it's on RDS and things are working as expected

Additional Things to Consider for Done

(1) Test needed?
(2) Documentation needed?

Add S3 Bucket Permissions for Blockbuster App to Terraform

We use the dosomething-blockbuster S3 bucket for the Gala and Summit every year. We want to be able to toggle off/on public access, as well as define IAM role and permissions for write access to the bucket in Terraform so we don't have to manually muck about with the web console.

Automate dependency updates across applications.

We have some recurring cards to run npm update && composer update in Northstar and Rogue, but it's easy to forget to do this until GitHub sends out a security alert (and that's only for JavaScript dependencies for now). We should look into something like Greenkeeper (just npm), DependenCI (just Composer, killer name) or Snyk (everything) to automate this.

Add RDS databases to Terraform.

We use Amazon RDS for most of our application databases because it offers good pricing, performance, and automated backups. We currently configure and hook these up to applications by hand, but this is prone to errors. We should audit these & manage them with Terraform.

Add SQS queues to Terraform.

We use Amazon SQS for queuing in many of our applications (Northstar, Rogue, and Chompy) because it's affordable, simple, and reliable. Like other third-party services, though, we configure queues & hook them up to applications by hand!

We should move these queues, IAM roles, and environment variable config into Terraform.

Add DNS records to Terraform.

We should add our DNS records to Terraform so we can manage subdomains & hook them up to the right backend in code. I could've sworn DNSMadeEasy was a third-party provider last time I looked, but it turns out it's included out-of-the-box.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.