openaustralia / infrastructure Goto Github PK

8.0 8.0 2.0 4.1 MB

Automated setup and configuration for most of OpenAustralia Foundation's servers

Shell 9.62% PHP 3.06% HCL 82.71% VCL 3.39% Ruby 0.52% Makefile 0.71%

infrastructure's Introduction

OpenAustralia.org

This is the master OpenAustralia.org repository. Here you'll find issue tracking for the whole project and how to deploy it. This repository doesn't contain much code, those are stored in the submodules.

The key sub-projects are:

The web application: openaustralia/twfy
The parser: openaustralia/openaustralia-parser

Development

OpenAustralia.org is currently deployed on Ubuntu 12.04 and has a number of quite old dependencies. This means it can be a bit difficult to get it running on a modern machine (if you'd like to try anyway there's an old website that has the details).

The easiest way to get a development copy running is to use Vagrant, VirtualBox, and Ansible with the Vagrantfile in the infrastructure repository NOT THIS REPOSITORY.

Ansible doesn't currently create a ~vagrant/.my.cnf so you'll have to create one by hand, pinching DB details from /srv/www/production/shared/config/general`.

Then:

# Setup the database on the Vagrant machine
bundle exec cap -S stage=development deploy:setup_db

# Load MPs into the database
bundle exec cap -S stage=development parse:members

# Download, parse, and load speeches for an example day
vagrant ssh --command '/srv/www/production/current/openaustralia-parser/parse-speeches.rb 2017-08-08'

Yay, you've done it! Visit http://openaustralia.org.au.test and you should see your development copy of OpenAustralia.org.au

Deployment

OpenAustralia.org is deployed using Capistrano from this repository. Once you've made changes to the web application or the parser and those have been pushed to GitHub you'll first need to update their submodules in this repository.

You do this by adding and committing, just like you would with any other change in Git. Here's what it looks like to update both the parser and the web application's submodules:

$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

  modified:   openaustralia-parser (new commits)
  modified:   twfy (new commits)

no changes added to commit (use "git add" and/or "git commit -a")
$ git add --patch
diff --git a/openaustralia-parser b/openaustralia-parser
index 08291a1..e7aa61c 160000
--- a/openaustralia-parser
+++ b/openaustralia-parser
@@ -1 +1 @@
-Subproject commit 08291a110bd044e9b3b23deeeaff5a87489d59c3
+Subproject commit e7aa61c30fa0352fbf20247119b3a7abb6cb12e8
Stage this hunk [y,n,q,a,d,/,e,?]? y

diff --git a/twfy b/twfy
index 08dcf7a..ee01ada 160000
--- a/twfy
+++ b/twfy
@@ -1 +1 @@
-Subproject commit 08dcf7a702e483292248efeeaa8c2e439b00a85c
+Subproject commit ee01ada5fa07d3f8bc4a95620c401f238b5b1e70
Stage this hunk [y,n,q,a,d,/,e,?]? y

$ git commit --message="Update to HEAD of submodules"
[master 95051d1] Update to HEAD of submodules
 2 files changed, 2 insertions(+), 2 deletions(-)
$ git push origin master

Once this is pushed to GitHub you're ready to deploy:

bundle exec cap -S stage=production deploy

If you've updated data about members you'll need to parse that and import it. This happens automatically once a day or you can run it using this Capistrano task:

bundle exec cap -S stage=production parse:members

For other things, like attempting to parse a day's speeches after a parsing error, you'll need to log into the server to run the script(s) manually.

Updating images

OpenAustralia attempts to grab the official profile photo for each MP from the APH website. However, it's common for the profile page to go up some time before the profile photo is ready. When this happens, we cache the photoless page. It's neccessary to manually purge the cache in order to detect that a photo has been added.

The cached html files live in /srv/www/production/shared/html_cache/member_images. To clear out the cache for everyone with the surname Abbot, cd to that directory and ls *Abbot*. If you're sure you've got the right list of files, you can use rm to really get rid of them.

You'll then need to:

$ cd /srv/www/production/current/openaustralia-parser/
$ ./member-images.rb

to load the new images.

The new images should be picked up by TVFY the next day.

Copyright & License

infrastructure's People

Contributors

Stargazers

Watchers

Forkers

benrfairless adamtbeames

infrastructure's Issues

Don't send production and staging cron outputs to the same place

Currently for theyvoteforyou, openaustralia and planningalerts they're being sent to seperate slack channels by project. So, for example openaustralia staging and production cron output goes to the same slack channel. This makes things less useful because every time you look at a log you have to figure out whether it's production or staging.

So, handle cron output differently for production and staging:

We could send them to different channels
We could only send them to slack for production (I think this is my preferred choice)

Change url of openaustralia foundation site to oaf.org.au

Right now it's openaustraliafoundation.org.au. It should stay that way until after the migration is complete. After that's done it would be good to change it to oaf.org.au to be consistent with the dominant form of the email address that we use.

Also, openaustraliafoundation should then redirect to oaf.org.au.

Move righttoknow over to more generic installation

Right now it's being installed from an openaustralia fork of righttoknow. There's really very little reason now to maintain that fork as we haven't been doing active development on alavetelli for a good few years.

So, it would make more sense to use more of a default setup as mysociety maintains it.

Set up some kind of output for cron jobs

Before we had everything going out via email. Do we still want to do this?

Configure and test email setup for theyvoteforyou

Convert openaustralia myisam tables to innodb

After migrating to RDS:

DB Instance main-database contains MyISAM tables that have not been migrated to InnoDB. These
tables can impact your ability to perform point-in-time restores. Consider converting these tables to
InnoDB. Please refer to
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.BackingUpAndRestoringAmazonRDSInstances.html#Overview.BackupDeviceRestrictions"

Add blog.openaustralia.org handling back

We've currently missed that in the openaustralia migration

Migrate all services off of octopus computing

Unfortunately free hosting, so generously donated by Octopus Computing for many years, is coming to an end. We need to migrate all of our services off of Octopus by the end of February.

This doesn't give us a huge amount of time. So, there's clearly a limited amount of re-architecting that we can do. It makes sense to revive this project which was to split out the different websites on to different VMs, which will loosen some of the dependencies between the sites and also hopefully make it easier for volunteers to potentially help maintain the infrastructure. We won't then need to give them access to everything for the volunteer to be only helping out with one project. There's also less to learn and less that can go wrong, in theory at least.

Some principles on how we could do this:

Unless there's a very specific and clear reason keep all changes to the minimum (e.g. don't change versions of support software unless it's needed. Don't change the architecture just because we can)
Separate each site on to its own VM
Centralise the databases - this is different than the approach originally taken in this repo - this is so that the architecture is closer to using a managed database (which would be great if we could because running databases is no fun). Alternatively if we don't go for a managed solution in the long wrong it's easier to set up a replicated database in a leader/follower setup.
Manage servers using Ansible
Migrate each service off one by one (to spread out the load of any downtime) and get feedback on our decisions as quickly as possible.

Option to disable cron jobs

This should be there for all the sites so that during migration we can disable the cron jobs on the new server, migrate the database, disable cron jobs on the old server and then enable cron jobs on the new server.

Investigate package update options for Ansible servers

We're moving to boxes built using Ansible and one thing @mlandauer hasn't quite worked out is how to deal with package updates effectively.

Consider doing the AWS infrastructure setup with terraform

Ansible is kind of hacky (even more than usual) when it comes to setting up EC2 instances and the associated bits and bobs.

Would it make more sense to investigate whether something like Terraform is a more appropriate tool?

If possible make web request to ip address return the site too

Basically setup the default site for nginx

Disable honeybadger when running in development on vagrant

Switch theyvoteforyou over to RDS on ec2

Create script to generate SSL certificates for development

Currently we're generating them by hand, encrypting them and checking them into the repository.

This works but won't allow someone who doesn't have the ansible vault password to do local development which is less than ideal.

We don't want to check in the private key for the CA as that would allow anyone to sign a certificate for any domain, effectively allowing someone to man-in-the-middle any of my traffic (or anyone else who trusts that CA root certificate).

So, a much better solution would be for every individual developer to have their own unique root CA. So, let's write a script to do all the hard work.

Create OAF "certificate authority" and sign development SSL certs with it

We need to update the SSL certificates (self-signed) for this so we might as well might things a little bit better. If we create a certificate authority and sign SSL certificates with it for development domains then we only need to install the CA certificate and all domains signed the CA certificate will be trusted which makes things much less hassly.

See here for some information on how to do this:
https://deliciousbrains.com/ssl-certificate-authority-for-local-https-development/

Handle redirection of php urls for theyvoteforyou

See this snippet from the apache setup on Kedumba:

# Turn off any php handling for this so that urls ending in .php get passed to Rails
    RemoveHandler .php
    php_flag engine off

We need to do something similar for nginx

Fix RSS generation

See the output of running the cron job morningupdate:

PHP Warning: Module 'newrelic' already loaded in Unknown on line 0
PHP Warning: Module 'newrelic' already loaded in Unknown on line 0
Start time: 2018-03-05 09:05:01 AEDT
Parsing from APH to XML and loading into the database
[DEPRECATION] requiring "RMagick" is deprecated. Use "rmagick" instead
PHP Warning: Module 'newrelic' already loaded in Unknown on line 0

parse-speeche: 0% | | ETA: --:--:--
parse-speeche: 50% |ooooooooooooooooooooo | ETA: 00:00:02
parse-speeche: 100% |oooooooooooooooooooooooooooooooooooooooooo| ETA: 00:00:00
parse-speeche: 100% |oooooooooooooooooooooooooooooooooooooooooo| Time: 00:00:05
PHP Warning: Module 'newrelic' already loaded in Unknown on line 0
Xapian indexing
PHP Warning: Module 'newrelic' already loaded in Unknown on line 0
xapian indexing debate 2018-02-05
xapian indexing lords 2018-02-05
xapian indexing debate 2018-02-06
xapian indexing lords 2018-02-06
xapian indexing debate 2018-02-07
xapian indexing lords 2018-02-07
xapian indexing debate 2018-02-08
xapian indexing lords 2018-02-08
xapian indexing debate 2018-02-12
xapian indexing lords 2018-02-12
xapian indexing debate 2018-02-13
xapian indexing lords 2018-02-13
xapian indexing debate 2018-02-14
xapian indexing lords 2018-02-14
xapian indexing debate 2018-02-15
xapian indexing lords 2018-02-15
xapian indexing debate 2018-02-26
xapian indexing debate 2018-02-27
xapian indexing debate 2018-02-28
xapian indexing debate 2018-03-01

Running rssgenerate
Can't locate XML/RSS.pm in @INC (you may need to install the XML::RSS module) (@INC contains: /srv/www/staging/releases/20180304001345/twfy/scripts/../../perllib /srv/www/staging/releases/20180304001345/twfy/scripts /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.22.1 /usr/local/share/perl/5.22.1 /usr/lib/x86_64-linux-gnu/perl5/5.22 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.22 /usr/share/perl/5.22 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base .) at ./make_rss.pl line 9.
BEGIN failed--compilation aborted at ./make_rss.pl line 9.
PHP Warning: Module 'newrelic' already loaded in Unknown on line 0
PHP Notice: Undefined variable: returl in /srv/www/staging/releases/20180304001345/twfy/www/includes/easyparliament/page.php on line 889
PHP Notice: Undefined variable: returl in /srv/www/staging/releases/20180304001345/twfy/www/includes/easyparliament/page.php on line 911
...

And we can see in the apache logs that the RSS feed files are missing

Move over to rbenv for all rails projects

Something for the backlog. It would be good to update all rails projects to use rbenv rather than a mixture of rbenv and rvm.

Enforce ssl for openaustralia.org.au

Because it's the right thing to do...

Election Leaflets currently only set up for development

You need to configure the site address in the application configuration so we can't set it up transparently like PA or TVFY.

Move over CoreLogic (RP Data) weekly PlanningAlerts data feed

Separate servers for staging

Do we want varnish for theyvoteforyou?

Currently on kedumba it's a rather tortuous setup: SSL requests get served by apache which in turn makes plain http requests to varnish which in turn requests things from apache.

This is very horrible. Do we really need this?

Do we want to run ssh on non-standard port again by default?

It's certainly not security - just might reduce noise of random people knocking loudly on our door

Add cuttlefish.oaf.org.au to newrelic app monitoring

For some reason it's not there

Estimate cost of migrating to AWS using managed db (RDS)

Open-source server monitoring

Chances are we're going to not be able to use newrelic for all our server monitoring because we'll have too many servers. So, we'll probably have to setup our own monitoring infrastructure. That will be fun.

Make sure that mysql server is always setup to store databases (or is it tables?) in separate files

By default it does something unhelpful

Backups for RDS

This is a companion to #32

Add cron jobs for theyvoteforyou

These should be disabled by default in development and test (with the exception of backups - see #12) because otherwise bad things will happen

Setup Elastic IP address for theyvoteforyou instance

Right now we've got the DNS setup in a way that is dependent on the instances being around for a long time and maintaining their IP address.

As far as I'm aware there are two approaches we could take:

Use the Elastic Load Balancer
or use Elastic IP Addresses

I'm guessing (without looking) that having a seperate IP address for each site is going to be too expensive and as far as I'm aware we can have several domains go through a single load balancer.

Alaveteli + Ansible + Vagrant = ERROR: Decryption failed

Hi,

Ansible n00b here trying to setup nuvasuparati.info with your Ansible + Alaveteli config. Now I am setting up the dev environment and I have an error. I made a clone and removed all non-foi stuff but I think I might be missing some local encrypted variables.

What am I doing wrong?

$ cat ~/.infrastructure_ansible_vault_pass.txt
secret
$ vagrant up righttoknow.org.au.dev
......
Cleaning up downloaded VirtualBox Guest Additions ISO...
==> righttoknow.org.au.dev: Checking for guest additions in VM...
==> righttoknow.org.au.dev: Checking for host entries
==> righttoknow.org.au.dev: adding to (/etc/hosts) : 192.168.10.10
righttoknow.org.au.dev  # VAGRANT: d92aac01fdc0306fab4d852efb4544a7
(righttoknow.org.au.dev) / ade91b15-c094-4d57-b495-c4cfc98c3a8b
[sudo] password for andrei:
==> righttoknow.org.au.dev: Setting hostname...
==> righttoknow.org.au.dev: Configuring and enabling network interfaces...
==> righttoknow.org.au.dev: Mounting shared folders...
    righttoknow.org.au.dev: /vagrant => ~/infrastructure
==> righttoknow.org.au.dev: Running provisioner: ansible...
PYTHONUNBUFFERED=1 ANSIBLE_FORCE_COLOR=true
ANSIBLE_HOST_KEY_CHECKING=false ANSIBLE_SSH_ARGS='-o
UserKnownHostsFile=/dev/null -o ControlMaster=auto -o
ControlPersist=60s' ansible-playbook
--private-key=~/infrastructure/.vagrant/machines/righttoknow.org.au.dev/virtualbox/private_key
--user=vagrant --connection=ssh --limit='righttoknow.org.au.dev'
--inventory-file=~/infrastructure/.vagrant/provisioners/ansible/inventory
--sudo -vv --skip-tags=dns site.yml
ERROR: Decryption failed
Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.

Thank you for publishing your infrastructure!

Test that everything works across reboots

run-with-lockfile.sh is missing, what am I doing wrong?

Hi,

For some reason this task fails. I cloned commonlib and ansible passed but what is the correct way to do it? I disabled non-alaveteli sites so this might be the cause.

#mkdir -p /srv/www/current/
#cd /srv/www/current/
#git clone https://github.com/mysociety/commonlib.git

I ran using my fork of infrastructure, the secret is.... well.... secret in ~/.infrastructure_ansible_vault_pass.txt :)

Add test.theyvoteforyou.org.au.dev

So that we'll actually have a server setup that we will use in production too.

Switch over theyvoteforyou deployment back to capistrano

I think we're currently using capistrano for deployments on kedumba for theyvoteforyou so we should use that for the new server setup too to make sure that we're not missing anything in the deployment.

Switch production/test SSL certificates over to Let's Encrypt

This is again to match how things are now done on kedumba

Figure out production requirements on ec2 for theyvoteforyou

Buy reserved instances for EC2 and RDS for the upcoming year

Remove backups

Once we've moved over to RDS (see #32) we won't need any backups for the server itself because everything of permanent value is stored in the database.

Stop applications from reporting to new relic without SSL

See https://rpm.newrelic.com/accounts/154944/ssl_upgrade

Do backups in development (vagrant) version too

We'll obviously need to ensure that the backups on S3 are namespaced by the machine name (as we've tending to do anyway).

By doing this we're more likely to ensure that the right thing is happening in production by being able to test everything in development.

Test ssl setup with ssllabs

Come up with a migration plan for theyvoteforyou and document it here

Setup for cuttlefish.oaf.org.au?

I think this is already being managed on a seperate server by ansible. Need to check this and need to check that the DNS is being managed somewhere.

Test background jobs on theyvoteforyou

Add cuttlefish.oaf.org.au application to new relic monitoring

For some reason it's not there. The server is showing up though

Update DNS setup for theyvoteforyou

Spec servers

Work out what specs we need for each of these new servers:

Production

OpenAustralia.org.au

RAM: 2 GB
Disk: 80 GB

Latest gzipped MySQL backup: 269 MB * (30 days + 12 months + 52 weeks) = 25 GB

The running database is about 1.5 GB.

henare@kedumba:~$ du -sh /srv/www/www.openaustralia.org/
11G     /srv/www/www.openaustralia.org/
henare@kedumba:~$

That's just over 45 GB. Plus system and growth: 80 GB

OpenAustraliaFoundation.org.au

(what jamison already has)

RAM: 1 GB
Disk: 10 GB

PlanningAlerts

RAM: 3 GB
Disk: 80 GB

Disk

henare@kedumba:~$ du -h /srv/www/www.planningalerts.org.au/
6.7G    /srv/www/www.planningalerts.org.au/
henare@kedumba:~$ du -h /var/lib/automysqlbackup/*/pa_prod
3.2G    /var/lib/automysqlbackup/daily/pa_prod
3.0G    /var/lib/automysqlbackup/monthly/pa_prod
15G     /var/lib/automysqlbackup/weekly/pa_prod
henare@kedumba:~$

Up to 30 files could normally be in /var/lib/automysqlbackup/daily/pa_prod so I think a truer number is 15GB for this directory too (latest backup is 508 MB gzipped * 30 = ~15GB). That also makes the monthly directory 6 GB on the current backup size.

Adding up all the sizes from MySQL show table status; shows the database uses around 3.5 GB.

That's about 45GB. Plus system and growth: 80GB

Election Leaflets

RAM: 1 GB
Disk: 20 GB

Based on the latest backup database size, 6.8 MB, total required for automysqlbackup is 658 MB.

henare@kedumba:~$ du -sh /srv/www/www.electionleaflets.org.au/
11G     /srv/www/www.electionleaflets.org.au/
henare@kedumba:~$

Plus system and given we're not planning to add much to this: 20 GB.

Right To Know

RAM: 3 GB
Disk: 30 GB

henare@kedumba:~$ sudo du -sh /srv/www/www.righttoknow.org.au/
13G     /srv/www/www.righttoknow.org.au/
henare@kedumba:~$

postgres=# SELECT pg_size_pretty(pg_database_size('alaveteli_production'));
 pg_size_pretty
----------------
 101 MB
(1 row)

postgres=#

On the new server we'll also have autopostgresbackup so that'll take some space.

This could have quite a bit of growth soon. Let's say 30GB.

They Vote For You

RAM: 2 GB
Disk: 10 GB

Latest gzipped MySQL backup 9.1 MB, automysqlbackup should need 855 MB.

henare@kedumba:~$ du -sh /srv/www/theyvoteforyou.org.au/
3.3G    /srv/www/theyvoteforyou.org.au/
henare@kedumba:~$

morph.io

(existing Linode)

RAM: 4 GB
Disk: 95 GB (62 used)

cuttlefish.oaf.org.au