MSM Health Equity Tracker Backend

Codebase for Health Equity Tracker.

Contributing

To contribute to this project:

Fork the repository on github
On your development machine, clone your forked repo and add the official repo as a remote.
- Tip: by convention, the official repo is added with the name upstream. This can be done with the command git remote add upstream [email protected]:SatcherInstitute/<repo>.git

When you're ready to make changes:

Pull the latest changes from the official repo.
- Tip: If your official remote is named upstream, run git pull upstream master
Create a local branch, make changes, and commit to your local branch. Repeat until changes are ready for review.
[Optional] Rebase your commits so you have few commits with clear commit messages.
Push your branch to your remote fork, use the github UI to open a pull request (PR), and add reviewer(s).
Push new commits to your remote branch as you respond to reviewer comments.
- Note: once a PR is under review, don't rebase changes you've already pushed to the PR. This can confuse reviewers.
When ready to submit, use the "Squash and merge" option. This maintains linear history and ensures your entire PR is merged as a single commit, while being simple to use in most cases. If there are conflicts, pull the latest changes from master, merge them into your PR, and try again.

Note that there are a few downsides to "Squash and merge"

The official repo will not show commits from collaborators if the PR is a collaborative branch.
Working off the same branch or a dependent branch duplicates commits on the dependent branch and can cause repeated merge conflicts. To work around this, if you have a PR my_branch_1 and you want to start work on a new PR that is dependent on my_branch_1, you can do the following:
1. Create a new local branch my_branch_2 based on my_branch_1. Continue to develop on my_branch_2.
2. If my_branch_1 is updated (including by merging changes from master), switch to my_branch_2 and run git rebase -i my_branch_1 to incorporate the changes into my_branch_2 while maintaining the the branch dependency.
3. When review is done, squash and merge my_branch_1. Don't delete my_branch_1yet.
4. From local client, go to master branch and pull from master to update the local master branch with the squashed change.
5. From local client, run git rebase --onto master my_branch_1 my_branch_2. This tells git to move all the commits between my_branch_1 and my_branch_2 onto master. You can now delete my_branch_1.

Read more about the forking workflow here. For details on "Squash and merge" see here

One-time development setup

Install Cloud SDK (Quickstart)
Install Terraform (Getting started)
Install Docker Desktop (Get Docker)

gcloud config set project <project-id>

Testing

Unit tests can be run using pytest. Running pytest will recursively look for and execute test files.

pip install pytest
pytest

To test from the packaged version of the ingestion library, run pip install -e python/ingestion before testing.

Python environment setup

Create a virtual environment in your project directory, for example: python3 -m venv .venv
Activate the venv: source .venv/bin/activate
Install pip-tools and other packages as needed: pip install pip-tools

Testing Pub/Sub triggers

To test a Cloud Run service triggered by a Pub/Sub topic, run gcloud pubsub topics publish projects/<project-id>/topics/<your_topic_name> --message "your_message" --attribute=KEY1=VAL1,KEY2=VAL2

See Documentation for details.

Shared python code

Most python code should go in the /python directory, which contains packages that can be installed into any service. Each sub-directory of /python is a package with an __init__.py file, a setup.py file, and a requirements.in file. Shared code should go in one of these packages. If a new sub-package is added:

Create a folder /python/<new_package>. Inside, add:
- An empty __init__.py file
- A setup.py file with options: name=<new_package>, package_dir={'<new_package>': ''}, and packages=['<new_package>']
- A requirements.in file with the necessary dependencies
For each service that depends on /python/<new_package>, follow instructions at Adding an internal dependency

To work with the code locally, run pip install ./python/<package> from the root project directory. If your IDE complains about imports after changing code in /python, re-run pip install ./python/<package>.

Adding a new root-level python directory

Note: generally this should only be done for a new service. Otherwise, please add python code to the python/ directory.

When adding a new python root-level python directory, be sure to update .github/workflows/linter.yml to ensure the directory is linted and type-checked.

Adding python dependencies

Adding an external dependency

Add the dependency to the appropriate requirements.in file.
- If the dependency is used by /python/<package>, add it to the /python/<package>/requirements.in file.
- If the dependency is used directly by a service, add it to the <service_directory>/requirements.in file.
For each service that needs the dependency (for deps in /python/<package> this means every service that depends on /python/<package>):
- Run cd <service_directory>, then pip-compile requirements.in where <service_directory> is the root-level directory for the service. This will generate a requirements.txt file.
- Run pip install -r requirements.txt to ensure your local environment has the dependencies, or run pip install <new_dep> directly. Note, you'll first need to have followed the python environment setup described above Python environment setup.
Update the requirements.txt for unit tests
pip-compile python/tests/requirements.in -o python/tests/requirements.txt

Adding an internal dependency

If a service adds a dependency on /python/<some_package>:

Add -r ../python/<some_package>/requirements.in to the <service_directory>/requirements.in file. This will ensure that any deps needed for the package get installed for the service.
Follow step 2 of Adding an external dependency to generate the relevant requirements.txt files.
Add the line RUN pip install ./python/<some_package> to <service_directory>/Dockerfile

Launch the data ingestion pipeline on your local machine

Set up

Install Docker
Install Docker Compose
Set environment variables
- PROJECT_ID
- GCP_KEY_PATH (See documentation on creating and downloading keys.)
- DATASET_NAME
- GCS_LANDING_BUCKET
- GCS_MANUAL_UPLOADS_BUCKET
- MANUAL_UPLOADS_DATASET
- MANUAL_UPLOADS_PROJECT
- EXPORT_BUCKET

Getting Started

From inside the airflow/dev/ directory:

Build the Docker containers

make build
Stand up the multi-container environment

make run
At the UI link below, you should see the list of DAGs pulled from the dags/ folder. These files will automatically update the Airflow webserver when changed.
To run them manually, select the desired DAG, toggle to On and click Trigger Dag .
When finished, turn down the containers

make kill

More info on Apache Airflow in general.

Airflow UI link

localhost:8080

Developing locally with BigQuery

To upload to BigQuery from your local development environment, use these setup directions with an experimental Cloud project. This may be useful when iterating quickly if your Cloud Run ingestion job isn’t able to upload to BigQuery for some reason such as JSON parsing errors.

Deploying your own instance with terraform

Before deploying, make sure you have installed Terraform and a Docker client (e.g. Docker Desktop). See Set up above.

Create your own terraform.tfvars file in the same directory as the other terraform files. For each variable declared in prototype_variables.tf that doesn't have a default, add your own for testing. Typically your own variables should be unique and can just be prefixed with your name or ldap. There are some that have specific requirements like project ids, code paths, and image paths.
Configure docker to use credentials through gcloud. gcloud auth configure-docker
On the command line, navigate to your project directory and initialize terraform.
```
cd path/to/your/project
terraform init
```
Build and push your Docker images to Google Container Registry. This step uses the run_ingestion service as an example, but you will need to repeat this step for any service you've made changes to.
- Select any unique identifier for your-ingestion-image-name.
- Run:
```
# Build the images locally
docker build -t gcr.io/<project-id>/<your-ingestion-image-name> -f run_ingestion/Dockerfile .

# Upload the image to Google Container Registry
docker push gcr.io/<project-id>/<your-ingestion-image-name>
```
- Note that the frontend docker build command must append: --build-arg="DEPLOY_CONTEXT=development"

Deploy via Terraform.

# Get the latest image digests
export TF_VAR_ingestion_image_name=$(gcloud container images describe gcr.io/<project-id>/<your-ingestion-image-name> \
--format="value(image_summary.digest)")
# ... repeat for every service that was re-built and pushed in step 4.

# Switch to the config directory to deploy to terraform
cd config

# Deploy via terraform, providing the paths to the latest images so it knows to redeploy
# Append the appropriate environment variables for each service that was re-built and pushed in step 4.
terraform apply -var="ingestion_image_name=<your-ingestion-image-name>@$TF_VAR_ingestion_image_name"

Alternatively, if you aren't familiar with bash or are on Windows, you can run the above gcloud container images describe commands manually and copy/paste the output into your tfvars file for the ingestion_image_name and gcs_to_bq_image_name variables.

To redeploy, e.g. after making changes to a Cloud Run service, repeat steps 4-5. Make sure you run the docker commands from your base project dir and the terraform commands from the config/ directory.

Terraform deployment notes

Terraform doesn't automatically diff the contents of cloud run services, so simply calling terraform apply after making code changes won't upload your new changes. This is why Steps 4 and 5 are needed above. Here is an alternative:

Use terraform taint to mark a resource as requiring redeploy. Eg terraform taint google_cloud_run_service.ingestion_service. You can then set the ingestion_image_name variable in your tfvars file to <your-ingestion-image-name> and gcs_to_bq_image_name to <your-gcs-to-bq-image-name>. Then replace Step 5 above with just terraform apply. Step 4 is still required.

Accessing the Terraform UI Deployed

Go to Cloud Console.
Search for Composer
A list of environments should be present. Look for data-ingestion-environment
Click into the details, and navigate to the environment configuration tab.
One of the properties listed is Airflow web UI link.

Test and Production Environments

A note on Airflow DAGS

All files in the airflows/dags directory will be uploaded to the test airflow environment. Please only put DAG files in this directory.

frontend

The frontend consists of

health-equity-tracker/frontend/: A React app that contains all code and static resources needed in the browser (html, JS, CSS, images). This app was bootstrapped with Create React App. Documentation on Create React App can be found here.
health-equity-tracker/frontend_server/: A lightweight server that serves the React app as static files and forwards data requests to the data server.
health-equity-tracker/data_server/: A data server that responds to data requests by serving data files that have been exported from the data pipeline.

In addition, we have a Storybook project that also lives in health-equity-tracker/frontend/. Storybook is a library that allows us to explore and develop UI components in isolation. Stories for each UI component are contained in the same directory as the component in a subfolder called "storybook". The current master branch version of Storybook can be seen here: https://het-storybook.netlify.app

Frontend React App Environments

The frontend React App runs in different environments. We use configuration files (frontend/.env.prod, frontend/.env.staging, etc) to control settings in different environments. These include things like the data server URL and logging settings. These can be overridden for local development using a frontend/.env.development file.

Running just the React App locally

One Time Setup

Switch to the frontend/ directory, then install dependencies using NPM.

Note: you will need a compatible verison of Node.JS and NPM installed locally. See the "engines" field in frontend/package.json for the required / minimum versions of each. It's recommended to use Node Version Manager (nvm) if you need to have multiple versions of Node.JS / NPM installed on your machine.

cd frontend && npm install

Trouble-shooting Install

If you encounter errors during install that mention gyp, that refers to a Node.js native addon build tool that is required for some modules. Follow the instructions on the gyp github repo for installation and setting up required dependencies (eg Python and certain build tools like XCode Command Line Tools for OS X).

Running the React App

Since the frontend is a static site that just connects to an API for data requests, most frontend development happens independently of server-side changes. If you're only changing client-side behavior, you only need to run the React App. The simplest way to do this is to connect the frontend to the test website server. First, copy frontend/.env.example into frontend/.env.development. This file is already set up to point to the test website server.

To start a local development server, switch to the frontend/ directory and run:

npm run start:development

The site should now be visible at http://localhost:3000. Any changes to source code will cause a live reload of the site.

Note: you can also run npm start without a .env.development file. This will read environment variables from your terminal.

Note: when new environment variables are added, be sure to update the .env.example file so developers can reference it for their own .env.development files.

Available Overrides for local development

Environment variables in frontend/.env.development can be tweaked as needed for local development.

The REACT_APP_BASE_API_URL can be changed for different setups:

You can deploy the frontend server to your own GCP project
You can run the frontend server locally (see below)
You can run Docker locally (see below)
You can set it to an empty string or remove it to make the frontend read files from the /public/tmp directory. This allows testing behavior by simply dropping local files into that directory.

You can also force specific dataset files to read from the /public/tmp directory by setting an environment variable with the name REACT_APP_FORCE_STATIC variable to a comma-separated list of filenames. For example, REACT_APP_FORCE_STATIC=my_file1.json,my_file2.json would force my_file1.json and my_file2.json to be served from /public/tmp even if REACT_APP_BASE_API_URL is set to a real server url.

Running the Frontend Server locally

If you need to run the frontend server locally to test server-side changes, copy frontend_server/.env.example into frontend_server/.env.development, and update DATA_SERVER_URL to point to a specific data server url, similar to above.

To run the frontend server locally, navigate to the frontend_server/ directory and run:

node -r dotenv/config server.js dotenv_config_path=.env.development

This will start the server at http://localhost:8080. However, since it mostly serves static files from the build/ directory, you will either need to

run the frontend server separately and set the REACT_APP_BASE_API_URL url to http://localhost:8080 (see above), or
go to the frontend/ directory and run npm run build:development. Then copy the frontend/build/ directory to frontend_server/build/

Similarly to the frontend React app, the frontend server can be configured for local development by changing environment variables in frontend_server/.env.development. Copy frontend_server/.env.example to get started.

Running the Frontend Server with Docker locally

If you need to test Dockerfile changes or run the frontend in a way that more closely mirrors the production environment, you can run it using Docker. This will build both the frontend React app and the frontend server.

Run the following commands from the root project directory:

Build the frontend Docker image: docker build -t <some-identifying-tag> -f frontend_server/Dockerfile . --build-arg="DEPLOY_CONTEXT=development"
Run the frontend Docker image: docker run -p 49160:8080 -d <some-identifying-tag>
Navigate to http://localhost:49160.

When building with Docker, changes will not automatically be applied; you will need to rebuild the Docker image.

Running the Frontend Sever in your own GCP project

Refer to Deploying your own instance with terraform for instructions on deploying the frontend server to your own GCP project.

Running Storybook locally

To run storybook locally, switch to the frontend/ directory and run:

npm run storybook:development

Storybook local development also uses frontend/.env.development for configuration. However, storybook environment variables must start with STORYBOOK_ instead of REACT_APP_. Most environment variables have an equivalent STORYBOOK_ version.

Tests

To run unit tests, switch to the frontend/ directory and run:

npm test

This will run tests in watch mode, so you may have the tests running while developing.

Build

To create a "production" build do:

npm run build:${DEPLOY_CONTEXT}

This will use the frontend/.env.${DEPLOY_CONTEXT} file for environment variables and outputs bundled files in the frontend/build/ directory. These are the files that are used for hosting the app in production environments.

Ejecting Create React App

Note: this is a one-way operation. Once you eject, you can’t go back!

Don't do this unless there's a strong need to. See https://create-react-app.dev/docs/available-scripts/#npm-run-eject for further information.

License

MIT

michaelaltmann / health-equity-tracker Goto Github PK

health-equity-tracker's Introduction