councildataproject / cookiecutter-cdp-deployment Goto Github PK

View Code? Open in Web Editor NEW

26.0 4.0 9.0 8.14 MB

Cookiecutter template for creating new CDP instances.

License: Mozilla Public License 2.0

Python 39.42% HTML 25.09% JavaScript 6.36% TeX 26.44% Just 2.69%

cookiecutter-template cdp-deployments civic-tech open-government local-government government-data

cookiecutter-cdp-deployment's Introduction

cookiecutter-cdp-deployment

Cookiecutter template for creating new Council Data Project deployments.

Council Data Project

Council Data Project is an open-source project dedicated to providing journalists, activists, researchers, and all members of each community we serve with the tools they need to stay informed and hold their Council Members accountable.

For more information about Council Data Project, please visit our website.

About

This repository is a "cookiecutter template" for an entirely new Council Data Project (CDP) Instance. By following the steps defined in the Usage section, our tools will create and manage all the database, file storage, and processing infrastructure needed to serve the CDP web application.

While our tools will setup and manage all processing and storage infrastructure, you (or your team) must provide and maintain the custom Python code to gather event information and handle billing for the costs of the deployment.

For more information about costs and billing, see Cost.

CDP Instance Features

Plain text search of past events and meeting items
(search for "missing middle housing" or "bike lanes")
Filter and sort event and meeting item search results
(filter by date range, committee, etc.)
Automatic timestamped-transcript generation
(jump right to a specific public comment or debate)
Meeting item and amendment tracking
(check for amendment passage, upcoming meetings, etc.)
Share event at timepoint
(jump right to the point in the meeting you want to share)
Full event minutes details
(view all documents and presentations related to each event)

See the current Seattle CDP Instance for a live example.

Note: Some features are dependent on how much data is provided during event gather. More information see our ingestion models documentation.

Usage

Regardless of your deployment strategy, you may find reading the Things to Know section helpful prior to deployment.

Note: while this cookiecutter will help you setup a repository and CDP infrastructure, you will still need to write your own custom data ingestion function. Writing a basic data ingestion function ranges from taking a couple of hours to a couple of days depending on how much data you want to provide to our system.

Deploying Under the councildataproject.org Domain

If you want your deployment under the councildataproject.org domain (i.e. https://councildataproject.org/seattle), you will need to fill out the "New Instance Deployment" Issue Form.

The Council Data Project team will help you along in the process on the issue from there.

Deploying Under Your Own Domain

If you want to host your deployment under a different domain (i.e. Your-Org-Name.github.io/your-municipality), you will need to install cookiecutter and use this template.

Follow along with the video walkthrough

Before you begin, please note that you will need to install or have available the following:

gcloud
gsutil
Python 3.10+ (Any Python version greater than or equal to 3.10)

Once all tools are installed, the rest of the infrastructure setup process should take an hour or two.

In a terminal with Python 3.10+ installed:

pip install cookiecutter
cookiecutter gh:CouncilDataProject/cookiecutter-cdp-deployment

Follow the prompts in your terminal and fill in the details for the instance deployment. At the end of the process a new directory will have been created with all required files and further instructions to set up your new deployment.

For more details and examples on each parameter of this cookiecutter template, see Cookiecutter Parameters.

Follow the steps in the "Initial Repository Setup" section of the README.md file within the generated SETUP directory.

For more details on what is created from using this cookiecutter template, see Cookiecutter Repo Generation.

The short summary of setup tasks remaining are:

The creation of a new GitHub repository for the instance.
Logging in or creating an account for Google Cloud.
Initialize the basic infrastructure.
Assign a billing account to the created Google Cloud project.
Generate credentials for the Google Project for use in automated scripts.
Attach credentials as secrets to the GitHub repository.
Push the cookiecutter generated files to the GitHub repository.
Setup web hosting through GitHub Pages.
Enable open access for data stored by Google Cloud and Firebase.
Write a data ingestion function for your municipality (it may be useful to build off of cdp-scrapers).

You can also see an example generated repository and the full steps listed here.

Cookiecutter Parameters

Parameter	Description	Example 1	Example 2
municipality	The name of the municipality (town, city, county, etc.) that this CDP Instance will store data for.	Seattle	King County
iana_timezone	The IANA Timezone string of the municipality that this CDP instance is for.	America/Los_Angeles	America/Chicago
governing_body_type	What type of governing body this instance is for.	city council	county council
municipality_slug	The name of the municipality cleaned for use in the web application and parts of repository naming.	seattle	king-county
python_municipality_slug	The name of the municipality cleaned for use in specifically Python parts of the application.	seattle	king_county
infrastructure_slug	The name of the municipality cleaned for use in specifically application infrastructure. Must be globally unique to GCP.	cdp-seattle-abasjkqy	cdp-king-county-uiqmsbaw
maintainer_or_org_full_name	The full name of the primary maintainer or organization that will be managing this instance deployment.	Eva Maxfield Brown	Council Data Project
hosting_github_username_or_org	The GitHub username or organization that will host this instance's repository. (Used in the web application's domain name)	evamaxfield	CouncilDataProject
hosting_github_repo_name	A specific name to give to the repository. (Used in the web application's full address)	cdp-seattle	king-county
hosting_github_url	From the provided information, the expected URL of the GitHub repository.	https://github.com/evamaxfield/cdp-seattle	https://github.com/CouncilDataProject/king-county
hosting_web_app_address	From the provided information, the expected URL of the web application.	https://evamaxfield.github.io/cdp-seattle	https://councildataproject.org/king-county
firestore_region	The desired region to host the firestore instance. (Firestore docs)	us-west1	europe-central2
event_gather_timedelta_lookback_days	The number of days to look back from the current date every time the event scraper runs.	2	6
event_gather_cron	The event gather CRON configuration. (GitHub Actions CRON Details)	26 0,6,12,18 * * *	17 3,9,15,21 * * *
event_gather_runner_timeout_minutes	Minutes to wait before creating a CML runner attempt will fail.	15	16
event_gather_runner_max_attempts	Number of times to attempt to create a CML runner.	4	36
event_gather_runner_retry_wait_seconds	Number of seconds to wait between CML runner create attempts.	600	600

Things to Know

Much of Council Data Project processing and resource management can be handled for free and purely on GitHub. However we do rely on a select few resources outside of GitHub to manage all services and applications.

The only service that will require a billing account to manage payment for resources used, is Google Cloud. Google Cloud will manage all databases, file storage, and heavy-compute such as speech-to-text for transcription. You can see more about the average monthly cost of running a CDP Instance in Cost.

For more details see Cookiecutter Repo Generation. After creating the repo, the following steps will have instructions and links specific to your deployment in the generated repository's README.

Cookiecutter Repo Generation

Cookiecutter is a Python package to generate templated projects. This repository is a template for cookiecutter to generate a CDP deployment repository which contains following:

A directory structure for your project
A directory for your web application to build and deploy from
A directory for infrastructure management
A directory for your Python event gather function and it's requirements
Continuous integration
- Preconfigured for your web application to fully deploy
- Preconfigured to deploy all required CDP infrastructure
- Preconfigured to run CDP pipelines using GitHub Actions

To generate a new repository from this template, in a terminal with Python 3.5+ installed, run:

pip install cookiecutter
cookiecutter gh:CouncilDataProject/cookiecutter-cdp-deployment

Note: This will only create the basic repository. You will still need to set up a Google Cloud account.

Google Cloud

All of your deployments data and some data processing will be done using Google Cloud Platform (GCP).

Your deployment's provided and generated data (meeting dates, committee names, councilmember details, etc) will live in Firestore.
Your deployment's generated files (audio clips, transcripts, etc.) will live in Filestore.
The audio from the provided video will be processed using Whisper on Google Compute Engine.
- We additionally use Faster-Whisper.

Cost

CDP was created and maintained by a group of people working on it in their free time. We didn't want to pay extreme amounts of money so why should you?

To that end, we try to make CDP as low cost as possible. Many of the current features are entirely free as long as the repo is open source:

Free Resources and Infrastructure:

Event Processing (GitHub Actions)
Event and Legislation Indexing (GitHub Actions)
Web Hosting (GitHub Pages)

The backend resources and processing are the only real costs and depend on usage. The more users that use your web application, the more the database and file storage cost. The CDP-Seattle monthly averages below are for the most utilized months of its existence so take these as close to upper-bounds.

Billed Resources and Infrastructure:

Cloud Firestore Pricing CDP-Seattle monthly average: ~$40.00
Google Storage Pricing CDP-Seattle monthly average: ~$1.00
Google Compute Engine Pricing (Using n1-standard-4 in us-central) CDP-Seattle monthly average: ~$20.00

Total Average Monthly Cost: $61.00

This is the ongoing cost of storing new meetings as they occur once your instance is deployed. You may have an additonal upfront cost if you are seeding your database with older videos and using speech-to-text to transcribe them.

Future Processing Features

As we add more features to CDP that require additional processing or resources we will continue to try to minimize their costs wherever possible. Further, if a feature is optional, we will create a flag that maintainers can set to include or exclude the additional processing or resource usage. See Upgrades and New Features for more information.

Upgrades and New Features

In general, all upgrades, bugfixes, new features, and more will be delivered to your CDP repository via Dependabot.

After releasing a new version of cdp-backend or cdp-frontend, GitHub and Dependabot will automatically create a pull request to your instance repository which updates the version requirements of the pipelines, infrastructure, and/or web application.

These pull requests will contain the release notes for the each version that it upgrades through, i.e. if it upgrades from 3.0.7 to 3.0.9, it will contain the release notes for both 3.0.8 and 3.0.9. This should help you as a maintainer understand what each upgrade is fixing or adding.

An example of such an automated pull request can be seen here.

Finally, in the case that an upgrade requires some additional work for the maintainer, i.e. "regenerate the latest cookiecutter," or "run this script" -- we will explicitly say so in our release notes. Those additional tasks are usually quite simple we just haven't fully automated them yet.

An example of why we may ask for the maintainer to run a script after merging, would be to backfill the data needed for a new feature. For example, if we update our data model to allow for some new feature, data moving forward may be fine but data from the past will be missing values and it may be optional but recommended to run the backfill script to have the new feature available for all historical data.

Citation

If you have found CDP software, data, or ideas useful in your own work, please consider citing us:

Brown et al., (2021). Council Data Project: Software for Municipal Data Collection, Analysis, and Publication. Journal of Open Source Software, 6(68), 3904, https://doi.org/10.21105/joss.03904

@article{Brown2021,
  doi = {10.21105/joss.03904},
  url = {https://doi.org/10.21105/joss.03904},
  year = {2021},
  publisher = {The Open Journal},
  volume = {6},
  number = {68},
  pages = {3904},
  author = {Eva Maxfield Brown and To Huynh and Isaac Na and Brian Ledbetter and Hawk Ticehurst and Sarah Liu and Emily Gilles and Katlyn M. f. Greene and Sung Cho and Shak Ragoler and Nicholas Weber},
  title = {{Council Data Project: Software for Municipal Data Collection, Analysis, and Publication}},
  journal = {Journal of Open Source Software}
}

License

MIT

cookiecutter-cdp-deployment's People

Contributors

Stargazers

Watchers

Forkers

tohuynh isometimescode isaacna chainsawriot nniiicc smai-f rajataggarwal91 whargrove gregoryfoster

cookiecutter-cdp-deployment's Issues

[Instance]: Denver

Municipality Name

Denver

Governing Body Type

city council

Maintainer GitHub Name

JacksonMaxfield

Legistar Client Id

denver

Municipality Timezone

America/Denver

Municipality Slug

No response

Firestore Region

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Create Github Action Bot 2 for CDP Instance Creation

Use Case

Step 3 of the automated process to optimize the CDP instance creation

Solution

Github action bot that creates the repo

Triggered by comment we add when we see
- bot 1 success message and
- user comment choosing to proceed
Adds files to hosting repo
Adds maintainer as collaborator to CDP repo
Adds success comment when completed with hosting URL

Update event processing Github workflow

Use Case

CDP backend had a large refactor, and one of the changes was that the inputs to the event processing script changed. The arguments to the script now include a config file and multi-threading toggle.

Solution

We should update the github workflow to reflect the above changes.

Create Github Issue Template for CDP Instance Creation

Use Case

Step 1 of the automated process for optimizing the CDP instance creation

Solution

Github issue template with the fields the user must fill out in order for us to kick off the process

Info about prerequisites: How they plan to pay for the cost of the instance
Link to documentation about how to figure out if your city uses Legistar + where to find Legistar ID
Required field: Municipality

Current process also asks for municipality slug and python municipality slug but we can combine all those into this single field
Required field: Maintainer name
Required field: Hosting Github repo name
Current process also asks for Github URL but we can combine the username and repo fields to get the URL
Optional field: Legistar client ID

[Instance]: Seattle

Municipality Name

Seattle

Governing Body Type

city council

Maintainer GitHub Name

JacksonMaxfield

Legistar Client Id

seattle

Municipality Timezone

America/Los_Angeles

Municipality Slug

No response

Firestore Region

No response

Code of Conduct

I agree to follow this project's Code of Conduct

[Instance]: Philadelphia

Municipality Name

Philadelphia

Governing Body Type

city council

Maintainer GitHub Name

JacksonMaxfield

Legistar Client Id

phila

Municipality Timezone

America/New_York

Municipality Slug

No response

Firestore Region

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Update instance workflows with new Google Credentials Action

All of our instances need to have some workflows upgraded, every GitHub Action that uses google creds setup has this deprecation warning:

 Warning: google-github-actions/setup-gcloud is pinned at "master". We strongly advise against pinning to "@master" as it may be unstable. Please update your GitHub Action YAML from:

    uses: 'google-github-actions/setup-gcloud@master'

to:

    uses: 'google-github-actions/setup-gcloud@v0'

Alternatively, you can pin to any git tag or git SHA in the repository.
/usr/bin/tar xz --warning=no-unknown-keyword --overwrite -C /home/runner/work/_temp/687e7ca6-695c-41a1-a812-811c67aed9da -f /home/runner/work/_temp/2267538c-ec54-464b-bc5b-4ad56fbd0a73
Successfully set default project


Warning: "service_account_key" has been deprecated. Please switch to using google-github-actions/auth which supports both Workload Identity Federation and Service Account Key JSON authentication. For more details, see https://github.com/google-github-actions/setup-gcloud#authorization

So this is both, upgrade all of our setup-gcloud to @v0 and move to using a different JSON / service acount key auth

Add Process Special Event Workflow

Feature Description

A clear and concise description of the feature you're requesting.

Ability to process a file / video entirely using GitHub Actions.

Use Case

Please provide a use case to help us understand your request in context.

No need to install or setup cdp-backend or other credentials, anyone with admin creds to the repo actions could trigger a job to process a file (special event, forum, debate, etc.) this would make it easy for multiple people to help add events to the system.

Solution

Please describe your ideal solution.

@isaacna added a script to cdp-backend that allows for processing a local file (debate, forum, special event, etc) and storing to infrastructure.

Add a GitHub Action that utilizes the bin script but also downloads the remote file to the local runner before hand. As @isaacna's script assumes the file is local to begin with iirc.

Similar to other scripts, for the job to be available for use it must also have a dummy "run on push" trigger just to make it available in the GitHub UI.

Alternatives

Please describe any alternatives you've considered, even if you've dismissed them.

Update "receiving backend and frontend updates" section of README

I think the README docs are okay in general but can be improved for sure. Would be great to talk to recent additions to the scraper dev team and see what does and doesn't make sense in the README.

I know specifically we should update the receiving updates part of the README because we have dependabot working really well now.

Add option to choose infrastructure server locations

Use Case

Please provide a use case to help us understand your request in context

Currently the CDPStack object in the infra directory defaults to using us-west2 for all infrastructure creation but it would be nice if we added a cookiecutter option with a list of server locations or user enters their own.

Solution

Please describe your ideal solution

Add a cookiecutter list option for: infrastructure_location with options such as:

[
    "us-west1",
    "us-west2",
    "us-east1",
]

etc.

Will need to pass the chosen option down to the CDPStack object.
List of Firestore supported regions

Alternatives

Please describe any alternatives you've considered, even if you've dismissed them

[Instance]: Jacksonville

Municipality Name

Jacksonville

Governing Body Type

city council

Maintainer GitHub Name

evamaxfield

Legistar Client Id

jaxcityc

Municipality Timezone

America/New_York

Municipality Slug

No response

Firestore Region

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Potentially add timezome to cookiecutter parameters

Up for discussion, @tohuynh is moving from moment.js to our own datetime localization functions for a whole bunch of reasons, as a part of that, there is the addition of a timezone property in the website config, I propose that the timezone should simply be a cookiecutter parameter.

Switch from Pulumi to Terraform

This is a large-ish project and involves some work on the https://github.com/CouncilDataProject/cdp-backend side of things.

Convert the CDPStack object from using Pulumi to using Terraform CDK
https://github.com/hashicorp/terraform-cdk
This repo specifically needs to come up with a way to initialize a GCS backend, so that the main CDP stack can simply operate using that backend. See an example of GCS backend in typescript: here

The goal of this work is to remove extra account setup and management; if we remove Pulumi in favor of Terraform, not only are we using a more "stable" / widely adopted tool, but we also can remove more "admin" work of account management in favor of simply a single tool.

Plus we (CDP) has an issue in on the Terraform google provider right now to add a firestore security rules resource, if we switch, we will get that resource faster whenever it is resolved. Until it is resolved we can use terraform local exec which Pulumi doesn't have a mirror for.

Steps to complete this issue:

Rewrite the Pulumi code to Terraform CDK code (this should be decently easy), BUT, adopt a GCS backend for state storage in the cdp-backend infrastructure module
Come up with strategy for having CI/CD on cookiecutter generated repos deploy a GCS backend, THEN deploy infra
Come up with CI/CD workflow for synthesizing the CDK code -> terraform, then init, validate, apply (this is also easy)
Update docs on cdp-backend dev deployment information

Create Generalized Legistar Scrapers

Use Case

Currently each new CDP deployment would need to create their own event scraper.

Solution

Create a generalized Legistar scraper since Legistar is the most commonly used legislation tracking tool and reduce the work required to deploy a new CDP instance in those cities

[Instance]: Phoenix

Municipality Name

Phoenix

Governing Body Type

city council

Maintainer GitHub Name

JacksonMaxfield

Legistar Client Id

phoenix

Municipality Timezone

America/Phoenix

Municipality Slug

No response

Firestore Region

No response

Code of Conduct

I agree to follow this project's Code of Conduct

[Instance]: Alameda

Municipality Name

The City of Alameda

Governing Body Type

city council

Maintainer GitHub Name

phildini

Legistar Client Id

alameda

Municipality Timezone

America/Los_Angeles

Municipality Slug

alameda

Firestore Region

No response

Code of Conduct

I agree to follow this project's Code of Conduct

edit by @JacksonMaxfield, testing if this instance can be deployed with new LegistarScraper setup

Instance Deployment Process Language Changes

Feature Description

Text changes for the comments created during the CDP deployment process for greater user clarity

Use Case

External users who are trying to create a new CDP deployment may get confused during the process due to some of the current language used

Solution

Bot 1 comment changes

"Prior to triggering this deployment" -> "To proceed with the deployment process, please do the following"
Add "in cdp-backend/dev-infrastructure" to the commands that must be run there
'Comment "/cdp-deploy"' -> 'Comment "/cdp-deploy" on this issue'

Bot 2 comment changes

adding this to the description: "The instance is setting itself up right now and the process will take around 10 minutes to complete. Once completed, a CDP maintainer will comment on this issue with your instance's website link. See (link) for more details on the deployment setup progress. Your CDP instance will be populated with data within 6 hours of website creation."
"Deployment Status - Completed" -> "Deployment Status - Repository Created"
adding a step in Final Steps saying to manually add a comment with "Deployment Status - Completed" and the instance link when it is fully set up

[Instance]: Charlotte

Municipality Name

Charlotte

Governing Body Type

city council

Maintainer GitHub Name

evamaxfield

Legistar Client Id

charlottenc

Municipality Timezone

America/New_York

Municipality Slug

No response

Firestore Region

No response

Code of Conduct

I agree to follow this project's Code of Conduct

rerunning bot

[Instance]: Milwaukee

Municipality Name

Milwaukee

Governing Body Type

city council

Maintainer GitHub Name

evamaxfield

Legistar Client Id

milwaukee

Municipality Timezone

America/Chicago

Municipality Slug

No response

Firestore Region

No response

Code of Conduct

I agree to follow this project's Code of Conduct

[Instance]: King County

Municipality Name

King County

Governing Body Type

county council

Maintainer GitHub Name

JacksonMaxfield

Legistar Client Id

No response

Municipality Timezone

No response

Municipality Slug

No response

Firestore Region

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Configure .github/dependabot.yml

Use Case

Please provide a use case to help us understand your request in context

To both act as a method for dissemination of feature updates on cdp-backend and cdp-frontend AND to act as an okay method for getting GitHub actions running.

Solution

Please describe your ideal solution

Configure .github/dependabot.yml to pull in new releases actively, and since the only dependencies on the backend and frontend are our own repos, the only dependabot PRs should be directly from us.

Alternatives

Please describe any alternatives you've considered, even if you've dismissed them

[Instance]: Test Deployment

Municipality Name

Test Deployment

Maintainer GitHub Name

JacksonMaxfield

Legistar Client Id

No response

Municipality Timezone

No response

Municipality Slug

No response

Firestore Region

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Add details about setting up Firebase JS connection to generated README

The user should simply need to fill in the parameters to this function: https://firebase.google.com/docs/reference/js/firebase#initializeapp

With the values they see on the Firebase general project settings page
(i.e. https://console.firebase.google.com/u/0/project/cdp-seattle/settings/general)

[Instance]: King County

Municipality Name

King County

Maintainer GitHub Name

JacksonMaxfield

Legistar Client Id

No response

Municipality Timezone

No response

Municipality Slug

No response

Firestore Region

No response

Code of Conduct

I agree to follow this project's Code of Conduct

[Instance]: Dallas

Municipality Name

Dallas

Governing Body Type

city council

Maintainer GitHub Name

JacksonMaxfield

Legistar Client Id

cityofdallas

Municipality Timezone

America/Chicago

Municipality Slug

No response

Firestore Region

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Paper.md - log of pushed changes

Abstract

I added content to provide context for why the software exists and what problem it is trying to solve
Can probably simplify even more

Statement of need

This is a necessary section header in the paper - the paper won't validate against https://whedon.theoj.org/ without this section
I added a request for describing the instance concept in CDP at the beginning
Each of the tool descriptions needs to be more specific - see specific notes in the draft just pushed

Related work

- Write one very small paragraph about on going research with CDO - justice interrupted, Zoom distanced meetings, - Cite Brown + Weber iConf paper too

References

We should cite other work that is close but not similar - eg
- Taminiau, J., & Byrne, J. (2020). City‐scale urban sustainability: Spatiotemporal mapping of distributed solar power for New York City. Wiley Interdisciplinary Reviews: Energy and Environment, 9(5), e374. (which makes use of councilmatic)
- https://joss.theoj.org/papers/10.21105/joss.00411 (a joss paper (the only one) that transforms open gov data

Two other things:

Read the submission requirements one last time before you write https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain
When in doubt be succinct - this is as much about a paper as it is about creating long-form metadata for us to cite the software in future papers

Add `from_dt` and `to_dt` options to event gather pipeline action

Feature Description

With the addition of from_dt and to_dt as options to the event gather script in cdp_backend, we should add this in the github actions workflow

Solution

Some options:

Add as section in the readme detailing how to add these optional params in the github action file
Add another config file with extra options for script args
Add these as cookiecutter field options

JOSS Publication Draft

The criteria for the paper submission can be found here: https://joss.readthedocs.io/en/latest/submitting.html

I added a paper template with to this repo as papers.md

For author info:

Add affiliations for @tohuynh and @isaacna
Add ORCID (if you don't have one go get one here)

[Instance]: San Jose

Municipality Name

San Jose

Governing Body Type

city council

Maintainer GitHub Name

evamaxfield

Legistar Client Id

sanjose

Municipality Timezone

America/Los_Angeles

Municipality Slug

No response

Firestore Region

No response

Code of Conduct

I agree to follow this project's Code of Conduct

rerun

[Instance]: Portland, OR

Municipality Name

Portland

Governing Body Type

city council

Maintainer GitHub Name

dphoria

Legistar Client Id

No response

Municipality Timezone

America/Los_Angeles

Municipality Slug

portland

Firestore Region

us-west1

Code of Conduct

I agree to follow this project's Code of Conduct

Replace references to councildataproject.github.io with cookiecutter gh user

Description

A clear description of the bug

In general there has been an assumption that this organization (CouncilDataProject) manages all deployments but that isn't the case. There needs to be thought put into how assumed repo naming and bugs that are exposed when someone uses this template and uses their own org / account for publish.

Specifically this issue addresses all the various councildataproject.github.io strings in the template as they should be replaced with the {{ cookiecutter.gh_user_or_org }}.

I.E. if I (JacksonMaxfield) create a CDP instance under my own account it would be jacksonmaxfield.github.io/example.

Which leads to the naming problem. Should the repo name be customizable as a part of the cookiecutter process.

I.E. jacksonmaxfield.github.io/cdp-example (or jacksonmaxfield.github.io/cdp-seattle)

I made the assumption that just the municipality slug with append the URL but I made that assumption thinking they would all be prefexed by councildataproject so without it, it may be confusing.

To Resolve

Once we have a working web app that pulls data from the database, someone use this template and use their own GH account for publishing, fix all issues in the template until it fully publishes.

Upgrade web building and testing to node 16

We moved the cdp-frontend library to node 16 in CouncilDataProject/cdp-frontend#212

That change should be mirrored here.

cc @tohuynh

Store content hash string on the session db model

Feature Description

A clear and concise description of the feature you're requesting.

Currently if we want to look up the content hash for a session we need to go to through any transcript for the session, then the transcripts file ref, then do a split on "-" and take the first part of the split.

Use Case

Please provide a use case to help us understand your request in context.

Would make it much faster to get the audio file for a session if we stored the content hash on the session model itself.

Solution

Please describe your ideal solution.

Update the db model and the pipeline to store the content hash. Also write a script that updates old sessions with their session hashes using the transcript file ref to split process as described above.

Alternatives

Please describe any alternatives you've considered, even if you've dismissed them.

Specify where and how to add analytics

Feature Description

A clear and concise description of the feature you're requesting.

Add documentation to repo / commented out lines of HTML (index.html) that act as the local to add analytics script

Use Case

Please provide a use case to help us understand your request in context.

Support new deployment maintainers if they want to capture analytics

Solution

Please describe your ideal solution.

Prefer https://plausible.io/

Alternatives

Please describe any alternatives you've considered, even if you've dismissed them.

[Instance]: Richmond

Municipality Name

Richmond

Governing Body Type

city council

Maintainer GitHub Name

evamaxfield

Legistar Client Id

richmondva

Municipality Timezone

America/New_York

Municipality Slug

No response

Firestore Region

No response

Code of Conduct

I agree to follow this project's Code of Conduct

rerun

Add example data JSON file(s) to upload using ORMs for staging data.

Use Case

Please provide a use case to help us understand your request in context

Add a script for taking JSON file(s) and uploading them to a database using the upcoming ORMs (See CouncilDataProject/cdp-backend#2).
Specifically this will be used to upload example data to a staging database and bucket.

Solution

Please describe your ideal solution

We potentially need to have several sets of examples datasets.

Entirely minimal dataset, looking through the database models, only upload data that is required
Minimal + random sample of optional values
All database model fields filled

Because of multiple levels of dataset, it may be best to have a single "all possible data" and then programmatically create the other levels of the dataset by just removing keys from the dataset when they are not required.

There should additionally be tests that the example data stored in the repo is always matching the current models.

Alternatives

Please describe any alternatives you've considered, even if you've dismissed them

There is a potential argument to be made that this dataset is entirely programmatically created with some randomness to reduce data stored in git. But I think that just gets almost close to just creating a JSON file so seems like extra work.

[Instance]: Chicago

Municipality Name

Chicago

Governing Body Type

city council

Maintainer GitHub Name

JacksonMaxfield

Legistar Client Id

chicago

Municipality Timezone

America/Chicago

Municipality Slug

No response

Firestore Region

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Describe a archival study to be done with CDP infra

For JOSS paper (currently on branch: admin/joss-paper) we need to describe a few examples of how this software can be used for research.

I will add some examples about the chapter sectioning / text segmentation and alignment problem, the overtime allowance project, and maybe one other, but @nniiicc can you either on this issue or on that branch describe a data archival problem that CDP can help solve or contribute to.

[Instance]: Boston

Municipality Name

Boston

Governing Body Type

city council

Maintainer GitHub Name

JacksonMaxfield

Legistar Client Id

boston

Municipality Timezone

America/New_York

Municipality Slug

No response

Firestore Region

No response

Code of Conduct

I agree to follow this project's Code of Conduct

trying with recent setup-python v3 action change -- again

Add documentation to generated repo's README with API connection details and examples in Python

Really this is just "show how to connect with FireO with the infrastructure slug"

A quickstart would be nice.

Install on multiple python environments

Description

Currently the GH workflow is pinned to python 3.9. It's causing some problems in my environment.

Record video of cookiecutter usage and add to README

Use Case

Please provide a use case to help us understand your request in context

Have a working example of the setup process as a video for easier digest of how to create an instance

Solution

Please describe your ideal solution

Setting up of:

new google account
google project
google / github service account (role: Owner)
pulumi account
pulumi service account
run cookiecutter
setup github repo
git init and push
wait for actions to build
set gh-pages

[Instance]: Albuquerque

Municipality Name

Albuquerque

Governing Body Type

city council

Maintainer GitHub Name

evamaxfield

Legistar Client Id

cabq

Municipality Timezone

America/Denver

Municipality Slug

albuquerque

Firestore Region

No response

Code of Conduct

I agree to follow this project's Code of Conduct

rerunning

Break the deployment cookiecutter

As one of the main goals of the project is to be as easily deployable as possible, we need to continuous "stress test" our deployment process. Any and all comments on what is good, bad, confusing, or even not-helpful to documenting and actually carrying out a deployment process should be placed as comments on this GitHub Issue.

In short, you should start at one-of (your choice):

And attempt to fully create your own repo with a get_events function and has all GitHub Actions succeed.

Do not be afraid to be critical and / or nitpicky. This is exactly what we want.

Create GH Action to utilize `get_events` function and upload fake events during EventPipeline run

Use Case

Please provide a use case to help us understand your request in context

To ensure that there is always some example data for use for testing or just toying around.

Solution

Please describe your ideal solution

After CouncilDataProject/cdp-backend#11 is merged and #2 is resolved, adding a bin script or function or something to take a JSON or JSONs and upload them to some CDP database. No need for any pipeline really just rip through and upload.

Alternatives

Please describe any alternatives you've considered, even if you've dismissed them

[Instance]: Denver

Municipality Name

Denver

Governing Body Type

city council

Maintainer GitHub Name

JacksonMaxfield

Legistar Client Id

denver

Municipality Timezone

America/Denver

Municipality Slug

No response

Firestore Region

No response

Code of Conduct

I agree to follow this project's Code of Conduct

[Instance]: Fort Worth

Municipality Name

Fort Worth

Governing Body Type

city council

Maintainer GitHub Name

evamaxfield

Legistar Client Id

fortworthgov

Municipality Timezone

America/Chicago

Municipality Slug

No response

Firestore Region

No response

Code of Conduct

I agree to follow this project's Code of Conduct

rerunning

Consider explicitly pinning cdp-backend and cdp-frontend versions

Feature Description

We should consider explicitly pinning the package versions for cdp-backend and cdp-frontend.

Use Case

I noticed that dependabot wasn't creating PR's in our deployments. I checked the dependabot run, and it does detect the new package versions correctly, but doesn't generate a PR. According to this post, it's because we use ~= instead of explicitly pinning the version with ==. In this case, dependabot isn't really doing anything.

According to the following posts, explicit pinning favors reproducibility over re-usability. However, with dependabot I think that the re-usability disadvantage of explicit pinning is reduced.

I think for the vast majority of our dependencies, we can keep ~=, but for cdp-backend and cdp-frontend we should pin them with == because they're pretty core to the deployment, and a bug in a version bump could propagate to all deployments when re-installing packages during each GH action run. Dependabot would then create PR's for the core package bumps.

Solution

Add explicit pins on cdp-backend and cdp-frontend.

Add dependabot upgrades for infra

Dependabot isn't configured for infra section of generated repo https://github.com/CouncilDataProject/cookiecutter-cdp-deployment/blob/main/%7B%7B%20cookiecutter.hosting_github_repo_name%20%7D%7D/.github/dependabot.yml

Simply add the details for infra for cdp-backend

Create Github Action Bot 1 for CDP Instance Creation

Use Case

Step 2 of the automated process to optimize the CDP instance creation

Solution

Github action bot that takes the data from the issue template and tries to run the event scraper

Triggered by: /cdp-check-form comment added by us or user

Adds comment to the issue about the minimum amount of data required for CDP
- This comment should only be added during the first run and should not be re-added if the user reruns bot 1 to test a scraper
If Legistar client ID not provided, adds comment about how to write a custom scraper
- This comment should only be added during the first run and should not be readded if the user reruns bot 1 to test a scraper
Attempts to run event scraper using cdp-backend dry run test
- If successful (minimum data returned), adds a comment with
- Success message
- Example of event data found for user to verify it looks good
- if unsuccessful, adds a comment with
- the error message
- If it’s a Legistar connection issue, add next step of contacting city clerk to verify Legistar ID and access permissions
- info on how to create a custom scraper/modify the existing scraper and ask to rerun bot 1 to test completed scraper