The compute-studio from compute-tooling

Add links to TOS, privacy policy

This issue will describe where we need to place links to our TOS, privacy policy, and developer license agreement (if we end up needing one).

Assigned to Matt to finish writing the issue, and then I'll switch assignment to Hank.

Allow users to own segmented specifications

Users currently own simulations; we could also allow them to own specifications.

It would be helpful for spcs to be segmentable at the full spec, section, subsections, and parameter levels.

For example, I could save (and name) a spec for an EITC maximum credit, or the EITC section, or Refundable Credits, or (soon), Individual Income Tax, or Policy, or Behavior (and its children), or the full specification including both Policy and Behavior.

The spec I save would show up in a My Specs (or similar) section of my home page

A following enhancement would be to allow specification segments to be swapped in and out elegantly in the GUI. The first iteration could be a "upload a specification" option that adds the specification as an adjustment to the existing (possibly user-modified) specification, and overwriting any merge conflicts.

Interested in @hdoupe's feedback sometime. For now adding a speculative label.

Download results csv titles issue

I ran a Tax-Calculator simulation today on CompModels.org. I love the Bokeh plot that is included. However, I downloaded the results, and all the csv files that came in the zip file were missing titles. I had to download the json results in order to figure out the correspondence between the filenames and the csv titles. There must be a way to have the titles included in the csv files. The csv's could have more descriptive names, or a crosswalk text file or json could be included with the csv's.

Run COMP without Stripe information

COMP needs to be able run without stripe. I investigated using the stripe-mock project, but it isn't advanced enough to accommodate our needs. It does not keep state and simply serves up static JSON responses. This project may become more useful to us as it progresses or as our stripe related needs change.

Another approach to this issue is to configure COMP so that it can run locally without the billing app. This should be feasible. To accomplish this:

Users shouldn't be subscribed to plans on signup
Usage reports shouldn't be sent to stripe after model runs
Stripe Products shouldn't be created when the database is initialized from the billing.json config file
Stripe related tests should be marked with something like requires_stripe. That way users can run the test suite without any issues either

My biggest concern is that this will mean that people without Stripe access will be developing on a project that differs from the production app in some sensitive areas. On the other hand, I'd expect people who are developing without stripe access to be working in areas that do not affect the billing app. Further, the billing app does have a test suite that will be run with the stripe access tokens before the production app is deployed.

Queuing counter is broken

The celery queue length metric always returns 0, which in turn throws off the compute ETA estimates. I've spent a considerable amount of time trying to fix this but have not had any luck so far. It seems that the tasks are consumed as soon as they are queued which means that the queue is always of length 0. Another solution for this could be to keep a list of pending task ID's on each app in the redis store. When jobs are submitted to the workers, they would be pushed to the job list and when they finish, they would be removed from the job list. This could be thrown off if there is some kind of worker failure where the task is never removed from the queue. Perhaps, this problem could be resolved by cross-referencing the job lists with the info in the celery inspect module.

http -> https

www.compmodels.com should accommodate https.

Message for the waiting page

This might be a nice message for people to think about while they wait for their sims:

This is a public simulation. Thank you for contributing to COMP’s free database of simulation results.

When your simulation is complete, it will be published at www.compmodels.org/_____.

Allow others to see my username associated with this simulation.
Notify me on Twitter (we will at you from @compmodels).
Notify me by email.

COMP File storage error

sim:

traceback:

Traceback (most recent call last):
  File "/home/distributed/api/celery_app/__init__.py", line 79, in f
    outputs = s3like.write_to_s3like(task_id, outputs)
  File "/opt/conda/lib/python3.7/site-packages/s3like/__init__.py", line 139, in write_to_s3like
    buff, OBJ_STORAGE_BUCKET, ziplocation, ExtraArgs={"ACL": "public-read"}
  File "/opt/conda/lib/python3.7/site-packages/boto3/s3/inject.py", line 539, in upload_fileobj
    return future.result()
  File "/opt/conda/lib/python3.7/site-packages/s3transfer/futures.py", line 106, in result
    return self._coordinator.result()
  File "/opt/conda/lib/python3.7/site-packages/s3transfer/futures.py", line 265, in result
    raise self._exception
  File "/opt/conda/lib/python3.7/site-packages/s3transfer/tasks.py", line 126, in __call__
    return self._execute_main(kwargs)
  File "/opt/conda/lib/python3.7/site-packages/s3transfer/tasks.py", line 150, in _execute_main
    return_value = self._main(**kwargs)
  File "/opt/conda/lib/python3.7/site-packages/s3transfer/upload.py", line 692, in _main
    client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
  File "/opt/conda/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/opt/conda/lib/python3.7/site-packages/botocore/client.py", line 661, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (ServiceUnavailable) when calling the PutObject operation (reached max retries: 4): Service is unavailable at this time

Allow both baseline specification and alternative specification to be edited

For Tax-Brain and similarly structured simulation applications, it would be valuable to be able to specify both the baseline specification and the alternate specification.

From a user interface perspective, I think this could be accomplished by adding big tabs to the top of the parameter section that can be edited, and maybe change the color of the section popout when switching back and forth between the baseline and alternate.

We should also probably think of a user as owning specifications as well as simulations, and, for instance, add "my specifications" to the users' home page.

When we print logs of modified parameters, we'll want to separate out baseline and alternate.

On adjusted alternate specifications, we'll want to tell the difference from default and the difference from the user-set baseline, which may have been adjusted.

cc @andersonfrailey @hdoupe

Chrome browser tells me going to compmodels.com is unsafe

I'm assuming one of the main entry paths to the TaxBrain web app is from the lower-right-hand icon on the following page at the OSPC "Portfolio" page:

When people using the Chrome browser click on that icon, they get the following page:

@hdoupe, I don't think this is the impression you want to be giving people.

@MattHJensen @andersonfrailey

Expandable sections and subsections in parameter navbar

Based on @donboyd5's recommendations to make the parameters more visible to show off the features and capabilities of the models, I think a useful step would be to add expandability to the parameter navbar in the left column of the parameter inputs page to include parameters as well as sections and subsections. My inclination is to keep the default level of expansion the same because it could be overwhelming to see the full list by default.

@hdoupe, what do you think?

(edit, see following comment)

Tax-Calculator Style parameters support

The Tax-Calculator error messages are saved with the following structure:

{
    "warnings": {//same as errors below},
    "errors": {
        "param name": {
            "some year": "error message string",
            "another year": "error message string",
        },
        "another param name": {
            "some year": "error message string",
            "another year": "error message string",
        },
    }
}

The error messages are then added to the form as a parameter name: string error message for each error message on that parameter. COMP expects there to be a single parameter name: list of error strings for each parameter. The result is this:

I plan to push a bug fix that resolves this issue in particular. However, it highlights too larger underlying issues:

Tax-Calculator parameter and error message structure is defined primarily by the taxcalc/policy_current_law.json file in the Tax-Calculator repo. But, this is subject to changes that are outside of COMP developers' control. My policy thus far is to mirror that structure since there is no spec that defines the structure of that file. Since there is no spec that projects need to keep up with, several flavors of the Tax-Calculator style inputs structure have developed (Tax-Brain, OG-USA). This puts some pressure on COMP to remain compatible with them all. So far, this has been easy, but that could change.
Thoroughly testing the contrib.taxcalcstyle package is difficult unless taxcalc is installed. I will be looking into ways that this issue can be solved over the next couple weeks.

Show "Change" tab by default

The aggregate results tab by default shows the Current Law tab. I'd suggest showing the Change tab by default, as TaxBrain does.

Initial set of UI improvements

@MattHJensen has provided a list of UI improvements:

Am I missing anything? If I am or if you have any other ideas for how things could be made better, feel free to drop them in this issue. I'll see how far I can get with these today.

commit to do list to repository?

What do you think about using roadmap.md to track to dos (and maybe OKRs), rather than the list of open GH issues?

It seems like content this important should be version controlled.

We could open PRs to add or annotate items, and we can discuss the additions/annotations in the PR discussions.

We could use markdown's headline tags to assign hierarchy.

This approach may give the community some ownership over planning and encourage them to make contributions to the code base.

It would force those with suggestions to consider prioritization.

Issues would be reserved for bug fixes and discussions like this one.

technical documentation link in README.md is broken

The link below:

is broken

Use Kind for integration tests

While working on #230, I looked into using Kind for testing the compute cluster. Due to how interrelated all of the different services are, it's difficult to test each component individually. Testing them individually would involve mocking out all kinds of services and callbacks and would be more trouble than its worth. It would also miss out on testing the interactions between the different components. Kind makes it possible to spin up a kubernetes cluster (or something that looks like it) within a single docker container.

This would make it possible to spin up the full C/S system, including the webapp, without having to spin up nodes on GKE, push development images to an online container registry and do a full deploy. Doing this entire process makes the development feedback loop on the compute cluster extremely slow and painful. Kind would also make it possible to add automated tests for each PR which is something that C/S badly needs.

Preserve selections when toggling another selection for table

I'm seeing this on Tax-Brain but guessing it has to do with Compute Studio. Steps to repro:

Open a Tax-Brain reform results page, e.g. https://compute.studio/PSLmodels/Tax-Brain/44403/
Under "Tables" toggle from "Distribution Table" to "Differences Table"
Switch from 2020 to 2021

Expected: Sticks with differences table.
Actual: Reverts to distribution table.

Note that it remembers the selection of differences table if going back to 2020.

Move + box to left side in My Simulations panel

When I arrived at My Simulations, my first natural click was to PSLModels/TaxBrain, but that took me to a new simulation page. The + box on the right was what I was looking for to see my simulations. I'd personally find it more intuitive to move this to the left side to get users to the main purpose of the page more quickly.

Add preview app

This would allow publishers to view their inputs and outputs before initially publishing on compute.studio or before pushing updates to existing models on compute.studio. I'm thinking of a single page app where you can upload a JSON file created by a get_inputs function and a JSON file created by a run_model function.

not sure chat room link is working as intended

The chat room link:

Produces this result:

Maybe it's intended, or maybe it should be changed. If it is intended, newcomers like me may need a hint about what to do next.

PNG files on Compute Studio Pages

Right now, Compute Studio reads in a PNG file from memory to display on the web.

Is CS ok in caching all these PNG files separately from the HTML files for the results pages?

If not, one solution would be to encode the PNG and add that to the directly to the HTML.

Minor improvements to COMP interface

A few suggestions to improve COMP's interface:

Bug - If a user changes a dropdown input parameter, runs the model, then visits the edit inputs page, the user cannot edit the dropdown parameter that they originally changed.
Show the hover-over "i" button on the inputs page only when the parameter has a description.
Give model publishers the option to have a section of input parameters closed by default on the inputs page.
On the edit inputs page, consider changing the first section title from "Model Parameters as JSON" to "Parameter Adjustments" or similar to avoid using jargon.

Tax-Calculator run time on Comp is long

I ran a simple simulation today of Tax-Calculator module on CompModels.org. I increased the maximum taxable income for the payroll tax to $200,000. I was surprised that it took nearly 5 minutes to run this Tax-Calculator simulation. This seems like a significant (~2.5x) slowdown compared to the old TaxBrain application. Is this a function of the new platform using slower hardware, pipelines, etc.? Or is there some new complexity in Tax-Calculator that requires more compute time?

alert box shows "close" instead of "ok" on Safari

It would be nice if this were "Ok" instead of "Close", like it is on Chrome.

When I see this my eye is first drawn to "Close", and so before I read the message I think it wants me to close the site and there is something wrong.

Tax-Brain sensitive data file

PSLmodels/Tax-Brain can be run with two files: a less accurate but public file and a more accurate but private file. This issue proposes an approach for giving COMP access to the private file.

The private file can be stored in a S3 bucket which is under the Open Source Policy Center's AWS account. Then, the Open Source Policy Center would give COMP read access to this account using a similar approach to how PaperTrail handles storing log data in an S3 bucket of the owner's choosing. On each simulation, COMP would read the data like this:

import gzip
import boto3
import pandas as pd

client = boto3.client(
    "s3", 
    aws_access_key_id=aws_access_key_id,
    aws_secret_access_key=aws_secret_access_key
)
obj = client.get_object(Bucket='bucket-name', Key='path/to/file.csv')
gz = gzip.GzipFile(fileobj=obj['Body'])
data = pd.read_csv(gz)

Note that the sensitive data is never stored and is streamed directly to a pandas dataframe. Alternatively, the data stream could be passed directly to Tax-Brain and Tax-Brain could handle loading it into a pandas dataframe.

This data can then be passed to Tax-Brain in the simulation function like this:

    import taxbrain

    def run(start_year, data_source, use_full_sample, user_mods, puf_file):
        return taxbrain.tbi.run_tbi_model(
            start_year, data_source, use_full_sample, user_mods, puf_file
        )

@andersonfrailey and @MattHJensen what do you think about this approach?

Create model page as placeholder while publish process is in progress

A simple version of the model inputs page should be displayed at www.compmodels.com/owner/Model-Name until the model publishing process is complete.

Adding buttons to simulation pages for questions and feedback

What do you think about adding optional buttons to the simulation pages for "Questions" and "Feedback"? For PSL models, these could link to the new topic page for the respective Discourse categories.

It will be annoying for the user to sign up for a new site, but the projects ought to appreciate the traffic, and my guess is that C/S rolling our own username-integrated solution would be a distraction

Need to setup automated testing and test coverage

Snappier outputs

The current outputs pages are very large which results in slow load times, "script unresponsive" dialogs, and high amounts of memory use. Instead of rendering all of the data for all of the outputs at once, we could show a thumbnail for each output and only render the output when the user clicks on it.

There are three components to this:

How to generate the thumbnails
How to store the outputs
How to access the outputs efficiently

How to generate the thumbnails

This is a problem that will be solved entirely in the compute-studio-storage project. The gist of it is that jinja2 will be used to render a temporary HTML file for each of the renderable outputs, and pyppeteer will be used to take screenshots of the generated files.

How to store the outputs

Currently, each simulation is stored as two zip files, one for renderable outputs and another for downloadable outputs. These files are all stored in a bucket on Google Cloud Storage. Compute Studio keeps a "remote" result in its database and uses this to find the outputs that it needs:

{
    "outputs": {
        "renderable": {
            "outputs": [
                {
                    "title": "",
                    "filename": ".json",
                    "media_type": "bokeh"
                },
                {
                    "title": "Aggregate Results",
                    "filename": "Aggregate Results.json",
                    "media_type": "bokeh"
                },
                {
                    "title": "Tables",
                    "filename": "Tables.json",
                    "media_type": "bokeh"
                }
            ],
            "ziplocation": "[job_id::uuid]_renderable.zip"
        },
}

We're going to need a PNG or JPG file for each of the renderable outputs, and we will need to be able to access that file individually. To do this, each renderable output needs an ID that can be used as the file name of the corresponding thumbnail:

{
    "outputs": {
        "renderable": {
            "outputs": [
                {
                    "id": "file_id::uuid",
                    "title": "",
                    "filename": ".json",
                    "media_type": "bokeh"
                },
                {
                    "id": "file_id::uuid",
                    "title": "Aggregate Results",
                    "filename": "Aggregate Results.json",
                    "media_type": "bokeh"
                },
                {
                    "id": "file_id::uuid",
                    "title": "Tables",
                    "filename": "Tables.json",
                    "media_type": "bokeh"
                }
            ],
            "ziplocation": "[job_id::uuid]_renderable.zip"
        },
}

Then, each thumbnail will be located at a link like: https://storage.cloud.google.com/cs-outputs-dev/file_id.png and can be rendered using this:

<img src="https://storage.cloud.google.com/cs-outputs-dev/file_id.png">

How to access the outputs

When the page first loads, the JavaScript client will need the "remote" result to get a link to each of the renderable outputs' thumbnails. Next, when the user clicks on one of the thumbnails, the client will download the zipfile containing all of the renderable outputs and extract the selected output from the zip file and render it either on the webpage or in a pop-up window of some sort. Right now, Tax-Brain's zip files download in about 0.7 seconds, and my guess is that extracting the file and rendering it will take under half a second, resulting in about 1.2 seconds of waiting. Once the zipfile is downloaded, it will be cached by the JavaScript client, and the only cost for clicking another output will be extracting it and rendering it. I hope that these operations can be done in half a second or less. I'm not very familiar with using JavaScript with zipfiles; thus, this approach and the time estimations are somewhat theoretical.

Guide question

https://docs.compute.studio/publish/functions has this snippet:

import matchups

def get_inputs(meta_params_dict):
    meta_params = MetaParams()
    meta_params.adjust(meta_params_dict)
    params = MatchupsParams()

Should the MetaParams() and MatchupsParams() have a matchups. prefix?

Sim times are not on EST

Show parameter sub divisions in sidebar

In PSLmodels/Tax-Brain#42, @donboyd5 suggested showing the subsections of parameters in the inputs page sidebar. This will take some care to implement without cluttering up the sidebar, but I wanted to open the issue in the comp-ce repo to keep a record of the feature request.

Liabilities table labels rows as "change" for non-change views

I think the row labels should only say "Change" when the "Change" tab is active. p

login prompt from a sim page should return user to the sim

I'm not logged in
spend some time designing a sim
try to run a sim
get confirmation popup
confirm that I want to run sim
get a prompt to log in
log in
am taken to my dashboard and lose my sim.

Would be preferable to take the user back to the sim they were working on with the run sim confirmation pop up open (step 4, above).

New outputs format

COMP should support a richer set of model outputs, and it should be less opinionated about the outputs it supports. This can be done by relying on the rich ecosystem of projects that specialize in visualizing data like Bokeh and supporting data formats like pictures and videos. For now, these could be pictures, videos, interactive plots, or tables. The models could return this data using the following format:

{
    "renderable": [
        {
            "media_type": "bokeh",
            "title": "some title",
            "data": {
                "javascript": "...",
                "html": "..."
            }
        },
        {
            "media_type": "table",
            "title": "some title",
            "data": {
                "html": "<table>...</table>"
            }
        },
        {
            "media_type": "picture",
            "title": "some title",
            "data": {
                "picture": "some binary picture data",
                "extension": "JPEG/PNG"
            }
        },
        {
            "media_type": "video",
            "title": "some title",
            "data": {
                "video": "some binary video data",
                "extension": "mp4/other"
            }
        }
    ],
    "downloadable": [
        {
            "media_type": "CSV",
            "title": "some title",
            "data": {
                "CSV": "..."
            }
        },
        {
            "media_type": "HDF5",
            "title": "some title",
            "data": {
                "HDF5": "some HDF5 data"
            }
        }
    ]
}

Each output object is going to have the following structure:

{
    "media_type": "picture",
    "title": "some title",
    "data": {
        "picture": "some binary picture data",
        "extension": "JPEG/PNG"
    }
}

This describes the output's type, title, and some arguments that will be needed to parse the data. Internally, an object_storage_link attribute will be added to each object. This will link to its location in an object storage provider like Digital Ocean Spaces or AWS S3.

Note that Bokeh is supported individually, but in the future, it could be supported within a wider category of interactive outputs. My preference is to move into the interactive plots output type slowly while we figure out how things should work.

I plan to dogfood this data format with Matchups over the next few days. @andersonfrailey if you are willing to work with me on this, I can do some experimenting off of your branch that adds a bokeh plot to Tax-Brain (PSLmodels/Tax-Brain#26).

Application error

In running https://www.compmodels.org/PSLmodels/Tax-Brain/41142 (a CPS-based reform eliminating the payroll tax cap), Comp hung at "Estimated 2 minutes remaining" for several minutes. When I refreshed the page, I got this:

And now compmodels.org shows that error on the homepage and for all compmodels pages.

Fogot password bug

I was notified this morning about a bug in Compute Studio's password reset flow where the default django email (example.com) was used instead of compute.studio. When the webapp was first deployed on heroku, I must have neglected to fill this field out. I fixed the bug by updating the site name and restarting the webapp processes. I'm sorry for the inconvenience to users who have been affected by this.

@donboyd5 thank you for the bug report.

Error pages on some Tax-Brain submissions

Some users have reported receiving errors when submitting their inputs on Tax-Brain. This is because their inputs are not being validated and returned by the compute cluster before the request times out. Thus, the page hangs and eventually an error page is shown.

To resolve this, I am going to bump the timeout time on requests from 2.5 seconds to 4 seconds. I am also going to make the process of validating user inputs asynchronous.

This will be similar to the process for actually running the simulation: user submits inputs, a dialog notifies them that the inputs have been submitted and are being validated, and either the simulation will be kicked off or the errors will be shown to the user.

Page load time

The PSLmodels/Tax-Brain page load time feels very slow. COMP has to process about 220 parameters to build up the inputs form for that page. I tinkered with showing a spinner while the page is loading so that the user isn't just staring at a blank page. However, I was unable to have much luck with that approach. It seems like most of the time is spent building the form on the backend and not rendering the form in the browser. One approach that could solve this problem is to load a blank page and then load all of the form data from a REST API call. Some type of loading symbol could be used on the blank page while the form is built and the API call is completed.

Need to add error handling around stripe usage record calls

COMP uses Stripe to handle its billing infrastructure. Right now, the Stripe API is down and results from sims are not saved because an error is thrown on a bad response from the Stripe API.

404 Error for link on Publishing Guide

On the Publishing Guide the link in "The second part documents..." is broken.

user default home page should be "my simulations" not "my models"

currently the default is "my models"

Might also move "my simulations" to the left of "my models".

Make it easy to build and deploy

I think it would be very cool for Compute Studio to have a similar build and deploy process as JupyterHub's zero-to-k8s project. Right now, we are pretty far away from being able to do this. Here's a list of TODO items that need to be completed before we can think about having something comparable:

TODO:

Deploy django webapp on kubernetes
Figure out whether the postgres database should be a managed service offered by the cloud provider or something that is also deployed with k8s
Build compute cluster from the compute.studio Rest API instead of a JSON config file (https://compute.studio/publish/api/ returns a list of projects and their metadata)
Standardize secrets management for all projects across the platform (i.e. what service should store the secrets? How do we get the secrets from the secure service into the k8s config files?)
Automatically create storage bucket for storing results from model runs, automatically handle permissions necessary to do this.

Put it all together

One click deploy platform.

How to insert image into README

I'm trying to insert an image into the README on the publish details. I can embed an image via markdown, but I can't change it's size. I also cannot insert html into the README that would give me more flexibility in sizing and positioning the image.

Are there any suggestions? I thought one could generally use html syntax within a markdown doc.

Any help is appreciated. Thanks!

Testing apps locally

@hdoupe Are there any instructions for testing an app locally? E.g., I'm seeing some formatting issues that I'd like to adjust with the results page for CCC, but don't want to burden you with installing new versions of my packages. If I could follow instructions to run these locally, I could fine tune the formatting before you need to put a new model version on CS.

cc @rickecon

Maintain a list of subprocessors

And add the last sentence from the following paragraph to the paragraph in docs/PRIVACY.md

Information We Collect from Sub-processors: A sub-processor is a third-party data processor engaged by COMP, who has or potentially will have access to or process information you provide to us or we collect automatically. COMP engages different types of data sub-processors to perform various functions. A list of sub-processors engaged by COMP is maintained at https://compmodels.com/privacy/subprocessors.

Numbers in Aggregate Results table are cramped

I'd suggest widening the columns a bit.

Preview tab

I like the Preview tab on the Compute Studio model pages, which shows the JSON format for the parameter changes. This could be useful if the user is running the models from source as well.

However, I doubt most users are doing this and there is not detail about what the Preview tab, making it confusing.

I might suggest moving this from the top to the bottom of the page or, preferably, having an option on the box on the left side to "Download Policy Reform JSON" (or something like that).

compute-tooling / compute-studio Goto Github PK

compute-studio's People

Contributors

Stargazers

Watchers

Forkers

compute-studio's Issues

How to generate the thumbnails

How to store the outputs

How to access the outputs

Recommend Projects

Recommend Topics

Recommend Org