compute-tooling / compute-studio Goto Github PK
View Code? Open in Web Editor NEWA Utility for Sharing Computational Models
Home Page: https://compute.studio
License: Other
A Utility for Sharing Computational Models
Home Page: https://compute.studio
License: Other
This issue will describe where we need to place links to our TOS, privacy policy, and developer license agreement (if we end up needing one).
Assigned to Matt to finish writing the issue, and then I'll switch assignment to Hank.
Users currently own simulations; we could also allow them to own specifications.
It would be helpful for spcs to be segmentable at the full spec, section, subsections, and parameter levels.
For example, I could save (and name) a spec for an EITC maximum credit, or the EITC section, or Refundable Credits, or (soon), Individual Income Tax, or Policy, or Behavior (and its children), or the full specification including both Policy and Behavior.
The spec I save would show up in a My Specs (or similar) section of my home page
A following enhancement would be to allow specification segments to be swapped in and out elegantly in the GUI. The first iteration could be a "upload a specification" option that adds the specification as an adjustment to the existing (possibly user-modified) specification, and overwriting any merge conflicts.
Interested in @hdoupe's feedback sometime. For now adding a speculative label.
I ran a Tax-Calculator
simulation today on CompModels.org
. I love the Bokeh plot that is included. However, I downloaded the results, and all the csv files that came in the zip file were missing titles. I had to download the json results in order to figure out the correspondence between the filenames and the csv titles. There must be a way to have the titles included in the csv files. The csv's could have more descriptive names, or a crosswalk text file or json could be included with the csv's.
COMP needs to be able run without stripe. I investigated using the stripe-mock
project, but it isn't advanced enough to accommodate our needs. It does not keep state and simply serves up static JSON responses. This project may become more useful to us as it progresses or as our stripe related needs change.
Another approach to this issue is to configure COMP so that it can run locally without the billing
app. This should be feasible. To accomplish this:
billing.json
config filerequires_stripe
. That way users can run the test suite without any issues eitherMy biggest concern is that this will mean that people without Stripe access will be developing on a project that differs from the production app in some sensitive areas. On the other hand, I'd expect people who are developing without stripe access to be working in areas that do not affect the billing
app. Further, the billing
app does have a test suite that will be run with the stripe access tokens before the production app is deployed.
The celery queue length metric always returns 0, which in turn throws off the compute ETA estimates. I've spent a considerable amount of time trying to fix this but have not had any luck so far. It seems that the tasks are consumed as soon as they are queued which means that the queue is always of length 0. Another solution for this could be to keep a list of pending task ID's on each app in the redis store. When jobs are submitted to the workers, they would be pushed to the job list and when they finish, they would be removed from the job list. This could be thrown off if there is some kind of worker failure where the task is never removed from the queue. Perhaps, this problem could be resolved by cross-referencing the job lists with the info in the celery inspect
module.
www.compmodels.com should accommodate https.
This might be a nice message for people to think about while they wait for their sims:
This is a public simulation. Thank you for contributing to COMP’s free database of simulation results.
When your simulation is complete, it will be published at www.compmodels.org/_____.
sim:
traceback:
Traceback (most recent call last):
File "/home/distributed/api/celery_app/__init__.py", line 79, in f
outputs = s3like.write_to_s3like(task_id, outputs)
File "/opt/conda/lib/python3.7/site-packages/s3like/__init__.py", line 139, in write_to_s3like
buff, OBJ_STORAGE_BUCKET, ziplocation, ExtraArgs={"ACL": "public-read"}
File "/opt/conda/lib/python3.7/site-packages/boto3/s3/inject.py", line 539, in upload_fileobj
return future.result()
File "/opt/conda/lib/python3.7/site-packages/s3transfer/futures.py", line 106, in result
return self._coordinator.result()
File "/opt/conda/lib/python3.7/site-packages/s3transfer/futures.py", line 265, in result
raise self._exception
File "/opt/conda/lib/python3.7/site-packages/s3transfer/tasks.py", line 126, in __call__
return self._execute_main(kwargs)
File "/opt/conda/lib/python3.7/site-packages/s3transfer/tasks.py", line 150, in _execute_main
return_value = self._main(**kwargs)
File "/opt/conda/lib/python3.7/site-packages/s3transfer/upload.py", line 692, in _main
client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
File "/opt/conda/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/opt/conda/lib/python3.7/site-packages/botocore/client.py", line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (ServiceUnavailable) when calling the PutObject operation (reached max retries: 4): Service is unavailable at this time
For Tax-Brain and similarly structured simulation applications, it would be valuable to be able to specify both the baseline specification and the alternate specification.
From a user interface perspective, I think this could be accomplished by adding big tabs to the top of the parameter section that can be edited, and maybe change the color of the section popout when switching back and forth between the baseline and alternate.
We should also probably think of a user as owning specifications as well as simulations, and, for instance, add "my specifications" to the users' home page.
When we print logs of modified parameters, we'll want to separate out baseline and alternate.
On adjusted alternate specifications, we'll want to tell the difference from default and the difference from the user-set baseline, which may have been adjusted.
I'm assuming one of the main entry paths to the TaxBrain web app is from the lower-right-hand icon on the following page at the OSPC "Portfolio" page:
When people using the Chrome browser click on that icon, they get the following page:
@hdoupe, I don't think this is the impression you want to be giving people.
Based on @donboyd5's recommendations to make the parameters more visible to show off the features and capabilities of the models, I think a useful step would be to add expandability to the parameter navbar in the left column of the parameter inputs page to include parameters as well as sections and subsections. My inclination is to keep the default level of expansion the same because it could be overwhelming to see the full list by default.
@hdoupe, what do you think?
(edit, see following comment)
The Tax-Calculator error messages are saved with the following structure:
{
"warnings": {//same as errors below},
"errors": {
"param name": {
"some year": "error message string",
"another year": "error message string",
},
"another param name": {
"some year": "error message string",
"another year": "error message string",
},
}
}
The error messages are then added to the form as a parameter name: string error message
for each error message on that parameter. COMP expects there to be a single parameter name: list of error strings
for each parameter. The result is this:
I plan to push a bug fix that resolves this issue in particular. However, it highlights too larger underlying issues:
taxcalc/policy_current_law.json
file in the Tax-Calculator repo. But, this is subject to changes that are outside of COMP developers' control. My policy thus far is to mirror that structure since there is no spec that defines the structure of that file. Since there is no spec that projects need to keep up with, several flavors of the Tax-Calculator style inputs structure have developed (Tax-Brain, OG-USA). This puts some pressure on COMP to remain compatible with them all. So far, this has been easy, but that could change.contrib.taxcalcstyle
package is difficult unless taxcalc
is installed. I will be looking into ways that this issue can be solved over the next couple weeks.@MattHJensen has provided a list of UI improvements:
Am I missing anything? If I am or if you have any other ideas for how things could be made better, feel free to drop them in this issue. I'll see how far I can get with these today.
What do you think about using roadmap.md to track to dos (and maybe OKRs), rather than the list of open GH issues?
It seems like content this important should be version controlled.
We could open PRs to add or annotate items, and we can discuss the additions/annotations in the PR discussions.
We could use markdown's headline tags to assign hierarchy.
This approach may give the community some ownership over planning and encourage them to make contributions to the code base.
It would force those with suggestions to consider prioritization.
Issues would be reserved for bug fixes and discussions like this one.
While working on #230, I looked into using Kind for testing the compute cluster. Due to how interrelated all of the different services are, it's difficult to test each component individually. Testing them individually would involve mocking out all kinds of services and callbacks and would be more trouble than its worth. It would also miss out on testing the interactions between the different components. Kind makes it possible to spin up a kubernetes cluster (or something that looks like it) within a single docker container.
This would make it possible to spin up the full C/S system, including the webapp, without having to spin up nodes on GKE, push development images to an online container registry and do a full deploy. Doing this entire process makes the development feedback loop on the compute cluster extremely slow and painful. Kind would also make it possible to add automated tests for each PR which is something that C/S badly needs.
I'm seeing this on Tax-Brain but guessing it has to do with Compute Studio. Steps to repro:
Expected: Sticks with differences table.
Actual: Reverts to distribution table.
Note that it remembers the selection of differences table if going back to 2020.
When I arrived at My Simulations, my first natural click was to PSLModels/TaxBrain, but that took me to a new simulation page. The + box on the right was what I was looking for to see my simulations. I'd personally find it more intuitive to move this to the left side to get users to the main purpose of the page more quickly.
This would allow publishers to view their inputs and outputs before initially publishing on compute.studio or before pushing updates to existing models on compute.studio. I'm thinking of a single page app where you can upload a JSON file created by a get_inputs
function and a JSON file created by a run_model
function.
Right now, Compute Studio reads in a PNG file from memory to display on the web.
Is CS ok in caching all these PNG files separately from the HTML files for the results pages?
If not, one solution would be to encode the PNG and add that to the directly to the HTML.
A few suggestions to improve COMP's interface:
Bug - If a user changes a dropdown input parameter, runs the model, then visits the edit inputs page, the user cannot edit the dropdown parameter that they originally changed.
Show the hover-over "i" button on the inputs page only when the parameter has a description.
Give model publishers the option to have a section of input parameters closed by default on the inputs page.
On the edit inputs page, consider changing the first section title from "Model Parameters as JSON" to "Parameter Adjustments" or similar to avoid using jargon.
I ran a simple simulation today of Tax-Calculator
module on CompModels.org
. I increased the maximum taxable income for the payroll tax to $200,000. I was surprised that it took nearly 5 minutes to run this Tax-Calculator
simulation. This seems like a significant (~2.5x) slowdown compared to the old TaxBrain
application. Is this a function of the new platform using slower hardware, pipelines, etc.? Or is there some new complexity in Tax-Calculator
that requires more compute time?
PSLmodels/Tax-Brain can be run with two files: a less accurate but public file and a more accurate but private file. This issue proposes an approach for giving COMP access to the private file.
The private file can be stored in a S3 bucket which is under the Open Source Policy Center's AWS account. Then, the Open Source Policy Center would give COMP read access to this account using a similar approach to how PaperTrail handles storing log data in an S3 bucket of the owner's choosing. On each simulation, COMP would read the data like this:
import gzip
import boto3
import pandas as pd
client = boto3.client(
"s3",
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key
)
obj = client.get_object(Bucket='bucket-name', Key='path/to/file.csv')
gz = gzip.GzipFile(fileobj=obj['Body'])
data = pd.read_csv(gz)
Note that the sensitive data is never stored and is streamed directly to a pandas dataframe. Alternatively, the data stream could be passed directly to Tax-Brain and Tax-Brain could handle loading it into a pandas dataframe.
This data can then be passed to Tax-Brain in the simulation function like this:
import taxbrain
def run(start_year, data_source, use_full_sample, user_mods, puf_file):
return taxbrain.tbi.run_tbi_model(
start_year, data_source, use_full_sample, user_mods, puf_file
)
@andersonfrailey and @MattHJensen what do you think about this approach?
A simple version of the model inputs page should be displayed at www.compmodels.com/owner/Model-Name until the model publishing process is complete.
What do you think about adding optional buttons to the simulation pages for "Questions" and "Feedback"? For PSL models, these could link to the new topic page for the respective Discourse categories.
It will be annoying for the user to sign up for a new site, but the projects ought to appreciate the traffic, and my guess is that C/S rolling our own username-integrated solution would be a distraction
The current outputs pages are very large which results in slow load times, "script unresponsive" dialogs, and high amounts of memory use. Instead of rendering all of the data for all of the outputs at once, we could show a thumbnail for each output and only render the output when the user clicks on it.
There are three components to this:
This is a problem that will be solved entirely in the compute-studio-storage project. The gist of it is that jinja2
will be used to render a temporary HTML file for each of the renderable outputs, and pyppeteer
will be used to take screenshots of the generated files.
Currently, each simulation is stored as two zip files, one for renderable outputs and another for downloadable outputs. These files are all stored in a bucket on Google Cloud Storage. Compute Studio keeps a "remote" result in its database and uses this to find the outputs that it needs:
{
"outputs": {
"renderable": {
"outputs": [
{
"title": "",
"filename": ".json",
"media_type": "bokeh"
},
{
"title": "Aggregate Results",
"filename": "Aggregate Results.json",
"media_type": "bokeh"
},
{
"title": "Tables",
"filename": "Tables.json",
"media_type": "bokeh"
}
],
"ziplocation": "[job_id::uuid]_renderable.zip"
},
}
We're going to need a PNG or JPG file for each of the renderable outputs, and we will need to be able to access that file individually. To do this, each renderable output needs an ID that can be used as the file name of the corresponding thumbnail:
{
"outputs": {
"renderable": {
"outputs": [
{
"id": "file_id::uuid",
"title": "",
"filename": ".json",
"media_type": "bokeh"
},
{
"id": "file_id::uuid",
"title": "Aggregate Results",
"filename": "Aggregate Results.json",
"media_type": "bokeh"
},
{
"id": "file_id::uuid",
"title": "Tables",
"filename": "Tables.json",
"media_type": "bokeh"
}
],
"ziplocation": "[job_id::uuid]_renderable.zip"
},
}
Then, each thumbnail will be located at a link like: https://storage.cloud.google.com/cs-outputs-dev/file_id.png and can be rendered using this:
<img src="https://storage.cloud.google.com/cs-outputs-dev/file_id.png">
When the page first loads, the JavaScript client will need the "remote" result to get a link to each of the renderable outputs' thumbnails. Next, when the user clicks on one of the thumbnails, the client will download the zipfile containing all of the renderable outputs and extract the selected output from the zip file and render it either on the webpage or in a pop-up window of some sort. Right now, Tax-Brain's zip files download in about 0.7 seconds, and my guess is that extracting the file and rendering it will take under half a second, resulting in about 1.2 seconds of waiting. Once the zipfile is downloaded, it will be cached by the JavaScript client, and the only cost for clicking another output will be extracting it and rendering it. I hope that these operations can be done in half a second or less. I'm not very familiar with using JavaScript with zipfiles; thus, this approach and the time estimations are somewhat theoretical.
https://docs.compute.studio/publish/functions has this snippet:
import matchups
def get_inputs(meta_params_dict):
meta_params = MetaParams()
meta_params.adjust(meta_params_dict)
params = MatchupsParams()
Should the MetaParams()
and MatchupsParams()
have a matchups.
prefix?
In PSLmodels/Tax-Brain#42, @donboyd5 suggested showing the subsections of parameters in the inputs page sidebar. This will take some care to implement without cluttering up the sidebar, but I wanted to open the issue in the comp-ce repo to keep a record of the feature request.
Would be preferable to take the user back to the sim they were working on with the run sim confirmation pop up open (step 4, above).
COMP should support a richer set of model outputs, and it should be less opinionated about the outputs it supports. This can be done by relying on the rich ecosystem of projects that specialize in visualizing data like Bokeh and supporting data formats like pictures and videos. For now, these could be pictures, videos, interactive plots, or tables. The models could return this data using the following format:
{
"renderable": [
{
"media_type": "bokeh",
"title": "some title",
"data": {
"javascript": "...",
"html": "..."
}
},
{
"media_type": "table",
"title": "some title",
"data": {
"html": "<table>...</table>"
}
},
{
"media_type": "picture",
"title": "some title",
"data": {
"picture": "some binary picture data",
"extension": "JPEG/PNG"
}
},
{
"media_type": "video",
"title": "some title",
"data": {
"video": "some binary video data",
"extension": "mp4/other"
}
}
],
"downloadable": [
{
"media_type": "CSV",
"title": "some title",
"data": {
"CSV": "..."
}
},
{
"media_type": "HDF5",
"title": "some title",
"data": {
"HDF5": "some HDF5 data"
}
}
]
}
Each output object is going to have the following structure:
{
"media_type": "picture",
"title": "some title",
"data": {
"picture": "some binary picture data",
"extension": "JPEG/PNG"
}
}
This describes the output's type, title, and some arguments that will be needed to parse the data. Internally, an object_storage_link
attribute will be added to each object. This will link to its location in an object storage provider like Digital Ocean Spaces or AWS S3.
Note that Bokeh is supported individually, but in the future, it could be supported within a wider category of interactive outputs. My preference is to move into the interactive plots output type slowly while we figure out how things should work.
I plan to dogfood this data format with Matchups over the next few days. @andersonfrailey if you are willing to work with me on this, I can do some experimenting off of your branch that adds a bokeh plot to Tax-Brain (PSLmodels/Tax-Brain#26).
In running https://www.compmodels.org/PSLmodels/Tax-Brain/41142 (a CPS-based reform eliminating the payroll tax cap), Comp hung at "Estimated 2 minutes remaining" for several minutes. When I refreshed the page, I got this:
And now compmodels.org shows that error on the homepage and for all compmodels pages.
I was notified this morning about a bug in Compute Studio's password reset flow where the default django email (example.com) was used instead of compute.studio. When the webapp was first deployed on heroku, I must have neglected to fill this field out. I fixed the bug by updating the site name and restarting the webapp processes. I'm sorry for the inconvenience to users who have been affected by this.
@donboyd5 thank you for the bug report.
Some users have reported receiving errors when submitting their inputs on Tax-Brain. This is because their inputs are not being validated and returned by the compute cluster before the request times out. Thus, the page hangs and eventually an error page is shown.
To resolve this, I am going to bump the timeout time on requests from 2.5 seconds to 4 seconds. I am also going to make the process of validating user inputs asynchronous.
This will be similar to the process for actually running the simulation: user submits inputs, a dialog notifies them that the inputs have been submitted and are being validated, and either the simulation will be kicked off or the errors will be shown to the user.
The PSLmodels/Tax-Brain page load time feels very slow. COMP has to process about 220 parameters to build up the inputs form for that page. I tinkered with showing a spinner while the page is loading so that the user isn't just staring at a blank page. However, I was unable to have much luck with that approach. It seems like most of the time is spent building the form on the backend and not rendering the form in the browser. One approach that could solve this problem is to load a blank page and then load all of the form data from a REST API call. Some type of loading symbol could be used on the blank page while the form is built and the API call is completed.
COMP uses Stripe to handle its billing infrastructure. Right now, the Stripe API is down and results from sims are not saved because an error is thrown on a bad response from the Stripe API.
On the Publishing Guide the link in "The second part documents..." is broken.
currently the default is "my models"
Might also move "my simulations" to the left of "my models".
I think it would be very cool for Compute Studio to have a similar build and deploy process as JupyterHub's zero-to-k8s project. Right now, we are pretty far away from being able to do this. Here's a list of TODO items that need to be completed before we can think about having something comparable:
TODO:
Put it all together
I'm trying to insert an image into the README on the publish details. I can embed an image via markdown, but I can't change it's size. I also cannot insert html into the README that would give me more flexibility in sizing and positioning the image.
Are there any suggestions? I thought one could generally use html syntax within a markdown doc.
Any help is appreciated. Thanks!
@hdoupe Are there any instructions for testing an app locally? E.g., I'm seeing some formatting issues that I'd like to adjust with the results page for CCC, but don't want to burden you with installing new versions of my packages. If I could follow instructions to run these locally, I could fine tune the formatting before you need to put a new model version on CS.
cc @rickecon
And add the last sentence from the following paragraph to the paragraph in docs/PRIVACY.md
I like the Preview tab on the Compute Studio model pages, which shows the JSON format for the parameter changes. This could be useful if the user is running the models from source as well.
However, I doubt most users are doing this and there is not detail about what the Preview tab, making it confusing.
I might suggest moving this from the top to the bottom of the page or, preferably, having an option on the box on the left side to "Download Policy Reform JSON" (or something like that).
Behavior
section header is showing up twice.A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.