Giter Site home page Giter Site logo

netflix / metaflow-service Goto Github PK

View Code? Open in Web Editor NEW
178.0 186.0 69.0 1.45 MB

:rocket: Metadata tracking and UI service for Metaflow!

Home Page: http://www.metaflow.org

License: Apache License 2.0

Dockerfile 0.23% Python 99.63% Shell 0.13%
machine-learning ai ml ml-infrastructure data-science productivity ml-platform ui metaflow

metaflow-service's Introduction

Metaflow Service

Metadata service implementation for Metaflow.

This provides a thin wrapper around a database and keeps track of metadata associated with metaflow entities such as Flows, Runs, Steps, Tasks, and Artifacts.

For more information, see Metaflow's admin docs

Getting Started

The service depends on the following Environment Variables to be set:

  • MF_METADATA_DB_HOST [defaults to localhost]
  • MF_METADATA_DB_PORT [defaults to 5432]
  • MF_METADATA_DB_USER [defaults to postgres]
  • MF_METADATA_DB_PSWD [defaults to postgres]
  • MF_METADATA_DB_NAME [defaults to postgres]

Optionally you can also overrider the host and port the service runs on

  • MF_METADATA_PORT [defaults to 8080]
  • MF_MIGRATION_PORT [defaults to 8082]
  • MF_METADATA_HOST [defaults to 0.0.0.0]

Create triggers to broadcast any database changes via pg_notify on channel NOTIFY:

  • DB_TRIGGER_CREATE
    • [metadata_service defaults to 0]
    • [ui_backend_service defaults to 1]
pip3 install ./
python3 -m services.metadata_service.server

Swagger UI: http://localhost:8080/api/doc

Using docker-compose

Easiest way to run this project is to use docker-compose and there are two options:

  • docker-compose.yml
    • Assumes that Dockerfiles are pre-built and local changes are not included automatically
    • See docker build section on how to pre-build the Docker images
  • docker-compose.development.yml
    • Development version
    • Includes automatic Dockerfile builds and mounts local ./services folder inside the container

Running docker-compose.yml:

docker-compose up -d

Running docker-compose.development.yml (recommended during development):

docker-compose -f docker-compose.development.yml up
  • Metadata service is available at port :8080.
  • Migration service is available at port :8082.
  • UI service is available at port :8083.

to access the container run

docker exec -it metadata_service /bin/bash

within the container curl the service directly

curl localhost:8080/ping

Using published image on DockerHub

Latest release of the image is available on dockerhub

docker pull netflixoss/metaflow_metadata_service

Be sure to set the proper env variables when running the image

docker run -e MF_METADATA_DB_HOST='<instance_name>.us-east-1.rds.amazonaws.com' \
-e MF_METADATA_DB_PORT=5432 \
-e MF_METADATA_DB_USER='postgres' \
-e MF_METADATA_DB_PSWD='postgres' \
-e MF_METADATA_DB_NAME='metaflow' \
-it -p 8082:8082 -p 8080:8080 metaflow_metadata_service

Running tests

Tests are run using Tox and pytest.

Run following command to execute tests in Dockerized environment:

docker-compose -f docker-compose.test.yml up -V --abort-on-container-exit

Above command will make sure there's PostgreSQL database available.

Usage without Docker:

The test suite requires a PostgreSQL database, along with the following environment variables for connecting the tested services to the DB.

  • MF_METADATA_DB_HOST=db_test
  • MF_METADATA_DB_PORT=5432
  • MF_METADATA_DB_USER=test
  • MF_METADATA_DB_PSWD=test
  • MF_METADATA_DB_NAME=test
# Run all tests
tox

# Run unit tests only
tox -e unit

# Run integration tests only
tox -e integration

# Run both unit & integrations tests in parallel
tox -e unit,integration -p

Executing flows against a local Metadata service

With the metadata service up and running at http://localhost:8080, you are able to use this as the service when executing Flows with the Metaflow client locally via

METAFLOW_SERVICE_URL=http://localhost:8080 METAFLOW_DEFAULT_METADATA="service" python3 basicflow.py run

Alternatively you can configure a default profile with the service URL for the Metaflow client to use. See Configuring metaflow for instructions.

Migration Service

The Migration service is a tool to help users manage underlying DB migrations and launch the most recent compatible version of the metadata service

Note that it is possible to run the two services independently and a Dockerfile is supplied for each service. However the default Dockerfile combines the two services.

Also note that at runtime the migration service and the metadata service are completely disjoint and do not communicate with each other

Migrating to the latest db schema

Note may need to do a rolling restart to get latest version of the image if you don't have it already

You can manage the migration either via the api provided or with the utility cli provided with migration_tools.py

  • check status and note version you are on
    • Api: /db_schema_status
    • cli: python3 migration_tools.py db-status
  • see if there are migrations to be run
    • if there are any migrations to be run is_up_to_date should be false and a list of migrations to be applied will be shown under unapplied_migrations
  • take backup of db
    • in case anything goes wrong it is a good idea to take a back up of the db
  • migrations may cause downtime depending on what is being run as part of the migration
  • Note concurrent updates are not supported. it may be advisable to reduce your cluster size to a single node
  • upgrade db schema
    • Api: /upgrade
    • cli: python3 migration_tools.py upgrade
  • check status again to verify you are on up to date version
    • Api: /db_schema_status
    • cli: python3 migration_tools.py db-status
    • Note that is_up_to_date should be set to True and migration_in_progress should be set to False
  • do a rolling restart of the metadata service cluster
    • In order for the migration to be effective a full restart of the containers is required
  • latest available version of service should be ready
    • cli: python3 migration_tools.py metadata-service-version
  • If you had previously scaled down your cluster it should be safe to return it to the desired number of containers

Under the Hood: What is going on in the Docker Container

Within the published metaflow_metadata_service image the migration service is packaged along with the latest version of the metadata service compatible with every version of the db. This means that multiple versions of the metadata service comes bundled with the image, each is installed under a different virtual env.

When the container spins up, the migration service is launched first and determines what virtualenv to activate depending on the schema version of the DB. This will determine which version of the metadata service will run.

Release

See the release docs

Get in Touch

There are several ways to get in touch with us:

metaflow-service's People

Contributors

alau avatar crk-codaio avatar darinyu avatar ferras avatar hallsop avatar jackie-ob avatar jbvaningen avatar msavela avatar oavdeev avatar obgibson avatar pjoshi30 avatar rikishk avatar rohanrebello avatar romain-intel avatar rswigginton avatar saikonen avatar sarthak avatar savingoyal avatar tfurmston avatar valaydave avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

metaflow-service's Issues

`metaflow.exception.MetaflowNotFound` while the Flow exists on the S3 server (on-prem)

Problem

I have deployed a metaflow service with the dev docker-compose and I extended the the environment variables with the ones which is needed to configure metaflow. As I saw that is used to to retrieve artefacts for the UI for the different runs.
Then it seems like metaflow can't access the files on the S3 storage (metaflow.exception.MetaflowNotFound error).

Details

(If I don't include the METAFLOW_... envs then I receive and "AWS credential error ..." error which seems valid, as I have my own S3 endpoint.)

These variables are new compared to the existing setup:

      - AWS_ACCESS_KEY_ID=<ID>
      - AWS_SECRET_ACCESS_KEY=<SECRET>
      - METAFLOW_DEFAULT_METADATA="service"
      - METAFLOW_DEFAULT_DATASTORE="s3"
      - METAFLOW_DATASTORE_SYSROOT_S3="s3://testbucket/metaflow-testbucket"
      - METAFLOW_S3_ENDPOINT_URL="http://192.168.99.99"
      - METAFLOW_S3_VERIFY_CERTIFICATE=false

Now, when I am running with these, the error is the following

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/services/ui_backend_service/data/cache/client/cache_server.py", line 302, in <module>
    cli(auto_envvar_prefix='MFCACHE')
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/root/services/ui_backend_service/data/cache/client/cache_server.py", line 298, in cli
    Scheduler(store, max_actions).loop()
  File "/root/services/ui_backend_service/data/cache/client/cache_server.py", line 199, in __init__
    maxtasksperchild=512,  # Recycle each worker once 512 tasks have been completed
  File "/usr/local/lib/python3.7/multiprocessing/context.py", line 119, in Pool
    context=self.get_context())
  File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 176, in __init__
    self._repopulate_pool()
  File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 241, in _repopulate_pool
    w.start()
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/usr/local/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/usr/local/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/usr/local/lib/python3.7/multiprocessing/popen_fork.py", line 74, in _launch
    code = process_obj._bootstrap()
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/root/services/ui_backend_service/data/cache/client/cache_worker.py", line 29, in execute_action
    execute(tempdir, action_cls, request)
  File "/root/services/ui_backend_service/data/cache/client/cache_worker.py", line 56, in execute
    invalidate_cache=req.get('invalidate_cache', False))
  File "/root/services/ui_backend_service/data/cache/get_log_file_action.py", line 140, in execute
    results = {**existing_keys}
  File "/usr/local/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/root/services/ui_backend_service/data/cache/utils.py", line 130, in streamed_errors
    get_traceback_str()
  File "/root/services/ui_backend_service/data/cache/utils.py", line 124, in streamed_errors
    yield
  File "/root/services/ui_backend_service/data/cache/get_log_file_action.py", line 131, in execute
    task = Task(pathspec, attempt=attempt)
  File "/usr/local/lib/python3.7/site-packages/metaflow/client/core.py", line 947, in __init__
    super(Task, self).__init__(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/metaflow/client/core.py", line 361, in __init__
    self._object = self._get_object(*ids)
  File "/usr/local/lib/python3.7/site-packages/metaflow/client/core.py", line 391, in _get_object
    raise MetaflowNotFound("%s does not exist" % self)

metaflow.exception.MetaflowNotFound: Task('HelloFlow/5/start/12', attempt=0) does not exist

If I check my S3 with s3cmd I can see that a dir exists with this path.

When I am running the flow the files are stored perfectly, I did not notice any problems, and also I can see it on the UI.

(I understand that Metaflow is not primary created for on-prem usage, but it would be a blast to use it without AWS. I would be grateful for a on-prem setup guide)

"goose: not found" in migrations service when running separately from metadata service

I am trying to get the metadata service and migrations service set up as separate services. I built two separate images using Dockerfile.metadata_service and Dockerfile.migration_service on the 2.0.3 git tag. On doing a GET of /db_schema_status of the migrations service, I get

{"detail": "Exception('unable to get db version via goose: /bin/sh: 1: goose: not found\\n')"}

A cursory examination of the three dockerfiles (including the one that combines the two services) shows that the combined dockerfile installs goose but the individual dockerfiles do not. I also noticed libpq-dev in the combined dockerfile but not in the individual ones.

Implement a messaging service for UI real-time updates, for read-replica support

Suggested improvement
Introduce a separate service to behave as a pub/sub message queue for real-time updates related to the metadata service.
Metadata service would publish to this message queue upon every successful insert and update of records on a fire-and-forget basis. ui_backend_service would then subscribe to this services relevant topics and receive near-realtime events related to the metadata service.

Prefer a separate service instead of tying the feature as part of the metadata service, to avoid impacting the metadata service performance with additional load that broadcasts might incur. No strong opinions whether this should be a custom built or an off the shelf solution.

Motivation
Currently the ui_backend relies heavily on PostgreSQL features for broadcasting metadata related changes. One limitation of this approach is with the possible replication strategies that the ui_backend can employ.
With event notifications being offloaded to a separate service, the ui_backend could easily run off a read-replica and require less setup when deploying.

This would align well with suggestions like #2 as well

metaflow ui-service unknow bug

"id": "MetaflowS3Exception",
"traceback": "Traceback (most recent call last):\n File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main\n "main", mod_spec)\n File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code\n exec(code, run_globals)\n File "/root/services/ui_backend_service/data/cache/client/cache_server.py", line 307, in \n cli(auto_envvar_prefix='MFCACHE')\n File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1128, in call\n return self.main(*args, **kwargs)\n File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1053, in main\n rv = self.invoke(ctx)\n File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1395, in invoke\n return ctx.invoke(self.callback, **ctx.params)\n File "/usr/local/lib/python3.7/site-packages/click/core.py", line 754, in invoke\n return __callback(*args, **kwargs)\n File "/root/services/ui_backend_service/data/cache/client/cache_server.py", line 301, in cli\n Scheduler(store, max_actions).loop()\n File "/root/services/ui_backend_service/data/cache/client/cache_server.py", line 199, in init\n maxtasksperchild=512, # Recycle each worker once 512 tasks have been completed\n File "/usr/local/lib/python3.7/multiprocessing/context.py", line 119, in Pool\n context=self.get_context())\n File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 176, in init\n self._repopulate_pool()\n File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 241, in _repopulate_pool\n w.start()\n File "/usr/local/lib/python3.7/multiprocessing/process.py", line 112, in start\n self._popen = self._Popen(self)\n File "/usr/local/lib/python3.7/multiprocessing/context.py", line 277, in _Popen\n return Popen(process_obj)\n File "/usr/local/lib/python3.7/multiprocessing/popen_fork.py", line 20, in init\n self._launch(process_obj)\n File "/usr/local/lib/python3.7/multiprocessing/popen_fork.py", line 74, in _launch\n code = process_obj._bootstrap()\n File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap\n self.run()\n File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run\n self._target(*self._args, **self._kwargs)\n File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 121, in worker\n result = (True, func(*args, **kwds))\n File "/root/services/ui_backend_service/data/cache/client/cache_worker.py", line 29, in execute_action\n execute(tempdir, action_cls, request)\n File "/root/services/ui_backend_service/data/cache/client/cache_worker.py", line 56, in execute\n invalidate_cache=req.get('invalidate_cache', False))\n File "/root/services/ui_backend_service/data/cache/get_log_file_action.py", line 144, in execute\n results = {**existing_keys}\n File "/usr/local/lib/python3.7/contextlib.py", line 130, in exit\n self.gen.throw(type, value, traceback)\n File "/root/services/ui_backend_service/data/cache/utils.py", line 130, in streamed_errors\n get_traceback_str()\n File "/root/services/ui_backend_service/data/cache/utils.py", line 124, in streamed_errors\n yield\n File "/root/services/ui_backend_service/data/cache/get_log_file_action.py", line 141, in execute\n content = log_provider.get_log_content(task, logtype)\n File "/root/services/ui_backend_service/data/cache/get_log_file_action.py", line 275, in get_log_content\n return get_log_content(task, logtype)\n File "/root/services/ui_backend_service/data/cache/get_log_file_action.py", line 197, in get_log_content\n for datetime, line in task.loglines(stream)\n File "/root/services/ui_backend_service/data/cache/get_log_file_action.py", line 196, in \n (_datetime_to_epoch(datetime), line)\n File "/usr/local/lib/python3.7/site-packages/metaflow/client/core.py", line 1388, in loglines\n ds_type, ds_root, stream, attempt, *self.path_components\n File "/usr/local/lib/python3.7/site-packages/metaflow/client/filecache.py", line 81, in get_logs_stream\n return task_ds.load_logs(LOG_SOURCES, stream, attempt_override=attempt)\n File "/usr/local/lib/python3.7/site-packages/metaflow/datastore/task_datastore.py", line 45, in method\n return f(self, *args, **kwargs)\n File "/usr/local/lib/python3.7/site-packages/metaflow/datastore/task_datastore.py", line 776, in load_logs\n r = self._load_file(paths.keys(), add_attempt=False)\n File "/usr/local/lib/python3.7/site-packages/metaflow/datastore/task_datastore.py", line 924, in _load_file\n for key, path, meta in load_results:\n File "/usr/local/lib/python3.7/site-packages/metaflow/plugins/datastores/s3_storage.py", line 139, in iter_results\n r = s3.get(p, return_missing=True, return_info=True)\n File "/usr/local/lib/python3.7/site-packages/metaflow/plugins/datatools/s3/s3.py", line 897, in get\n path, addl_info = self._one_boto_op(_download, url)\n File "/usr/local/lib/python3.7/site-packages/metaflow/plugins/datatools/s3/s3.py", line 1315, in _one_boto_op\n "S3 operation failed.\n" "Key requested: %s\n" "Error: %s" % (url, error)\n\nmetaflow.plugins.datatools.s3.s3.MetaflowS3Exception: S3 operation failed.\nKey requested: s3://dev-test/tzy/ParameterFlow/argo-avalon.user.tanzheyue.parameterflow-jmrxv/start/t-e051c938/0.runtime_stdout.log\nError: An error occurred () when calling the GetObject operation: \n",

As shown in the error report above, the ui can display artifacts and dags, but cannot load stdout and stderr. I have verified that the key is not identified correctly, and the code key is 0. runtime_ Stdout.log, but 0. task is actually generated_ stdout.log

[Metaflow UI] stdout and stderr logs timeout/fail to load

When using the Metaflow UI the stdout/stderr panes no longer successfully load, and the requests to load them return with a 504 gateway timeout.

Screenshot 2023-10-17 at 2 36 28 PM

Example url being requested by UI for stderr logs:
/api/flows/<flow_name>/runs/59510/steps/start/tasks/539228/logs/err?attempt_id=0&_limit=500&_page=1&_order=-row

I believe the issue is caused by a very expensive join query in async def get_task_by_request(self, request): in ui_backend_service/api/log.py. Looking at the code, this function call and underlying join query seems unnecessary given that the UI is already passing all the task parameters necessary to uniquely identify the task in the Task table directly, including attempt.

How to run this project

Hello, I want to run this project to try out the UI service with metaflow-ui. I tried the following approaches:

$ docker compose up -d
Error response from daemon: No such image: metadata_service:latest

Is the image private or something like that? I tried to build the metadata-service from the Dockerfile (docker image build -t "metadata-service" -f "Dockerfile.metadata_service") and then use it with the docker compose, but it didn't work. However the production image doesn't seem to have the UI service, at least from what I understand.

I also tried the development buidl:

$ MF_METADATA_PORT=8090 docker compose -f docker-compose.development.yml up -d

I needed to switch ports because 8080 is occupied by Jenkins. This worked, however when I tried to view any website, it returns 404, even when I go inside the container and execute it there.

Can you help me to tell me what I'm doing wrong? I just need to get Metaflow-UI service working to get working Metaflow UI.

  • Repo: Tried both current master and last release repo.
  • Docker version: 23.0.0
  • System: Rocky Linux 8

`500` error encountered on `flows/<flow_id>/runs` requests due to size of payload

I have a flow running in production on a 2min schedule for which artifacts / metadata are being stored. I am trying to interact with those artifacts with the client API. However, requests to /flows/<flow_id>/runs (via instantiating Flow("FlowName")) are now failing with

Metadata request: (/flows/<flow_id>/runs) failed (code 500): {"message": "Internal server error"}

Request to other endpoints, like flows/<flow_id>/runs/<run_id>, are going through just fine. After taking a closer look on API Gateway, I generated the same request through the console there, and get

Execution failed due to configuration error: Integration response of reported length 28729085 is larger than allowed maximum of 10485760 bytes.
Tue May 31 22:49:46 UTC 2022 : Method completed with status: 500

Basically the response payload is larger than the non-configurable 10MB limit on API Gateway.

I can get around this by requesting individual runs directly, but it would be great to still be able to use the flow apis to interact with the child runs of the flow via the client. (Perhaps adding the ability to pass filtering params in the requests, like fetching the last n runs?). Also curious if this is something that others have run into when deploying flows to production where many runs are produced and stored, and if there are any workarounds or things I am missing.

Error messages not clear when failures are due to DB connection problems

Metaflow 2.0.2 executing HelloFlow for user
Validating your flow...
The graph looks good!
Running pylint...
Pylint not found, so extra checks are disabled.
Metaflow service error:
Metadata request (/flows/HelloFlow) failed (code 500): "{"err_msg": "asynchronous connection attempt underway"}"

Use f string

f string can be used if python3.6+ is being used.

Trouble running `docker-compose.development.yml`

Hey all,

I am trying to run the whole metaflow-service but when running the development file, all containers are running properly except the the ui_backend_1.
I've clone the repo and then ran:

docker-compose -f docker-compose.development.yml up

but I am facing the following error:

ui_backend_1  | INFO:CacheAsyncClient:cache_data/artifact_search:Message: Traceback (most recent call last):
ui_backend_1  | INFO:CacheAsyncClient:cache_data/artifact_search:Message:   File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
ui_backend_1  | INFO:CacheAsyncClient:cache_data/artifact_search:Message:     "__main__", mod_spec)
ui_backend_1  | INFO:CacheAsyncClient:cache_data/artifact_search:Message:   File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
ui_backend_1  | INFO:CacheAsyncClient:cache_data/artifact_search:Message:     exec(code, run_globals)
ui_backend_1  | INFO:CacheAsyncClient:cache_data/artifact_search:Message:   File "/root/services/ui_backend_service/data/cache/client/cache_server.py", line 16, in <module>
ui_backend_1  | INFO:CacheAsyncClient:cache_data/artifact_search:Message:     import click
ui_backend_1  | INFO:CacheAsyncClient:cache_data/artifact_search:Message: ModuleNotFoundError: No module named 'click'
ui_backend_1  | Traceback (most recent call last):
ui_backend_1  |   File "/usr/local/bin/ui_backend_service", line 33, in <module>
ui_backend_1  |     sys.exit(load_entry_point('metadata-service', 'console_scripts', 'ui_backend_service')())
ui_backend_1  |   File "/root/services/ui_backend_service/ui_server.py", line 135, in main
ui_backend_1  |     loop.run_until_complete(handler.setup())
ui_backend_1  |   File "/usr/local/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
ui_backend_1  |     return future.result()
ui_backend_1  |   File "/usr/local/lib/python3.7/site-packages/aiohttp/web_runner.py", line 279, in setup
ui_backend_1  |     self._server = await self._make_server()
ui_backend_1  |   File "/usr/local/lib/python3.7/site-packages/aiohttp/web_runner.py", line 375, in _make_server
ui_backend_1  |     await self._app.startup()
ui_backend_1  |   File "/usr/local/lib/python3.7/site-packages/aiohttp/web_app.py", line 417, in startup
ui_backend_1  |     await self.on_startup.send(self)
ui_backend_1  |   File "/usr/local/lib/python3.7/site-packages/aiosignal/__init__.py", line 36, in send
ui_backend_1  |     await receiver(*args, **kwargs)  # type: ignore
ui_backend_1  |   File "/root/services/ui_backend_service/data/cache/store.py", line 66, in start_caches
ui_backend_1  |     await self.artifact_cache.start_cache()
ui_backend_1  |   File "/root/services/ui_backend_service/data/cache/store.py", line 154, in start_cache
ui_backend_1  |     await self.cache.start()
ui_backend_1  |   File "/root/services/ui_backend_service/data/cache/client/cache_async_client.py", line 118, in request_and_return
ui_backend_1  |     await req
ui_backend_1  |   File "/root/services/ui_backend_service/data/cache/client/cache_async_client.py", line 67, in check
ui_backend_1  |     await ret.wait()
ui_backend_1  |   File "/root/services/ui_backend_service/data/cache/client/cache_async_client.py", line 113, in wait
ui_backend_1  |     async for obj in self.wait_iter(_repeat(), timeout):
ui_backend_1  |   File "/root/services/ui_backend_service/data/cache/client/cache_async_client.py", line 102, in wait_iter
ui_backend_1  |     raise CacheServerUnreachable()
ui_backend_1  | services.ui_backend_service.data.cache.client.cache_client.CacheServerUnreachable
Entire trace
docker-compose -f docker-compose.development.yml up                                                                                                                                     INT  pipelines py
WARNING: The CUSTOM_QUICKLINKS variable is not set. Defaulting to a blank string.
WARNING: The NOTIFICATIONS variable is not set. Defaulting to a blank string.
WARNING: The PLUGINS variable is not set. Defaulting to a blank string.
Docker Compose is now in the Docker CLI, try `docker compose up`

Creating network "metaflow-service_default" with the default driver
Pulling db (postgres:11)...
11: Pulling from library/postgres
1cb79db8a9e7: Pull complete
f6bae7873dd7: Pull complete
8f7722dc50a7: Pull complete
e8622b8cb6f3: Pull complete
d6d74bba3a57: Pull complete
874d4d2a09fd: Pull complete
2d87c3a4038c: Pull complete
f955a6cf127b: Pull complete
c607e7071388: Pull complete
1026a7bbc62f: Pull complete
ca1ba5d59f0e: Pull complete
915b4028528f: Pull complete
7dd826953df8: Pull complete
Digest: sha256:bf8c2cbc7372069108f4e2e526da0eda81c96ba6129c42165ab1a1941fc0151a
Status: Downloaded newer image for postgres:11
Building migration
[+] Building 65.7s (18/18) FINISHED
 => [internal] load build definition from Dockerfile.migration_service                                                                                                                                                                                                     0.0s
 => => transferring dockerfile: 455B                                                                                                                                                                                                                                       0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                                          0.0s
 => => transferring context: 2B                                                                                                                                                                                                                                            0.0s
 => [internal] load metadata for docker.io/library/python:3.7                                                                                                                                                                                                              1.6s
 => [internal] load metadata for docker.io/library/golang:1.16.3                                                                                                                                                                                                           1.5s
 => [auth] library/golang:pull token for registry-1.docker.io                                                                                                                                                                                                              0.0s
 => [auth] library/python:pull token for registry-1.docker.io                                                                                                                                                                                                              0.0s
 => [internal] load build context                                                                                                                                                                                                                                          0.0s
 => => transferring context: 39.32kB                                                                                                                                                                                                                                       0.0s
 => [stage-1 1/8] FROM docker.io/library/python:3.7@sha256:48e1422053164310266a9a85e4f3886733a5f1dc025238dba229068806aff4d6                                                                                                                                               25.6s
 => => resolve docker.io/library/python:3.7@sha256:48e1422053164310266a9a85e4f3886733a5f1dc025238dba229068806aff4d6                                                                                                                                                        0.0s
 => => sha256:fee1d9b288b819d7a9114da2279230b6712dac68e505b0491c7d40f78c48c1f9 2.22kB / 2.22kB                                                                                                                                                                             0.0s
 => => sha256:0c6b8ff8c37e92eb1ca65ed8917e818927d5bf318b6f18896049b5d9afc28343 54.92MB / 54.92MB                                                                                                                                                                           4.9s
 => => sha256:412caad352a3ecbb29c080379407ae0761e7b9b454f7239cbfd1d1da25e06b29 5.15MB / 5.15MB                                                                                                                                                                             0.4s
 => => sha256:e6d3e61f7a504fa66d7275123969e9917570188650eb84b2280a726b996040f6 10.87MB / 10.87MB                                                                                                                                                                           0.8s
 => => sha256:48e1422053164310266a9a85e4f3886733a5f1dc025238dba229068806aff4d6 1.86kB / 1.86kB                                                                                                                                                                             0.0s
 => => sha256:eb092e58e9248e0dc5baf7c8604e9c25e8a4cfeb89eb52b6c7163c8c0eb2f05c 9.18kB / 9.18kB                                                                                                                                                                             0.0s
 => => sha256:461bb1d8c517c7f9fc0f1df66c9dc34c85a23421c1e1c540b2e28cbb258e75f5 54.57MB / 54.57MB                                                                                                                                                                           4.4s
 => => sha256:808edda3c2e855dc13af758b35cefbcc417ad1ab4fead7f72234b09aeda893a0 196.53MB / 196.53MB                                                                                                                                                                         9.9s
 => => sha256:724cfd2dc19be12b837643ea67bd5ad7a6fd98049a88f02ec70eca30fa03a5a1 6.29MB / 6.29MB                                                                                                                                                                             5.0s
 => => sha256:773d554492cc49b691accb57e00f233a2217ca78abf2c16bc51e4064c7c77ca7 14.85MB / 14.85MB                                                                                                                                                                           5.8s
 => => extracting sha256:0c6b8ff8c37e92eb1ca65ed8917e818927d5bf318b6f18896049b5d9afc28343                                                                                                                                                                                  4.8s
 => => sha256:5ecddf626bb9ed1c8363e84afc5a04e50144ec20f6595d4df9a82b6d22e62724 236B / 236B                                                                                                                                                                                 5.2s
 => => sha256:eff4d08aa2149de6167d219d1574d89451658355eb697f6d6cf55a3155b4fe2f 2.35MB / 2.35MB                                                                                                                                                                             5.6s
 => => extracting sha256:412caad352a3ecbb29c080379407ae0761e7b9b454f7239cbfd1d1da25e06b29                                                                                                                                                                                  0.4s
 => => extracting sha256:e6d3e61f7a504fa66d7275123969e9917570188650eb84b2280a726b996040f6                                                                                                                                                                                  0.5s
 => => extracting sha256:461bb1d8c517c7f9fc0f1df66c9dc34c85a23421c1e1c540b2e28cbb258e75f5                                                                                                                                                                                  2.9s
 => => extracting sha256:808edda3c2e855dc13af758b35cefbcc417ad1ab4fead7f72234b09aeda893a0                                                                                                                                                                                  8.3s
 => => extracting sha256:724cfd2dc19be12b837643ea67bd5ad7a6fd98049a88f02ec70eca30fa03a5a1                                                                                                                                                                                  0.3s
 => => extracting sha256:773d554492cc49b691accb57e00f233a2217ca78abf2c16bc51e4064c7c77ca7                                                                                                                                                                                  0.7s
 => => extracting sha256:5ecddf626bb9ed1c8363e84afc5a04e50144ec20f6595d4df9a82b6d22e62724                                                                                                                                                                                  0.0s
 => => extracting sha256:eff4d08aa2149de6167d219d1574d89451658355eb697f6d6cf55a3155b4fe2f                                                                                                                                                                                  0.4s
 => [goose 1/2] FROM docker.io/library/golang:1.16.3@sha256:f7d3519759ba6988a2b73b5874b17c5958ac7d0aa48a8b1d84d66ef25fa345f1                                                                                                                                               0.2s
 => => resolve docker.io/library/golang:1.16.3@sha256:f7d3519759ba6988a2b73b5874b17c5958ac7d0aa48a8b1d84d66ef25fa345f1                                                                                                                                                     0.0s
 => => sha256:d5dc529b0ee7ad7b3289de938fb53b2cd495900c8d587f15fd86532788151f67 7.00kB / 7.00kB                                                                                                                                                                             0.0s
 => => sha256:f7d3519759ba6988a2b73b5874b17c5958ac7d0aa48a8b1d84d66ef25fa345f1 2.36kB / 2.36kB                                                                                                                                                                             0.0s
 => => sha256:dfa3cef088454200d6b48e2a911138f7d5d9afff77f89243eea6342f16ddcfb0 1.79kB / 1.79kB                                                                                                                                                                             0.0s
 => [goose 2/2] RUN go get -u github.com/pressly/goose/cmd/goose                                                                                                                                                                                                          47.5s
 => [stage-1 2/8] COPY --from=goose /go/bin/goose /usr/local/bin/                                                                                                                                                                                                          0.1s
 => [stage-1 3/8] ADD services/__init__.py /root/services/__init__.py                                                                                                                                                                                                      0.0s
 => [stage-1 4/8] ADD services/utils /root/services/utils                                                                                                                                                                                                                  0.0s
 => [stage-1 5/8] ADD services/migration_service /root/services/migration_service                                                                                                                                                                                          0.0s
 => [stage-1 6/8] ADD setup.py setup.cfg /root/                                                                                                                                                                                                                            0.0s
 => [stage-1 7/8] WORKDIR /root                                                                                                                                                                                                                                            0.0s
 => [stage-1 8/8] RUN pip install --editable .                                                                                                                                                                                                                            15.7s
 => exporting to image                                                                                                                                                                                                                                                     0.4s
 => => exporting layers                                                                                                                                                                                                                                                    0.4s
 => => writing image sha256:114e2b6aa424bbdb881aa793d65efab8e540842ef8df31b4f0f4be400aeb54bf                                                                                                                                                                               0.0s
 => => naming to docker.io/library/metaflow-service_migration                                                                                                                                                                                                              0.0s

Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
WARNING: Image for service migration was built because it did not already exist. To rebuild this image you must use `docker-compose build` or `docker-compose up --build`.
Building metadata
[+] Building 21.1s (13/13) FINISHED
 => [internal] load build definition from Dockerfile.metadata_service                                                                                                                                                                                                      0.0s
 => => transferring dockerfile: 351B                                                                                                                                                                                                                                       0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                                          0.0s
 => => transferring context: 2B                                                                                                                                                                                                                                            0.0s
 => [internal] load metadata for docker.io/library/python:3.7                                                                                                                                                                                                              0.4s
 => CACHED [1/8] FROM docker.io/library/python:3.7@sha256:48e1422053164310266a9a85e4f3886733a5f1dc025238dba229068806aff4d6                                                                                                                                                 0.0s
 => [internal] load build context                                                                                                                                                                                                                                          0.0s
 => => transferring context: 151.19kB                                                                                                                                                                                                                                      0.0s
 => [2/8] ADD services/__init__.py /root/services/                                                                                                                                                                                                                         0.0s
 => [3/8] ADD services/data /root/services/data                                                                                                                                                                                                                            0.0s
 => [4/8] ADD services/utils /root/services/utils                                                                                                                                                                                                                          0.0s
 => [5/8] ADD services/metadata_service /root/services/metadata_service                                                                                                                                                                                                    0.0s
 => [6/8] ADD setup.py setup.cfg /root/                                                                                                                                                                                                                                    0.0s
 => [7/8] WORKDIR /root                                                                                                                                                                                                                                                    0.0s
 => [8/8] RUN pip install --editable .                                                                                                                                                                                                                                    19.6s
 => exporting to image                                                                                                                                                                                                                                                     0.8s
 => => exporting layers                                                                                                                                                                                                                                                    0.7s
 => => writing image sha256:85bac02c09fd0d29e0e1c2e50f05337129e1acf5e9a1d14ffdb821d1e3a762ff                                                                                                                                                                               0.0s
 => => naming to docker.io/library/metaflow-service_metadata                                                                                                                                                                                                               0.0s

Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
WARNING: Image for service metadata was built because it did not already exist. To rebuild this image you must use `docker-compose build` or `docker-compose up --build`.
Building ui_backend
[+] Building 27.0s (15/15) FINISHED
 => [internal] load build definition from Dockerfile.ui_service                                                                                                                                                                                                            0.0s
 => => transferring dockerfile: 815B                                                                                                                                                                                                                                       0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                                          0.0s
 => => transferring context: 2B                                                                                                                                                                                                                                            0.0s
 => [internal] load metadata for docker.io/library/python:3.7                                                                                                                                                                                                              0.4s
 => [internal] load build context                                                                                                                                                                                                                                          0.0s
 => => transferring context: 714.83kB                                                                                                                                                                                                                                      0.0s
 => CACHED [ 1/10] FROM docker.io/library/python:3.7@sha256:48e1422053164310266a9a85e4f3886733a5f1dc025238dba229068806aff4d6                                                                                                                                               0.0s
 => [ 2/10] ADD services/__init__.py /root/services/__init__.py                                                                                                                                                                                                            0.0s
 => [ 3/10] ADD services/data /root/services/data                                                                                                                                                                                                                          0.0s
 => [ 4/10] ADD services/utils /root/services/utils                                                                                                                                                                                                                        0.0s
 => [ 5/10] ADD services/metadata_service /root/services/metadata_service                                                                                                                                                                                                  0.0s
 => [ 6/10] ADD services/ui_backend_service /root/services/ui_backend_service                                                                                                                                                                                              0.0s
 => [ 7/10] ADD setup.py setup.cfg /root/                                                                                                                                                                                                                                  0.0s
 => [ 8/10] WORKDIR /root                                                                                                                                                                                                                                                  0.0s
 => [ 9/10] RUN /root/services/ui_backend_service/download_ui.sh                                                                                                                                                                                                           1.0s
 => [10/10] RUN pip install --editable .                                                                                                                                                                                                                                  24.2s
 => exporting to image                                                                                                                                                                                                                                                     1.1s
 => => exporting layers                                                                                                                                                                                                                                                    1.1s
 => => writing image sha256:3ed79b083b1ef4210b92e2416359e238f20daab0ede18e2ab44c6756a2d0d22d                                                                                                                                                                               0.0s
 => => naming to docker.io/library/metaflow-service_ui_backend                                                                                                                                                                                                             0.0s

Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
WARNING: Image for service ui_backend was built because it did not already exist. To rebuild this image you must use `docker-compose build` or `docker-compose up --build`.
Creating metaflow-service_db_1 ... done
Creating metaflow-service_ui_backend_1 ... done
Creating metaflow-service_migration_1  ... done
Creating metaflow-service_metadata_1   ... done
Attaching to metaflow-service_db_1, metaflow-service_migration_1, metaflow-service_ui_backend_1, metaflow-service_metadata_1
db_1          | The files belonging to this database system will be owned by user "postgres".
db_1          | This user must also own the server process.
db_1          |
db_1          | The database cluster will be initialized with locale "en_US.utf8".
db_1          | The default database encoding has accordingly been set to "UTF8".
db_1          | The default text search configuration will be set to "english".
db_1          |
db_1          | Data page checksums are disabled.
db_1          |
ui_backend_1  | INFO:AsyncPostgresDB:ui:Setting up notify trigger for flows_v3
ui_backend_1  |    Keys: ['flow_id']
db_1          | fixing permissions on existing directory /var/lib/postgresql/data ... ok
ui_backend_1  | INFO:AsyncPostgresDB:ui:Setting up notify trigger for runs_v3
ui_backend_1  |    Keys: ['flow_id', 'run_number', 'last_heartbeat_ts']
db_1          | creating subdirectories ... ok
ui_backend_1  | INFO:AsyncPostgresDB:ui:Setting up notify trigger for steps_v3
ui_backend_1  |    Keys: ['flow_id', 'run_number', 'step_name']
db_1          | selecting default max_connections ... 100
ui_backend_1  | INFO:AsyncPostgresDB:ui:Setting up notify trigger for tasks_v3
ui_backend_1  |    Keys: ['flow_id', 'run_number', 'step_name', 'task_id']
db_1          | selecting default shared_buffers ... 128MB
ui_backend_1  | INFO:AsyncPostgresDB:ui:Setting up notify trigger for artifact_v3
ui_backend_1  |    Keys: ['flow_id', 'run_number', 'step_name', 'task_id', 'attempt_id', 'name']
db_1          | selecting default timezone ... Etc/UTC
db_1          | selecting dynamic shared memory implementation ... posix
ui_backend_1  | INFO:AsyncPostgresDB:ui:Setting up notify trigger for metadata_v3
ui_backend_1  |    Keys: ['flow_id', 'run_number', 'step_name', 'task_id', 'field_name', 'value']
db_1          | creating configuration files ... ok
ui_backend_1  | INFO:AsyncPostgresDB:ui:Connection established.
ui_backend_1  |    Pool min: 1 max: 10
ui_backend_1  |
db_1          | running bootstrap script ... ok
ui_backend_1  | INFO:AsyncPostgresDB:ui:cache:Connection established.
ui_backend_1  |    Pool min: 1 max: 10
ui_backend_1  |
db_1          | performing post-bootstrap initialization ... ok
db_1          | syncing data to disk ... ok
db_1          |
db_1          | Success. You can now start the database server using:
db_1          |
db_1          |     pg_ctl -D /var/lib/postgresql/data -l logfile start
db_1          |
ui_backend_1  | INFO:AsyncPostgresDB:ui:notify:Connection established.
ui_backend_1  |    Pool min: 1 max: 10
ui_backend_1  |
db_1          |
db_1          | WARNING: enabling "trust" authentication for local connections
db_1          | You can change this by editing pg_hba.conf or using the option -A, or
db_1          | --auth-local and --auth-host, the next time you run initdb.
ui_backend_1  | INFO:ListenNotify:Connection acquired
ui_backend_1  | INFO:AsyncPostgresDB:ui:heartbeat:Connection established.
ui_backend_1  |    Pool min: 1 max: 10
ui_backend_1  |
db_1          | waiting for server to start....2022-02-18 15:35:34.430 UTC [48] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
db_1          | 2022-02-18 15:35:34.442 UTC [49] LOG:  database system was shut down at 2022-02-18 15:35:34 UTC
ui_backend_1  | INFO:AsyncPostgresDB:ui:websocket:Connection established.
ui_backend_1  |    Pool min: 1 max: 10
ui_backend_1  |
ui_backend_1  | INFO:AutoCompleteApi:0 cached tags in memory consuming 0 Mb
ui_backend_1  | INFO:AsyncPostgresDB:global:Connection established.
ui_backend_1  |    Pool min: 1 max: 10
ui_backend_1  |
ui_backend_1  | INFO:root:Metadata service available at http://0.0.0.0:8083/metadata
db_1          | 2022-02-18 15:35:34.448 UTC [48] LOG:  database system is ready to accept connections
db_1          |  done
db_1          | server started
db_1          |
db_1          | /usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
db_1          |
ui_backend_1  | INFO:Plugin:Init plugins
ui_backend_1  | INFO:Plugin:Plugins ready: []
db_1          | waiting for server to shut down...2022-02-18 15:35:34.573 UTC [48] LOG:  received fast shutdown request
db_1          | .2022-02-18 15:35:34.575 UTC [48] LOG:  aborting any active transactions
db_1          | 2022-02-18 15:35:34.578 UTC [48] LOG:  background worker "logical replication launcher" (PID 55) exited with exit code 1
db_1          | 2022-02-18 15:35:34.578 UTC [50] LOG:  shutting down
db_1          | 2022-02-18 15:35:34.593 UTC [48] LOG:  database system is shut down
db_1          |  done
db_1          | server stopped
db_1          |
db_1          | PostgreSQL init process complete; ready for start up.
db_1          |
db_1          | 2022-02-18 15:35:34.685 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
db_1          | 2022-02-18 15:35:34.685 UTC [1] LOG:  listening on IPv6 address "::", port 5432
db_1          | 2022-02-18 15:35:34.687 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
db_1          | 2022-02-18 15:35:34.699 UTC [67] LOG:  database system was shut down at 2022-02-18 15:35:34 UTC
db_1          | 2022-02-18 15:35:34.704 UTC [1] LOG:  database system is ready to accept connections
metadata_1    | INFO:AsyncPostgresDB:global:Connection established.
metadata_1    |    Pool min: 1 max: 10
metadata_1    |
ui_backend_1  | INFO:CacheAsyncClient:cache_data/artifact_search:Message: Traceback (most recent call last):
ui_backend_1  | INFO:CacheAsyncClient:cache_data/artifact_search:Message:   File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
ui_backend_1  | INFO:CacheAsyncClient:cache_data/artifact_search:Message:     "__main__", mod_spec)
ui_backend_1  | INFO:CacheAsyncClient:cache_data/artifact_search:Message:   File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
ui_backend_1  | INFO:CacheAsyncClient:cache_data/artifact_search:Message:     exec(code, run_globals)
ui_backend_1  | INFO:CacheAsyncClient:cache_data/artifact_search:Message:   File "/root/services/ui_backend_service/data/cache/client/cache_server.py", line 16, in <module>
ui_backend_1  | INFO:CacheAsyncClient:cache_data/artifact_search:Message:     import click
ui_backend_1  | INFO:CacheAsyncClient:cache_data/artifact_search:Message: ModuleNotFoundError: No module named 'click'
ui_backend_1  | Traceback (most recent call last):
ui_backend_1  |   File "/usr/local/bin/ui_backend_service", line 33, in <module>
ui_backend_1  |     sys.exit(load_entry_point('metadata-service', 'console_scripts', 'ui_backend_service')())
ui_backend_1  |   File "/root/services/ui_backend_service/ui_server.py", line 135, in main
ui_backend_1  |     loop.run_until_complete(handler.setup())
ui_backend_1  |   File "/usr/local/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
ui_backend_1  |     return future.result()
ui_backend_1  |   File "/usr/local/lib/python3.7/site-packages/aiohttp/web_runner.py", line 279, in setup
ui_backend_1  |     self._server = await self._make_server()
ui_backend_1  |   File "/usr/local/lib/python3.7/site-packages/aiohttp/web_runner.py", line 375, in _make_server
ui_backend_1  |     await self._app.startup()
ui_backend_1  |   File "/usr/local/lib/python3.7/site-packages/aiohttp/web_app.py", line 417, in startup
ui_backend_1  |     await self.on_startup.send(self)
ui_backend_1  |   File "/usr/local/lib/python3.7/site-packages/aiosignal/__init__.py", line 36, in send
ui_backend_1  |     await receiver(*args, **kwargs)  # type: ignore
ui_backend_1  |   File "/root/services/ui_backend_service/data/cache/store.py", line 66, in start_caches
ui_backend_1  |     await self.artifact_cache.start_cache()
ui_backend_1  |   File "/root/services/ui_backend_service/data/cache/store.py", line 154, in start_cache
ui_backend_1  |     await self.cache.start()
ui_backend_1  |   File "/root/services/ui_backend_service/data/cache/client/cache_async_client.py", line 118, in request_and_return
ui_backend_1  |     await req
ui_backend_1  |   File "/root/services/ui_backend_service/data/cache/client/cache_async_client.py", line 67, in check
ui_backend_1  |     await ret.wait()
ui_backend_1  |   File "/root/services/ui_backend_service/data/cache/client/cache_async_client.py", line 113, in wait
ui_backend_1  |     async for obj in self.wait_iter(_repeat(), timeout):
ui_backend_1  |   File "/root/services/ui_backend_service/data/cache/client/cache_async_client.py", line 102, in wait_iter
ui_backend_1  |     raise CacheServerUnreachable()
ui_backend_1  | services.ui_backend_service.data.cache.client.cache_client.CacheServerUnreachable

Any idea what went wrong here?

Thanks for your help 😃

sslmode customization

New Feature:

Would it be possible to pass postgresql sslmode on the connection string as a parameter? At the moment it is always disabled which avoid connections for Databases where SSL is required. It would be good to pass this as another parameter and allow to switch between disable to required, for example.

Thanks!!

Migration documentation incomplete

I've just gone through the process of updating the metadata service I've been running and performing the DB migration

I noted some things are missing from the docs (found out going through the python code)

  • There's no mention of the environment variable MF_MIGRATION_ENDPOINTS_ENABLED which can be used to disable the migration endpoints
  • To perform a migration upgrade the needed request is a PATCH to /upgrade and not GET which is not mentioned in the documentation
  • The documentation mentions the availability of python3 migration_tools.py db-status but actually this migration_tools.py script is not bundled in the public docker image from docker hub

Managed to solve these issues myself but I had to spend quite some time inspecting the image and the migration service code. Updating the documentation would help people have an easier time performing their migrations

Limit number of retry attempts when connecting to the database

On startup, infinite attempts are made to connect to postgres. When an attempt fails there is no error message.

As I understand it, there exists an attempt at a retry mechanism which limits the number of postgres connection attempts to 3. Afterwards, the exception is supposed to be raised. Unfortunately, this does not work as currently implemented and the exception is never raised.

There is also a todo associated with adding a proper error message. All of this can be found in postgres_async_db.py in the _init() method.

Proposed fix:

        retries = 3
        attempt = 0
        while True:
            try:
                self.pool = await aiopg.create_pool(dsn)
                for table in self.tables:
                    await table._init()
            except Exception as e:
                attempt += 1
                print("Could not connect to database, attempt", attempt, "out of", retries, "\n Cause:", e)
                if attempt == retries:
                    raise e
                time.sleep(1)
                continue
            break

unable to reach metadata_service after using docker compose

Description of steps by @aleade

I cloned the most recent git repo
I did docker pull netflixoss/metaflow_metadata_service
I changed the docker-compose file only to change the image and the volumes path (last line)
version: "3"
services:
metadata:
  image: "netflixoss/metaflow_metadata_service"
  container_name: "metadata_service"
  ports:
    - "${MF_METADATA_PORT:-5004}:${MF_METADATA_PORT:-5004}"
  volumes:
    - ./metadata_service:/code
I did docker-compose up -d
I did docker exec -it metadata_service /bin/bash
I did curl localhost:8080/ping

Harden service against ID length overflows

Right now, various Metaflow Ids (e.g. flow ids, run ids, etc.) are passed straight from client request to Postgres SQL query without any validation.

With the advent of tag mutation CLI, it is now more likely for the service to receive invalid (not necessarily maliciously) ids. E.g. a local metadata service generated run id string based on epoch time ms will overflow on Postgres as a run number.

This issue suggests we harden all IDs referenced in a client request (e.g. could be part of URL params, or body), and have the service respond with appropriate error responses to the user (rather than raw Postgres errors).

Trouble running locally the service, ui - `relation "public.flows_v3" does not exist`

I tried to follow the instructions in the metaflow-ui repo: https://github.com/Netflix/metaflow-ui/blob/master/docs/README.md to deploy a Metaflow locally (on my macbook).

From the docs:

# Set up metaflow service
$ git clone https://github.com/Netflix/metaflow-service.git && cd metaflow-service
# Running docker-compose.development.yml (recommended during development):
$ docker-compose -f docker-compose.development.yml up

# Run docker container using custom default API endpoint
$ docker run -p 3000:3000 -e METAFLOW_SERVICE=http://localhost:8083/ metaflow-ui:latest

But the service fails because of the following:

psycopg2.errors.UndefinedTable: relation "public.flows_v3" does not exist

Here you can find the whole trace from the start:

Creating metaflow-service_db_1 ... done
Creating metaflow-service_ui_backend_1 ... done
Creating metaflow-service_migration_1  ... done
Creating metaflow-service_metadata_1   ... done
Attaching to metaflow-service_db_1, metaflow-service_ui_backend_1, metaflow-service_migration_1, metaflow-service_metadata_1
db_1          | The files belonging to this database system will be owned by user "postgres".
db_1          | This user must also own the server process.
db_1          |
db_1          | The database cluster will be initialized with locale "en_US.utf8".
db_1          | The default database encoding has accordingly been set to "UTF8".
db_1          | The default text search configuration will be set to "english".
db_1          |
db_1          | Data page checksums are disabled.
db_1          |
db_1          | fixing permissions on existing directory /var/lib/postgresql/data ... ok
db_1          | creating subdirectories ... ok
db_1          | selecting default max_connections ... 100
db_1          | selecting default shared_buffers ... 128MB
db_1          | selecting default timezone ... Etc/UTC
db_1          | selecting dynamic shared memory implementation ... posix
db_1          | creating configuration files ... ok
db_1          | running bootstrap script ... ok
db_1          | performing post-bootstrap initialization ... ok
db_1          | syncing data to disk ... ok
db_1          |
db_1          | WARNING: enabling "trust" authentication for local connections
db_1          | You can change this by editing pg_hba.conf or using the option -A, or
db_1          | --auth-local and --auth-host, the next time you run initdb.
db_1          |
db_1          | Success. You can now start the database server using:
db_1          |
db_1          |     pg_ctl -D /var/lib/postgresql/data -l logfile start
db_1          |
db_1          | waiting for server to start....2022-04-07 13:42:03.280 UTC [49] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
db_1          | 2022-04-07 13:42:03.304 UTC [50] LOG:  database system was shut down at 2022-04-07 13:42:02 UTC
db_1          | 2022-04-07 13:42:03.312 UTC [49] LOG:  database system is ready to accept connections
db_1          |  done
db_1          | server started
db_1          |
db_1          | /usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
db_1          |
db_1          | 2022-04-07 13:42:03.602 UTC [49] LOG:  received fast shutdown request
db_1          | waiting for server to shut down....2022-04-07 13:42:03.605 UTC [49] LOG:  aborting any active transactions
db_1          | 2022-04-07 13:42:03.611 UTC [49] LOG:  background worker "logical replication launcher" (PID 56) exited with exit code 1
db_1          | 2022-04-07 13:42:03.611 UTC [51] LOG:  shutting down
db_1          | 2022-04-07 13:42:03.637 UTC [49] LOG:  database system is shut down
db_1          |  done
db_1          | server stopped
db_1          |
db_1          | PostgreSQL init process complete; ready for start up.
db_1          |
db_1          | 2022-04-07 13:42:03.729 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
db_1          | 2022-04-07 13:42:03.729 UTC [1] LOG:  listening on IPv6 address "::", port 5432
db_1          | 2022-04-07 13:42:03.741 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
db_1          | 2022-04-07 13:42:03.767 UTC [68] LOG:  database system was shut down at 2022-04-07 13:42:03 UTC
db_1          | 2022-04-07 13:42:03.776 UTC [1] LOG:  database system is ready to accept connections
metadata_1    | INFO:AsyncPostgresDB:global:Connection established.
metadata_1    |    Pool min: 1 max: 10
metadata_1    |
ui_backend_1  | INFO:AsyncPostgresDB:ui:Setting up notify trigger for flows_v3
ui_backend_1  |    Keys: ['flow_id']
db_1          | 2022-04-07 13:42:06.024 UTC [79] ERROR:  relation "public.flows_v3" does not exist
db_1          | 2022-04-07 13:42:06.024 UTC [79] STATEMENT:
db_1          | 	            CREATE TRIGGER notify_ui_flows_v3 AFTER INSERT OR UPDATE OR DELETE ON public.flows_v3
db_1          | 	                FOR EACH ROW EXECUTE PROCEDURE public.notify_ui_flows_v3();
db_1          |
ui_backend_1  | ERROR:AsyncPostgresDB:ui:Exception occurred
ui_backend_1  | Traceback (most recent call last):
ui_backend_1  |   File "/root/services/data/postgres_async_db.py", line 85, in _init
ui_backend_1  |     await table._init(create_triggers=create_triggers)
ui_backend_1  |   File "/root/services/data/postgres_async_db.py", line 162, in _init
ui_backend_1  |     await PostgresUtils.setup_trigger_notify(db=self.db, table_name=self.table_name, keys=self.trigger_keys)
ui_backend_1  |   File "/root/services/data/postgres_async_db.py", line 433, in setup_trigger_notify
ui_backend_1  |     commands=_commands
ui_backend_1  |   File "/root/services/data/postgres_async_db.py", line 365, in create_trigger_if_missing
ui_backend_1  |     await cur.execute(command)
ui_backend_1  |   File "/usr/local/lib/python3.7/site-packages/aiopg/connection.py", line 426, in execute
ui_backend_1  |     await self._conn._poll(waiter, timeout)
ui_backend_1  |   File "/usr/local/lib/python3.7/site-packages/aiopg/connection.py", line 881, in _poll
ui_backend_1  |     await asyncio.wait_for(self._waiter, timeout)
ui_backend_1  |   File "/usr/local/lib/python3.7/asyncio/tasks.py", line 442, in wait_for
ui_backend_1  |     return fut.result()
ui_backend_1  |   File "/usr/local/lib/python3.7/site-packages/aiopg/connection.py", line 788, in _ready
ui_backend_1  |     state = self._conn.poll()
ui_backend_1  | psycopg2.errors.UndefinedTable: relation "public.flows_v3" does not exist
ui_backend_1  |
ui_backend_1  | /usr/local/lib/python3.7/site-packages/aiopg/pool.py:479: ResourceWarning: Unclosed 1 connections in <aiopg.pool.Pool object at 0x7f98f418fad0>
ui_backend_1  |   f"Unclosed {left} connections in {self!r}", ResourceWarning
ui_backend_1  | INFO:AsyncPostgresDB:ui:Setting up notify trigger for flows_v3
ui_backend_1  |    Keys: ['flow_id']
db_1          | 2022-04-07 13:42:07.061 UTC [80] ERROR:  relation "public.flows_v3" does not exist
db_1          | 2022-04-07 13:42:07.061 UTC [80] STATEMENT:
db_1          | 	            CREATE TRIGGER notify_ui_flows_v3 AFTER INSERT OR UPDATE OR DELETE ON public.flows_v3
db_1          | 	                FOR EACH ROW EXECUTE PROCEDURE public.notify_ui_flows_v3();
db_1          |
ui_backend_1  | ERROR:AsyncPostgresDB:ui:Exception occurred
ui_backend_1  | Traceback (most recent call last):
ui_backend_1  |   File "/root/services/data/postgres_async_db.py", line 85, in _init
ui_backend_1  |     await table._init(create_triggers=create_triggers)
ui_backend_1  |   File "/root/services/data/postgres_async_db.py", line 162, in _init
ui_backend_1  |     await PostgresUtils.setup_trigger_notify(db=self.db, table_name=self.table_name, keys=self.trigger_keys)
ui_backend_1  |   File "/root/services/data/postgres_async_db.py", line 433, in setup_trigger_notify
ui_backend_1  |     commands=_commands
ui_backend_1  |   File "/root/services/data/postgres_async_db.py", line 365, in create_trigger_if_missing
ui_backend_1  |     await cur.execute(command)
ui_backend_1  |   File "/usr/local/lib/python3.7/site-packages/aiopg/connection.py", line 426, in execute
ui_backend_1  |     await self._conn._poll(waiter, timeout)
ui_backend_1  |   File "/usr/local/lib/python3.7/site-packages/aiopg/connection.py", line 881, in _poll
ui_backend_1  |     await asyncio.wait_for(self._waiter, timeout)
ui_backend_1  |   File "/usr/local/lib/python3.7/asyncio/tasks.py", line 442, in wait_for
ui_backend_1  |     return fut.result()
ui_backend_1  |   File "/usr/local/lib/python3.7/site-packages/aiopg/connection.py", line 788, in _ready
ui_backend_1  |     state = self._conn.poll()
ui_backend_1  | psycopg2.errors.UndefinedTable: relation "public.flows_v3" does not exist
ui_backend_1  |
ui_backend_1  | /usr/local/lib/python3.7/site-packages/aiopg/pool.py:479: ResourceWarning: Unclosed 1 connections in <aiopg.pool.Pool object at 0x7f98f418fe50>
ui_backend_1  |   f"Unclosed {left} connections in {self!r}", ResourceWarning
ui_backend_1  | INFO:AsyncPostgresDB:ui:Setting up notify trigger for flows_v3
ui_backend_1  |    Keys: ['flow_id']
db_1          | 2022-04-07 13:42:08.083 UTC [81] ERROR:  relation "public.flows_v3" does not exist
db_1          | 2022-04-07 13:42:08.083 UTC [81] STATEMENT:
db_1          | 	            CREATE TRIGGER notify_ui_flows_v3 AFTER INSERT OR UPDATE OR DELETE ON public.flows_v3
db_1          | 	                FOR EACH ROW EXECUTE PROCEDURE public.notify_ui_flows_v3();
db_1          |
ui_backend_1  | ERROR:AsyncPostgresDB:ui:Exception occurred
ui_backend_1  | Traceback (most recent call last):
ui_backend_1  |   File "/root/services/data/postgres_async_db.py", line 85, in _init
ui_backend_1  |     await table._init(create_triggers=create_triggers)
ui_backend_1  |   File "/root/services/data/postgres_async_db.py", line 162, in _init
ui_backend_1  |     await PostgresUtils.setup_trigger_notify(db=self.db, table_name=self.table_name, keys=self.trigger_keys)
ui_backend_1  |   File "/root/services/data/postgres_async_db.py", line 433, in setup_trigger_notify
ui_backend_1  |     commands=_commands
ui_backend_1  |   File "/root/services/data/postgres_async_db.py", line 365, in create_trigger_if_missing
ui_backend_1  |     await cur.execute(command)
ui_backend_1  |   File "/usr/local/lib/python3.7/site-packages/aiopg/connection.py", line 426, in execute
ui_backend_1  |     await self._conn._poll(waiter, timeout)
ui_backend_1  |   File "/usr/local/lib/python3.7/site-packages/aiopg/connection.py", line 881, in _poll
ui_backend_1  |     await asyncio.wait_for(self._waiter, timeout)
ui_backend_1  |   File "/usr/local/lib/python3.7/asyncio/tasks.py", line 442, in wait_for
ui_backend_1  |     return fut.result()
ui_backend_1  |   File "/usr/local/lib/python3.7/site-packages/aiopg/connection.py", line 788, in _ready
ui_backend_1  |     state = self._conn.poll()
ui_backend_1  | psycopg2.errors.UndefinedTable: relation "public.flows_v3" does not exist
ui_backend_1  |
ui_backend_1  | Traceback (most recent call last):
ui_backend_1  |   File "/usr/local/bin/ui_backend_service", line 33, in <module>
ui_backend_1  |     sys.exit(load_entry_point('metadata-service', 'console_scripts', 'ui_backend_service')())
ui_backend_1  |   File "/root/services/ui_backend_service/ui_server.py", line 129, in main
ui_backend_1  |     the_app = app(loop, DBConfiguration())
ui_backend_1  |   File "/root/services/ui_backend_service/ui_server.py", line 54, in app
ui_backend_1  |     loop.run_until_complete(async_db._init(db_conf=db_conf, create_triggers=DB_TRIGGER_CREATE))
ui_backend_1  |   File "/usr/local/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
ui_backend_1  |     return future.result()
ui_backend_1  |   File "/root/services/data/postgres_async_db.py", line 97, in _init
ui_backend_1  |     raise e
ui_backend_1  |   File "/root/services/data/postgres_async_db.py", line 85, in _init
ui_backend_1  |     await table._init(create_triggers=create_triggers)
ui_backend_1  |   File "/root/services/data/postgres_async_db.py", line 162, in _init
ui_backend_1  |     await PostgresUtils.setup_trigger_notify(db=self.db, table_name=self.table_name, keys=self.trigger_keys)
ui_backend_1  |   File "/root/services/data/postgres_async_db.py", line 433, in setup_trigger_notify
ui_backend_1  |     commands=_commands
ui_backend_1  |   File "/root/services/data/postgres_async_db.py", line 365, in create_trigger_if_missing
ui_backend_1  |     await cur.execute(command)
ui_backend_1  |   File "/usr/local/lib/python3.7/site-packages/aiopg/connection.py", line 426, in execute
ui_backend_1  |     await self._conn._poll(waiter, timeout)
ui_backend_1  |   File "/usr/local/lib/python3.7/site-packages/aiopg/connection.py", line 881, in _poll
ui_backend_1  |     await asyncio.wait_for(self._waiter, timeout)
ui_backend_1  |   File "/usr/local/lib/python3.7/asyncio/tasks.py", line 442, in wait_for
ui_backend_1  |     return fut.result()
ui_backend_1  |   File "/usr/local/lib/python3.7/site-packages/aiopg/connection.py", line 788, in _ready
ui_backend_1  |     state = self._conn.poll()
ui_backend_1  | psycopg2.errors.UndefinedTable: relation "public.flows_v3" does not exist
ui_backend_1  |
metaflow-service_ui_backend_1 exited with code 1

Metaflow UI log ERROR

Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/root/services/ui_backend_service/data/cache/client/cache_server.py", line 307, in
cli(auto_envvar_prefix='MFCACHE')
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1128, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/root/services/ui_backend_service/data/cache/client/cache_server.py", line 301, in cli
Scheduler(store, max_actions).loop()
File "/root/services/ui_backend_service/data/cache/client/cache_server.py", line 196, in init
self.pool = multiprocessing.Pool(
File "/usr/local/lib/python3.11/multiprocessing/context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "/usr/local/lib/python3.11/multiprocessing/pool.py", line 215, in init
self._repopulate_pool()
File "/usr/local/lib/python3.11/multiprocessing/pool.py", line 306, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "/usr/local/lib/python3.11/multiprocessing/pool.py", line 329, in _repopulate_pool_static
w.start()
File "/usr/local/lib/python3.11/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/lib/python3.11/multiprocessing/context.py", line 281, in _Popen
return Popen(process_obj)
File "/usr/local/lib/python3.11/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/local/lib/python3.11/multiprocessing/popen_fork.py", line 71, in _launch
code = process_obj._bootstrap(parent_sentinel=child_r)
File "/usr/local/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.11/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/root/services/ui_backend_service/data/cache/client/cache_worker.py", line 29, in execute_action
execute(tempdir, action_cls, request)
File "/root/services/ui_backend_service/data/cache/client/cache_worker.py", line 51, in execute
res = action_cls.execute(
File "/root/services/ui_backend_service/data/cache/get_log_file_action.py", line 133, in execute
with streamed_errors(stream_output):
File "/usr/local/lib/python3.11/contextlib.py", line 155, in exit
self.gen.throw(typ, value, traceback)
File "/root/services/ui_backend_service/data/cache/utils.py", line 130, in streamed_errors
get_traceback_str()
File "/root/services/ui_backend_service/data/cache/utils.py", line 124, in streamed_errors
yield
File "/root/services/ui_backend_service/data/cache/get_log_file_action.py", line 136, in execute
current_hash = log_provider.get_log_hash(task, logtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/services/ui_backend_service/data/cache/get_log_file_action.py", line 270, in get_log_hash
return get_log_size(task, logtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/services/ui_backend_service/data/cache/get_log_file_action.py", line 177, in get_log_size
return task.stderr_size if logtype == STDERR else task.stdout_size
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/metaflow/client/core.py", line 1317, in stdout_size
return self._get_logsize("stdout")
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/metaflow/client/core.py", line 1433, in _get_logsize
meta_dict = self.metadata_dict
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/metaflow/client/core.py", line 1135, in metadata_dict
m.name: m.value for m in sorted(self.metadata, key=lambda m: m.created_at)
^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/metaflow/client/core.py", line 1059, in metadata
all_metadata = self._metaflow.metadata.get_object(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/metaflow/metadata/metadata.py", line 425, in get_object
pre_filter = cls._get_object_internal(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/metaflow/plugins/metadata/service.py", line 280, in _get_object_internal
v, _ = cls._request(None, url, "GET")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/metaflow/plugins/metadata/service.py", line 468, in _request
raise ServiceException(

metaflow.plugins.metadata.service.ServiceException: Metadata request (/flows/ParquetCheck/runs/argo-parquetcheck.user.zhangxinyu19.parquetcheck-g8vjm/steps/start/tasks/t-2aa87376/metadata) failed (code 500): "{"err_msg": {"type": "timeout error"}}"

Should ignore errors when loading run parameters

When displaying run details (parameters), some parameters may not be displayable -- this should be ignored as opposed to throwing an error:

Here is a sample error:

Traceback (most recent call last):
  File "/apps/python3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/apps/python3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/apps/mliui/services/ui_backend_service/data/cache/client/cache_server.py", line 307, in <module>
    cli(auto_envvar_prefix='MFCACHE')
  File "/apps/python3.10/lib/python3.10/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/apps/python3.10/lib/python3.10/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/apps/python3.10/lib/python3.10/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/apps/python3.10/lib/python3.10/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/apps/mliui/services/ui_backend_service/data/cache/client/cache_server.py", line 301, in cli
    Scheduler(store, max_actions).loop()
  File "/apps/mliui/services/ui_backend_service/data/cache/client/cache_server.py", line 196, in __init__
    self.pool = multiprocessing.Pool(
  File "/apps/python3.10/lib/python3.10/multiprocessing/context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "/apps/python3.10/lib/python3.10/multiprocessing/pool.py", line 215, in __init__
    self._repopulate_pool()
  File "/apps/python3.10/lib/python3.10/multiprocessing/pool.py", line 306, in _repopulate_pool
    return self._repopulate_pool_static(self._ctx, self.Process,
  File "/apps/python3.10/lib/python3.10/multiprocessing/pool.py", line 329, in _repopulate_pool_static
    w.start()
  File "/apps/python3.10/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/apps/python3.10/lib/python3.10/multiprocessing/context.py", line 281, in _Popen
    return Popen(process_obj)
  File "/apps/python3.10/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/apps/python3.10/lib/python3.10/multiprocessing/popen_fork.py", line 71, in _launch
    code = process_obj._bootstrap(parent_sentinel=child_r)
  File "/apps/python3.10/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/apps/python3.10/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/apps/python3.10/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/apps/mliui/services/ui_backend_service/data/cache/client/cache_worker.py", line 29, in execute_action
    execute(tempdir, action_cls, request)
  File "/apps/mliui/services/ui_backend_service/data/cache/client/cache_worker.py", line 51, in execute
    res = action_cls.execute(
  File "/apps/mliui/services/ui_backend_service/data/cache/get_data_action.py", line 122, in execute
    results[target_key] = cacheable_exception_value(ex)
  File "/apps/mliui/services/ui_backend_service/data/cache/utils.py", line 104, in cacheable_exception_value
    return json.dumps([False, ex.__class__.__name__, str(ex), get_traceback_str()])
  File "/apps/mliui/services/ui_backend_service/data/cache/get_data_action.py", line 120, in execute
    results[target_key] = json.dumps(result)
  File "/apps/python3.10/lib/python3.10/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/apps/python3.10/lib/python3.10/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/apps/python3.10/lib/python3.10/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/apps/python3.10/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '

TypeError: Object of type RootLogger is not JSON serializable
Details
Status
500

support other databases

looks like the service is tied to pg. Probably could switch to sqlalchemy or another package to support a good list of RDBMS.

My initial aim was to use aurora serverless which doesn't yet have pg compatibility yet.

Missing dependencies for gs datastore

If you configure METAFLOW_DEFAULT_DATASTORE: "gs" (and of course set up a GCS bucket as datastore root) and then run the metaflow service, metaflow fails on some google package imports.

AFAIK this is impacting DAG and log viewing in the UI, for anyone using GCP.

Example error (copied from UI)

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/services/ui_backend_service/data/cache/client/cache_server.py", line 307, in <module>
    cli(auto_envvar_prefix='MFCACHE')
  File "/opt/latest/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/opt/latest/lib/python3.7/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/opt/latest/lib/python3.7/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/latest/lib/python3.7/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/root/services/ui_backend_service/data/cache/client/cache_server.py", line 301, in cli
    Scheduler(store, max_actions).loop()
  File "/root/services/ui_backend_service/data/cache/client/cache_server.py", line 199, in __init__
    maxtasksperchild=512,  # Recycle each worker once 512 tasks have been completed
  File "/usr/lib/python3.7/multiprocessing/context.py", line 119, in Pool
    context=self.get_context())
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 176, in __init__
    self._repopulate_pool()
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 241, in _repopulate_pool
    w.start()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.7/multiprocessing/popen_fork.py", line 74, in _launch
    code = process_obj._bootstrap()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/root/services/ui_backend_service/data/cache/client/cache_worker.py", line 29, in execute_action
    execute(tempdir, action_cls, request)
  File "/root/services/ui_backend_service/data/cache/client/cache_worker.py", line 56, in execute
    invalidate_cache=req.get('invalidate_cache', False))
  File "/root/services/ui_backend_service/data/cache/get_log_file_action.py", line 143, in execute
    results = {**existing_keys}
  File "/usr/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/root/services/ui_backend_service/data/cache/utils.py", line 130, in streamed_errors
    get_traceback_str()
  File "/root/services/ui_backend_service/data/cache/utils.py", line 124, in streamed_errors
    yield
  File "/root/services/ui_backend_service/data/cache/get_log_file_action.py", line 136, in execute
    current_hash = log_provider.get_log_hash(task, logtype)
  File "/root/services/ui_backend_service/data/cache/get_log_file_action.py", line 270, in get_log_hash
    return get_log_size(task, logtype)
  File "/root/services/ui_backend_service/data/cache/get_log_file_action.py", line 177, in get_log_size
    return task.stderr_size if logtype == STDERR else task.stdout_size
  File "/opt/latest/lib/python3.7/site-packages/metaflow/client/core.py", line 1317, in stdout_size
    return self._get_logsize("stdout")
  File "/opt/latest/lib/python3.7/site-packages/metaflow/client/core.py", line 1438, in _get_logsize
    return self._log_size(stream, meta_dict)
  File "/opt/latest/lib/python3.7/site-packages/metaflow/client/core.py", line 1525, in _log_size
    ds_type, ds_root, stream, attempt, *self.path_components
  File "/opt/latest/lib/python3.7/site-packages/metaflow/client/filecache.py", line 148, in get_log_size
    return task_ds.get_log_size(LOG_SOURCES, logtype)
  File "/opt/latest/lib/python3.7/site-packages/metaflow/datastore/task_datastore.py", line 45, in method
    return f(self, *args, **kwargs)
  File "/opt/latest/lib/python3.7/site-packages/metaflow/datastore/task_datastore.py", line 413, in get_log_size
    sizes = [self._storage_impl.size_file(p) for p in paths]
  File "/opt/latest/lib/python3.7/site-packages/metaflow/datastore/task_datastore.py", line 413, in <listcomp>
    sizes = [self._storage_impl.size_file(p) for p in paths]
  File "/opt/latest/lib/python3.7/site-packages/metaflow/plugins/datastores/gs_storage.py", line 195, in size_file
    import google.api_core.exceptions

ModuleNotFoundError: No module named 'google'

It seems to me that the UI backend service is missing google-cloud-storage from its requirements.txt. I applied this change and built a version of the metaflow service image myself, this resolved the issue for me.

Schema "public" hardcode in postgres_async_db.py

Although the function setup_trigger_notify allow you to pass postgresql schema as a parameter, the line where it is invoked (line 164 on same file), schema parameter is not passed. I would suggest to create it an env variable and pass it. If variable is empty, then public schema in postrges would be used.

I installed Metaflow service on a custom schema but when i tried to start Metaflow UI, it was failing because tables were not on PUBLIC schema. I implemented the proposed solution and it did work.

Thanks a lot for your support in advance and great job!!

Metadata DB Password should be quoted

When launching a Metaflow stack using the recommended CloudFormation template, an RDS database password is created using AWS Secrets Manager. This is currently configured to exclude the characters "@/\.

Unfortunately, this still allows potentially problematic characters like # and ' to be included in the password.

If ' is included as the first character in the password, the Metadata Service fails to start as follows:

/migration_service/migration_server.py:17: DeprecationWarning: loop argument is deprecated
  app = web.Application(loop=loop)
printing connection exception: invalid dsn: unterminated quoted string in connection info string

printing connection exception: invalid dsn: unterminated quoted string in connection info string

printing connection exception: invalid dsn: unterminated quoted string in connection info string

/migration_service/migration_server.py:28: DeprecationWarning: Application.make_handler(...) is deprecated, use AppRunner API instead
  handler = the_app.make_handler()
serving on ('0.0.0.0', 8082)
Error handling request
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/aiohttp/web_protocol.py", line 418, in start
    resp = await task
  File "/usr/local/lib/python3.7/dist-packages/aiohttp/web_app.py", line 458, in _handle
    resp = await handler(request)
  File "/migration_service/api/admin.py", line 53, in version
    version = await ApiUtils.get_latest_compatible_version()
  File "/migration_service/api/utils.py", line 56, in get_latest_compatible_version
    is_present = await PostgresUtils.is_present("flows_v3")
  File "/migration_service/data/postgres_async_db.py", line 10, in is_present
    with (await AsyncPostgresDB.get_instance().pool.cursor()) as cur:
AttributeError: 'NoneType' object has no attribute 'cursor'
/bin/sh: 1: metadata_service: not found

The password should be escaped when setting up the DSN, otherwise the service can fail to start, resulting in a Metaflow stack that does not work.

Integrating with metaflow runs

How do I configure/integrate metaflow to this metadata service? Either locally or deploy when deployed in aws?

Based on metaflow configure's usage, I can't see anything related to configuring just the metadata service:

Screen Shot 2020-01-11 at 10 40 44 AM

Introduce tests for the migration service.

Suggested Improvement
Introduce tests for the migration service that cover basic acceptance behaviour:

  • Migrations can be applied, and rolled back successfully (no broken migrations exist)
  • Migration service can be introduced after-the-fact to an existing metadata service deployment (No conflicts with migrations with CREATE TABLE that the metadata service has applied)

Motivation
Currently introducing any migrations to the service relies heavily on manual testing and is prone to introducing unexpected issues. We also have the existing issue that the migration service cannot be applied successfully against an existing schema that was created by the metadata service. The migrations should preferably be somewhat idempotent (opting for CREATE IF NOT EXISTS) so the service can be an easy opt-in choice later down the line.

Improve heartbeat failure messaging

  1. Expose document explaining how heartbeats are used to mark runs and tasks as failed. This document can be at metaflow.org and in the READMEs of this repository. This will be in addition to https://github.com/Netflix/metaflow-service/blob/master/services/ui_backend_service/docs/environment.md#heartbeat-intervals
  2. When a task or run fails because of a missing heartbeat, show that fact in MFGUI.
  3. Have a default minimum heartbeat and a maximum heartbeat time. If the task/run misses the minimum heartbeat, show it as "pending" and only show it as "failed" when it misses the maximum heartbeat time. This functionality will have to consider resumes and multiple attempts.

The reason for this issue is that some runs/tasks are being marked as "failed" when they have not started yet, and some runs/tasks are still marked as "running" when they have failed but not reached the heartbeat threshold yet.

artifact_v3 table is missing an index for retrieving SFN artifacts

Attempting to retrieve artifacts for SFN executions through the artifacts endpoint will attempt to run a query with run_id and task_name rather than run_number and task_id. The existing table is only indexed on the following PK: (flow_id, run_number, step_name, task_id, attempt_id, name)

Running metaflow-service and UI locally sees artifacts and DAG not displayable

Setting up metaflow-service and UI locally and run an example flow, I can see the Flow information. However, all artifacts are NOT displayed, see 'NoneType' object has no attribute 'get'. But locally on notebook, I am able to get the artifacts, e.g. Task('Flow/8/start/23', attempt=0)['df'].data.

Having a look at the chrom inspect, I can see a 500 internal error related to CORS:

Request URL: http://localhost:8083/flows/RevenuePredictionFlow/runs/8/dag
Request Method: GET
Status Code: 500 Internal Server Error
Remote Address: [::1]:8083
Referrer Policy: strict-origin-when-cross-origin

But in the code, i can see there is loose setting for it already:

headers["Access-Control-Allow-Origin"] = "*"

I think this results in the artifacts not displayable. Can anyone help with that and point out where i might do wrong?

Thank you!

From v2.2.4+ SSL connection break

From v2.2.4 the SSL version are breaking into the connection, its required on AWS RDS to change parameter group: rds.force_ssl: 0 => 1 and manual apply it to the database and reboot the database, else the system wont get up and run healthy again.

Can't start with v2.4.5 in a fresh database

When the version netflixoss/metaflow_metadata_service:v2.4.5 are used its can't migrate a fresh database, I need to go back to 2.0.6 and bump each version, a image version there are over 2 years old, its will be nice if its possible to always can do a fresh install from a image.

500 encountered from metaflow service

@russellbrooks reported the following issue

Metaflow service error:
Metadata request (/flows/TuningXGB/runs/572/steps/start/tasks/1821/metadata) failed (code 500): {"message":"Internal server error"}

For context, this was encountered as part of a hyperparameter tuning framework (each flow run is a model training evaluation) after ~6hrs with 125 runs successfully completed. Everything is executed on Batch with 12 variants being trained in parallel, then feeding those results back to a bayesian optimizer to generate the next.

The cloudwatch logs from Batch indicate that the step completed successfully, and the Metaflow service error was encountered on a separate on-demand EC2 instance that's running the optimizer and executes the flows using asyncio.create_subprocess_shell. Looking at API Gateway, the request rates seemed reasonable and its configured without throttling. RDS showed plenty of CPU credits and barely seemed phased throughout the workload. Each run was executed with --retry but this error seems to have short-circuited that logic and resulted in a hard-stop.

Move on from Python 3.7 before security updates are discontinued.

The main way that the services are released is through the published docker image. This, along with all of the service codebase is currently running and tested on Python 3.7, which is scheduled for EOL the coming summer (https://devguide.python.org/versions/).

As we do not have any backward compatibility requirements for the execution of the codebase due to the way it is released, it could be considered that we move to a more recent version of Python, along with updating the base image for the docker build.

Simplify UI backend caching and improve scalability

Suggested improvement
Caching solution used by the UI backend could be changed to one that performs simple function caching, and one that uses a cache store that is shareable across multiple instances (for example with Redis). There are some promising async caching solutions like https://github.com/aio-libs/aiocache which these features.

Motivation
Current caching layer implementation for the ui_backend_service is instance specific, making horizontal scaling have unnecessary overhead. There is also a lot of boilerplate associated with computing values, caching them, and accessing the cached values. Function caching would also make multi-layer caches easier to implement compared to the current CacheAction approach.

[Tag mutation project] Service-side tag validations

Metaflow client API will enforce these rules:

  • Max size of any single tag
  • Max size of a tag set (e.g. when first initializing a tag set at start of a run, or when doing tag mutation)
  • Tag values must be UTF compatible

We should apply the same rules on the server side for safety in depth.

Building with `Dockerfile.ui_service` fails due to old version of pip

The following error is received when attempting to build the Dockerfile.ui_service file using docker-compose.development.yml

ERROR: Could not build wheels for pygit2 which use PEP 517 and cannot be installed directly
WARNING: You are using pip version 19.1.1, however version 22.0.4 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
The command '/bin/sh -c pip install --editable .' returned a non-zero code: 1
ERROR: Service 'ui_backend' failed to build

The version of pip needs to be upgraded in the Dockerfile to fix this issue

Metaflow Deployed AWS Batch Job Causes Security Hub To Raise Risk Items

Hi, we use batch jobs created via metaflow --with batch option to create metaflow batch jobs on aws. Recently, after new security measures were introduced and enabling of security hub, it is raising high risk items. Below are the items:

  • ECS.1 Amazon ECS task definitions should have secure networking modes and user definitions.
  • ECS.4 containers should run as non-privileged
  • ECS.5 ECS containers should be limited to read-only access to root filesystems
    Details on aws doc for these items can be found here

Are there plans to update metaflow to adhere to these aws security checks?

Bad error handling for big migrations bumps

When the image netflixoss/metaflow_metadata_service change version and the system break its get a database error, and the real problem is its to big migration jump, here it will be a huge help to error handling this with output about the migration are to big and what version of migration you are on today so its more transperetent.

[bug] unable to set up ssl connection to AWS rds

I was trying to use the docker image to set up ssl connection to rds, I hit into some issues:

  1. I realised in run_goose.py there are environment variables set up for setting up ssl. However, the environment variable is not set in docker-compose file.
  2. with the environment variable MF_METADATA_DB_SSL_ROOT_CERT in run_goose.py, we need to parse in the local file path of the certificate, would it be possible to set up a folder in this repo to store some common ca certificates for database so that we can parse in the certificates easier?

Running metaflow-service and metaflow-ui locally leads to hanging stucked state

I am trying to follow documentation here: https://github.com/Netflix/metaflow-ui/blob/master/docs/README.md#running-metaflow-metaflow-service-and-metaflow-ui-locally

However, when running: docker-compose -f docker-compose.development.yml up, the script got stuck and i see logs below:

WARNING: The CUSTOM_QUICKLINKS variable is not set. Defaulting to a blank string.
WARNING: The NOTIFICATIONS variable is not set. Defaulting to a blank string.
WARNING: The PLUGINS variable is not set. Defaulting to a blank string.
Starting metaflow-service_db_1 ... done
Recreating metaflow-service_migration_1 ... done
Recreating metaflow-service_metadata_1   ... done
Recreating metaflow-service_ui_backend_1 ... done
Attaching to metaflow-service_db_1, metaflow-service_migration_1, metaflow-service_metadata_1, metaflow-service_ui_backend_1
db_1          | 
db_1          | PostgreSQL Database directory appears to contain a database; Skipping initialization
db_1          | 
db_1          | 2022-10-10 15:26:45.003 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
db_1          | 2022-10-10 15:26:45.004 UTC [1] LOG:  listening on IPv6 address "::", port 5432
migration_1   | 2022/10/10 15:26:45 goose: no migrations to run. current version: 20220503175500
db_1          | 2022-10-10 15:26:45.007 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
db_1          | 2022-10-10 15:26:45.018 UTC [27] LOG:  database system was shut down at 2022-10-10 15:24:57 UTC
db_1          | 2022-10-10 15:26:45.025 UTC [1] LOG:  database system is ready to accept connections
metaflow-service_migration_1 exited with code 0
metadata_1    | INFO:AsyncPostgresDB:global:Connection established.
metadata_1    |    Pool min: 1 max: 10
metadata_1    | 
ui_backend_1  | INFO:AsyncPostgresDB:ui:Setting up notify trigger for flows_v3
ui_backend_1  |    Keys: ['flow_id']
ui_backend_1  | INFO:AsyncPostgresDB:ui:Setting up notify trigger for runs_v3
ui_backend_1  |    Keys: ['flow_id', 'run_number', 'last_heartbeat_ts']
ui_backend_1  | INFO:AsyncPostgresDB:ui:Setting up notify trigger for steps_v3
ui_backend_1  |    Keys: ['flow_id', 'run_number', 'step_name']
ui_backend_1  | INFO:AsyncPostgresDB:ui:Setting up notify trigger for tasks_v3
ui_backend_1  |    Keys: ['flow_id', 'run_number', 'step_name', 'task_id']
ui_backend_1  | INFO:AsyncPostgresDB:ui:Setting up notify trigger for artifact_v3
ui_backend_1  |    Keys: ['flow_id', 'run_number', 'step_name', 'task_id', 'attempt_id', 'name']
ui_backend_1  | INFO:AsyncPostgresDB:ui:Setting up notify trigger for metadata_v3
ui_backend_1  |    Keys: ['flow_id', 'run_number', 'step_name', 'task_id', 'field_name', 'value']
ui_backend_1  | INFO:AsyncPostgresDB:ui:Connection established.
ui_backend_1  |    Pool min: 1 max: 10
ui_backend_1  | 
ui_backend_1  | INFO:AsyncPostgresDB:ui:cache:Connection established.
ui_backend_1  |    Pool min: 1 max: 10
ui_backend_1  | 
ui_backend_1  | INFO:AsyncPostgresDB:ui:notify:Connection established.
ui_backend_1  |    Pool min: 1 max: 10
ui_backend_1  | 
ui_backend_1  | INFO:ListenNotify:Connection acquired
ui_backend_1  | INFO:AsyncPostgresDB:ui:heartbeat:Connection established.
ui_backend_1  |    Pool min: 1 max: 10
ui_backend_1  | 
ui_backend_1  | INFO:AsyncPostgresDB:ui:websocket:Connection established.
ui_backend_1  |    Pool min: 1 max: 10
ui_backend_1  | 
ui_backend_1  | INFO:AutoCompleteApi:0 cached tags in memory consuming 0 Mb
ui_backend_1  | INFO:AsyncPostgresDB:global:Connection established.
ui_backend_1  |    Pool min: 1 max: 10
ui_backend_1  | 
ui_backend_1  | INFO:root:Metadata service available at http://0.0.0.0:8083/metadata
ui_backend_1  | INFO:Plugin:Init plugins
ui_backend_1  | INFO:Plugin:Plugins ready: []
ui_backend_1  | INFO:CacheAsyncClient:cache_data/artifact_search:Message: CACHE [2022-10-10T15:26:47.720159] IO ERROR: (None) Cache initialized with 0 permanents objects, 0 disposable objects, totaling 0 bytes.
ui_backend_1  | INFO:CacheAsyncClient:cache_data/artifact_search:{'op': 'worker_terminate', 'keys': 1, 'stream_key': None, 'idempotency_token': 'a2f7330460012044da92a7cbd06ced73685afd96'}
ui_backend_1  | INFO:CacheAsyncClient:cache_data/artifact_search:Pending stream keys: []
ui_backend_1  | INFO:CacheAsyncClient:cache_data/dag:Message: CACHE [2022-10-10T15:26:48.470189] IO ERROR: (None) Cache initialized with 0 permanents objects, 0 disposable objects, totaling 0 bytes.
ui_backend_1  | INFO:CacheAsyncClient:cache_data/dag:{'op': 'worker_terminate', 'keys': 1, 'stream_key': None, 'idempotency_token': '9a4620847aea07945ef8b9a0ff9ec3cf6e26f687'}
ui_backend_1  | INFO:CacheAsyncClient:cache_data/dag:Pending stream keys: []
ui_backend_1  | INFO:CacheAsyncClient:cache_data/log:Message: CACHE [2022-10-10T15:26:49.075326] IO ERROR: (None) Cache initialized with 0 permanents objects, 0 disposable objects, totaling 0 bytes.
ui_backend_1  | INFO:CacheAsyncClient:cache_data/log:{'op': 'worker_terminate', 'keys': 1, 'stream_key': None, 'idempotency_token': 'd2ca0055884473b601dfe3c78c31c1917a550ad9'}
ui_backend_1  | INFO:CacheAsyncClient:cache_data/log:Pending stream keys: []
ui_backend_1  | INFO:CacheStore:Preloading 0 runs

And I cannot access http://0.0.0.0:8083/metadata either.

The next step is to run: docker run -p 3000:3000 -e METAFLOW_SERVICE=http://localhost:8083/ metaflow-ui:latest, but it is not clear to me where I should run this?


Thanks in advance for any hints and help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.