Giter Site home page Giter Site logo

polyaxon / charts Goto Github PK

View Code? Open in Web Editor NEW
34.0 34.0 44.0 1.37 MB

Helm charts for creating reproducible and maintainable deployments of Polyaxon with Kubernetes.

Home Page: https://charts.polyaxon.com

License: Apache License 2.0

Smarty 89.31% Shell 7.57% Python 1.70% Mustache 1.42%
deep-learning distributed-systems gitops helm helm-chart helm-charts k8s kubernetes machine-learning mlops polyaxon pytorch scikit-learn tensorflow

charts's Introduction

License: Apache 2 Polyaxon API Slack

Docs Release GitHub GitHub

CLI Haupt Hypertune Traceml Codacy Badge

Reproduce, Automate, Scale your data science

Welcome to Polyaxon, a platform for building, training, and monitoring large scale deep learning applications. We are making a system to solve reproducibility, automation, and scalability for machine learning applications.

Polyaxon deploys into any data center, cloud provider, or can be hosted and managed by Polyaxon, and it supports all the major deep learning frameworks such as Tensorflow, MXNet, Caffe, Torch, etc.

Polyaxon makes it faster, easier, and more efficient to develop deep learning applications by managing workloads with smart container and node management. And it turns GPU servers into shared, self-service resources for your team or organization.


demo


Install

TL;DR;

  • Install CLI

    # Install Polyaxon CLI
    $ pip install -U polyaxon
  • Create a deployment

    # Create a namespace
    $ kubectl create namespace polyaxon
    
    # Add Polyaxon charts repo
    $ helm repo add polyaxon https://charts.polyaxon.com
    
    # Deploy Polyaxon
    $ polyaxon admin deploy -f config.yaml
    
    # Access API
    $ polyaxon port-forward

Please check polyaxon installation guide

Quick start

TL;DR;

  • Start a project

    # Create a project
    $ polyaxon project create --name=quick-start --description='Polyaxon quick start.'
  • Train and track logs & resources

    # Upload code and start experiments
    $ polyaxon run -f experiment.yaml -u -l
  • Dashboard

    # Start Polyaxon dashboard
    $ polyaxon dashboard
    
    Dashboard page will now open in your browser. Continue? [Y/n]: y

compare dashboards


  • Notebook
    # Start Jupyter notebook for your project
    $ polyaxon run --hub notebook

compare


  • Tensorboard
    # Start TensorBoard for a run's output
    $ polyaxon run --hub tensorboard -P uuid=UUID

tensorboard


Please check our quick start guide to start training your first experiment.

Distributed job

Polyaxon supports and simplifies distributed jobs. Depending on the framework you are using, you need to deploy the corresponding operator, adapt your code to enable the distributed training, and update your polyaxonfile.

Here are some examples of using distributed training:

Hyperparameters tuning

Polyaxon has a concept for suggesting hyperparameters and managing their results very similar to Google Vizier called experiment groups. An experiment group in Polyaxon defines a search algorithm, a search space, and a model to train.

Parallel executions

You can run your processing or model training jobs in parallel, Polyaxon provides a mapping abstraction to manage concurrent jobs.

DAGs and workflows

Polyaxon DAGs is a tool that provides container-native engine for running machine learning pipelines. A DAG manages multiple operations with dependencies. Each operation is defined by a component runtime. This means that operations in a DAG can be jobs, services, distributed jobs, parallel executions, or nested DAGs.

Architecture

Polyaxon architecture

Documentation

Check out our documentation to learn more about Polyaxon.

Dashboard

Polyaxon comes with a dashboard that shows the projects and experiments created by you and your team members.

To start the dashboard, just run the following command in your terminal

$ polyaxon dashboard -y

Project status

Polyaxon is stable and it's running in production mode at many startups and Fortune 500 companies.

Contributions

Please follow the contribution guide line: Contribute to Polyaxon.

Research

If you use Polyaxon in your academic research, we would be grateful if you could cite it.

Feel free to contact us, we would love to learn about your project and see how we can support your custom need.

charts's People

Contributors

alecrubin avatar antonfriberg avatar boniek83 avatar dxist avatar elyase avatar faezs avatar gzcf avatar j-kohn avatar jmvizcainoio avatar mmourafiq avatar naetherm avatar nathansmyth avatar polyaxon-ci avatar polyaxon-team avatar ricardofbarros avatar rxminus avatar shotarok avatar vfdev-5 avatar wbuchwalter avatar zeyaddeeb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

charts's Issues

Helm chart fails if tls is enabled

I tried to follow the SSL guide here with the example ingress configuration:

serviceType: ClusterIP
ingress:
  enabled: true
  hostName: polyaxon.acme.com
  tls:
  - secretName: polyaxon.acme-tls
    hosts:
      - polyaxon.acme.com

Unfortunately helm upgrade using version 0.4.3 fails with the following error:

Error: YAML parse error on polyaxon/templates/ing.yaml: error converting YAML to JSON: yaml: line 39: did not find expected '-' indicator
Error: UPGRADE FAILED: YAML parse error on polyaxon/templates/ing.yaml: error converting YAML to JSON: yaml: line 39: did not find expected '-' indicator

I seams like this line is causing the trouble.
For now switching back to http works for me, or am I missing something?

install polyaxon by using helm (offline) error

Hello! I am interesting in the polyaxon and want to run demo by installing in minikube. But I met the problem at the very beginning.
Trying install polyaxon in minikube.

  • start minikube
    minikube start --cpus 4 --memory 8192 --disk-size=40g --driver=hyperkit

  • download the polyaxon-charts and generate polyaxon-1.0.8.tgz

  • using helm install polyaxon-1.0.8.tgz offline
    helm install polyaxon-1.0.8.tgz

but it goes....

Error: render error in "polyaxon/templates/streams-deployment.yaml": template: polyaxon/templates/streams-deployment.yaml:63:3: executing "polyaxon/templates/streams-deployment.yaml" at <include "config.artifactsStore.mount" .>: error calling include: template: polyaxon/templates/partials/_stores.tpl:16:39: executing "config.artifactsStore.mount" at <eq .Values.artifactsStore.kind "host_path">: error calling eq: invalid type for comparison

I got the problem just like that.So is anyone else has same problem just like me?

version information:

  • my os is macos 10.15.3.
  • minikube version is v1.9.2
  • helm version is
Client: &version.Version{SemVer:"v2.16.6", GitCommit:"dd2e5695da88625b190e6b22e9542550ab503a47", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.16.6", GitCommit:"dd2e5695da88625b190e6b22e9542550ab503a47", GitTreeState:"clean"}
  • polyaxon charts version is the newest by now.

THX! BEST WISHES~

polyaxon-rabbitmq-ha-0 state is CrashLoopBackOff

Hello All,

It seems that the polyaxon-rabbitmq-ha-0 is not starting.

I used the following command to install the chart.

sudo polyaxon admin deploy -f config.yml

the log of the container is:

2020-03-08 02:19:38.666 [info] <0.8.0> Feature flags: list of feature flags found:
2020-03-08 02:19:38.666 [info] <0.8.0> Feature flags:   [ ] drop_unroutable_metric
2020-03-08 02:19:38.666 [info] <0.8.0> Feature flags:   [ ] empty_basic_get_metric
2020-03-08 02:19:38.666 [info] <0.8.0> Feature flags:   [ ] implicit_default_bindings
2020-03-08 02:19:38.666 [info] <0.8.0> Feature flags:   [ ] quorum_queue
2020-03-08 02:19:38.666 [info] <0.8.0> Feature flags:   [ ] virtual_host_metadata
2020-03-08 02:19:38.666 [info] <0.8.0> Feature flags: feature flag states written to disk: yes
2020-03-08 02:19:38.698 [info] <0.266.0> ra: meta data store initialised. 0 record(s) recovered
2020-03-08 02:19:38.698 [info] <0.271.0> WAL: recovering ["/var/lib/rabbitmq/mnesia/rabbit@polyaxon-rabbitmq-ha-0.polyaxon-rabbitmq-ha-discovery.polyaxon.svc.cluster.local/quorum/rabbit@polyaxon-rabbitmq-ha-0.polyaxon-rabbitmq-ha-discovery.polyaxon.svc.cluster.local/00000003.wal"]
2020-03-08 02:19:38.699 [info] <0.275.0>
 Starting RabbitMQ 3.8.0 on Erlang 22.1.5
 Copyright (C) 2007-2019 Pivotal Software, Inc.
 Licensed under the MPL.  See https://www.rabbitmq.com/

  ##  ##      RabbitMQ 3.8.0
  ##  ##
  ##########  Copyright (C) 2007-2019 Pivotal Software, Inc.
  ######  ##
  ##########  Licensed under the MPL.  See https://www.rabbitmq.com/

  Doc guides: https://rabbitmq.com/documentation.html
  Support:    https://rabbitmq.com/contact.html
  Tutorials:  https://rabbitmq.com/getstarted.html
  Monitoring: https://rabbitmq.com/monitoring.html

  Logs: <stdout>

  Config file(s): /etc/rabbitmq/rabbitmq.conf

  Starting broker...2020-03-08 02:19:38.699 [info] <0.275.0>
 node           : rabbit@polyaxon-rabbitmq-ha-0.polyaxon-rabbitmq-ha-discovery.polyaxon.svc.cluster.local
 home dir       : /var/lib/rabbitmq
 config file(s) : /etc/rabbitmq/rabbitmq.conf
 cookie hash    : z9Efk6foMzTWv7yMOCE7Sg==
 log(s)         : <stdout>
 database dir   : /var/lib/rabbitmq/mnesia/rabbit@polyaxon-rabbitmq-ha-0.polyaxon-rabbitmq-ha-discovery.polyaxon.svc.cluster.local
2020-03-08 02:19:38.712 [info] <0.275.0> Running boot step pre_boot defined by app rabbit
2020-03-08 02:19:38.712 [info] <0.275.0> Running boot step rabbit_core_metrics defined by app rabbit
2020-03-08 02:19:38.713 [info] <0.275.0> Running boot step rabbit_alarm defined by app rabbit
2020-03-08 02:19:38.715 [info] <0.281.0> Memory high watermark set to 244 MiB (256000000 bytes) of 64408 MiB (67536941056 bytes) total
2020-03-08 02:19:38.717 [info] <0.283.0> Enabling free disk space monitoring
2020-03-08 02:19:38.717 [info] <0.283.0> Disk free limit set to 50MB
2020-03-08 02:19:38.719 [info] <0.275.0> Running boot step code_server_cache defined by app rabbit
2020-03-08 02:19:38.719 [info] <0.275.0> Running boot step file_handle_cache defined by app rabbit
2020-03-08 02:19:38.719 [info] <0.286.0> Limiting to approx 65436 file handles (58890 sockets)
2020-03-08 02:19:38.720 [info] <0.287.0> FHC read buffering:  OFF
2020-03-08 02:19:38.720 [info] <0.287.0> FHC write buffering: ON
2020-03-08 02:19:38.720 [info] <0.275.0> Running boot step worker_pool defined by app rabbit
2020-03-08 02:19:38.720 [info] <0.276.0> Will use 16 processes for default worker pool
2020-03-08 02:19:38.720 [info] <0.276.0> Starting worker pool 'worker_pool' with 16 processes in it
2020-03-08 02:19:38.720 [info] <0.275.0> Running boot step database defined by app rabbit
2020-03-08 02:19:38.720 [info] <0.275.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit@polyaxon-rabbitmq-ha-0.polyaxon-rabbitmq-ha-discovery.polyaxon.svc.cluster.local is empty. Assuming we need to join an existing cluster or initialise from scratch...
2020-03-08 02:19:38.720 [info] <0.275.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2020-03-08 02:19:38.720 [info] <0.275.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2020-03-08 02:19:38.721 [info] <0.275.0> Peer discovery backend does not support locking, falling back to randomized delay
2020-03-08 02:19:38.721 [info] <0.275.0> Peer discovery backend rabbit_peer_discovery_k8s supports registration.
2020-03-08 02:19:38.721 [info] <0.275.0> Will wait for 1820 milliseconds before proceeding with registration...
2020-03-08 02:19:40.588 [info] <0.275.0> Failed to get nodes from k8s - {failed_connect,[{to_address,{"kubernetes.default.svc.cluster.local",443}},
                 {inet,[inet],nxdomain}]}
2020-03-08 02:19:40.589 [error] <0.274.0> CRASH REPORT Process <0.274.0> with 0 neighbours exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n                 {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from_config/0 line 140 in application_master:init/4 line 138
2020-03-08 02:19:40.589 [info] <0.43.0> Application rabbit exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n                 {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from_config/0 line 140
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,\"{failed_connect,[{to_address,{\\"kubernetes.default.svc.cluster.local\\",443}},\n                 {inet,[inet],nxdomain}]}\"}},[{rabbit_mnesia,init_from_config,0,[{file,\"src/rabbit_mnesia.erl\"},{line,140}]},{rabbit_mnesia,init_with_lock,3,[{file,\"src/rabbit_mnesia.erl\"},{line,120}]},{rabbit_mnesia,init,0,[{file,\"src/rabbit_mnesia.erl\"},{line,87}]},{rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,55}]},{rabbit_boot_steps,run_step,2,[{file,\"src/rabbit_boot_steps.erl\"},{line,59}]},{rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,28}]},{rabbit_boot_steps,run_boot_steps,1,[{file,\"src/rabbit_boot_steps.erl\"},{line,29}]},{rabbit,start,2,[{file,\"src/rabbit.erl\"},{line,975}]}]}}}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,"{failed_connect,[{to_address,{\"kubernetes.defau

Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done

Environment:

  • kubectl
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.6", GitCommit:"72c30166b2105cd7d3350f2c28a219e6abcd79eb", GitTreeState:"clean", BuildDate:"2020-01-18T23:31:31Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.6", GitCommit:"72c30166b2105cd7d3350f2c28a219e6abcd79eb", GitTreeState:"clean", BuildDate:"2020-01-18T23:23:21Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
  • helm
Client: &version.Version{SemVer:"v2.15.1", GitCommit:"cf1de4f8ba70eded310918a8af3a96bfe8e7683b", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.15.1", GitCommit:"cf1de4f8ba70eded310918a8af3a96bfe8e7683b", GitTreeState:"clean"}
  • polyaxon-cli
---
Metadata-Version: 2.1
Name: polyaxon-cli
Version: 0.6.0
Summary: Command Line Interface (CLI) for Polyaxon.
Home-page: https://github.com/polyaxon/polyaxon-cli
Author: Mourad Mourafiq
Author-email: [email protected]
Installer: pip
License: MIT
Location: /usr/local/lib/python3.5/dist-packages
Requires: polyaxon-client, raven, pathlib, polyaxon-deploy, click, tabulate, click-completion, polyaxon-dockerizer
Classifiers:
  Programming Language :: Python
  Programming Language :: Python :: 2
  Programming Language :: Python :: 2.7
  Programming Language :: Python :: 3
  Programming Language :: Python :: 3.5
  Programming Language :: Python :: 3.6
  Programming Language :: Python :: 3.7
  Operating System :: OS Independent
  Intended Audience :: Developers
  Intended Audience :: Science/Research
  Topic :: Scientific/Engineering :: Artificial Intelligence
Entry-points:
  [console_scripts]
  polyaxon = polyaxon_cli.main:cli

Can't install on 1.16.2 from master (with 1.16 fix)

Hi,
Doing:

    git clone https://github.com/polyaxon/polyaxon-chart
    helm dependency update ./polyaxon
    helm install ./polyaxon --name=polyaxon --namespace=polyaxon -f ../polyaxon/config.yaml  --dry-run --debug

I get :

[debug] Created tunnel using local port: '38219'

[debug] SERVER: "127.0.0.1:38219"

[debug] Original chart version: ""
[debug] CHART PATH: /.../polyaxon-chart/polyaxon

Error: unable to recognize "": no matches for kind "Deployment" in version "extensions/v1beta1"
   

That's on k8s 1.16.2 using microk8s

kubectl version 
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-17T17:16:09Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T19:09:08Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

This is with master head is :

commit 5b5b5f8b52619f98defadc4fcb647abeb0db3993 (HEAD -> master, origin/master, origin/HEAD)
Author: Rinat Shigapov <[email protected]>
Date:   Tue Oct 22 20:00:50 2019 +0300

    add K8S 1.16 support (#51)
    
    * add k8s 1.16 support
    
    * streams is unready when api server uses SSL+NodePort
    
    * revert postgres dependency

CrashLoopBackOff polyaxon-redis-*-0

Hello All,

It seems that the both polyaxon-redis-master-0 and polyaxon-redis-slave-0 are not starting.

> kubectl get pods -n polyaxon

NAME                                            READY   STATUS             RESTARTS   AGE
polyaxon-docker-registry-78c9b9c9dd-f8lgk       1/1     Running            0          7m36s
polyaxon-polyaxon-api-59b45bccc6-kl8nj          2/2     Running            0          7m35s
polyaxon-polyaxon-beat-776b89ccfd-tm7cx         2/2     Running            0          7m36s
polyaxon-polyaxon-events-7774c88844-2hzlf       1/1     Running            0          7m35s
polyaxon-polyaxon-hpsearch-cf5ffd5f5-pgsl9      1/1     Running            0          7m36s
polyaxon-polyaxon-k8s-events-6d77b8c499-4m78j   1/1     Running            0          7m36s
polyaxon-polyaxon-monitors-d55dbf7dd-mdvw8      1/1     Running            0          7m36s
polyaxon-polyaxon-scheduler-5767dc68cd-zwvwh    1/1     Running            0          7m35s
polyaxon-postgresql-0                           1/1     Running            0          7m35s
polyaxon-redis-master-0                         0/1     CrashLoopBackOff   6          7m35s
polyaxon-redis-slave-0                          0/1     CrashLoopBackOff   6          7m35s

I used the following commands to install the chart.

# Create a namespace
$ kubectl create namespace polyaxon

# Add Polyaxon charts repo
$ helm repo add polyaxon https://charts.polyaxon.com

# Deploy Polyaxon
$ helm install polyaxon/polyaxon \
    --name=polyaxon \
    --namespace=polyaxon \
    -f config.yaml

The config.yaml file looks like this:

rbac:
  enabled: false

serviceType: NodePort

broker: redis
rabbitmq-ha:
  enabled: false

The log of the polyaxon-redis-master-0 pod is:

> kubectl logs -n polyaxon polyaxon-redis-master-0

1:C 06 May 2020 16:26:51.332 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 06 May 2020 16:26:51.333 # Redis version=5.0.4, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 06 May 2020 16:26:51.333 # Configuration loaded
1:M 06 May 2020 16:26:51.334 * Running mode=standalone, port=6379.
1:M 06 May 2020 16:26:51.334 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 06 May 2020 16:26:51.334 # Server initialized
1:M 06 May 2020 16:26:51.334 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 06 May 2020 16:26:51.334 * Reading RDB preamble from AOF file...
1:M 06 May 2020 16:26:51.335 * Reading the remaining AOF tail...
1:M 06 May 2020 16:26:52.096 # Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>

The log of the polyaxon-redis-slave-0 pod is:

> kubectl logs -n polyaxon polyaxon-redis-slave-0

INFO  ==> ** Starting Redis **
1:C 06 May 2020 16:29:27.315 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 06 May 2020 16:29:27.315 # Redis version=5.0.4, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 06 May 2020 16:29:27.315 # Configuration loaded
1:S 06 May 2020 16:29:27.316 * Running mode=standalone, port=6379.
1:S 06 May 2020 16:29:27.316 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:S 06 May 2020 16:29:27.316 # Server initialized
1:S 06 May 2020 16:29:27.316 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:S 06 May 2020 16:29:27.316 * Reading RDB preamble from AOF file...
1:S 06 May 2020 16:29:27.316 * Reading the remaining AOF tail...
1:S 06 May 2020 16:29:27.348 # Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>

Environment:

  • kubectl
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.6", GitCommit:"72c30166b2105cd7d3350f2c28a219e6abcd79eb", GitTreeState:"clean", BuildDate:"2020-01-18T23:31:31Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.6", GitCommit:"72c30166b2105cd7d3350f2c28a219e6abcd79eb", GitTreeState:"clean", BuildDate:"2020-01-18T23:23:21Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
  • helm
Client: &version.Version{SemVer:"v2.15.1", GitCommit:"cf1de4f8ba70eded310918a8af3a96bfe8e7683b", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.15.1", GitCommit:"cf1de4f8ba70eded310918a8af3a96bfe8e7683b", GitTreeState:"clean"}
  • polyaxon-cli
---
Metadata-Version: 2.1
Name: polyaxon-cli
Version: 0.6.0
Summary: Command Line Interface (CLI) for Polyaxon.
Home-page: https://github.com/polyaxon/polyaxon-cli
Author: Mourad Mourafiq
Author-email: [email protected]
Installer: pip
License: MIT
Location: /usr/local/lib/python3.5/dist-packages
Requires: polyaxon-client, raven, pathlib, polyaxon-deploy, click, tabulate, click-completion, polyaxon-dockerizer
Classifiers:
  Programming Language :: Python
  Programming Language :: Python :: 2
  Programming Language :: Python :: 2.7
  Programming Language :: Python :: 3
  Programming Language :: Python :: 3.5
  Programming Language :: Python :: 3.6
  Programming Language :: Python :: 3.7
  Operating System :: OS Independent
  Intended Audience :: Developers
  Intended Audience :: Science/Research
  Topic :: Scientific/Engineering :: Artificial Intelligence
Entry-points:
  [console_scripts]
  polyaxon = polyaxon_cli.main:cli

no matches for kind "Deployment" in version "extensions/v1beta1"

Hi all

Unable to deploy helm chart with helm 3

Observing

helm install polyaxon --namespace polyaxon polyaxon/polyaxon
Error: unable to build kubernetes objects from release manifest: unable to recognize "": no matches for kind "Deployment" in version "extensions/v1beta1"

Expected:
polyaxon to be deployed on k8s

k8s version

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:18:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2+k3s1", GitCommit:"cdab19b09a84389ffbf57bebd33871c60b1d6b28", GitTreeState:"clean", BuildDate:"2020-01-27T18:09:26Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}

helm version

version.BuildInfo{Version:"v3.0.2", GitCommit:"19e47ee3283ae98139d98460de796c1be1e3975f", GitTreeState:"clean", GoVersion:"go1.13.5"}

Same error even with

postgres:
    enabled: false

Also unable to install version 0.5.6

h install polyaxon --namespace polyaxon polyaxon/polyaxon --version 0.5.6
Error: unable to build kubernetes objects from release manifest: [unable to recognize "": no matches for kind "Deployment" in version "extensions/v1beta1", unable to recognize "": no matches for kind "StatefulSet" in version "apps/v1beta1", unable to recognize "": no matches for kind "StatefulSet" in version "apps/v1beta2"]

Would appreciate if someone could help troubleshoot.

Following the instructions on ReadME for experiment and job fails in yaml.

With the yaml

---
version: 1
kind: job
environment:
  persistence:
    data:
        mountPath: "/data"
        existingClaim: "pvc-data"
        readOnly: false
  resources:
    cpu:
      limits: 1
      requests: 1
    memory:
      requests: 5120
      limits: 5120

build:
  dockerfile: polyaxon/Dockerfile
run:
  cmd:
    - python polyaxon/temp.py
    - sleep 10

I get the error: Polyaxonfile is not valid. Error message `{'environment': {'persistence': {'data': ['Not a valid list.']}}}`.

Private registry for docker images

I'm behind a proxy and I need push every image to a private registry. I think if change the values for something like below solves this issue:

Parameter Description Default
global.imageRegistry Global Docker image registry nil
global.imagePullSecrets Global Docker registry secret names as an array []
api.image.registry API image registry docker.io
api.image.repository API image name polyaxon/polyaxon-api
api.image.tag API image tag 0.5.6

Certificate error when running helm repo add

Hi -

I'm getting the following error when running helm repo add polyaxon https://charts.polyaxon.com:

Error: Looks like "https://charts.polyaxon.com" is not a valid chart repository or cannot be reached: Get https://charts.polyaxon.com/index.yaml: x509: certificate signed by unknown authority

What's strange is that this seems to be dependent upon the environment from which I'm running helm repo add. I can add the repo just fine when running locally or on AWS, but when running on a particular on-prem server, I get the error. All of these environments are running the same OS and the same versions of Kubernetes/Helm.

Running helm repo add incubator https://kubernetes-charts-incubator.storage.googleapis.com/ as a general helm test, it works fine. I also notice that wget https://charts.polyaxon.com/index.yaml and curl https://charts.polyaxon.com/index.yaml throw a similar cert error to the helm command.

I know very little about SSL/TLS, so I'm unsure how to get around this issue. The one thing I've tried is to pass a cert explicitly using helm's --cert-file argument, which did not work.

Any thoughts on what to do here?

gitlab oauth

I can see that the chart was compatible with gitlab auth in the past, but now I'm not sure if it is compatible or only ldap.

Can you clarify it for me?

Thank you

Juanma

ImagePullBackOff for in cluster Docker Registry

Hello All,

I can't start jobs/experiments/notebooks due to them freezing in the starting phase.

After a bit of looking into it I found out that the pods are failing in a ImagePullBackOff.

A quick describe yielded the following events:

Events:
  Type     Reason     Age                     From                 Message
  ----     ------     ----                    ----                 -------
  Normal   Scheduled  7m19s                   default-scheduler    Successfully assigned polyaxon/plx-notebook-cb72a35b76ea4c2fa8a65e795960346f-87f76cd6b-sxvhs to oodapow-pc
  Normal   Pulling    7m18s                   kubelet, oodapow-pc  Pulling image "polyaxon/polyaxon-init:0.6.1"
  Normal   Pulled     7m17s                   kubelet, oodapow-pc  Successfully pulled image "polyaxon/polyaxon-init:0.6.1"
  Normal   Created    7m17s                   kubelet, oodapow-pc  Created container polyaxon-init-job
  Normal   Started    7m16s                   kubelet, oodapow-pc  Started container polyaxon-init-job
  Warning  Failed     6m34s (x3 over 7m16s)   kubelet, oodapow-pc  Error: ErrImagePull
  Warning  Failed     5m55s (x5 over 7m15s)   kubelet, oodapow-pc  Error: ImagePullBackOff
  Normal   Pulling    5m41s (x4 over 7m16s)   kubelet, oodapow-pc  Pulling image "127.0.0.1:31813/quick-start_1:5ef3f22108184273bf4a696378c6a2d5"
  Warning  Failed     5m41s (x4 over 7m16s)   kubelet, oodapow-pc  Failed to pull image "127.0.0.1:31813/quick-start_1:5ef3f22108184273bf4a696378c6a2d5": rpc error: code = Unknown desc = failed to resolve image "127.0.0.1:31813/quick-start_1:5ef3f22108184273bf4a696378c6a2d5": no available registry endpoint: failed to do request: Head https://127.0.0.1:31813/v2/quick-start_1/manifests/5ef3f22108184273bf4a696378c6a2d5: http: server gave HTTP response to HTTPS client
  Normal   BackOff    2m15s (x19 over 7m15s)  kubelet, oodapow-pc  Back-off pulling image "127.0.0.1:31813/quick-start_1:5ef3f22108184273bf4a696378c6a2d5"

Environment:

  • kubectl
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.6", GitCommit:"72c30166b2105cd7d3350f2c28a219e6abcd79eb", GitTreeState:"clean", BuildDate:"2020-01-18T23:31:31Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.6", GitCommit:"72c30166b2105cd7d3350f2c28a219e6abcd79eb", GitTreeState:"clean", BuildDate:"2020-01-18T23:23:21Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
  • helm
Client: &version.Version{SemVer:"v2.15.1", GitCommit:"cf1de4f8ba70eded310918a8af3a96bfe8e7683b", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.15.1", GitCommit:"cf1de4f8ba70eded310918a8af3a96bfe8e7683b", GitTreeState:"clean"}
  • polyaxon-cli
---
Metadata-Version: 2.1
Name: polyaxon-cli
Version: 0.6.0
Summary: Command Line Interface (CLI) for Polyaxon.
Home-page: https://github.com/polyaxon/polyaxon-cli
Author: Mourad Mourafiq
Author-email: [email protected]
Installer: pip
License: MIT
Location: /usr/local/lib/python3.5/dist-packages
Requires: polyaxon-client, raven, pathlib, polyaxon-deploy, click, tabulate, click-completion, polyaxon-dockerizer
Classifiers:
  Programming Language :: Python
  Programming Language :: Python :: 2
  Programming Language :: Python :: 2.7
  Programming Language :: Python :: 3
  Programming Language :: Python :: 3.5
  Programming Language :: Python :: 3.6
  Programming Language :: Python :: 3.7
  Operating System :: OS Independent
  Intended Audience :: Developers
  Intended Audience :: Science/Research
  Topic :: Scientific/Engineering :: Artificial Intelligence
Entry-points:
  [console_scripts]
  polyaxon = polyaxon_cli.main:cli
  • config.yml
rbac:
  enabled: false

serviceType: NodePort

broker: redis
rabbitmq-ha:
  enabled: false

Error: requirements.lock is out of sync with requirements.yaml

Issue: I was trying to install polyaxon with helm in an automated environment, install did fail with message :
"""Error: requirements.lock is out of sync with requirements.yaml"""

While issue can easily be solved manually "in manual deployments",
it would be good for automation to update the lock file in the repository.

Should allow TLS in ingress

we should have an option for injecting tls

ing.yaml

spec:
  tls:
  - hosts:
    - polyaxon.domain.com
    secretName: polyaxon-tls

Failed to delete helm polyaxon

Hi guys,

i found difficulty when deleting polyaxon helm. The command is like this

helm delete polyaxon --purge

But, it's throw error Error: jobs.batch "polyaxon-clean-experiments" already exists

I hope someone can help me with this. Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.