webankfintech / prophecis Goto Github PK
View Code? Open in Web Editor NEWProphecis is a one-stop cloud native machine learning platform.
Home Page: https://github.com/WeBankFinTech/Prophecis
License: Apache License 2.0
Prophecis is a one-stop cloud native machine learning platform.
Home Page: https://github.com/WeBankFinTech/Prophecis
License: Apache License 2.0
根据安装部署文档中的要求准备安装DSS/Linkis平台的Appconn插件部署包及初始化SQL发现存在异常。
release0.3.2未提供对应的插件包、并且0.3.2源码内appconn对应pom依赖为DSS1.0.1 Linkis1.0.3,尝试替换为7月DSS发布版本的DSS1.1.0 Linkis1.1.1进行编译发现代码存在报错:
MLSSOpenRequestRef.java[XX,XX] error: camnot find symbol.
同时发现appconn初始化sql中需要操作的dss_appcation元数据表在DSS1.1.0 Linkis1.1.1的元数据表中已不存在
您好!
wedatasphere/prophecis:metrics-0.2.0
wedatasphere/prophecis:jobmonitor-0.2.0
wedataspere/prophecis:minio-2020-06-14
wedatasphere/prophecis:lcm-0.2.0
wedatasphere/prophecis:trainer-0.2.0
这些镜像都无法拉取,请问仓库里有这些镜像吗?
一直关注Prophecis,请问v0.2.x和v0.3.x这两个版本有大概的发布时间点么?
wget https://get.helm.sh/helm-v3.2.1-linux-amd64.tar.gz
tar -xzvf helm-v3.2.1-linux-amd64.tar.gz
cd linux-amd64/
mv helm /usr/bin/
helm version
helm repo list
helm repo add aliyuncs https://apphub.aliyuncs.com
wget https://github.com/istio/istio/releases/download/1.8.2/istio-1.8.2-linux-amd64.tar.gz
#设置istioctl环境变量
export PATH=$PATH:/opt/istio-1.8.2/bin
#部署
istioctl install
#验证,查看相关Pod是否正常Running
kubectl -n istio-system get pods
wget https://github.com/SeldonIO/seldon-core/archive/refs/tags/v1.13.0.tar.gz
cd seldon-core-1.13.0/helm-charts
helm install seldon-core seldon-core-operator --set usageMetrics.enabled=true --namespace seldon-system --set istio.enabled=true
#如果镜像拉取报错
docker pull registry.cn-shenzhen.aliyuncs.com/shikanon/google_containers.spartakus-amd64:v1.1.0
docker tag registry.cn-shenzhen.aliyuncs.com/shikanon/google_containers.spartakus-amd64:v1.1.0 gcr.io/google_containers/spartakus-amd64:v1.1.0
helm list -n seldon-system
helm del seldon-core -n seldon-system
yum install -y nfs-utils rpcbind
systemctl start rpcbind
systemctl enable rpcbind
systemctl start nfs-server
systemctl enable nfs-server
vim /root/.docker/config.json
#增加如下配置
{
"auths": {
"": {
"auth": ""
}
},
"HttpHeaders": {
"User-Agent": "Docker-Client/20.10.8-ce (linux)"
}
}
mkdir -p /data/bdap-ss/mlss-data/tmp
mkdir -p /mlss/di/jobs/prophecis
mkdir -p /cosdata/mlss-test
vim /etc/exports
/data/bdap-ss/mlss-data/tmp xx.xx.xx.0/24(rw,sync,no_root_squash)
/mlss/di/jobs/prophecis xx.xx.xx.0/24(rw,sync,no_root_squash)
/cosdata/mlss-test xx.xx.xx.0/24(rw,sync,no_root_squash)
exportfs -arv
showmount -e xx.xx.xx.xx
mkdir -p /data/bdap-ss/mlss-data/tmp
mkdir -p /mlss/di/jobs/prophecis
mkdir -p /cosdata/mlss-test
mount xx.xx.xx.xx:/data/bdap-ss/mlss-data/tmp /data/bdap-ss/mlss-data/tmp
mount xx.xx.xx.xx:/mlss/di/jobs/prophecis /mlss/di/jobs/prophecis
mount xx.xx.xx.xx:/cosdata/mlss-test /cosdata/mlss-test
(1) 文件重复问题:
/install/Prophecis/templates/di 文件下:learner-configmap.yml 与 learner-rsa-keys.yml 移动至 /install/Prophecis/templates/services 下,然后删除 /install/Prophecis/templates/di 文件夹。
(2) 镜像地址问题
安装配置文件中,所有 uat.sf.dockerhub.stgwebank/webank/prophecis 的镜像地址 更换成 wedatasphere/prophecis
(3) 修改sql数据库配置信息
install/sql下,数据库创建文件:
prophecis.sql 与 prophecis-data.sql 前两行的数据库地址
CREATE DATABASE IF NOT EXISTS `mlss_gzpc_bdap_uat_01` /*!40100 DEFAULT CHARACTER SET utf8 */;
USE `mlss_gzpc_bdap_uat_01`;
改成自己的数据库地址,地址对应下一条中mysql配置的 db —> name
然后,先后复制 prophecis.sql 与 prophecis-data.sql 内容至数据库sql脚本编辑器中执行,生成对应表与文件。
/install/Prophecis/values.yaml中需要修改如下部分
# 改成自己的,mysql的用户名密码
db:
server: 127.0.0.1
port: 3306
name: prophecis_db
user: prophecis
pwd: prophecis@wedatasphere
# 用户访问的网页地址,改成宿主机节点ip
gateway:
address: 127.0.0.1
port: 30778
#超级管理员的用户名密码,可以改成自己需要的,需对应数据库表t_superadmin
admin:
user: hadoop
password: hadoop
kubectl create namespace prophecis
kubectl label nodes xx.xx.xx.xx mlss-node-role=platform
#如果有GPU计算节点,则标注NVIDIAGPU
kubectl label nodes xx.xx.xx.xx hardware-type=NVIDIAGPU
## 安装Notebook Controller组件
helm install notebook-controller ./notebook-controller
## 安装MinIO组件
helm install minio-prophecis --namespace prophecis ./MinioDeployment
## 安装prophecis组件
helm install prophecis ./Prophecis
#查看与删除
helm list --all
helm del prophecis --namespace default
helm del notebook-controller --namespace default
helm del minio-prophecis --namespace prophecis
生产环境可以用吗?
我将GitHub源码编译成的 mlss-controlcenter-go 替换 了 cc-apiserver-v0.3.0中的执行文件,但是线上运行时页面报如下两个错:
path /cc/v1/groups/group/storage was not found
Error when checking namespace from cc, {"code":404,"message":"path /cc/v1/groups/users/roles/1/namespaces was not found"}
线上cc-apiserver-v0.3.0中的mlss-controlcenter-go是否和GitHub上一源码致呢?
The missing images in Prophecis/install/value.yaml:
wedatasphere/prophecis:mllabis-v0.3.2 --> wedatasphere/prophecis:mllabis-v0.3.0
wedatasphere/prophecis:metrics-v0.3.2 --> wedatasphere/prophecis:metrics-v0.3.0
wedatasphere/prophecis:mf-server-v0.3.2 --> wedatasphere/prophecis:mf-server-v0.3.0
go build -v -o bin/main
webank/DI/lcm/service/lcm
webank/DI/lcm/service/lcm
service/lcm/splitTraining.go:39:42: not enough arguments in call to learner.CreateServiceSpec
have (string, string)
want (string, string, kubernetes.Interface)
service/lcm/splitTraining.go:78:17: t.helper undefined (type splitTraining has no field or method helper)
service/lcm/splitTraining.go:117:50: too many arguments in call to newConstructLearnerContainer
service/lcm/split_training.go:36:6: method redeclared: splitTraining.jobSpecForLearner
method(splitTraining) func(string) ("k8s.io/api/batch/v1".Job, error)
method(splitTraining) func("k8s.io/api/core/v1".Service) (*"k8s.io/api/batch/v1".Job, error)
service/lcm/split_training.go:36:24: splitTraining.jobSpecForLearner redeclared in this block
previous declaration at service/lcm/splitTraining.go:70:6
service/lcm/split_training.go:85:24: splitTraining.Start redeclared in this block
previous declaration at service/lcm/splitTraining.go:37:6
service/lcm/split_training.go:122:33: cannot use serviceSpec (type *"k8s.io/api/core/v1".Service) as type string in argument to t.jobSpecForLearner
service/lcm/split_training.go:149:25: (*splitTraining).NewCreateFromBOM redeclared in this block
previous declaration at service/lcm/splitTraining.go:128:6
service/lcm/split_training.go:179:29: (*splitTraining).NewCreateFromBOM.func1 redeclared in this block
previous declaration at service/lcm/splitTraining.go:137:33
service/lcm/split_training.go:200:25: (*splitTraining).CreateFromBOMForTFJob redeclared in this block
previous declaration at service/lcm/splitTraining.go:179:6
service/lcm/split_training.go:179:29: too many errors
prophecis 页面可以打开,但是admin无法登录。
页面错误502.
Oct 15 18:08:30 ai-master kubelet: W1015 18:08:30.651802 10459 kubelet_pods.go:863] Unable to retrieve pull secret prophecis/hubsecret
for prophecis/ffdl-trainingdata-8946c74fd-5prbp due to secret "hubsecret" not found. The image pull may not succeed.
Oct 15 18:08:51 ai-master kubelet: W1015 18:08:51.652099 10459 kubelet_pods.go:863] Unable to retrieve pull secret prophecis/hubsecret
for prophecis/di-storage-796c9596c-l6ws9 due to secret "hubsecret" not found. The image pull may not succeed.
有2个镜像无法拉取,一个是wedataspere/prophecis:minio-2020-06-14的镜像无法拉取;一个是wedatasphere/prophecis:fluent-bit-1.2.1的镜像 ,请问是否仓库当前是否存在该镜像?
可自身创建notebook运行镜像:
1.Dockerfile:
FROM jupyter/scipy-notebook:notebook-6.4.10
USER root
RUN echo "Asia/shanghai" > /etc/timezone
ENTRYPOINT ["sh","-c", "jupyter lab --notebook-dir=/home/jovyan --ip=0.0.0.0 --no-browser --allow-root --port=8888 --NotebookApp.token='' --NotebookApp.password='' --NotebookApp.allow_origin='*' --NotebookApp.base_url=${NB_PREFIX}"]
2.编译:
docker build -t "xxxx/jupyter/scipy-notebook:notebook-6.4.10" .
我们可能是要在arm64进行 部署安装 因为涉及到python解释性语言 lib 库的问题所以想请教是否可以在arm64上支持
vim /etc/kubernetes/manifests/kube-apiserver.yaml
#spec:
你们配置的文件为41012 ,而又要求k8s部署为40000以内,岂不是冲突了
I had test kubernetes v1.20.0 and v1.18.6 and it shows errors, and below is the detail information:
run helm install notebook-controller . in folder Prophecis/helm-charts/k8s 1.18.6/notebook-controller
it shows:
Error: template: MLSS/templates/notebook-controller-0.5.1.yaml:115:24: executing "MLSS/templates/notebook-controller-0.5.1.yaml" at <.Values.aide.controller.notebook.repository>: nil pointer evaluating interface {}.controller
run helm install notebook-controller . in folder Prophecis/helm-charts/notebook-controller
it shows:
Error: unable to build kubernetes objects from release manifest: [unable to recognize "": no matches for kind "Deployment" in version "apps/v1beta1", unable to recognize "": no matches for kind "StatefulSet" in version "apps/v1beta2"
[root@node Prophecis]# kubectl logs -f bdap-ui-deployment-595f6c44bf-jmkb5 -n prophecis
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: /etc/nginx/conf.d/default.conf is not a file or does not exist
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
nginx: [emerg] host not found in resolver "kube-dns.kube-system.svc.cluster.local" in /etc/nginx/conf.d/ui.conf:46
[root@node Prophecis]#
storage-deployment.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
name: di-storage
namespace: prophecis
labels:
app.kubernetes.io/managed-by: Helm
environment: prophecis
service: di-storage
annotations:
deployment.kubernetes.io/revision: '1'
meta.helm.sh/release-name: prophecis
meta.helm.sh/release-namespace: default
spec:
replicas: 1
selector:
matchLabels:
environment: prophecis
service: di-storage
template:
metadata:
creationTimestamp: null
labels:
environment: prophecis
service: di-storage
version: storage-v0.3.2
spec:
volumes:
- name: di-config
configMap:
name: di-config
defaultMode: 420
- name: timezone-volume
hostPath:
path: /usr/share/zoneinfo/Asia/Shanghai
type: File
- name: oss-storage
hostPath:
path: tmp
type: Directory
containers:
- name: di-storage-rpc-server
image: 'wedatasphere/prophecis:storage-v0.3.2'
command:
- /bin/sh
- '-c'
args:
- DLAAS_PORT=8443 /main
ports:
- containerPort: 8443
protocol: TCP
env:
- name: DLAAS_POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: DLAAS_ENV
value: prophecis
- name: DLAAS_LOGLEVEL
value: DEBUG
- name: DLAAS_PUSH_METRICS_ENABLED
value: 'true'
- name: LINKIS_ADDRESS
value: '127.0.0.1:8088'
- name: LINKIS_TOKEN_CODE
value: BML-AUTH
- name: MONGO_ADDRESS
value: mongo.prophecis.svc.cluster.local
- name: MONGO_USERNAME
value: mlssopr
- name: MONGO_PASSWORD
value: mlssopr
- name: MONGO_DATABASE
value: mlsstest
- name: MONGO_Authentication_Database
value: admin
- name: DLAAS_OBJECTSTORE_TYPE
valueFrom:
secretKeyRef:
name: storage-secrets
key: DLAAS_OBJECTSTORE_TYPE
- name: DLAAS_OBJECTSTORE_AUTH_URL
valueFrom:
secretKeyRef:
name: storage-secrets
key: DLAAS_OBJECTSTORE_AUTH_URL
- name: DLAAS_OBJECTSTORE_USER_NAME
valueFrom:
secretKeyRef:
name: storage-secrets
key: DLAAS_OBJECTSTORE_USER_NAME
- name: DLAAS_OBJECTSTORE_PASSWORD
valueFrom:
secretKeyRef:
name: storage-secrets
key: DLAAS_OBJECTSTORE_PASSWORD
- name: DLAAS_ELASTICSEARCH_SCHEME
value: http
- name: DLAAS_ELASTICSEARCH_ADDRESS
value: 'http://elasticsearch.prophecis.svc.cluster.local:9200'
- name: DLAAS_ELASTICSEARCH_ADDRESS
valueFrom:
secretKeyRef:
name: trainingdata-secrets
key: DLAAS_ELASTICSEARCH_ADDRESS
- name: DLAAS_ELASTICSEARCH_USERNAME
valueFrom:
secretKeyRef:
name: trainingdata-secrets
key: DLAAS_ELASTICSEARCH_USERNAME
- name: DLAAS_ELASTICSEARCH_PASSWORD
valueFrom:
secretKeyRef:
name: trainingdata-secrets
key: DLAAS_ELASTICSEARCH_PASSWORD
resources:
limits:
cpu: 500m
memory: 1Gi
volumeMounts:
- name: di-config
mountPath: /etc/mlss/
- name: timezone-volume
mountPath: /etc/localtime
- name: oss-storage
mountPath: /data/oss-storage
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: Always
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
nodeSelector:
mlss-node-role: platform
securityContext: {}
imagePullSecrets:
- name: hubsecret
schedulerName: default-scheduler
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 25%
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
登录后此接口报错
接口:/mf/v1/services
报错信息:Missing or malformed MLSS-UserID header.
自己构建的镜像部署之后总是403forbidden,能不能分享一下ui image构建的dockerfile啊,谢谢
如题,这个可以直接接入hadoop系的数据库吗,hive,impala,mysql等
部署Prophecis版本: v0.3.0 Kubernetes版本: 1.18.6, 所有pod运行状态都Running
1.部署文档中说是: Prophecis使用LDAP来负责统一认证,但部署文档没有要求必须安装LDAP目录服务,有要求LDAP必须创建什么用户吗?
2.部署文档要求创建的超级管理员和用户密码,给t_superadmin表对应,t_superadmin表中有name字段,不需要密码字段存储吗?LDAP创建的用户要和t_superadmin表的超级管理员用户对应吗?
3.登录时,出现错误:原因是:LDAP目录服务器 用户认证没通过吗?
4.清除浏览器缓存后,重新打开登录页面,可以登录进去,但好多页面点击过程都会出现“网络服务异常”错误。
请问这个给Auth_type:LDAP有关系吗?
mllabis模块下仅有notebook-server源码部分,且与线上使用的源码不一致(wedatasphere/prophecis:mllabis-v0.1.1)。并且,未曾找到notebook-controller部分的源码。
Will there be a 0.3.X version release?
All the image in the DevelopmentGuide.md is can not open
开源产品的文档应该清晰易读,重要过程应有详细的说明。不应像CSDN上很多程序员写的自言自语的“天书”。文档总的来说很糟糕,对于Prophecis的推广会产生较大负面作用
Client: Docker Engine - Community
Version: 19.03.9
API version: 1.40
Go version: go1.13.10
Git commit: 9d988398e7
Built: Fri May 15 00:25:27 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.9
API version: 1.40 (minimum version 1.12)
Go version: go1.13.10
Git commit: 9d988398e7
Built: Fri May 15 00:24:05 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.13
GitCommit: 9cc61520f4cd876b86e77edfeb88fbcd536d1f9d
runc:
Version: 1.0.3
GitCommit: v1.0.3-0-gf46b6ba
docker-init:
Version: 0.18.0
GitCommit: fec3683
-k8s version:
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:38:05Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.20", GitCommit:"1f3e19b7beb1cc0110255668c4238ed63dadb7ad", GitTreeState:"clean", BuildDate:"2021-06-16T12:51:17Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
创建NoteBook之后,状态一直都是waiting,该如何处理?哪里能看到日志?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.