openkruise / rollouts Goto Github PK
View Code? Open in Web Editor NEWEnhanced Rollouts features for application automation.
License: Apache License 2.0
Enhanced Rollouts features for application automation.
License: Apache License 2.0
文档:https://openkruise.io/zh/rollouts/user-manuals/api-specifications/
案例中使用
matches:
- headers:
- type: Exact # or "RegularExpression"
key: <matched-header-key>
value: <matched-header-value, or reg-expression>
报错:Error from server (BadRequest): error when creating "demo_rollout.yaml": Rollout in version "v1alpha1" cannot be handled as a Rollout: strict decoding error: unknown field "spec.strategy.canary.steps[0].matches[0].headers[0].key"
源代码中:rollouts/api/v1alpha1/rollout_types.go
type HttpRouteMatch struct {
Headers []gatewayv1alpha2.HTTPHeaderMatch `json:"headers,omitempty"`
}
type HTTPHeaderMatch struct {
Type *HeaderMatchType `json:"type,omitempty"`
Name HTTPHeaderName `json:"name"`
Value string `json:"value"`
}
key目前应该换成name吗
last paused
rollback
reason:
https://github.com/openkruise/rollouts/blob/v0.3.0/pkg/util/controller_finder.go#L372
RS with zero replices is filtered out here
What would you like to be added:
Kruise Rollout Features:
CRD is defined as follows:
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
metadata:
name: rollouts-demo
spec:
strategy:
canary:
steps:
## the first 20% grayed out replicas, and 20% of the traffic
- weight: 20
## Manual confirmation of continuation
pause: {}
trafficRouting:
service: echoserver # required
## nginx ingress
type: nginx
nginx:
ingress: echoserver # required
objectRef:
workloadRef:
apiVerison: apps/v1
kind: Deployment
name: echoserver
What would you like to be added:
Rollout Support Traffic Graying for Istio Envoy, as follows:
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
metadata:
name: rollouts-demo
spec:
strategy:
canary:
trafficRoutings:
- service: echoserver # required
## istio, nginx, alb
type: istio
CustomTraffic:
apiVersion: xxxx
kind: istio
name: xxxx
有些时候,因为一些业务特性,我们希望滚动更新在特定时间范围内执行,避开高峰期执行,而不是单纯的使用.pause去限定,因为这对于运维人员来说,这是一个不小的负担,如果只有.pause进行限定,意味着我每次的更改需要计算好时间再去发布,而不能随时发布变更
httproute
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
creationTimestamp: "2023-04-27T07:30:47Z"
generation: 57
labels:
app_id: d3251dc4bd6748a4ae9ec4e1da73c7a8
managedFields:
- apiVersion: gateway.networking.k8s.io/v1beta1
fieldsType: FieldsV1
fieldsV1:
f:status:
.: {}
f:parents: {}
manager: pilot-discovery
operation: Update
subresource: status
time: "2023-04-27T07:30:47Z"
- apiVersion: gateway.networking.k8s.io/v1beta1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:labels:
.: {}
f:app_id: {}
f:spec:
.: {}
f:hostnames: {}
f:parentRefs: {}
manager: rainbond-api
operation: Update
time: "2023-04-27T07:30:47Z"
name: default-038a99
namespace: zhangqh
resourceVersion: "7457252"
uid: cd174b49-a50c-4151-a7aa-a7cede3d0c87
spec:
hostnames:
- rainbond.example.com
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: bookinfo-gateway
namespace: zhangqh
rules:
- backendRefs:
- group: ""
kind: Service
name: nginx
namespace: zhangqh
port: 80
weight: 1
matches:
- path:
type: PathPrefix
value: /
status:
parents:
- conditions:
- lastTransitionTime: "2023-05-06T03:57:02Z"
message: Route was valid
observedGeneration: 57
reason: Accepted
status: "True"
type: Accepted
- lastTransitionTime: "2023-05-06T03:57:02Z"
message: All references resolved
observedGeneration: 57
reason: ResolvedRefs
status: "True"
type: ResolvedRefs
controllerName: istio.io/gateway-controller
parentRef:
group: gateway.networking.k8s.io
kind: Gateway
name: bookinfo-gateway
namespace: zhangqh
rollout
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
metadata:
annotations:
rollouts.kruise.io/hash: 2f5d4z94dvv5z4zb9fz4vd4z95w7297fd4825wb7cdw8464fdfcbvz49d5vbd2w2
rollouts.kruise.io/rolling-style: partition
creationTimestamp: "2023-05-06T05:41:24Z"
finalizers:
- rollouts.kruise.io/rollout
generation: 1
labels:
app_id: d3251dc4bd6748a4ae9ec4e1da73c7a8
component_id: 2dd5cc625b09f75b9a01ef632045d183
hostname: rainbond.example.com
managedFields:
- apiVersion: rollouts.kruise.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:rollouts.kruise.io/hash: {}
f:finalizers:
.: {}
v:"rollouts.kruise.io/rollout": {}
manager: kruise-rollout
operation: Update
time: "2023-05-06T05:41:24Z"
- apiVersion: rollouts.kruise.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:rollouts.kruise.io/rolling-style: {}
f:labels:
.: {}
f:app_id: {}
f:component_id: {}
f:hostname: {}
f:spec:
.: {}
f:objectRef:
.: {}
f:workloadRef:
.: {}
f:apiVersion: {}
f:kind: {}
f:name: {}
f:strategy:
.: {}
f:canary:
.: {}
f:steps: {}
f:trafficRoutings: {}
manager: rainbond-worker
operation: Update
time: "2023-05-06T05:41:24Z"
- apiVersion: rollouts.kruise.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:status:
.: {}
f:canaryStatus:
.: {}
f:canaryReadyReplicas: {}
f:canaryReplicas: {}
f:canaryRevision: {}
f:currentStepIndex: {}
f:currentStepState: {}
f:lastUpdateTime: {}
f:message: {}
f:observedWorkloadGeneration: {}
f:podTemplateHash: {}
f:rolloutHash: {}
f:stableRevision: {}
f:conditions: {}
f:message: {}
f:observedGeneration: {}
f:phase: {}
manager: kruise-rollout
operation: Update
subresource: status
time: "2023-05-06T05:41:27Z"
name: default-nginx
namespace: zhangqh
resourceVersion: "7489716"
uid: e60ce2c7-2753-4355-a9aa-ed796fda264c
spec:
objectRef:
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: default-nginx
strategy:
canary:
steps:
- matches:
- headers:
- name: qqq
type: Exact
value: qqq
pause: {}
weight: 50
- matches:
- headers:
- name: qqq
type: Exact
value: qqq
pause: {}
weight: 70
- matches:
- headers:
- name: qqq
type: Exact
value: qqq
pause: {}
weight: 100
trafficRoutings:
- gateway:
httpRouteName: default-038a99
service: nginx
status:
canaryStatus:
canaryReadyReplicas: 2
canaryReplicas: 2
canaryRevision: 66c4494759
currentStepIndex: 1
currentStepState: StepPaused
lastUpdateTime: "2023-05-06T05:41:39Z"
message: BatchRelease is at state Ready, rollout-id , step 1
observedWorkloadGeneration: 265
podTemplateHash: 66c4494759
rolloutHash: 2f5d4z94dvv5z4zb9fz4vd4z95w7297fd4825wb7cdw8464fdfcbvz49d5vbd2w2
stableRevision: 8487bf6555
conditions:
- lastTransitionTime: "2023-05-06T05:41:24Z"
lastUpdateTime: "2023-05-06T05:41:24Z"
message: Rollout is in Progressing
reason: InRolling
status: "True"
type: Progressing
message: Rollout is in step(1/3), and you need manually confirm to enter the next
step
observedGeneration: 1
phase: Progressing
kubectl explain Rollout.spec.strategy.canary.trafficRoutings
KIND: Rollout
VERSION: rollouts.kruise.io/v1alpha1
RESOURCE: trafficRoutings <[]Object>
DESCRIPTION:
TrafficRoutings hosts all the supported service meshes supported to enable
more fine-grained traffic routing todo current only support one
TrafficRouting
基于已接入openKruise workload类型(Advance Statefulset,CloneSet)的应用,想实现金丝雀发布,加强发布质量的保障。考虑到openkruise社区已有rollouts,希望可以支持到openkruise 的workload类型,方便应用接入。
In some complex scenes, the rollback of applications may relate with not only workload, but also other resources, such as configMaps and secrets. If just the workload is rolled back directly, users may face the risk that the application is unavailable. So we allow users rollback in a batch-by-batch way to reduce the risks like this.
What would you like to be added:
The current batch release program for Deployment, the extreme case (as follows) there is a double resource situation, the subsequent optimization to solve:
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
metadata:
name: rollouts-demo
spec:
strategy:
canary:
steps:
## the first 20% grayed out replicas, and 20% of the traffic
- weight: 20
## Manual confirmation of continuation
pause: {}
- weight: 40
pause: {duration: 60}
- weight: 60
pause: {duration: 60}
- weight: 80
pause: {duration: 60}
- weight: 100
pause: {duration: 60}
objectRef:
workloadRef:
apiVerison: apps/v1
kind: Deployment
name: echoserver
So currently for Deployment Rollout, it is recommended to configure it in the following way:
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
metadata:
name: rollouts-demo
spec:
strategy:
canary:
steps:
## the first 20% grayed out replicas, and 20% of the traffic
- weight: 20
## Manual confirmation of continuation
pause: {}
trafficRouting:
service: echoserver # required
## nginx ingress
type: nginx
nginx:
ingress: echoserver # required
objectRef:
workloadRef:
apiVerison: apps/v1
kind: Deployment
name: echoserver
so far, user can not know starttime and endtime for every canary batch/step
my rollout has 3 steps. the 3rd one raise this error
I0107 15:40:53.810662 1 rollout_controller.go:107] Begin to reconcile Rollout ap1/zjn-test-test-zjn-01
I0107 15:40:53.810976 1 rollout_progressing.go:37] reconcile rollout(ap1/zjn-test-test-zjn-01) progressing action...
I0107 15:40:53.811124 1 rollout_progressing.go:78] rollout(ap1/zjn-test-test-zjn-01) is Progressing, and in reason(InRolling)
I0107 15:40:53.811151 1 rollout_canary.go:84] rollout(ap1/zjn-test-test-zjn-01) run canary strategy, and state(StepTrafficRouting)
W0107 15:40:53.811156 1 manager.go:85] rollout(ap1/zjn-test-test-zjn-01) stableRevision or podTemplateHash can not be empty, and wait a moment
What would you like to be added:
Kruise rollout is maintained and improved by both the communities of OpenKruise and KubeVela. So we need to have kubevela rollout smooth migration Kruise Rollout solution.
当流程进行到 金丝雀验证成功,完成发布 时无法继续推进,具体状态如下:
测试的 Yaml:
https://kubevela.io/docs/tutorials/k8s-object-rollout
相关日志:
kubectl describe application canary-demo
Name: canary-demo
Namespace: default
Labels: <none>
Annotations: app.oam.dev/publishVersion: v2
oam.dev/kubevela-version: v1.7.4
API Version: core.oam.dev/v1beta1
Kind: Application
Metadata:
Creation Timestamp: 2023-04-27T04:00:28Z
Finalizers:
app.oam.dev/resource-tracker-finalizer
Generation: 2
Managed Fields:
API Version: core.oam.dev/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:oam.dev/kubevela-version:
f:finalizers:
.:
v:"app.oam.dev/resource-tracker-finalizer":
Manager: kubevela
Operation: Update
Time: 2023-04-27T04:00:36Z
API Version: core.oam.dev/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:app.oam.dev/publishVersion:
f:kubectl.kubernetes.io/last-applied-configuration:
f:spec:
.:
f:components:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2023-04-27T04:00:56Z
API Version: core.oam.dev/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:appliedResources:
f:conditions:
f:latestRevision:
.:
f:name:
f:revision:
f:revisionHash:
f:observedGeneration:
f:services:
f:status:
f:workflow:
.:
f:appRevision:
f:contextBackend:
f:finished:
f:mode:
f:startTime:
f:status:
f:steps:
f:suspend:
f:terminated:
Manager: kubevela
Operation: Update
Subresource: status
Time: 2023-04-27T04:01:56Z
Resource Version: 3019240
UID: 5161a2d1-b736-475b-a995-84c80287dfe3
Spec:
Components:
Name: canary-demo
Properties:
Objects:
API Version: apps/v1
Kind: Deployment
Metadata:
Name: canary-demo
Spec:
Replicas: 5
Selector:
Match Labels:
App: demo
Template:
Metadata:
Labels:
App: demo
Spec:
Containers:
Image: boot.powerk8s.cn/test-project/rust_web_test:2.0.0
Name: demo
Ports:
Container Port: 8080
API Version: v1
Kind: Service
Metadata:
Labels:
App: demo
Name: canary-demo
Namespace: default
Spec:
Ports:
Name: http
Port: 8080
Protocol: TCP
Target Port: 8080
Selector:
App: demo
API Version: networking.k8s.io/v1
Kind: Ingress
Metadata:
Labels:
App: demo
Name: canary-demo
Namespace: default
Spec:
Ingress Class Name: nginx
Rules:
Host: canary-demo.com
Http:
Paths:
Backend:
Service:
Name: canary-demo
Port:
Number: 8080
Path: /version
Path Type: ImplementationSpecific
Traits:
Properties:
Canary:
Steps:
Weight: 20
Weight: 90
Traffic Routings:
Type: ingress
Type: kruise-rollout
Type: k8s-objects
Status:
Applied Resources:
API Version: apps/v1
Creator: workflow
Kind: Deployment
Name: canary-demo
Namespace: default
API Version: v1
Creator: workflow
Kind: Service
Name: canary-demo
Namespace: default
API Version: networking.k8s.io/v1
Creator: workflow
Kind: Ingress
Name: canary-demo
Namespace: default
API Version: rollouts.kruise.io/v1alpha1
Creator: workflow
Kind: Rollout
Name: canary-demo
Namespace: default
Conditions:
Last Transition Time: 2023-04-27T04:00:27Z
Reason: Available
Status: True
Type: Parsed
Last Transition Time: 2023-04-27T04:00:27Z
Reason: Available
Status: True
Type: Revision
Last Transition Time: 2023-04-27T04:00:28Z
Reason: Available
Status: True
Type: Policy
Last Transition Time: 2023-04-27T04:00:28Z
Reason: Available
Status: True
Type: Render
Latest Revision:
Name: canary-demo-v2
Revision: 2
Revision Hash: a08d1df877cb296f
Observed Generation: 2
Services:
Healthy: true
Message: Rollout is in step(2/2), and upgrade workload to new version
Name: canary-demo
Namespace: default
Traits:
Healthy: false
Message: Rollout is in step(2/2), and upgrade workload to new version
Type: kruise-rollout
Workload Definition:
API Version:
Kind:
Status: runningWorkflow
Workflow:
App Revision: v2
Context Backend:
API Version: v1
Kind: ConfigMap
Name: workflow-canary-demo-context
Namespace: default
UID: 0d762543-4114-4b6f-87cf-c2aebc57316d
Finished: false
Mode: DAG-DAG
Start Time: 2023-04-27T04:00:56Z
Status: executing
Steps:
First Execute Time: 2023-04-27T04:00:56Z
Id: nz6bq8ypkd
Last Execute Time: 2023-04-27T04:01:48Z
Message: wait healthy
Name: canary-demo
Phase: running
Reason: Wait
Type: apply-component
Suspend: false
Terminated: false
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Applied 2m2s Application Workflow finished
Normal Deployed 2m2s (x2 over 2m2s) Application Deployed successfully
Normal PolicyGenerated 92s (x5 over 2m2s) Application Policy generated successfully
Normal Rendered 92s (x5 over 2m2s) Application Rendered successfully
Normal Parsed 91s (x6 over 2m3s) Application Parsed successfully
Normal Revisioned 91s (x6 over 2m3s) Application Revisioned successfully
strategy:
canary:
# canary published, e.g. 20%, 40%, 60% ...
steps:
# routing 5% traffics to the new version
- weight: 5
# Manual confirmation of the release of the remaining pods
pause: {}
# optional, The first step of released replicas. If not set, the default is to use 'weight', as shown above is 5%.
replicas: 20%
这个配置是否支持weight 这种配置,有没有其他方式。
So far ,User use "workloadRef" to specify the target workload. Need one new feature let user specify customized workload selector ,such as "label selector"
Is there a plan to support DaemonSet?Is there a specific roadmap?
In
rollouts/api/v1alpha1/rollout_types.go
Line 89 in ecd0974
paused
in struct RolloutPause like:
type RolloutPause struct {
// Duration the amount of time to wait before moving to the next step.
// +optional
Duration *int32 `json:"duration,omitempty"`
Paused *bool `json:"paused,omitempty"`
}
Paused
indicates that the deploy should be paused or resumed.Set true to pause the deploy, change to false to resume the deploy. Only Duration
can not control the deploy.
What would you like to be added:
Rollout release process supports post-checking, Metrics analysis, as follows:
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
spec:
strategy:
objectRef:
...
canary:
steps:
- weight: 5
...
# metrics analysis
analysis:
templates:
- templateName: success-rate
startingStep: 2 # delay starting analysis run until setWeight: 40%
args:
- name: service-name
value: guestbook-svc.default.svc.cluster.local
# metrics analysis
apiVersion: rollouts.kruise.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
spec:
args:
- name: service-name
metrics:
- name: success-rate
interval: 5m
# NOTE: prometheus queries return results in the form of a vector.
# So it is common to access the index 0 of the returned array to obtain the value
successCondition: result[0] >= 0.95
failureLimit: 3
provider:
prometheus:
address: http://prometheus.example.com:9090
query: |
sum(irate(
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[5m]
)) /
sum(irate(
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m]
))
What would you like to be added:
Rollout itself is more focused on grayscale within a single cluster, and the capabilities for multiple clusters need to be built with kubevela, as follows:
apiVersion: core.oam.dev/v1beta1
kind: Application
metadata:
name: podinfo
spec:
components:
traits:
- type: kruise-rollout
properties:
canary:
steps:
- weight: 20
pause: {}
trafficRouting:
type: nginx
I0826 18:54:53.687162 11941 request.go:1181] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"admission webhook "vrollout.kb.io" denied the request: Spec.Strategy.Canary.TrafficRouting.Type: Invalid value: "alb": TrafficRouting only support 'nginx' type","code":422}
From the design doc of rollouts
, I know that the rollouts
resource is bound to a deployment or cloneset, which is a one to one
mode, we should record the information such as pod names, pod ips, deploy strategy... when user do a deploy action.
请问有计划支持aws alb ingress 灰度吗
版本: kruise-rollout v0.3.0
Kubernetes 版本:1.25.4
ingress controller: aliyun-alb
rollout manifest:
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
metadata:
name: rollout-with-traffic
annotations:
rollouts.kruise.io/rolling-style: partition
namespace: foo
spec:
objectRef:
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: echoserver
strategy:
canary:
steps:
- replicas: 1
weight: 5
pause: {}
- replicas: 30%
pause: {}
matches:
- headers:
- name: user-agent
type: Exact
value: pc
- replicas: 60%
pause: {}
matches:
- headers:
- name: user-agent
type: Exact
value: mobile
- replicas: 100%
weight: 100
trafficRoutings:
- service: echoserver
ingress:
name: echoserver
classType: aliyun-alb
在rollout 的过程当中接入管理入口流量,并且需求根据httpheader 的方式进行对后端灰度,第二阶段通过灰度 user-agent: pc 流量, 第三阶段灰度 user-agent: mobile 流量。 再此场景下
aliyun-alb 会产生的alb ingress annotation 如下:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
alb.ingress.kubernetes.io/canary: "true"
alb.ingress.kubernetes.io/canary-by-header: user-agent
alb.ingress.kubernetes.io/canary-by-header-value: pc
alb.ingress.kubernetes.io/order: "1"
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"networking.k8s.io/v1","kind":"Ingress","metadata":{"annotations":{},"name":"echoserver","namespace":"foo"},"spec":{"ingressClassName":"alb","rules":[{"host":"echoserver.caocaokeji.cn","http":{"paths":[{"backend":{"service":{"name":"echoserver","port":{"number":80}}},"path":"/apis/echo","pathType":"Exact"}]}}]}}
creationTimestamp: "2023-03-14T07:53:54Z"
generation: 1
name: echoserver-canary
namespace: foo
...
此规则在执行至第二阶段时对alb 产生全新的rule 规则,并将自己的order ID 该ALB的最前端,并配置ID 为1。此时灰度流量访问正常,如果同时具备多个需要http header的 应用同时进行灰度,则会出现aliyun-alb 无法创建 rule的情况。原因为:alb 的 orderID 在宣告的 ingress 当中必须唯一。
# kube-ali-stable get ingress -n foo
NAME CLASS HOSTS ADDRESS PORTS AGE
echoserver alb echoserver.example.com alb-g2jxqzz0nxknlsapxl.cn-hangzhou.alb.aliyuncs.com 80 25h
echoserver-canary alb echoserver.example.com alb-g2jxqzz0nxknlsapxl.cn-hangzhou.alb.aliyuncs.com 80 87m
echoserver-clone alb echoserver.example.com alb-g2jxqzz0nxknlsapxl.cn-hangzhou.alb.aliyuncs.com 80 102m
echoserver-clone-canary alb echoserver.example.com 80 117s
# kube-ali-stable get ingress -n foo -o yaml echoserver-clone-canary
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
alb.ingress.kubernetes.io/canary: "true"
alb.ingress.kubernetes.io/canary-weight: "5"
alb.ingress.kubernetes.io/order: "1"
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"networking.k8s.io/v1","kind":"Ingress","metadata":{"annotations":{},"name":"echoserver-clone","namespace":"foo"},"spec":{"ingressClassName":"alb","rules":[{"host":"echoserver-clone.caocaokeji.cn","http":{"paths":[{"backend":{"service":{"name":"echoserver-clone","port":{"number":80}}},"path":"/apis/echo","pathType":"Exact"}]}}]}}
creationTimestamp: "2023-03-14T09:19:46Z"
generation: 1
name: echoserver-clone-canary
namespace: foo
...
如果将 kruise-rollout-controller-manager 内/lua_configuration/trafficrouting_ingress/aliyun-alb.lua 文件 annotations["alb.ingress.kubernetes.io/order"] = "1"
进行删除,则canary ingress 可以正确创建出,并在aliyun-alb上生成全新ruleID ,但客户端进行
curl --silent -H "user-agent: pc " -H "Host: echoserver.example.com" alb-g2jxqzz0nxknlsapxl.cn-hangzhou.alb.aliyuncs.com/apis/echo
时无法进行灰度请求。原因目前查到,由于该canary 的ingress 在aliyun-alb上的生成的 ruleID 后于现有ingress ruleID ,按照aliyun-alb 匹配从上至下匹配规则,由于无法匹配到该header 请求并进行转发。
** 需求:希望kruise-rollout能够在使用aliyun-alb作为ingress controller的场景下,rollout 对象开启 trafficRoutings 时能支持canary-by-header模式的灰度功能。**
Rollout support A/B Testing release:
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
metadata:
name: rollouts-demo
spec:
objectRef:
...
strategy:
canary:
steps:
- matchs:
- headers:
- name: user
- value: xiaoming
pause: {}
replicas: 20%
trafficRoutings:
# echoserver service name
- service: echoserver
# echoserver ingress name, current only nginx ingress
ingress:
name: echoserver
env:mac,minikube
the yaml file is https://github.com/openkruise/rollouts/blob/master/docs/tutorials/basic_usage.md#1-deploy-business-application-contains-deployment-service-and-ingress
problem:when i enter kubectl apply -f echoserver.yaml
and kubectl get pods
,i will get pods error,such as
NAME READY STATUS RESTARTS AGE
echoserver-56b6c7cc94-5qslt 0/1 Error 2 (17s ago) 22s
echoserver-56b6c7cc94-dwqqz 0/1 Error 2 (18s ago) 22s
echoserver-56b6c7cc94-pt6th 0/1 Error 2 (17s ago) 22s
echoserver-56b6c7cc94-tkw26 0/1 Error 2 (17s ago) 22s
echoserver-56b6c7cc94-xggxs 0/1 Error 2 (17s ago) 22s
➜ kubenetes kubectl describe pod echoserver-56b6c7cc94-5qslt
Name: echoserver-56b6c7cc94-5qslt
Namespace: default
Priority: 0
Node: minikube/192.168.58.2
Start Time: Sun, 08 May 2022 18:57:03 +0800
Labels: app=echoserver
pod-template-hash=56b6c7cc94
Annotations: <none>
Status: Running
IP: 172.17.0.22
IPs:
IP: 172.17.0.22
Controlled By: ReplicaSet/echoserver-56b6c7cc94
Containers:
echoserver:
Container ID: docker://5773ea0bc0ee021c19ae812c9c30d2f6c8cd719921dfb6230863c7ec97b09ec2
Image: cilium/echoserver:1.10.2
Image ID: docker-pullable://cilium/echoserver@sha256:f8c125b8ad412c65be38c721463737e4e80721208c744cbacc2a8977b61441c6
Port: 8080/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sun, 08 May 2022 20:04:22 +0800
Finished: Sun, 08 May 2022 20:04:23 +0800
Ready: False
Restart Count: 18
Environment:
PORT: 8080
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nzdg9 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-nzdg9:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 57s (x328 over 70m) kubelet Back-off restarting failed container
➜ kubenetes kubectl logs echoserver-56b6c7cc94-5qslt
Generating self-signed cert
Generating a 2048 bit RSA private key
...........+++
................................................................+++
writing new private key to '/certs/privateKey.key'
-----
Starting nginx
2022/05/08 12:04:23 [error] 24#24: failed to initialize Lua VM in /etc/nginx/nginx.conf:88
nginx: [error] failed to initialize Lua VM in /etc/nginx/nginx.conf:88
i don't know where to look /etc/nginx/nginx.conf:88,and why will lead to this error,please teach me how to slove it
url:https://github.com/openkruise/rollouts/blob/master/docs/images/rollout_canary.png
Canary Releases should be allocated less traffic,but in image allocated 95% traffic to new version service .Don't be offended if my opinion is incorrect
@zmberg
Can CanaryStatus add a total number of replicas, especially when the workload is automatically HPA, I can get the number of replicas directly from the rollout without checking the specific workload
当前 spec.replicas
的数据类型为 IntOrString
,但我们的校验都是同一套标准,会导致 spec.replicas
在 Int
类型下不能突破 100
的上限,我知道我们的设计初衷可能是希望大于100
的副本数,就由百分比去控制,但是在使用场景中,难免会有希望添加一个准确数字的需求,所以我觉得是否可以根据 spec.replicas
的数据类型来分开判断,取消在 Int
类型下 100
的上限,将这个权限交给用户。
Rollout for sts/asts will support OnDelete updateStrategy?
By far, in the spec of Rollout, we can only specify the steps of the rolling process but cannot set which step we want to reached. In the case that the Rollout process needs to be managed by upper layer system, like App Delivery Tools / GitOps Tools / Business Platform, we need to set all the steps before the desired step to use no duration to allow a direct rolling.
For example, if we have a rolling process that contains 5 steps
steps:
- weight: 5
- weight: 10
- weight: 20
- weight: 50
- weight: 100
If we want to let the rolling to reach the 4th step and stop there, we need to write
steps:
- weight: 5
pause:
duration: 0
- weight: 10
pause:
duration: 0
- weight: 20
pause:
duration: 0
- weight: 50
pause: {}
- weight: 100
This might not be a very convenient way especially when user needs to interact with the whole spec of the rollout object directly.
If we can support a more convenient way to do that, like directly specifying the desired step id or index, it could make it easier for the interaction like
stepIndex: 3
steps:
- weight: 5
- weight: 10
- weight: 20
- weight: 50
- weight: 100
BTW, the currentStepIndex
in the status field could also be an alternative place for changing step index. As in kubectl v1.26, we could use kubectl edit --subresource=status
to edit the status of the Kubernetes resources.
Install Kubernetes Cluster, requires Kubernetes version >= 1.19.
使用 openkruise rollouts 必须需要 k8s >=1.19 版本吗? 1.18 完全不能使用吗?我们目前使用的k8s集群是 1.18 有可能使用吗?
希望能够有一份开发文档,整体架构,依赖关系和本地运行的文档
In pull/61
, I changed three files:
api/v1alpha1/rollouthistory_types.go
,config/crd/bases/rollouts.kruise.io_rollouthistories.yaml
,api/v1alpha1/zz_generated.deepcopy.go
. ( add some deep-copy method which are generated by kubebuilder)Situation : In [CI/unit-tests]
https://github.com/openkruise/rollouts/actions/runs/3288369126/jobs/5418624092
, I found that
Warning: The
save-state command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/
in step Cache Go Dependencies
Run Unit Tests
, after command git status
, there are infomation as below:Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git restore <file>..." to discard changes in working directory) modified: api/v1alpha1/zz_generated.deepcopy.go
Check diff
, expression -z $(git status -s)
will be false, leading to false for [CI/unit-tests]
.Question : How can I pass this -z $(git status -s)
, or what should I do to make changes modified: api/v1alpha1/zz_generated.deepcopy.go
staged? Thanks!
For example, the request with header stage: pre will be routed to the service canary version, otherwise the request will be routed to the service online version.
Current, when rollout going done, the product environment will firstly rolling update with the product traffic on, then convert all traffic to product environment, then delete gray environments. It will have a problem: if two application name A and B have call dependency,
They rolling update pod concurrently will lead to traffic confusion: the traffic flow from A product environments may flow to B gray environments.
Because the gray environment will be new product environment. the gray workload name should support config by user when it start rollout.
rollouts/api/v1alpha1/rollout_types.go
Line 188 in ecd0974
ProgressingReasonCanceled = "Canceled"
, which indicates this rollout is canceled by user manually?在kubevela 中使用 kruise-rollout ,当rollout 暂停时如何手工操作继续?必须使用kubectl-kruise rollout approve rollout 吗?
有没有办法通过 k8s java client api 操作暂停的 rollout 继续呢?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.