Knative net-istio
defines a KIngress controller for Istio.
To learn more about Knative, please visit our Knative docs repository.
If you are interested in contributing, see CONTRIBUTING.md and DEVELOPMENT.md.
A Knative ingress controller for Istio.
License: Apache License 2.0
Knative net-istio
defines a KIngress controller for Istio.
To learn more about Knative, please visit our Knative docs repository.
If you are interested in contributing, see CONTRIBUTING.md and DEVELOPMENT.md.
According to https://istio.io/docs/reference/config/config-status/ Istio CRD Status is now available.
We should investigate and adopt this new feature.
Istio recently (2020-09-22 21:32:01 +0000 UTC) released "Istio 1.6.10". Let's test it π.
Hi,
We are seeing unwanted retries due to 500 status code by the service, and this is annoying as it is running in a serverless environment and the error had to be immediately propagated back.
Need to add the below code to https://github.com/knative-sandbox/net-istio/blob/524afbe8aa70b5589360d0e96b92ef2cd8c6a317/pkg/reconciler/ingress/resources/virtual_service.go#L221,
Retries: &istiov1alpha3.HTTPRetry{
Attempts: 0,
PerTryTimeout: nil,
},
This is a very bad situation to be in, as the actual service is returning error but we are retrying it for 3 times before returning the response.
There are istio mesh mode specific in knative/serving like:
Although these tests should move to knative/net-istio instead of hosting in knative/serving, currently knative/net-istio seems not have the option to run tests with mesh mode.
/area API
/area networking
Istio 1.3 supports "Automatic Protocol Selection", meaning we might not have to specify container port names to support http
and http2
.
This issue is to track the investigation of that feature.
Notes:
AFAICS, only the LoadBalancer-type Ingressgateway service is currently supported. When Istio is installed with ingressgateway type NodePort or ClusterIP, the ingress gw status.loadbalancer field remains empty "{}" (k8s 1.16.4) and the Knative routes don't reconcile and are stuck with the Ready status set to "Unknown" and the reason "Uninitialized". A static IP can only be set if the cloud controller manager supports it (Azure). Other setups may use virtual IP routing in which case the ingress gateway service would have externalIPs configured. Or it may use NodePort in which case it would be nice if the ingress endpoint could be configured statically in config-istio for the istio-webhook not to depend on a LB type ingressgateway service.
IIRC older versions may have supported the NodePort type.
Wdyt? Best, Manuel
Istio recently (2020-09-18 20:06:39 +0000 UTC) released "Istio 1.7.2". Let's test it π.
0.15
I am able to create a unidirectional grpc stream server as ksvc successfully. But the stream gets terminated by timeout set on istio virtual service. This time out by default is set to the max revision time out value.
Create a unidirectional grpc stream server (where client receives the stream), deploy it as ksvc.
Have the ksvc timeout less than the max revision timeout, have the grpc server send the stream continuously and you will notice that after the max revision timeout the client will receive a RST_STREAM error.
Since the queue proxy times out after the time mentioned in ksvc, is it necessary for the istio virtual service to have timeout set. Is it ok to remove it ? @nak3 @tcnghia @ZhiminXiang
#175 is not doing what it is supposed to do because in Istio 1.6.x, there is no more Helm charts, so download-istio.sh
doesn't work as expected.
/assign
We should take a look at the new Istio v1beta1 API
https://github.com/istio/api/commits/master/networking/v1beta1
if it is ready for use, we should start a feature track to migrate to Istio v1beta1.
servers
across Gateways
Ingress.Spec.Rules.Hosts
with Gateway.Spec.Servers[].Hosts
As knative/serving#8585 reported, "mesh only" setting has some simple bug and e2e tests missed them.
We should run e2e test mesh mode with local-gateway.mesh: "mesh"
and without cluster-local-gateway.
The details are in this doc.
This is to track the rollout. The current plan to have 0 downtime is:
server
entry to cluster-local-gateway
on port 8081 (no-op just makes the Envoy listen on this port)π Wait for Istio propagation
cluster-local
Service
targetPort
to 8081
(k8s will route internal traffic to the new port)server
entry in cluster-local-gateway
(port 80 is not used anymore)Gateway
to select both Deployment
(Ingress and Cluster Local)π : Wait for Istio propagation
cluster-local
Service
to select the Ingress Deployment
Gateway
to select only the Ingress Deployment
π Wait for Istio propagation
Deployment
This could be done in a single release by waiting for propagation before proceeding to the next steps where required, or we can spread it over multiple releases. @tcnghia any thoughts?
/assign
/area networking
While working through
https://cloud.google.com/solutions/authorizing-access-to-cloud-run-on-gke-services-using-istio#deploying_a_sample_service
A few issues came up:
JWT authorization needs to be disabled on some paths for the system to work (controller being able to ping the service, for example).
With the updated policy:
apiVersion: authentication.istio.io/v1alpha1
kind: Policy
metadata:
name: default
namespace: tutorial
spec:
origins:
- jwt:
issuer: https://accounts.google.com
audiences:
- http://example.com
jwksUri: https://www.googleapis.com/oauth2/v3/certs
triggerRules:
- includedPaths:
- prefix: /
- excludedPaths:
- exact: /_internal/knative/activator/probe
- exact: /healthz
- exact: /probe
- exact: /metrics
principalBinding: USE_ORIGIN
Things mostly work, with one large gap.
It was discovered that this example only works when the Activator is not proxying requests. This is pretty clearly non-ideal, and appears to be related to an inability of the Activator to query the health of the (Revision pod and/or ClusterIP).
We should determine how to make Istio JWT authz work properly with Activator proxying in order for users to be able to have scale to zero and burst protection at the same time as locking down Service access via JWT.
Please refer to https://testgrid.knative.dev/serving#istio-stable-mesh.
Almost all tests became Red π after knative/serving@0998674.
There is a noticeable regression in 1.5 that was fixed in 1.7 (and possibly 1.6).
This should be documented (with links to bugs) on knative.dev so users are aware.
related to #132 and knative/serving#7507
Istio 1.6 does not support v1alpha1 authentication policy
.
Although it could be replaced with v1beta1 RequestAuthentication
, the new RequestAuthentication does not support regex feature for now. Please refer to - istio/istio#16585 (comment).
The current TestProbeWhitelist highly depends on the regex, so we need to re-write it.
Now that we separated the repo, we should start generating the client library in our repo instead of relying on vendoring from knative.dev/serving
Hello,
Trying to set CORS policy from Istio 1.6.3 VirtualService and especially 'allowOrigins' field:
...
http:
- corsPolicy:
allowCredentials: true
allowHeaders:
- content-type
- request-id
- authorization
allowOrigins:
- exact: http://localhost
maxAge: 24h
match:
- uri:
prefix: /mobile/
...
it leads the service networking-istio to the following error:
E0630 16:04:21.662064 1 reflector.go:123] runtime/asm_amd64.s:1357: Failed to list *v1alpha3.VirtualService: v1alpha3.VirtualServiceList.Items: []v1alpha3.VirtualService: v1alpha3.VirtualService.v1alpha3.VirtualService.Spec: unmarshalerDecoder: unknown field "allowOrigins" in v1alpha3.CorsPolicy, error found in #10 byte of ...|":100}]}]}},{"apiVer|..., bigger context ...|ter.local","port":{"number":80}},"weight":100}]}]}},{"apiVersion":"networking.istio.io/v1alpha3","ki|...
Then, my Knative service is not ready:
conditions:
- lastTransitionTime: "2020-06-30T17:48:18Z"
status: "True"
type: ConfigurationsReady
- lastTransitionTime: "2020-06-30T17:48:18Z"
message: Ingress reconciliation failed
reason: ReconcileIngressFailed
status: "False"
type: Ready
- lastTransitionTime: "2020-06-30T17:48:18Z"
message: Ingress reconciliation failed
reason: ReconcileIngressFailed
status: "False"
type: RoutesReady
Knative version used:
kubectl -n knative-serving get pods --show-labels
NAME READY STATUS RESTARTS AGE LABELS
activator-688c498dcd-dhvxr 2/2 Running 0 20h app=activator,istio.io/rev=default,pod-template-hash=688c498dcd,role=activator,security.istio.io/tlsMode=istio,service.istio.io/canonical-name=activator,service.istio.io/canonical-revision=latest,serving.knative.dev/release=v0.14.2
autoscaler-577b8f6b6-k7m8c 2/2 Running 0 20h app=autoscaler,istio.io/rev=default,pod-template-hash=577b8f6b6,security.istio.io/tlsMode=istio,service.istio.io/canonical-name=autoscaler,service.istio.io/canonical-revision=latest,serving.knative.dev/release=v0.14.2
autoscaler-hpa-cf757b76b-ckvgh 2/2 Running 0 20h app=autoscaler-hpa,istio.io/rev=default,pod-template-hash=cf757b76b,security.istio.io/tlsMode=istio,service.istio.io/canonical-name=autoscaler-hpa,service.istio.io/canonical-revision=latest,serving.knative.dev/release=v0.14.2
controller-75cccc4cd6-tmtdg 2/2 Running 1 20h app=controller,istio.io/rev=default,pod-template-hash=75cccc4cd6,security.istio.io/tlsMode=istio,service.istio.io/canonical-name=controller,service.istio.io/canonical-revision=latest,serving.knative.dev/release=v0.14.2
istio-webhook-b65488fbc-fjkrx 2/2 Running 0 107m app=istio-webhook,istio.io/rev=default,pod-template-hash=b65488fbc,role=istio-webhook,security.istio.io/tlsMode=istio,service.istio.io/canonical-name=istio-webhook,service.istio.io/canonical-revision=latest,serving.knative.dev/release=v0.14.1
networking-istio-7d9d688b86-d75dr 1/1 Running 0 107m app=networking-istio,pod-template-hash=7d9d688b86,serving.knative.dev/release=v0.14.1
webhook-7b476996c8-7mlr4 2/2 Running 1 20h app=webhook,istio.io/rev=default,pod-template-hash=7b476996c8,role=webhook,security.istio.io/tlsMode=istio,service.istio.io/canonical-name=webhook,service.istio.io/canonical-revision=latest,serving.knative.dev/release=v0.14.2
Istio recently (2020-08-18 18:58:35 +0000 UTC) released "Istio 1.7.0-rc.2". Let's test it π.
This is a ticket copied from knative/serving#8243
v0.14.0
We deployed knative-serving v0.14.0 over Kubernetes Cluster with Istio as Service Mesh.
Observed memory usage for networking-istio pod shows sudden increase & does not get back to original state.
$ kubectl get pod networking-istio-cb8649f6d-tqqkd -nknative-serving -o jsonpath='{.spec.containers[0].resources}{"\n"}'
map[limits:map[cpu:300m memory:400Mi] requests:map[cpu:30m memory:40Mi]]
Captured heap & allocs information before & after memory usage increase on networking-istio pod.
Memory profile information before and after increase in usage.
before-memory-usage-increase.zip
after-memory-usage-increase.zip
No major events happened during this change in memory usage of networking-istio pod.
However, we do have approximately 300+ namespaces.
Was not able to retrieve all events from different namespaces.
Support Istio envoyfilter grpc_json_transcoder to allow grpc or json rest requests on the same port. Currently this does not seem work based on some of the networking setup by knative
Currently, if Istio sidecar injection is enabled on knative-serving
namespace, the webhook will also have a sidecar.
Given that the MutatingWebhookConfiguration
specifies an expected CA for the certificate exposed by the webhook endpoint, a component outside of the mesh (e.g. kube-api
) fails to contact the webhook when the sidecar tries to terminate the TLS connection, which happens in mTLS strict mode (in permissive mode, Envoy acts a TCP proxy, see https://github.com/istio/istio/blob/master/pilot/pkg/networking/core/v1alpha3/listener.go#L211):
The webhook is designed to only be contacted by kube-api
(outside of the mesh) and to not contact anything, therefore, it should always be excluded from the mesh:
kubectl patch deployments.apps -n knative-serving webhook -p '{"spec":{"template":{"metadata":{"annotations":{"sidecar.istio.io/inject":"false"}}}}}'
/cc @Cynocracy
See https://testgrid.knative.dev/serving#istio-stable-mesh&include-filter-by-regex=TestAllowedProbes
I think the PR knative/serving#9679 is for the fix
/assign @JRBANCEL
This is the net-istio side of knative/serving#8765.
Currently the network probing logic implements an optimization where a pod is assumed to have been fully programmed once any part of the network programming we are probing has been successfully verified. In practice, this means that for each pod we are enqueuing N pieces of work and cancelling N-1 pieces of work for each gateway pod every time we change a kingress.
Given that this builds on a faulty assumption (in the linked issue), I'd propose instead optimizing this on the net-istio side here:
Instead of enqueuing N pieces of work per pod, and then cancelling the N-1 pieces of work, simply choose one piece of work to verify (per IP) and prune the rest. This should achieve what the prober-side optimization yields and more because this should also improve the throughput of the prober workqueue.
I'd like to avoid having Istio take a hit to correct the issue above, and this should be safe to start in isolation, so if anyone wants to try and knock this off I'd appreciate it. π
Istio recently (2020-09-29 19:39:28 +0000 UTC) released "Istio 1.7.3". Let's test it π.
Istio recently (2020-08-13 01:07:17 +0000 UTC) released "Istio 1.7.0-rc.1". Let's test it π.
Istio recently (2020-08-22 01:00:15 +0000 UTC) released "Istio 1.7.0". Let's test it π.
Upon the next release of istio/api, update the istio dependency and replace the canonical labels with the ones added in istio/api#1339
#147 introduces the use of the v1beta1
API.
This requires at least Istio 1.5, otherwise the controllers will fail.
This is a tracking issue for the 503 issue in Istio 1.5 (installed by istioctl
).
Currently Istio upgrade got stuck due to 503 error. istio/istio#23029 is discussing with Istio team and has some detail.
obsolete tickets: knative/serving#8210 knative/serving#8193
$ kubectl edit vs hello-example-ingress
:q
).Then, you get following error:
# virtualservices.networking.istio.io "hello-example-ingress" was not valid:
# * : Invalid value: "The edited file failed validation": [ValidationError(VirtualService.spec.http[0]): unknown field "websocketUpgrade" in io.istio.networking.v1beta1.VirtualService.spec.http, ValidationError(VirtualService.spec.http[1]): unknown field "websocketUpgrade" in io.istio.networking.v1beta1.VirtualService.spec.http]
The websocketUpgrade
is already deprecated, so we should remove it.
I have tried to apply the third_party/istio-1.5.7-helm/istio-minimal.yaml. But both istio-ingressgateway pods and cluster-local-gateway pods cannot be started because istio-proxy container is not ready.
cluster-local-gateway-96488b8bf-qknql 0/1 Running 0 98s
istio-ingressgateway-6c866f94c6-vhpxv 1/2 Running 0 98s
istio-pilot-6bdfc6f49c-k2hfd 1/1 Running 0 98s
I can reproduce this issue consistently.
I checked the log, and got a lot of the following logs:
2020-09-04T03:45:21.061808Z info waiting for file
2020-09-04T03:45:21.162125Z info waiting for file
2020-09-04T03:45:21.262440Z info waiting for file
2020-09-04T03:45:21.362749Z info waiting for file
2020-09-04T03:45:21.462978Z info waiting for file
2020-09-04T03:45:21.563242Z info waiting for file
2020-09-04T03:45:21.663541Z info waiting for file
And later I got the following logs:
2020-09-04T04:10:25.459577Z info Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 1 rejected; lds updates: 1 successful, 0 rejected
This issue caused knative/serving#9275
/cc @nak3
The gateways field of the VirtualService CRD indicates to which Istio
Gateways this VirtualService applies to.
At the moment, all configured gateways of are added to the gateways fields.
However, if the user wants to only specify a local gateway, they also
get the public gateway in that field.
This confuses the reconciler, which can't find the gateway and errors out.
The VirtualService generation logic should instead only add gateways which are actually used by the VirtualService routes.
Related code: https://github.com/knative/net-istio/blob/f08de598438b0e85c27b401416575405cc451eba/pkg/reconciler/ingress/resources/virtual_service.go#L135-L140
Related issue: #43
/cc @tcnghia
The istio webhooks are still using v1beta1 - they should switch to v1 at some point
related: knative/serving#9494
Istio recently (2020-09-10 21:26:38 +0000 UTC) released "Istio 1.7.1". Let's test it π.
Right now the prober is unconditionally called during reconciliation to determine readiness: https://github.com/knative-sandbox/net-istio/blob/81a9b95df6ba5516bb3240b53caaa1686fc2dd75/pkg/reconciler/ingress/ingress.go#L203
This is ok under normal circumstances because internally the prober caches the probe result and so on global resyncs we hit the cache and things finish quickly. However, when we resync due to failing over (perhaps due to rollout) this cache is empty.
If I have 1000 ksvc, they are exposed on both gateways, and those gateways have 5 pods each, then the prober must perform 5 * 2 * 1,000 = 10,000
probes before normal work may resume.
In net-contour
, I changed things to have the main Reconcile loop rely on the recorded readiness of the kingress
to elide IsReady
checks, effectively recording the probers internal cache in the CRD's durable status.
Steps to reproduce :
The cluster local svc never gets ready.
TLS redirect gets added to the cluster local gateway
tls:
httpsRedirect: true
This doesnt happen the second time (i.e. if you remove the tls redirect from gateway, delete services and recreate) . Only during first time certificate reconciliation.
We should use this for config-istio validation, and some of the new stuff Thomas has planned around canonical service labels.
Istio recently (2020-09-29 19:33:58 +0000 UTC) released "Istio 1.6.11". Let's test it π.
This might be an issue in operation not net-istio, but I report the issue here as a Knative user was confused.
istioctl
deletes CRD.$ istioctl manifest apply -f istio-minimal-operator.yaml
$ kubectl apply --filename https://storage.googleapis.com/knative-nightly/serving/latest/serving-crds.yaml
$ kubectl apply --filename https://storage.googleapis.com/knative-nightly/serving/latest/serving-core.yaml
$ kubectl apply --filename https://storage.googleapis.com/knative-nightly/net-istio/latest/release.yaml
$ kn service create hello-example --image=gcr.io/knative-samples/helloworld-go
$ kubectl port-forward -n istio-system service/istio-ingressgateway 8000:80
Forwarding from 127.0.0.1:8000 -> 8080
Forwarding from [::1]:8000 -> 8080
Handling connection for 8000
$ curl -H "Host: hello-example.default.example.com" localhost:8000
Hello World!
Do not touch Knative, but just re-install Istio.
$ istioctl manifest generate -f istio-minimal-operator.yaml | kubectl delete -f -
$ istioctl manifest apply -f istio-minimal-operator.yaml
Now, Knative app started unreachable.
$ kubectl port-forward -n istio-system service/istio-ingressgateway 8000:80
Forwarding from 127.0.0.1:8000 -> 8080
Forwarding from [::1]:8000 -> 8080
Handling connection for 8000
E0813 15:22:23.690783 4348 portforward.go:400] an error occurred forwarding 8000 -> 8080: error forwarding port 8080 to pod fb1991564571888471fc564b2a2fe08f528fea1ceeafe99db8fef57f4564a310, uid : exit status 1: 2020/08/13 06:22:23 socat[199389] E connect(5, AF=2 127.0.0.1:8080, 16): Connection refused
Handling connection for 8000
Empty reply from server
.$ curl -H "Host: hello-example.default.example.com" localhost:8000
curl: (52) Empty reply from server
This is because Gateway in knative-serving ns was deleted in step-4. We need to re-install Gateways in net-istio.yaml.
cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: knative-ingress-gateway
namespace: knative-serving
labels:
serving.knative.dev/release: "v20200811-4a3c487"
networking.knative.dev/ingress-provider: istio
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: cluster-local-gateway
namespace: knative-serving
labels:
serving.knative.dev/release: "v20200811-4a3c487"
networking.knative.dev/ingress-provider: istio
spec:
selector:
istio: cluster-local-gateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
EOF
I wonder that Gateway (cluster-local-gateway, knative-ingress-gateway) should not be included in net-istio.yaml, but net-istio controller should create it if not exist.
As we see the performance regression of Istio 1.5, we should move the stable to Istio 1.7.
When I restart my local kubernetes cluster, and type kubectl get king
, it always reminder errorοΌWaiting for load balancer to be ready
k3d stop cluster xxx
and k3d start cluster xxx
kubectl get king -oyaml
and get Waiting for load balancer to be ready
errorI think there might be a race condition in statusManager IsReady
function.
IsReady
function first cache ingressStates
with gateway service endpoint ips(when restart k3d cluster, it just return last cached endpoint ips)statusManager
prob with old ip and get connect errorThis will not a problem when deploy kuberneter cluster in production mode. But I think when gateway service pod reconciled, statusManager
will ignore to prob new gateway service endpoint ips. This bug can be checked manual re-deploy gateway service deployment, and no new prob logs at net-istio
log.
What do you think?
In here knative/serving#9301 we propose to use install-istio.sh
for development purpose.
Supporting NodePort will help those of us who are using Minikube
or KinD
. This will also make it easier to run integration tests using net-istio
on GitHub Actions.
Istio recently (2020-09-09 20:57:35 +0000 UTC) released "Istio 1.6.9". Let's test it π.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.