Giter Site home page Giter Site logo

Comments (78)

ggallen avatar ggallen commented on July 17, 2024

I believe I have found the problem here.

I started of trying to debug by sending a single message to my broker. Surprisingly, this let to 101 events being sent to my EventListener. After turning on trigger debugging, I reran my test and again check the logs. After analyzing the messages, it broke down like this:

      1 Ce-Type:[com.redhat.agogos.event.component.build.v1alpha1]
    254 Ce-Type:[dev.tekton.event.triggers.accepted.v1]

Further investigation reveals that this message is sent in sink.go (https://github.com/tektoncd/triggers/blob/main/pkg/sink/sink.go#L236-L255). If the message encoding is unknown, a simple HTTP response is sent with an "Accepted" status. However, if the encoding is known, the CloudEvent message "dev.tekton.event.triggers.accepted.v1" is sent. In my case it appears the encoding is known.

However, this appears to cause a furious loop. As the accepted event is sent, the broker accepts it and responds by forwarding it back to the EventListener. The EventListener receives the event, processes it, and since it has a known encoding it again responds by sending an accepted evetnt. There appears to be an infinite loop, and I'm not sure why it ever ends, but at some point it does. It could be because the EL gets so overwhelmed the eventually it can't respond in time and breaks the cycle.

To me this appears to be a flaw in the logic of sending an event in response to another event. There's probably a good reason to do so, but it's nothing that I am aware of. But my setup does not seem unusual so it seems like this could be a common problem for anyone who is sending events with a known encoding.

To test my theory, I commented out the sending of the accepted event and ran my curl test. The test ran for an hour, and it send 1.5 million events to the EL without a single restart. A check of the EventListener log reveals no timeout errors and no errors (other than the expected FailedPrecondition errors).

I'm not sure what the fix should be here. To me it would make sense for --el-events disable to also disable the sending of this event. Or maybe my configuration needs to change somehow, but it's unclear to me how to do this. It seems to be a flaw in the logic of responding to an event with another event, unless the EventListener can/should be configured to ignored responding to events that it actually sent. That may be a better solution, and maybe it should do that by default.

from triggers.

khrm avatar khrm commented on July 17, 2024

@afrittoli What do you think about this request? Is there a flaw in the logic of sending cloud-event responses to cloud-event requests?

from triggers.

ggallen avatar ggallen commented on July 17, 2024

I actually think the EL should contain logic to not respond with another event to an event it generated. I think that's what's missing.

from triggers.

afrittoli avatar afrittoli commented on July 17, 2024

I actually think the EL should contain logic to not respond with another event to an event it generated. I think that's what's missing.

Thanks @ggallen for your analysis of this.
To give some more context, Tekton Triggers can both consume and produce CloudEvents.

Producing CloudEvents

Tekton can produce CloudEvents to notify consumers about things that happen in Tekton, like a trigger was received, a pipeline was executed, etc. This functionality has existed in Tekton Pipelines for some time, and has recently been introduced into Tekton triggers as well.

Consuming CloudEvents

An event producer creates a CloudEvent, which can be sent directly to an event listener sink or sent to a CloudEvent broker and delivered to the event listener sink via a subscription.

In Tekton's own infra cluster, we use a Knative Eventing channel to route CloudEvents. The Knative subscription object sends the events to the event listener sink and expects a response. If the response is missing, or if the response is not a valid CloudEvent, it considers the message delivery to be failed and it attempts again. Tekton triggers added a CloudEvent response to CloudEvents triggers to address this issue.

Tekton must respond to an incoming CloudEvents, and that response must be in valid CloudEvents format according to the CloudEvents spec. Something is not working either in the cloudevent response or in the broker.


With that context, I would like to make sure I understand where the loop happens:

sequenceDiagram
    participant broker
    participant event-listener
    participant el-sink
    broker->>event-listener: sends a CloudEvent `com.redhat.agogos`
    event-listener ->>el-sink: produces a new CloudEvent `dev.tekton.event.triggers.started.v1`
    event-listener->>broker: sends a CloudEvent response `dev.tekton.event.triggers.accepted.v1`
    broker ->>event-listener: sends a CloudEvent response to  `dev.tekton.event.triggers.accepted.v1`
    event-listener->>broker: sends a CloudEvent response `dev.tekton.event.triggers.accepted.v1`
    broker ->>event-listener: sends a CloudEvent response to  `dev.tekton.event.triggers.accepted.v1`
    event-listener->>broker: sends a CloudEvent response `dev.tekton.event.triggers.accepted.v1`

If this is correct, it looks like the event broker does not recognise the response produced by the event listener as a response, instead, it interprets it as a new event to which it replies with a response to the event listener, which didn't expect a response and treats the incoming message as a new event and so on. I wonder how that is possible :S
I guess that the trigger configured filters by event type, or else the trigger would fire many times as well!

Do you have a trace of the content of the event listener response to the initial event?

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@afrittoli, I think the loop happens between the broker and the event listener as you show in your diagram. The UIDs are changing, so I also think the broker must be interpreting the response as a new event.

But I don't know how the broker is supposed to determine that the new event is in fact a response and not a new event. Is there something missing in the response that is supposed to indicate that to the broker?

If you tell me how to get what you want for debugging information, I can get it for you. But this should be easy to reproduce.

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@afrittoli, if it makes any difference, we are using the Knative MT broker.

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@afrittoli ?

from triggers.

khrm avatar khrm commented on July 17, 2024

@afrittoli I think what @ggallen want is a receiving without a cloud event.

Sending a cloud event in response to a cloud event is a valid response as we can see from this receiver:
https://github.com/cloudevents/sdk-go/blob/7f5ef3992769f96b40a54c4d59291be62acd36da/v2/client/client.go#L48

What @gallen want is just a Result response (which consists of a status code and message string) to CloudEvent.
https://github.com/cloudevents/sdk-go/blob/7f5ef3992769f96b40a54c4d59291be62acd36da/v2/client/client.go#L43

Can't we just filter events by the source on the broker side?

from triggers.

ggallen avatar ggallen commented on July 17, 2024

Well, I don't really care what I get for a response, I just don't want all the messages. But I think there is a problem here. The question is where, and I see a couple of possibilities:

  1. The EL is sending a bad response back to the broker, causing the broker to relay the response back as a new message.
  2. The broker is misinterpreting a valid EL response as a new event, and responding accordingly.

I think the problem, wherever it is, should be fixed. It's possible that it's #2, and the options are to either get a fix to the MT broker or change to a different broker that handles things correctly.

I think the question is where is the actual problem? I don't have the knowledge to answer that question.

from triggers.

khrm avatar khrm commented on July 17, 2024

If we can't filter events by the source on the broker side, then we can probably add an annotation to EventListener to determine what to do for CloudEvents request - default - send Events or just status-code and msg.

from triggers.

khrm avatar khrm commented on July 17, 2024

@afrittoli If you are free, please clarify on how to proceed with this.

from triggers.

afrittoli avatar afrittoli commented on July 17, 2024

Hello @khrm and @ggallen - sorry about the late reply.
Unfortunately, I've had no time to look further into this - and I'm now on PTO until September 😓

To summarise the issues as I understand them today:

  1. The original way the event listeners responded to a CloudEvent from the Knative Broker was not recognised by the Knative Broker as a valid CloudEvent response, which triggered retries and thus a lot of duplicated messages. The workaround for that was to set retries to 0 in the subscription. The fix was to change triggers responses to CloudEvents
  2. The current way the event listener responds to a CloudEvent is recognised by the Knative Broker as a new CloudEvent, so the Knative broker sends a reply (type dev.tekton.event.triggers.accepted.v1)
  3. The reply by the Knative broker is interpreted by Tekton triggers as a new event, etc.

In both issues (2) and (3), the response sent is not identified as a response, but rather as a new event, which triggers a response.

In both those cases, the issue can be:

  • either the message by the producer is not formatted according to spec, so the consumer does not read it as a response
  • or the consumer is parsing the response incorrectly, and taking it as a new message

Note that in (2) and (3) producer and consumers are inverted, so we could have two separate issues:

  • in (2) the producer of the response is tekton, consumer is knative
  • in (3) the producer of the response is knative, the consumer is tekton

The next steps would be to have examples of each of the responses produced by both systems, compare them with the specification and figure out whether the issue is the producer or consumer of the response.

I hope this helps move forward towards a solution.

from triggers.

ggallen avatar ggallen commented on July 17, 2024

I have gotten the Broker and EL messages. I chose three message/response pair as I think that's enough to get a picture of what's going on.

I also looked at the CDEvents documentation at https://github.com/cdevents/spec, but it wasn't helpful to someone with my level of knowledge about this topic. But I am wondering if it makes any difference that the initial event being sent to the EL from the Broker is not one of the predefined CD events.

Is that possibly the problem?

Here are the messages. Hopefully, this information can shed some light on what might be the problem.

Broker:
{ "level": "debug", "ts": "2023-08-15T17:53:38.743Z", "logger": "eventlistener", "caller": "sink/sink.go:177", "msg": "handling event with path /, payload: { \"name\": \"dummy-component-dep-2\", \"resource\": \"Component\", \"instance\": \"eea42e7e-4543-4309-bc6b-11cf35549575\" } and header: map[Accept-Encoding:[gzip] Ce-Id:[e3b1494d-05f1-45aa-8792-fcc3608fbf24] Ce-Knativearrivaltime:[2023-08-15T17:53:38.719745294Z] Ce-Source:[http://localhost] Ce-Specversion:[1.0] Ce-Type:[com.redhat.agogos.event.component.build.v1alpha1] Content-Length:[123] Content-Type:[application/json] Prefer:[reply] Traceparent:[00-d74ea1c9a53a46643d3eaeaa95e3e442-9f3e62b83c92cacb-00] User-Agent:[Go-http-client/1.1]]", "commit": "fed390f-dirty", "eventlistener": "agogos", "namespace": "gallen", "/triggers-eventid": "62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b", "eventlistenerUID": "e6bc7aeb-716d-4a3a-b094-068cad722b41", "/triggers-eventid": "62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b" }
Event Listener:
{ "level": "debug", "ts": "2023-08-15T17:53:38.744Z", "logger": "eventlistener", "caller": "sink/sink.go:252", "msg": "Responding with cloud event, payload: &{Context:Context Attributes,\n specversion: 1.0\n type: dev.tekton.event.triggers.accepted.v1\n source: agogos\n subject: gallen.agogos accepted 62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b\n id: 62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b\n datacontenttype: application/json\n DataEncoded:[123 34 101 118 101 110 116 76 105 115 116 101 110 101 114 34 58 34 97 103 111 103 111 115 34 44 34 110 97 109 101 115 112 97 99 101 34 58 34 103 97 108 108 101 110 34 44 34 101 118 101 110 116 76 105 115 116 101 110 101 114 85 73 68 34 58 34 101 54 98 99 55 97 101 98 45 55 49 54 100 45 52 97 51 97 45 98 48 57 52 45 48 54 56 99 97 100 55 50 50 98 52 49 34 44 34 101 118 101 110 116 73 68 34 58 34 54 50 98 54 101 50 98 98 45 97 54 56 102 45 52 100 99 56 45 98 55 56 98 45 57 102 98 57 49 98 54 98 99 98 56 98 34 125] DataBase64:false FieldErrors:map[]}", "commit": "fed390f-dirty", "eventlistener": "agogos", "namespace": "gallen", "/triggers-eventid": "62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b", "eventlistenerUID": "e6bc7aeb-716d-4a3a-b094-068cad722b41", "/triggers-eventid": "62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b" }
Broker:
{ "level": "debug", "ts": "2023-08-15T17:53:38.751Z", "logger": "eventlistener", "caller": "sink/sink.go:177", "msg": "handling event with path /, payload: {\"eventListener\":\"agogos\",\"namespace\":\"gallen\",\"eventListenerUID\":\"e6bc7aeb-716d-4a3a-b094-068cad722b41\",\"eventID\":\"62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b\"} and header: map[Accept-Encoding:[gzip] Ce-Id:[62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b] Ce-Knativearrivaltime:[2023-08-15T17:53:38.749146357Z] Ce-Source:[agogos] Ce-Specversion:[1.0] Ce-Subject:[gallen.agogos accepted 62b6e2bb-a68f-4dc8-b78b-9fb91b6bcb8b] Ce-Type:[dev.tekton.event.triggers.accepted.v1] Content-Length:[154] Content-Type:[application/json] Prefer:[reply] Traceparent:[00-d74ea1c9a53a46643d3eaeaa95e3e442-88f37a21ff5b0be1-00] User-Agent:[Go-http-client/1.1]]", "commit": "fed390f-dirty", "eventlistener": "agogos", "namespace": "gallen", "/triggers-eventid": "ab7642c2-0e65-463b-be45-6ff498034170", "eventlistenerUID": "e6bc7aeb-716d-4a3a-b094-068cad722b41", "/triggers-eventid": "ab7642c2-0e65-463b-be45-6ff498034170" }
Event Listener:
{ "level": "debug", "ts": "2023-08-15T17:53:38.751Z", "logger": "eventlistener", "caller": "sink/sink.go:252", "msg": "Responding with cloud event, payload: &{Context:Context Attributes,\n specversion: 1.0\n type: dev.tekton.event.triggers.accepted.v1\n source: agogos\n subject: gallen.agogos accepted ab7642c2-0e65-463b-be45-6ff498034170\n id: ab7642c2-0e65-463b-be45-6ff498034170\n datacontenttype: application/json\n DataEncoded:[123 34 101 118 101 110 116 76 105 115 116 101 110 101 114 34 58 34 97 103 111 103 111 115 34 44 34 110 97 109 101 115 112 97 99 101 34 58 34 103 97 108 108 101 110 34 44 34 101 118 101 110 116 76 105 115 116 101 110 101 114 85 73 68 34 58 34 101 54 98 99 55 97 101 98 45 55 49 54 100 45 52 97 51 97 45 98 48 57 52 45 48 54 56 99 97 100 55 50 50 98 52 49 34 44 34 101 118 101 110 116 73 68 34 58 34 97 98 55 54 52 50 99 50 45 48 101 54 53 45 52 54 51 98 45 98 101 52 53 45 54 102 102 52 57 56 48 51 52 49 55 48 34 125] DataBase64:false FieldErrors:map[]}", "commit": "fed390f-dirty", "eventlistener": "agogos", "namespace": "gallen", "/triggers-eventid": "ab7642c2-0e65-463b-be45-6ff498034170", "eventlistenerUID": "e6bc7aeb-716d-4a3a-b094-068cad722b41", "/triggers-eventid": "ab7642c2-0e65-463b-be45-6ff498034170" }
Broker:
{ "level": "debug", "ts": "2023-08-15T17:53:38.753Z", "logger": "eventlistener", "caller": "sink/sink.go:177", "msg": "handling event with path /, payload: {\"eventListener\":\"agogos\",\"namespace\":\"gallen\",\"eventListenerUID\":\"e6bc7aeb-716d-4a3a-b094-068cad722b41\",\"eventID\":\"ab7642c2-0e65-463b-be45-6ff498034170\"} and header: map[Accept-Encoding:[gzip] Ce-Id:[ab7642c2-0e65-463b-be45-6ff498034170] Ce-Knativearrivaltime:[2023-08-15T17:53:38.752346245Z] Ce-Source:[agogos] Ce-Specversion:[1.0] Ce-Subject:[gallen.agogos accepted ab7642c2-0e65-463b-be45-6ff498034170] Ce-Type:[dev.tekton.event.triggers.accepted.v1] Content-Length:[154] Content-Type:[application/json] Prefer:[reply] Traceparent:[00-d74ea1c9a53a46643d3eaeaa95e3e442-71a8938ac1254cf6-00] User-Agent:[Go-http-client/1.1]]", "commit": "fed390f-dirty", "eventlistener": "agogos", "namespace": "gallen", "/triggers-eventid": "919bcc93-8586-4d09-89e8-b6919a2bbc21", "eventlistenerUID": "e6bc7aeb-716d-4a3a-b094-068cad722b41", "/triggers-eventid": "919bcc93-8586-4d09-89e8-b6919a2bbc21" }
Event Listener:
{ "level": "debug", "ts": "2023-08-15T17:53:38.753Z", "logger": "eventlistener", "caller": "sink/sink.go:252", "msg": "Responding with cloud event, payload: &{Context:Context Attributes,\n specversion: 1.0\n type: dev.tekton.event.triggers.accepted.v1\n source: agogos\n subject: gallen.agogos accepted 919bcc93-8586-4d09-89e8-b6919a2bbc21\n id: 919bcc93-8586-4d09-89e8-b6919a2bbc21\n datacontenttype: application/json\n DataEncoded:[123 34 101 118 101 110 116 76 105 115 116 101 110 101 114 34 58 34 97 103 111 103 111 115 34 44 34 110 97 109 101 115 112 97 99 101 34 58 34 103 97 108 108 101 110 34 44 34 101 118 101 110 116 76 105 115 116 101 110 101 114 85 73 68 34 58 34 101 54 98 99 55 97 101 98 45 55 49 54 100 45 52 97 51 97 45 98 48 57 52 45 48 54 56 99 97 100 55 50 50 98 52 49 34 44 34 101 118 101 110 116 73 68 34 58 34 57 49 57 98 99 99 57 51 45 56 53 56 54 45 52 100 48 57 45 56 57 101 56 45 98 54 57 49 57 97 50 98 98 99 50 49 34 125] DataBase64:false FieldErrors:map[]}", "commit": "fed390f-dirty", "eventlistener": "agogos", "namespace": "gallen", "/triggers-eventid": "919bcc93-8586-4d09-89e8-b6919a2bbc21", "eventlistenerUID": "e6bc7aeb-716d-4a3a-b094-068cad722b41", "/triggers-eventid": "919bcc93-8586-4d09-89e8-b6919a2bbc21" }

from triggers.

khrm avatar khrm commented on July 17, 2024

That's fine. EL can receive any CE-Events. It will respond with Accepted Event which shouldn't come back.

from triggers.

khrm avatar khrm commented on July 17, 2024

@ggallen Can you test the broker with this sample? https://github.com/cloudevents/sdk-go/blob/7f5ef3992769f96b40a54c4d59291be62acd36da/samples/http/responder/main.go

If your broker while receiving the CE event sends back the event again, then it's broken. Otherwise, EL has a problem.

from triggers.

ggallen avatar ggallen commented on July 17, 2024

I don't understand what you want me to do. I can't run that code. Do you want me to send that message?

You can see from my example that the Broker ends up sending back the accepted message to the EL.

from triggers.

khrm avatar khrm commented on July 17, 2024

@ggallen , can you try setting filters like given here:
https://github.com/knative/eventing/blob/main/docs/broker/filtering.md ?

I wanted you to run something similar and send a message via a broker. Let me create a program.

from triggers.

khrm avatar khrm commented on July 17, 2024

@ggallen Please use this deployment file: https://raw.githubusercontent.com/khrm/ce-test/master/deployment.yaml

First, send a direct request after port-forwarding the service:

kubectl port-forward svc/ce-test 8080

Afterward, you can use the below commands to send a request:

cd cmd/client
go run main.go

Afterward, you can port-forward the MT broker endpoint (expose as port 8080) and send the request using the above command.

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@khrm, something is not quite right, but here are the results. Sending to ce-test:

warning: GOPATH set to GOROOT (/usr/local/go) has no effect
Response:
Context Attributes,
  specversion: 1.0
  type: samples.http.test
  source: /test
  subject: https://github.com/cloudevents/sdk-go/v2/samples/requester#0
  id: 5f2b0c22-10ad-4f1e-98ec-0c69a482a913
  time: 2023-08-18T11:49:02.700185606Z
  datacontenttype: application/json
Data,
  {
    "id": 0,
    "message": "test done!"
  }

Got Event Response Context: Context Attributes,
  specversion: 1.0
  type: samples.http.test
  source: /test
  subject: https://github.com/cloudevents/sdk-go/v2/samples/requester#0
  id: 5f2b0c22-10ad-4f1e-98ec-0c69a482a913
  time: 2023-08-18T11:49:02.700185606Z
  datacontenttype: application/json

Got Response Data: &{Sequence:0 Message:test done!}
----------------------------

Sending to the MT broker:

warning: GOPATH set to GOROOT (/usr/local/go) has no effect
2023/08/18 08:05:25 Event sent at 2023-08-18 08:05:25.166677156 -0400 EDT m=+0.005350907
2023/08/18 08:05:25 Response status code 404: 

from triggers.

khrm avatar khrm commented on July 17, 2024

@ggallen Was message received at CE-Test service port? I don't think so.

from triggers.

ggallen avatar ggallen commented on July 17, 2024

Well, I can't get anything to run now. I see this in the events:

22s         Normal    Created             pod/ce-test-54c5f5f7-2n6k4        Created container ce-test
21s         Normal    Started             pod/ce-test-54c5f5f7-2n6k4        Started container ce-test
23s         Warning   Unhealthy           pod/ce-test-54c5f5f7-2n6k4        Liveness probe failed: HTTP probe failed with statuscode: 405
23s         Normal    Killing             pod/ce-test-54c5f5f7-2n6k4        Container ce-test failed liveness probe, will be restarted

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@khrm, the ce-test pod keeps restarting.

[gallen@localhost ce-test] (master)$ kc get pods
NAME                         READY   STATUS              RESTARTS   AGE
ce-test-54c5f5f7-rl757       0/1     ContainerCreating   0          13s
el-agogos-65596c479b-lmhhc   1/1     Running             0          3m51s
[gallen@localhost ce-test] (master)$ kc get pods
NAME                         READY   STATUS    RESTARTS   AGE
ce-test-54c5f5f7-rl757       0/1     Running   0          19s
el-agogos-65596c479b-lmhhc   1/1     Running   0          3m57s
[gallen@localhost ce-test] (master)$ kc logs ce-test-54c5f5f7-rl757
[gallen@localhost ce-test] (master)$ kc logs ce-test-54c5f5f7-rl757
[gallen@localhost ce-test] (master)$ kc get pods
NAME                         READY   STATUS             RESTARTS       AGE
ce-test-54c5f5f7-rl757       0/1     CrashLoopBackOff   11 (27s ago)   28m
el-agogos-65596c479b-lmhhc   1/1     Running            0              32m
[gallen@localhost ce-test] (master)$ 

In the output above, I haven't done anything but let time pass.

I am seeing the liveness probe fail as in the previous comment.

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@khrm, I was able to finally send a message directly to the broker:

warning: GOPATH set to GOROOT (/usr/local/go) has no effect
2023/08/18 10:54:51 Event sent at 2023-08-18 10:54:51.843636122 -0400 EDT m=+0.005916899
2023/08/18 10:54:51 Response status code 202: 

And what the EL shows in the log:

{"level":"debug","ts":"2023-08-18T14:57:25.966Z","caller":"logging/config.go:116","msg":"Successfully created the logger."}
{"level":"debug","ts":"2023-08-18T14:57:25.967Z","caller":"logging/config.go:117","msg":"Logging level set to: debug"}
{"level":"debug","ts":"2023-08-18T14:57:25.968Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:76","msg":"Flushing the existing exporter before setting up the new exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T14:57:25.969Z","logger":"eventlistener","caller":"metrics/prometheus_exporter.go:51","msg":"Created Prometheus exporter with config: &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}. Start the server for Prometheus exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T14:57:25.969Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:91","msg":"Successfully updated the metrics exporter; old config: <nil>; new config &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T14:57:25.971Z","logger":"eventlistener","caller":"adapter/adapter.go:88","msg":"Creating event broadcaster","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T15:01:18.944Z","logger":"eventlistener","caller":"sink/sink.go:177","msg":"handling event with path /, payload: {\"id\":0,\"message\":\"Hello world!\"} and header: map[Accept-Encoding:[gzip] Ce-Id:[8a01378a-4f68-4baf-8d10-93240543774d] Ce-Knativearrivaltime:[2023-08-18T15:01:18.939591943Z] Ce-Source:[https://github.com/cloudevents/sdk-go/v2/samples/requester] Ce-Specversion:[1.0] Ce-Time:[2023-08-18T15:01:18.933199105Z] Ce-Type:[com.cloudevents.sample.sent] Content-Length:[33] Content-Type:[application/json] Prefer:[reply] Traceparent:[00-e8655703484c759904c8dd719d58244c-f9e172fdf6b5ca8c-00] User-Agent:[Go-http-client/1.1]]","commit":"2ec8bc6-dirty","eventlistener":"agogos","namespace":"gallen","/triggers-eventid":"09c62c6b-fb93-4865-a1c0-4644100b7689","eventlistenerUID":"bd026c66-0e12-48ee-8304-e8277a1c65fe","/triggers-eventid":"09c62c6b-fb93-4865-a1c0-4644100b7689"}
{"level":"debug","ts":"2023-08-18T15:01:18.944Z","logger":"eventlistener","caller":"sink/metrics.go:115","msg":"event listener request","commit":"2ec8bc6-dirty","status":"succeeded"}
{"level":"debug","ts":"2023-08-18T15:01:18.944Z","logger":"eventlistener","caller":"sink/metrics.go:100","msg":"event listener request completed","commit":"2ec8bc6-dirty","status":202,"duration":0.000231657}

Sending to ce-test:

warning: GOPATH set to GOROOT (/usr/local/go) has no effect
Response:
Context Attributes,
  specversion: 1.0
  type: samples.http.test
  source: /test
  subject: https://github.com/cloudevents/sdk-go/v2/samples/requester#0
  id: a5447e02-4e7e-42d7-87c7-a459c1fe34cd
  time: 2023-08-18T15:03:35.92904009Z
  datacontenttype: application/json
Data,
  {
    "id": 0,
    "message": "test done!"
  }

Got Event Response Context: Context Attributes,
  specversion: 1.0
  type: samples.http.test
  source: /test
  subject: https://github.com/cloudevents/sdk-go/v2/samples/requester#0
  id: a5447e02-4e7e-42d7-87c7-a459c1fe34cd
  time: 2023-08-18T15:03:35.92904009Z
  datacontenttype: application/json

Got Response Data: &{Sequence:0 Message:test done!}
----------------------------

And the EL log after that:

{"level":"debug","ts":"2023-08-18T15:02:21.767Z","caller":"logging/config.go:116","msg":"Successfully created the logger."}
{"level":"debug","ts":"2023-08-18T15:02:21.768Z","caller":"logging/config.go:117","msg":"Logging level set to: debug"}
{"level":"debug","ts":"2023-08-18T15:02:21.769Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:76","msg":"Flushing the existing exporter before setting up the new exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T15:02:21.769Z","logger":"eventlistener","caller":"metrics/prometheus_exporter.go:51","msg":"Created Prometheus exporter with config: &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}. Start the server for Prometheus exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T15:02:21.769Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:91","msg":"Successfully updated the metrics exporter; old config: <nil>; new config &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T15:02:21.770Z","logger":"eventlistener","caller":"adapter/adapter.go:88","msg":"Creating event broadcaster","commit":"2ec8bc6-dirty"}

Nothing there, although maybe that is expected.

I think this is what you asked for. If you need anything else please let me know.

from triggers.

khrm avatar khrm commented on July 17, 2024

@ggallen Actually, I wanted you to use CE-Test in place of EventLisener. Put CE-Test in broker in place of EventLisener and send event via broker.

I assume at the moment when we send event to Broker, it goes to EL. Now we want to send event to Broker, it should go to CE-Test.

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@khrm, I will try that. I'm afraid the problem is going to be that I can't get the log for CE-test, unless I get lucky, since it keeps restarting.

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@khrm, I believe this is what you are asking for.

CE-Test OUTPUT:

warning: GOPATH set to GOROOT (/usr/local/go) has no effect
Response:
Context Attributes,
  specversion: 1.0
  type: dev.tekton.event.triggers.accepted.v1
  source: agogos
  subject: gallen.agogos accepted b2e5d2cf-dc2b-45b0-b474-c51e08dc3c6e
  id: b2e5d2cf-dc2b-45b0-b474-c51e08dc3c6e
  datacontenttype: application/json
Data,
  {
    "eventListener": "agogos",
    "namespace": "gallen",
    "eventListenerUID": "bd026c66-0e12-48ee-8304-e8277a1c65fe",
    "eventID": "b2e5d2cf-dc2b-45b0-b474-c51e08dc3c6e"
  }

Got Event Response Context: Context Attributes,
  specversion: 1.0
  type: dev.tekton.event.triggers.accepted.v1
  source: agogos
  subject: gallen.agogos accepted b2e5d2cf-dc2b-45b0-b474-c51e08dc3c6e
  id: b2e5d2cf-dc2b-45b0-b474-c51e08dc3c6e
  datacontenttype: application/json

Got Response Data: &{Sequence:0 Message:}
----------------------------

CE-Test LOG:

{"level":"debug","ts":"2023-08-18T20:04:53.227Z","caller":"logging/config.go:116","msg":"Successfully created the logger."}
{"level":"debug","ts":"2023-08-18T20:04:53.227Z","caller":"logging/config.go:117","msg":"Logging level set to: debug"}
{"level":"debug","ts":"2023-08-18T20:04:53.229Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:76","msg":"Flushing the existing exporter before setting up the new exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:04:53.230Z","logger":"eventlistener","caller":"metrics/prometheus_exporter.go:51","msg":"Created Prometheus exporter with config: &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}. Start the server for Prometheus exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:04:53.230Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:91","msg":"Successfully updated the metrics exporter; old config: <nil>; new config &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:04:53.231Z","logger":"eventlistener","caller":"adapter/adapter.go:88","msg":"Creating event broadcaster","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:16:28.882Z","logger":"eventlistener","caller":"sink/sink.go:177","msg":"handling event with path /, payload: {\"id\":0,\"message\":\"Hello world!\"} and header: map[Accept-Encoding:[gzip] Ce-Id:[233b6c63-0ddf-4fc9-b721-806f2911d4e4] Ce-Source:[https://github.com/cloudevents/sdk-go/v2/samples/requester] Ce-Specversion:[1.0] Ce-Time:[2023-08-18T20:16:28.87933523Z] Ce-Type:[com.cloudevents.sample.sent] Content-Length:[33] Content-Type:[application/json] User-Agent:[Go-http-client/1.1]]","commit":"2ec8bc6-dirty","eventlistener":"agogos","namespace":"gallen","/triggers-eventid":"b2e5d2cf-dc2b-45b0-b474-c51e08dc3c6e","eventlistenerUID":"bd026c66-0e12-48ee-8304-e8277a1c65fe","/triggers-eventid":"b2e5d2cf-dc2b-45b0-b474-c51e08dc3c6e"}
{"level":"debug","ts":"2023-08-18T20:16:28.882Z","logger":"eventlistener","caller":"sink/metrics.go:115","msg":"event listener request","commit":"2ec8bc6-dirty","status":"succeeded"}
{"level":"debug","ts":"2023-08-18T20:16:28.882Z","logger":"eventlistener","caller":"sink/metrics.go:100","msg":"event listener request completed","commit":"2ec8bc6-dirty","status":202,"duration":0.000258366}

BROKER OUTPUT:

warning: GOPATH set to GOROOT (/usr/local/go) has no effect
Response:
Context Attributes,
  specversion: 1.0
  type: dev.tekton.event.triggers.accepted.v1
  source: agogos
  subject: gallen.agogos accepted 017b097c-4f3c-4b26-b97a-4ee12d31bfc2
  id: 017b097c-4f3c-4b26-b97a-4ee12d31bfc2
  datacontenttype: application/json
Data,
  {
    "eventListener": "agogos",
    "namespace": "gallen",
    "eventListenerUID": "bd026c66-0e12-48ee-8304-e8277a1c65fe",
    "eventID": "017b097c-4f3c-4b26-b97a-4ee12d31bfc2"
  }

Got Event Response Context: Context Attributes,
  specversion: 1.0
  type: dev.tekton.event.triggers.accepted.v1
  source: agogos
  subject: gallen.agogos accepted 017b097c-4f3c-4b26-b97a-4ee12d31bfc2
  id: 017b097c-4f3c-4b26-b97a-4ee12d31bfc2
  datacontenttype: application/json

Got Response Data: &{Sequence:0 Message:}
----------------------------

BROKER LOG:

{"level":"debug","ts":"2023-08-18T20:18:50.629Z","caller":"logging/config.go:116","msg":"Successfully created the logger."}
{"level":"debug","ts":"2023-08-18T20:18:50.629Z","caller":"logging/config.go:117","msg":"Logging level set to: debug"}
{"level":"debug","ts":"2023-08-18T20:18:50.630Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:76","msg":"Flushing the existing exporter before setting up the new exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:18:50.632Z","logger":"eventlistener","caller":"metrics/prometheus_exporter.go:51","msg":"Created Prometheus exporter with config: &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}. Start the server for Prometheus exporter.","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:18:50.632Z","logger":"eventlistener","caller":"metrics/metrics_worker.go:91","msg":"Successfully updated the metrics exporter; old config: <nil>; new config &{tekton.dev/triggers eventlistener prometheus 5000000000 <nil>  false 9000 0.0.0.0}","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:18:50.633Z","logger":"eventlistener","caller":"adapter/adapter.go:88","msg":"Creating event broadcaster","commit":"2ec8bc6-dirty"}
{"level":"debug","ts":"2023-08-18T20:21:05.845Z","logger":"eventlistener","caller":"sink/sink.go:177","msg":"handling event with path /gallen/agogos, payload: {\"id\":0,\"message\":\"Hello world!\"} and header: map[Accept-Encoding:[gzip] Ce-Id:[9a3ea66f-d5f3-485d-8e9c-eed7e99a446b] Ce-Source:[https://github.com/cloudevents/sdk-go/v2/samples/requester] Ce-Specversion:[1.0] Ce-Time:[2023-08-18T20:21:05.842938993Z] Ce-Type:[com.cloudevents.sample.sent] Content-Length:[33] Content-Type:[application/json] User-Agent:[Go-http-client/1.1]]","commit":"2ec8bc6-dirty","eventlistener":"agogos","namespace":"gallen","/triggers-eventid":"017b097c-4f3c-4b26-b97a-4ee12d31bfc2","eventlistenerUID":"bd026c66-0e12-48ee-8304-e8277a1c65fe","/triggers-eventid":"017b097c-4f3c-4b26-b97a-4ee12d31bfc2"}
{"level":"debug","ts":"2023-08-18T20:21:05.845Z","logger":"eventlistener","caller":"sink/metrics.go:115","msg":"event listener request","commit":"2ec8bc6-dirty","status":"succeeded"}
{"level":"debug","ts":"2023-08-18T20:21:05.845Z","logger":"eventlistener","caller":"sink/metrics.go:100","msg":"event listener request completed","commit":"2ec8bc6-dirty","status":202,"duration":0.000326446}

from triggers.

khrm avatar khrm commented on July 17, 2024

No. I meant instead of broker sending request to EL, it should send request to CE-Test service. In the above, I can see log from EL.

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@khrm, yes. I replaced the regular EL with the CE-test image.

from triggers.

khrm avatar khrm commented on July 17, 2024

@ggallen But the logs that you shared are from EL.

Can you use the updated deployment file from here:
https://raw.githubusercontent.com/khrm/ce-test/master/deployment.yaml

I removed probes and updated the image with one with logs. Now, you it shouldn't crash.

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@khrm, but how is the broker supposed to be configured to send events to ce-test? With the EL, it subscribes to the broker. I don't see anything similar for ce-test.

from triggers.

khrm avatar khrm commented on July 17, 2024

@ggallen How did you subscribe EL to Broker?

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@khrm, it's subscribed via a knative trigger:

- apiVersion: eventing.knative.dev/v1
  kind: Trigger
  metadata:
    annotations:
      eventing.knative.dev/creator: minikube-user
      eventing.knative.dev/lastModifier: minikube-user
    creationTimestamp: "2023-08-21T19:41:51Z"
    generation: 1
    labels:
      eventing.knative.dev/broker: agogos
    name: agogos
    namespace: gallen
    resourceVersion: "1597"
    uid: bd5b59d5-0470-4013-9269-74ebef5d5cd8
  spec:
    broker: agogos
    filter: {}
    subscriber:
      uri: http://el-agogos.gallen.svc.cluster.local:8080

I suppose I can adjust the trigger to point to ce-test.

from triggers.

khrm avatar khrm commented on July 17, 2024

Yes. Point it to CE-Test service.

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@khrm, here's the result of sending the event to the broker:

[gallen@localhost client] (master)$ kc logs ce-test-5b7b87bfdb-gvwbr
2023/08/22 13:00:41 Starting CE-Test
2023/08/22 13:00:41 listening on :8080/
Got Event Context: Context Attributes,
  specversion: 1.0
  type: com.cloudevents.sample.sent
  source: https://github.com/cloudevents/sdk-go/v2/samples/requester
  id: 34998d02-2ffe-4ad8-81b4-b7b53575e6f3
  time: 2023-08-22T13:06:36.961598534Z
  datacontenttype: application/json
Extensions,
  knativearrivaltime: 2023-08-22T13:06:36.96521175Z

2023/08/22 13:06:36 Handling EventID: 34998d02-2ffe-4ad8-81b4-b7b53575e6f3
Got Data: &{Sequence:0 Message:Hello world!}
----------------------------
2023/08/22 13:06:36 Sending event with ID f00c4892-b72c-43b5-ab00-1e8bbf6bd96e
[gallen@localhost client] (master)$ 

from triggers.

khrm avatar khrm commented on July 17, 2024

So CE-Test doesn't go into the loop?
@ggallen

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@khrm, no, it does not.

from triggers.

khrm avatar khrm commented on July 17, 2024

OK. So issue seems to be with EL. Will come back to you later.

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@khrm, any update?

from triggers.

khrm avatar khrm commented on July 17, 2024

@ggallen I was busy with OpenShift Pipelines 1.12 work. I will update you on Wednesday.

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@khrm?

from triggers.

khrm avatar khrm commented on July 17, 2024

Sorry. I was busy OSP 1.12 release. Now I am looking into this.

from triggers.

khrm avatar khrm commented on July 17, 2024

@ggallen Can you try using mitmproxy to capture response and request for both scenarios?

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@khrm, why? Are you unable to reproduce this issue on your own with all the information I've already provided?

from triggers.

afrittoli avatar afrittoli commented on July 17, 2024

@ggallen @khrm I'm back from PTO and read through the thread.
@khrm would it be possible to use unit tests to recreate the response of the EL and of ce-test and compare them to understand what's going on?

from triggers.

khrm avatar khrm commented on July 17, 2024

@afrittoli I would try to create that.

@ggallen I was busy with OSP 1.12 work. That's why I said. I would look into it this when free.

from triggers.

khrm avatar khrm commented on July 17, 2024

Proxy doesn't reports any significant difference between these two responses. First is by ce-test and then EL.

    Host: 127.0.0.1:8081
    User-Agent: Go-http-client/1.1
    Content-Length: 33
    Ce-Id: 3497e715-50cf-47e6-b672-0ea660bc485a
    Ce-Source: https://github.com/cloudevents/sdk-go/v2/samples/requester
    Ce-Specversion: 1.0
    Ce-Time: 2023-09-05T20:09:08.924977346Z
    Ce-Type: com.cloudevents.sample.sent
    Content-Type: application/json
    Accept-Encoding: gzip

    {
        "id": 0,
        "message": "Hello world!"
    }

 << 200 OK 31b
    Ce-Id: b9c47abc-38f6-4d35-8d1d-f6a7e3e3ce22
    Ce-Source: /test
    Ce-Specversion: 1.0
    Ce-Subject: https://github.com/cloudevents/sdk-go/v2/samples/requester#0
    Ce-Time: 2023-09-05T20:09:08.93055233Z
    Ce-Type: samples.http.test
    Content-Length: 31
    Content-Type: application/json
    Date: Tue, 05 Sep 2023 20:09:08 GMT

    {
        "id": 0,
        "message": "test done!"
    }

[01:39:09.436][[::1]:36502] client disconnect
[01:39:09.439][[::1]:36502] server disconnect 127.0.0.1:8081

[01:39:53.028][[::1]:33736] client connect
[01:39:53.030][[::1]:33736] server connect 127.0.0.1:8081
[::1]:33736: POST http://127.0.0.1:8081/
    Host: 127.0.0.1:8081
    User-Agent: Go-http-client/1.1
    Content-Length: 33
    Ce-Id: 48185ed2-3591-4432-805a-aae4da8198fb
    Ce-Source: https://github.com/cloudevents/sdk-go/v2/samples/requester
    Ce-Specversion: 1.0
    Ce-Time: 2023-09-05T20:09:53.027663988Z
    Ce-Type: com.cloudevents.sample.sent
    Content-Type: application/json
    Accept-Encoding: gzip

    {
        "id": 0,
        "message": "Hello world!"
    }

 << 202 Accepted 157b
    Ce-Id: a9e3ee86-5d60-4940-aab1-c98fd0435e3c
    Ce-Source: listener
    Ce-Specversion: 1.0
    Ce-Subject: default.listener accepted a9e3ee86-5d60-4940-aab1-c98fd0435e3c
    Ce-Type: dev.tekton.event.triggers.accepted.v1
    Content-Length: 157
    Content-Type: application/json
    Date: Tue, 05 Sep 2023 20:09:53 GMT

    {
        "eventID": "a9e3ee86-5d60-4940-aab1-c98fd0435e3c",
        "eventListener": "listener",
        "eventListenerUID": "99f270e1-8c7d-4224-b90d-4dba28617399",
        "namespace": "default"
    }
```

from triggers.

khrm avatar khrm commented on July 17, 2024

@ggallen Can you share your setup with me where you added ce-test? And then EL. I can debug then.

from triggers.

afrittoli avatar afrittoli commented on July 17, 2024

I wonder if the 202 response code is what confuses the broker. Perhaps we could try to have the test app reply with 202 and check if that triggers the loop?

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@ggallen Can you share your setup with me where you added ce-test? And then EL. I can debug then.

@khrm, that was over two weeks ago! I no longer have that setup.

from triggers.

khrm avatar khrm commented on July 17, 2024

@ggallen Can you try installing from this branch? https://github.com/tektoncd/triggers/tree/f-ce-200

Source base install method that I shared with you earlier.

Here I have only changed the HTTP response code which is the only change I could find between ce-test and EL.

from triggers.

khrm avatar khrm commented on July 17, 2024

After this change,
ce-test and EventListener respectively:

[::1]:40490: POST http://127.0.0.1:8081/
    Host: 127.0.0.1:8081
    User-Agent: Go-http-client/1.1
    Content-Length: 33
    Ce-Id: be265890-fa18-482b-a79b-bbe2ef5107ed
    Ce-Source: https://github.com/cloudevents/sdk-go/v2/samples/requester
    Ce-Specversion: 1.0
    Ce-Time: 2023-09-06T15:19:01.089905807Z
    Ce-Type: com.cloudevents.sample.sent
    Content-Type: application/json
    Accept-Encoding: gzip

    {
        "id": 0,
        "message": "Hello world!"
    }

 << 200 OK 31b
    Ce-Id: d3a75085-ec79-4c2c-b602-f5dbf99ff956
    Ce-Source: /test
    Ce-Specversion: 1.0
    Ce-Subject: https://github.com/cloudevents/sdk-go/v2/samples/requester#0
    Ce-Time: 2023-09-06T15:19:01.094062725Z
    Ce-Type: samples.http.test
    Content-Length: 31
    Content-Type: application/json
    Date: Wed, 06 Sep 2023 15:19:01 GMT

    {
        "id": 0,
        "message": "test done!"
    }

[20:49:01.597][[::1]:40490] client disconnect
[20:49:01.598][[::1]:40490] server disconnect 127.0.0.1:8081
[20:49:14.859][[::1]:38932] client connect
[20:49:14.860][[::1]:38932] server connect 127.0.0.1:8081
[::1]:38932: POST http://127.0.0.1:8081/
    Host: 127.0.0.1:8081
    User-Agent: Go-http-client/1.1
    Content-Length: 33
    Ce-Id: d6b2eb54-7335-4f0d-8a4f-a05d70093d47
    Ce-Source: https://github.com/cloudevents/sdk-go/v2/samples/requester
    Ce-Specversion: 1.0
    Ce-Time: 2023-09-06T15:19:14.858389027Z
    Ce-Type: com.cloudevents.sample.sent
    Content-Type: application/json
    Accept-Encoding: gzip

    {
        "id": 0,
        "message": "Hello world!"
    }

 << 200 OK 157b
    Ce-Id: c386e8b7-b485-414b-b559-8fb8b66b86ff
    Ce-Source: listener
    Ce-Specversion: 1.0
    Ce-Subject: default.listener accepted c386e8b7-b485-414b-b559-8fb8b66b86ff
    Ce-Type: dev.tekton.event.triggers.accepted.v1
    Content-Length: 157
    Content-Type: application/json
    Date: Wed, 06 Sep 2023 15:19:14 GMT

    {
        "eventID": "c386e8b7-b485-414b-b559-8fb8b66b86ff",
        "eventListener": "listener",
        "eventListenerUID": "99f270e1-8c7d-4224-b90d-4dba28617399",
        "namespace": "default"
    }

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@khrm, what exactly are you asking me to do? Does this branch have a fix?

from triggers.

afrittoli avatar afrittoli commented on July 17, 2024

@khrm, what exactly are you asking me to do? Does this branch have a fix?

This branch changes the response code from 202 to 200, which is the only difference in behaviour that we could observe between the test app and the event listener. At this stage, it's still a bit unclear to me whether the issue lies in the event listener or in the broker implementation, so it's difficult to build a reproducer test.

from triggers.

ggallen avatar ggallen commented on July 17, 2024

But I thought I already provided enough information so that the problem could be reproduced easily. Is that not the case?

from triggers.

khrm avatar khrm commented on July 17, 2024

@ggallen Unless we set up the broker, we can't debug this easily. Can you try using the above branch?

from triggers.

khrm avatar khrm commented on July 17, 2024

Or you can use the below release yaml:

kubectl apply -f 
https://gist.githubusercontent.com/khrm/5c38684e501e223ed252d0e56389324a/raw/2f3aa9f8a0fe5ca6c9b95452890a6e2e9ab1c44a/gistfile0.txt

kubectl apply -f https://storage.googleapis.com/tekton-releases/triggers/previous/v0.25.0/interceptors.yaml

It's based on the above branch.

from triggers.

khrm avatar khrm commented on July 17, 2024

@ggallen If the issue is the status code, that branch release yaml might fix it. That's the only difference between ce-test and EventListener. Can you please test it?

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@khrm, I applied your patches to my development environment and ran an Agogos build as I did when I initially found this problem. I see thousands of messages in the log still, notably ones like this:

{
  "level": "debug",
  "ts": "2023-09-11T20:08:22.402Z",
  "logger": "eventlistener",
  "caller": "sink/sink.go:177",
  "msg": "handling event with path /, payload: {\"eventListener\":\"agogos\",\"namespace\":\"gallen\",\"eventListenerUID\":\"a12a5403-ec16-4e45-b75a-dfa2b0f21e29\",\"eventID\":\"5ec30608-68e0-4561-9ef2-99b777bc5810\"} and header: map[Accept-Encoding:[gzip] Ce-Id:[5ec30608-68e0-4561-9ef2-99b777bc5810] Ce-Knativearrivaltime:[2023-09-11T20:08:22.314804512Z] Ce-Source:[agogos] Ce-Specversion:[1.0] Ce-Subject:[gallen.agogos accepted 5ec30608-68e0-4561-9ef2-99b777bc5810] Ce-Type:[dev.tekton.event.triggers.accepted.v1] Content-Length:[154] Content-Type:[application/json] Prefer:[reply] Traceparent:[00-2062e4977b673a86a10eb89c2df2b2ae-35bc5eb289dfdbce-00] User-Agent:[Go-http-client/1.1]]",
  "commit": "668863c",
  "eventlistener": "agogos",
  "namespace": "gallen",
  "/triggers-eventid": "b131f417-2fb1-4cf0-94e1-56dd21ba8160",
  "eventlistenerUID": "a12a5403-ec16-4e45-b75a-dfa2b0f21e29",
  "/triggers-eventid": "b131f417-2fb1-4cf0-94e1-56dd21ba8160"
}

I think the fact that I am still seeing messages of type dev.tekton.event.triggers.accepted.v1. I would not expect to see any of those if this problem was fixed.

from triggers.

khrm avatar khrm commented on July 17, 2024

I would publish a branch cloud events sdk. Wait.

from triggers.

khrm avatar khrm commented on July 17, 2024

@ggallen This doesn't look like the issue with EL.

I tested with ce-test and it has looping too. The only thing is that the CE-test gives 400 responses after a while and this breaks the loop.

from triggers.

khrm avatar khrm commented on July 17, 2024

A workaround for this:

kc edit configmap config-features -n knative-eventing

Enable the new filtering in the configmap.

  new-trigger-filters: enabled

In the Trigger, I put the following:

apiVersion: eventing.knative.dev/v1
kind: Trigger
metadata:
  annotations:
    eventing.knative.dev/creator: minikube-user
    eventing.knative.dev/lastModifier: minikube-user
  labels:
    eventing.knative.dev/broker: agogos
  name: agogos
  namespace: tests
  resourceVersion: "26811"
  uid: 408f1b62-876f-42c4-bd5b-19e1afa0e015
spec:
  broker: agogos
  filter: {}
  filters:
  - not:
      exact:
        type: dev.tekton.event.triggers.accepted.v1
  subscriber:
    uri: http://el-agogos.tests.svc.cluster.local:8080

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@khrm, are you saying there's not a problem here somewhere?

While I have no doubt your workaround will work, it's just masking the problem. Somewhere, something still has to process all those messages to filter out what isn't wanted. And whatever it is is still going to get thousands of messages it shouldn't get.

Actually, you're cutting alot of that off with the filter. But it's still masking what I think is a real problem, and will just happen with the next EL.

from triggers.

khrm avatar khrm commented on July 17, 2024

@ggallen There won't be thousand of messages if it's filter first time.

Even without EventLisener, if I don't use filter, it start looping.

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@khrm,

There won't be thousand of messages if it's filter first time.

Agreed, but it will still mean I'm going to get double the number of messages. Every time an accept message is sent, I get a message back that has to be filtered.

And since I'm using CloudEvents, I have to put this workaround on every EL. Any users of our software also have to also put it on any ELs that they may create.

It may not be an EL problem, but it still seems like a problem. It seems like it is actually a serious problem, regardless of where the problem lies. Don't you agree?

from triggers.

khrm avatar khrm commented on July 17, 2024

@khrm,

Agreed, but it will still mean I'm going to get double the number of messages. Every time an accept message is sent, I get a message back that has to be filtered.

Yes. That's true but that's how cloudevents work.

You send an event and you get an event back as response. Now you filter that response.

@afrittoli What could be the solution for this? Should we not send back the event for Cloudevents request? Or have a flag or annotation somewhere to disable sending back cloudevent as response?

from triggers.

ggallen avatar ggallen commented on July 17, 2024

That's true but that's how cloudevents work.

I can't believe this is how it's supposed to work. At least it's not how any other messaging system I have worked with works. Sending an ack should not result in an ack of the ack.

You send an event and you get an event back as a response. Now you filter that response.

If this is, in fact, true, even for an ack, then the EL should automatically filter all of those messages.

from triggers.

khrm avatar khrm commented on July 17, 2024

@ggallen It's not an ack only. It also contains information regarding what was event id generated during the processing.

	body := Response{
		EventListener:    r.EventListenerName,
		EventListenerUID: elUID,
		Namespace:        r.EventListenerNamespace,
		EventID:          eventID,
	}

from triggers.

khrm avatar khrm commented on July 17, 2024

@ggallen Can you use cloud event extension? A header with something like NO_ACK? Then, EventListener won't send back a response. Just a basic accepted response.
https://github.com/cloudevents/spec/blob/main/cloudevents/primer.md#cloudevent-extension-attributes

@afrittoli Is this the correct way to go about this?

from triggers.

khrm avatar khrm commented on July 17, 2024

It would be like a cloudevent extension.

from triggers.

khrm avatar khrm commented on July 17, 2024

So after looking into cloud event spec and SDK.
You can SetExtension.

And EventListener can GetExtension

Based on the extension value, EventListener can determine whether to send an event or not. We can name the extension as ack. If it's false, we won't send a cloudevent as a response. If it's true or not present, we will send an event.

from triggers.

gvarsamis avatar gvarsamis commented on July 17, 2024

@afrittoli Can you please confirm whether the extension route is the way to go here as the correct pattern to use and that the EL is intentionally sending 2 ACKS. One for the message and another ACK for the ACK.
We are also a bit concerned with using an extension as according to the primer, they could get pulled at any time and this will create technical debt for us and we'd have to make changes to our codebase if this happens, so we'd rather adopt the correct usage pattern than use an extension if they are not the correct pattern.

from triggers.

afrittoli avatar afrittoli commented on July 17, 2024

@khrm @gvarsamis Reading through the thread, my impression is that the event listener is behaving according to specification (perhaps with the exception of the status code 202 instead of 200).

The broker and even listener communicate through the CloudEvents HTTP Binding - which means communication follows a request/response pattern. The CloudEvent binding requires the HTTP client to produce a response that is valid according to the CloudEvents protocol - that means, if a payload is provided in the response, it must be a valid CloudEvent.

In the past, the event listener use to respond with a JSON payload - not in CloudEvents format - which caused issues with the broker - see #1439 for more context. The fix for that has been to format the previous JSON payload as a CloudEvent.

At this stage my perception is that the issue is no the broker side - it should not reply to an HTTP response with a new request. Once that HTTP request is sent to the event listener, the event listener replies to it as it should with an HTTP response. From an event listener point of view there is no connection between the first message from the broker and the second one.

What we could do is to reach out to the knative/cloudevents communities to clarify this behaviour with them.
In the meanwhile I see two options:

  • apply the filtering logic that @khrm suggested on the broker side
  • add an option in the event listener to respond to CloudEvents with an empty payload - but we would need to verify if that actually helps

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@afrittoli, thanks for the information. But do you agree that this is not expected behavior? There is a bug here somewhere? I agree it must the the broker if the EL is responding as it should. So it seems the issue has been narrowed to the broker.

I would like to get the bug fixed as it seems like this could be an issue for anyone using CloudEvents.

We are currently using the in-memory channel. Would changing to Apache Kafka potentially solve this problem? Or is it likely more of a problem with the broker and not the channel?

I'm OK with the workaround, but would really like to get a solution. The problem with the workaround is that it will work for our one Trigger, but any user of our system who creates their own Trigger would not know to create this filter.

@khrm , could you package up the small test case, create an issue for the broker, and see what they say? Please include me on it so I can track its progress.

from triggers.

khrm avatar khrm commented on July 17, 2024

Sure. I would create a test case to reproduce this without EL.

from triggers.

ggallen avatar ggallen commented on July 17, 2024

@khrm, did you create an issue for this yet?

from triggers.

khrm avatar khrm commented on July 17, 2024

@ggallen No. I didn't get time. We released Operator this week and are having f2f.

from triggers.

khrm avatar khrm commented on July 17, 2024

@ggallen I have created the issue here: knative/eventing#7356

from triggers.

khrm avatar khrm commented on July 17, 2024

@ggallen I think you can contact people from the Red Hat OpenShift Serverless team regarding Knative Eventing.

Or some other upstream maintainers.

from triggers.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.