open-telemetry / opentelemetry-go-instrumentation Goto Github PK
View Code? Open in Web Editor NEWOpenTelemetry Auto Instrumentation using eBPF
Home Page: https://opentelemetry.io
License: Apache License 2.0
OpenTelemetry Auto Instrumentation using eBPF
Home Page: https://opentelemetry.io
License: Apache License 2.0
the otel-go-instrumentation not works as expected in ubuntu 22.04 local virtual machine from vmware.
Steps to reproduce the behavior:
follow the docs (https://github.com/open-telemetry/opentelemetry-go-instrumentation/tree/main/docs/getting-started)
kubectl apply -f emojivoto-instrumented.yaml -n emojivoto
the otel-go-instrumentation not works as expected.
enter kind's docker and
crictl ps -a
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
c6ad06edfb1fc 0fc3761f3f87d 3 minutes ago Exited emojivoto-emoji-instrumentation 25 b41b8e14bc802 emoji-5c59d4f5f9-8g6nq
431bdddf06d93 0fc3761f3f87d 3 minutes ago Exited emojivoto-web-instrumentation 25 2bbe9a5bffdbb web-7d9f79f7d7-gwbtt
39b22e4c6f6cf 0fc3761f3f87d 3 minutes ago Exited emojivoto-voting-instrumentation 25 603e0f2e4ac92 voting-574dff6f47-wkd5l
and search first instrumentation's log:
crictl logs c6ad06edfb1fc
{"level":"info","ts":1683201490.9597812,"caller":"cli/main.go:37","msg":"starting Go OpenTelemetry Agent ..."}
{"level":"info","ts":1683201490.959896,"caller":"opentelemetry/controller.go:119","msg":"Establishing connection to OTLP receiver ..."}
{"level":"info","ts":1683201492.962331,"caller":"process/discover.go:55","msg":"found process","pid":19}
{"level":"info","ts":1683201492.963499,"caller":"ptrace/ptrace_linux.go:121","msg":"attach successfully","tid":19}
{"level":"info","ts":1683201492.9638128,"caller":"ptrace/ptrace_linux.go:121","msg":"attach successfully","tid":20}
{"level":"info","ts":1683201492.964077,"caller":"ptrace/ptrace_linux.go:121","msg":"attach successfully","tid":21}
{"level":"info","ts":1683201492.9644198,"caller":"ptrace/ptrace_linux.go:121","msg":"attach successfully","tid":22}
{"level":"info","ts":1683201492.9648137,"caller":"ptrace/ptrace_linux.go:121","msg":"attach successfully","tid":23}
{"level":"info","ts":1683201492.9651833,"caller":"ptrace/ptrace_linux.go:121","msg":"attach successfully","tid":24}
{"level":"info","ts":1683201492.9655643,"caller":"ptrace/ptrace_linux.go:121","msg":"attach successfully","tid":25}
{"level":"info","ts":1683201492.9658918,"caller":"ptrace/ptrace_linux.go:121","msg":"attach successfully","tid":44}
{"level":"info","ts":1683201492.9662254,"caller":"ptrace/ptrace_linux.go:121","msg":"attach successfully","tid":448}
{"level":"info","ts":1683201492.982999,"caller":"process/analyze.go:94","msg":"Detaching from process","pid":19}
{"level":"info","ts":1683201492.9832432,"caller":"process/analyze.go:139","msg":"mmaped remote memory","start_addr":"7F7175000000","end_addr":"7F7176800000"}
{"level":"info","ts":1683201492.9944782,"caller":"process/analyze.go:168","msg":"found relevant function for instrumentation","function":"net/http.HandlerFunc.ServeHTTP","start":2870464,"returns":[2870541]}
{"level":"info","ts":1683201492.9962378,"caller":"process/analyze.go:168","msg":"found relevant function for instrumentation","function":"google.golang.org/grpc/internal/transport.(*http2Client).createHeaderFields","start":4254816,"returns":[4260087,4266761,4266816]}
{"level":"info","ts":1683201492.9970098,"caller":"process/analyze.go:168","msg":"found relevant function for instrumentation","function":"google.golang.org/grpc/internal/transport.(*http2Server).operateHeaders","start":4296672,"returns":[4297106,4298064,4298299,4298532,4299573,4301293]}
{"level":"info","ts":1683201492.9972656,"caller":"process/analyze.go:168","msg":"found relevant function for instrumentation","function":"google.golang.org/grpc.(*ClientConn).Invoke","start":4758080,"returns":[4758386,4758556]}
{"level":"info","ts":1683201492.9977455,"caller":"process/analyze.go:168","msg":"found relevant function for instrumentation","function":"google.golang.org/grpc.(*Server).handleStream","start":4857024,"returns":[4857689,4859097,4859906,4860203,4860308]}
{"level":"info","ts":1683201492.9978814,"caller":"cli/main.go:79","msg":"target process analysis completed","pid":19,"go_version":"1.15.0","dependencies":{"contrib.go.opencensus.io/exporter/ocagent":"v0.6.0","github.com/beorn7/perks":"v1.0.1","github.com/census-instrumentation/opencensus-proto":"v0.2.1","github.com/cespare/xxhash/v2":"v2.1.1","github.com/golang/groupcache":"v0.0.0-20200121045136-8c9f03a8e57e","github.com/golang/protobuf":"v1.4.0","github.com/grpc-ecosystem/go-grpc-prometheus":"v1.2.0","github.com/grpc-ecosystem/grpc-gateway":"v1.14.4","github.com/matttproud/golang_protobuf_extensions":"v1.0.1","github.com/prometheus/client_golang":"v1.6.0","github.com/prometheus/client_model":"v0.2.0","github.com/prometheus/common":"v0.9.1","github.com/prometheus/procfs":"v0.0.11","go.opencensus.io":"v0.22.3","golang.org/x/net":"v0.0.0-20200425230154-ff2c4b7c35a0","golang.org/x/sync":"v0.0.0-20200317015054-43a5402ce75a","golang.org/x/sys":"v0.0.0-20200430082407-1f5687305801","golang.org/x/text":"v0.3.2","google.golang.org/api":"v0.22.0","google.golang.org/genproto":"v0.0.0-20200430143042-b979b6f78d84","google.golang.org/grpc":"v1.29.1","google.golang.org/protobuf":"v1.21.0"},"total_functions_found":5}
{"level":"info","ts":1683201492.9980018,"caller":"cli/main.go:85","msg":"invoking instrumentors"}
{"level":"info","ts":1683201493.012342,"logger":"allocator","caller":"allocator/allocator_linux.go:43","msg":"Loading allocator","start_addr":140125270966272,"end_addr":140125296132096}
{"level":"info","ts":1683201493.0141346,"caller":"instrumentors/runner.go:85","msg":"loading instrumentor","name":"net/http"}
{"level":"info","ts":1683201493.0148523,"caller":"inject/injector.go:91","msg":"Injecting variables","vars":{"ctx_ptr_pos":232,"is_registers_abi":false,"method_ptr_pos":0,"path_ptr_pos":56,"url_ptr_pos":16}}
{"level":"info","ts":1683201493.037925,"caller":"instrumentors/runner.go:85","msg":"loading instrumentor","name":"google.golang.org/grpc"}
{"level":"info","ts":1683201493.0421581,"caller":"inject/injector.go:91","msg":"Injecting variables","vars":{"clientconn_target_ptr_pos":24,"end_addr":140125296132096,"is_registers_abi":false,"start_addr":140125270966272,"total_cpus":2}}
{"level":"error","ts":1683201493.0505607,"caller":"instrumentors/runner.go:88","msg":"error while loading instrumentors, cleaning up","name":"google.golang.org/grpc","error":"field UprobeHttp2ClientCreateHeaderFields: program uprobe_Http2Client_CreateHeaderFields: load program: permission denied: 983: (bf) r2 = r0 ; R0=scalar(id=53) R2_w=scalar(id=53): 984: (67) r2 <<= 32 ; R2_w=scalar(smax=9223372032559808512,umax=18446744069414584320,var_off=(0x0; 0xffffffff00000000),s32_min=0,s3 (truncated, 1045 line(s) omitted)","stacktrace":"go.opentelemetry.io/auto/pkg/instrumentors.(*Manager).load\n\t/app/pkg/instrumentors/runner.go:88\ngo.opentelemetry.io/auto/pkg/instrumentors.(*Manager).Run\n\t/app/pkg/instrumentors/runner.go:36\nmain.main\n\t/app/cli/main.go:86\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
{"level":"info","ts":1683201493.050718,"caller":"server/probe.go:242","msg":"closing gRPC server instrumentor"}
{"level":"info","ts":1683201493.0507877,"caller":"server/probe.go:217","msg":"closing net/http instrumentor"}
{"level":"info","ts":1683201493.0802684,"caller":"grpc/probe.go:241","msg":"closing gRPC instrumentor"}
{"level":"error","ts":1683201493.0803745,"caller":"cli/main.go:88","msg":"error while running instrumentors","error":"field UprobeHttp2ClientCreateHeaderFields: program uprobe_Http2Client_CreateHeaderFields: load program: permission denied: 983: (bf) r2 = r0 ; R0=scalar(id=53) R2_w=scalar(id=53): 984: (67) r2 <<= 32 ; R2_w=scalar(smax=9223372032559808512,umax=18446744069414584320,var_off=(0x0; 0xffffffff00000000),s32_min=0,s3 (truncated, 1045 line(s) omitted)","stacktrace":"main.main\n\t/app/cli/main.go:88\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
everything works as expected
emojivoto-emoji-instrumentation:
Container ID: containerd://d90fdecaaaf9d70b0dc8b5e798a2894325d99094394eea0766e82cb519de746d
Image: otel-go-instrumentation
Image ID: docker.io/library/import-2023-05-04@sha256:e1e0a2ede7b8d3edfc5eb3735ba942d48ad60a5366f484a9ef42b38ac3fb36da
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 04 May 2023 13:13:27 +0100
Finished: Thu, 04 May 2023 13:13:30 +0100
Ready: False
Restart Count: 28
Environment:
OTEL_GO_AUTO_TARGET_EXE: /usr/local/bin/emojivoto-emoji-svc
OTEL_EXPORTER_OTLP_ENDPOINT: http://jaeger:4317
OTEL_SERVICE_NAME: emojivoto-emoji
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d4rfb (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-d4rfb:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 1s (x548 over 119m) kubelet Back-off restarting failed container
Review hardcoded logsize and whether we should allow users to adjust via configuration.
Originally posted by @MikeGoldsmith in #128 (review)
Is your feature request related to a problem? Please describe.
At the moment we have a make target for generating and checking the LICESNES folder, but no automated process. This means things could get out of sync.
Describe the solution you'd like
Add a CI step the runs make verify-licenses
. This could be its own workflow or could be added to the existing build workflow.
For some reason I cant access:
https://github.com/orgs/open-telemetry/teams/go-instrumentation-approvers
https://github.com/orgs/open-telemetry/teams/go-instrumentaiton-maintainers
getting 404
_OTHER
http request method?error.type
?network.protocol.name
if not http
url.full
We document targeted min kernel version for GA.
It needs to support bpf_ketime_get_boot_ns_calls
.
Originally posted by @dineshg13 in #22 (review)
Currently we have two HTTP based probes (net/http and gorilla) and a third to be added in #100. The probes are almost identical apart from the registration name and could be unified. The instrumentors would then deploy the unified probe instead of a custom version. Each probe is based on the net/http ServeHTTP(w ResponseWriter, r *Request)
interface.
This would make probe maintenance easier by reducing code and would allow improvements to be applied generically instead. eg HTTP response parsing.
TODOs
Is your feature request related to a problem? Please describe.
As mentioned in #46
I added sample apps for gorillamux and net/http. If someone else wants to add a gRPC app, I think the first 2 serve as a good example. New libraries should be able to drop in similarly.
Describe the solution you'd like
A sample app for gRPC instrumentation with similar behavior to the gorillamux and nethttp example apps, with a test to verify output.
Change OTEL_TARGET_EXE
to OTEL_GO_AUTO_TARGET_EXE
to follow https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/configuration/sdk-environment-variables.md#language-specific-environment-variables:
To ensure consistent naming across projects, this specification recommends that language specific environment variables are formed using the following convention:
OTEL_{LANGUAGE}_{FEATURE}
We adopt the same pattern in OTel .NET Auto
Running otel-go-instrumentation
on a simple binary produces the following failure:
{"level":"info","ts":1668622016.422307,"caller":"cli/main.go:37","msg":"starting Go OpenTelemetry Agent ..."}
{"level":"info","ts":1668622016.422393,"caller":"opentelemetry/controller.go:107","msg":"Establishing connection to OpenTelemetry collector ..."}
{"level":"info","ts":1668622018.4382257,"caller":"process/discover.go:57","msg":"found process","pid":157218}
panic: cant find keyval map
goroutine 1 [running]:
github.com/open-telemetry/opentelemetry-go-instrumentation/pkg/process.(*processAnalyzer).findKeyvalMmap(0xb48c60?, 0xc0001328c0?)
/usr/local/google/home/mikedame/go/src/github.com/open-telemetry/opentelemetry-go-instrumentation/pkg/process/analyze.go:92 +0x14f
github.com/open-telemetry/opentelemetry-go-instrumentation/pkg/process.(*processAnalyzer).Analyze(0xc000482120?, 0x26622, 0x2?)
/usr/local/google/home/mikedame/go/src/github.com/open-telemetry/opentelemetry-go-instrumentation/pkg/process/analyze.go:118 +0x1a9
main.main()
/usr/local/google/home/mikedame/go/src/github.com/open-telemetry/opentelemetry-go-instrumentation/cli/main.go:74 +0x37f
golang:1.18
docker container)I am building the following code (from here:
package main
import (
"fmt"
"log"
"net/http"
"github.com/gorilla/mux"
)
func main() {
r := mux.NewRouter()
r.HandleFunc("/books/{title}/page/{page}", func(w http.ResponseWriter, r *http.Request) {
vars := mux.Vars(r)
title := vars["title"]
page := vars["page"]
fmt.Fprintf(w, "You've requested the book: %s on page %s\n", title, page)
})
log.Fatal(http.ListenAndServe(":8080", r))
}
Save, run go mod init
and go tidy
then build with go build -o gm main.go
.
Run with ./gm &
Start a local OTEL collector at localhost:4317
Build the auto-instrumentation tool with make build
Run auto-instrumentation with the following command:
OTEL_SERVICE_NAME=otel-collector \
OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4317 \
OTEL_TARGET_EXE=/path/to/gm ./otel-go-instrumentation
Auto-instrumentation to work, or provide clearer information about what went wrong and how I can fix it.
I'm still figuring out ebpf, but from the code the error comes from
. I got enough from there to check/proc/<id>/maps
and it looks like none of the maps have the full read/write/execute permissions that auto-instrumentation needs:
$ cat /proc/157218/maps
00400000-00623000 r-xp 00000000 fe:00 8129336 /path/to/gm
00623000-00823000 r--p 00223000 fe:00 8129336 /path/to/gm
00823000-00861000 rw-p 00423000 fe:00 8129336 /path/to/gm
00861000-0089a000 rw-p 00000000 00:00 0
01b66000-01b87000 rw-p 00000000 00:00 0 [heap]
c000000000-c000400000 rw-p 00000000 00:00 0
c000400000-c004000000 ---p 00000000 00:00 0
7fe4ac000000-7fe4ac021000 rw-p 00000000 00:00 0
7fe4ac021000-7fe4b0000000 ---p 00000000 00:00 0
7fe4b0000000-7fe4b0021000 rw-p 00000000 00:00 0
7fe4b0021000-7fe4b4000000 ---p 00000000 00:00 0
7fe4b4000000-7fe4b4021000 rw-p 00000000 00:00 0
7fe4b4021000-7fe4b8000000 ---p 00000000 00:00 0
7fe4b8000000-7fe4b8021000 rw-p 00000000 00:00 0
7fe4b8021000-7fe4bc000000 ---p 00000000 00:00 0
7fe4bc000000-7fe4bc021000 rw-p 00000000 00:00 0
7fe4bc021000-7fe4c0000000 ---p 00000000 00:00 0
7fe4c02ea000-7fe4c02eb000 ---p 00000000 00:00 0
7fe4c02eb000-7fe4c0aeb000 rw-p 00000000 00:00 0
7fe4c0aeb000-7fe4c0aec000 ---p 00000000 00:00 0
7fe4c0aec000-7fe4c12ec000 rw-p 00000000 00:00 0
7fe4c12ec000-7fe4c12ed000 ---p 00000000 00:00 0
7fe4c12ed000-7fe4c1aed000 rw-p 00000000 00:00 0
7fe4c1aed000-7fe4c1aee000 ---p 00000000 00:00 0
7fe4c1aee000-7fe4c22ee000 rw-p 00000000 00:00 0
7fe4c22ee000-7fe4c22ef000 ---p 00000000 00:00 0
7fe4c22ef000-7fe4c4e00000 rw-p 00000000 00:00 0
7fe4c4e00000-7fe4d4f80000 ---p 00000000 00:00 0
7fe4d4f80000-7fe4d4f81000 rw-p 00000000 00:00 0
7fe4d4f81000-7fe4e6e30000 ---p 00000000 00:00 0
7fe4e6e30000-7fe4e6e31000 rw-p 00000000 00:00 0
7fe4e6e31000-7fe4e9206000 ---p 00000000 00:00 0
7fe4e9206000-7fe4e9207000 rw-p 00000000 00:00 0
7fe4e9207000-7fe4e9600000 ---p 00000000 00:00 0
7fe4e9600000-7fe4e9628000 r--p 00000000 fe:00 2359372 /usr/lib/x86_64-linux-gnu/libc.so.6
7fe4e9628000-7fe4e9798000 r-xp 00028000 fe:00 2359372 /usr/lib/x86_64-linux-gnu/libc.so.6
7fe4e9798000-7fe4e97f0000 r--p 00198000 fe:00 2359372 /usr/lib/x86_64-linux-gnu/libc.so.6
7fe4e97f0000-7fe4e97f4000 r--p 001f0000 fe:00 2359372 /usr/lib/x86_64-linux-gnu/libc.so.6
7fe4e97f4000-7fe4e97f6000 rw-p 001f4000 fe:00 2359372 /usr/lib/x86_64-linux-gnu/libc.so.6
7fe4e97f6000-7fe4e9803000 rw-p 00000000 00:00 0
7fe4e9810000-7fe4e9860000 rw-p 00000000 00:00 0
7fe4e9860000-7fe4e98e0000 ---p 00000000 00:00 0
7fe4e98e0000-7fe4e98e1000 rw-p 00000000 00:00 0
7fe4e98e1000-7fe4e9960000 ---p 00000000 00:00 0
7fe4e9960000-7fe4e99c3000 rw-p 00000000 00:00 0
7fe4e99c3000-7fe4e99c4000 r--p 00000000 fe:00 2366791 /usr/lib/x86_64-linux-gnu/libpthread.so.0
7fe4e99c4000-7fe4e99c5000 r-xp 00001000 fe:00 2366791 /usr/lib/x86_64-linux-gnu/libpthread.so.0
7fe4e99c5000-7fe4e99c6000 r--p 00002000 fe:00 2366791 /usr/lib/x86_64-linux-gnu/libpthread.so.0
7fe4e99c6000-7fe4e99c7000 r--p 00002000 fe:00 2366791 /usr/lib/x86_64-linux-gnu/libpthread.so.0
7fe4e99c7000-7fe4e99c8000 rw-p 00003000 fe:00 2366791 /usr/lib/x86_64-linux-gnu/libpthread.so.0
7fe4e99cf000-7fe4e99e1000 rw-p 00000000 00:00 0
7fe4e99e1000-7fe4e99e3000 r--p 00000000 fe:00 2359341 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7fe4e99e3000-7fe4e9a09000 r-xp 00002000 fe:00 2359341 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7fe4e9a09000-7fe4e9a14000 r--p 00028000 fe:00 2359341 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7fe4e9a14000-7fe4e9a16000 r--p 00033000 fe:00 2359341 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7fe4e9a16000-7fe4e9a18000 rw-p 00035000 fe:00 2359341 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7fff80864000-7fff80886000 rw-p 00000000 00:00 0 [stack]
7fff809be000-7fff809c2000 r--p 00000000 00:00 0 [vvar]
7fff809c2000-7fff809c4000 r-xp 00000000 00:00 0 [vdso]
Not sure what to do with this or how to fix it
I receive a context deadline exceeded
error when the agent is configured to send to a TLS OTLP endpoint.
Linux 5.19.0-1023-aws #24~22.04.1-Ubuntu SMP Wed Mar 29 15:23:31 UTC 2023 x86_64 GNU/Linux
go version go1.18.1 linux/amd64
Steps to reproduce the behavior:
make build
# sudo isn't needed to repro, but to rule out a permissions issue
sudo \
OTEL_TARGET_EXE=/usr/bin/ls \
OTEL_SERVICE_NAME=context-deadline-sadness \
OTEL_EXPORTER_OTLP_ENDPOINT=api.honeycomb.io:443 \
./otel-go-instrumentation
{
"level": "info",
"ts": 1681932098.3001456,
"caller": "cli/main.go:37",
"msg": "starting Go OpenTelemetry Agent ..."
}
{
"level": "info",
"ts": 1681932098.3007176,
"caller": "opentelemetry/controller.go:107",
"msg": "Establishing connection to OpenTelemetry collector ..."
}
{
"level": "error",
"ts": 1681932108.3095894,
"caller": "opentelemetry/controller.go:113",
"msg": "unable to connect to OpenTelemetry collector",
"addr": "api.honeycomb.io:443",
"error": "context deadline exceeded",
"stacktrace": "github.com/open-telemetry/opentelemetry-go-instrumentation/pkg/opentelemetry.NewController\n\t/home/ubuntu/opentelemetry-go-instrumentation/pkg/opentelemetry/controller.go:113\nmain.main\n\t/home/ubuntu/opentelemetry-go-instrumentation/cli/main.go:45\nruntime.main\n\t/usr/lib/go-1.18/src/runtime/proc.go:250"
}
{
"level": "error",
"ts": 1681932108.3097224,
"caller": "cli/main.go:47",
"msg": "unable to create OpenTelemetry controller",
"error": "context deadline exceeded",
"stacktrace": "main.main\n\t/home/ubuntu/opentelemetry-go-instrumentation/cli/main.go:47\nruntime.main\n\t/usr/lib/go-1.18/src/runtime/proc.go:250"
}
I expect a secure connection to be established. I can make it happen by changing the current hard-coded insecure gRPC config to a hard-coded TLS config:
- conn, err := grpc.DialContext(timeoutContext, endpoint, grpc.WithInsecure(), grpc.WithBlock())
+ conn, err := grpc.DialContext(timeoutContext, endpoint, grpc.WithTransportCredentials(credentials.NewTLS(&tls.Config{})), grpc.WithBlock())
I imagine the proper fix will be some shenanigans for supporting more of the OTEL_EXPORTER_OTLP_* environment variable config surface area.
Manual declaration of spans introduces errors and is boring actually.
I created the tool that adds spans to all exported functions automatically if they have a context parameter. It supports templates and it will not add spans if they were already declared before.
My proposal is to add this to documentation or integrate it somehow if it would be useful for end users.
I didn't find any alternatives.
Generating layers: https://mattermost.com/blog/opentracing-for-go-projects/
Assuming we have large enough number of goroutines, the code in alloc.h, write_target_data
can wrap around and overwrite an ongroing request data.
Namely the following code: https://github.com/open-telemetry/opentelemetry-go-instrumentation/blob/main/include/alloc.h#L77
When we reach the end of the allocated buffer, it simply begins at the start of the buffer for the current CPU. However, the data that's at the beginning of the per CPU segment can technically be in use. One simple example is a goroutine thread being preempted after the CreateHeaderFields
uprobe returns. After the preempted thread resumes, the data for its headers can be replaced with something else that's using the buffer.
No reproduction case. Making a testcase that fails should be possible, but it would be unreliable reproduction case.
Allocating new space from the shared memory segment should be safe.
I think a solution here could be a rudimentary reference counting.
Trace and span IDS are always set to 0.
Use https://github.com/keyval-dev/launcher to execute the latest version (v0.71.0) of the OpenTelemetry Collector Contrib
Launch the instrumentation tool with the following command:
OTEL_TARGET_EXE=/usr/bin/otelcol-contrib OTEL_EXPORTER_OTLP_ENDPOINT=0.0.0.0:4317 OTEL_SERVICE_NAME=collector ./otel-go-instrumentation
See error:
{"level":"info","ts":1676306465.655687,"caller":"grpc/probe.go:214","msg":"got spancontext","trace_id":"00000000000000000000000000000000","span_id":"0000000000000000"}
{"level":"info","ts":1676306465.6558187,"caller":"opentelemetry/controller.go:59","msg":"got event","attrs":[{"Key":"rpc.system","Value":{"Type":"STRING","Value":"grpc"}},{"Key":"rpc.service","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.ip","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.name","Value":{"Type":"STRING","Value":""}}]}
{"level":"info","ts":1676306467.4758658,"caller":"grpc/probe.go:214","msg":"got spancontext","trace_id":"00000000000000000000000000000000","span_id":"0000000000000000"}
{"level":"info","ts":1676306467.4759538,"caller":"opentelemetry/controller.go:59","msg":"got event","attrs":[{"Key":"rpc.system","Value":{"Type":"STRING","Value":"grpc"}},{"Key":"rpc.service","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.ip","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.name","Value":{"Type":"STRING","Value":""}}]}
{"level":"info","ts":1676306470.6999974,"caller":"grpc/probe.go:214","msg":"got spancontext","trace_id":"00000000000000000000000000000000","span_id":"0000000000000000"}
{"level":"info","ts":1676306470.7000897,"caller":"opentelemetry/controller.go:59","msg":"got event","attrs":[{"Key":"rpc.system","Value":{"Type":"STRING","Value":"grpc"}},{"Key":"rpc.service","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.ip","Value":{"Type":"STRING","Value":""}},{"Key":"net.peer.name","Value":{"Type":"STRING","Value":""}}]}
{"level":"info","ts":1676306472.459095,"caller":"grpc/probe.go:214","msg":"got spancontext","trace_id":"00000000000000000000000000000000","span_id":"0000000000000000"}
trace_id
and span_id
are always set to 0000000000000000
Originally posted by @MrAlias in #121 (comment)
The docs says
opentelemetry-go-instrumentation/README.md
Lines 22 to 27 in b29eef6
While I think that we are currently testing only on Go 1.20.
Side note: Of course, I may be wrong.
My proposal is update the docs to document the reality and create an issue to test against more versions of Go.
This project adds OpenTelemetry instrumentation to Go applications without having to modify their source code.
Instrumentation is done by using eBPF uprobes.
Automatic instrumentation is available for a wide range of Go applications: Go version 1.12 and above, in addition to supporting stripped binaries (compiled with go build -ldflags "-s -w"
)
Instrumented libraries follow the OpenTelemetry specification and semantic conventions to produce standard OpenTelemetry data.
GitHub repos:
https://github.com/keyval-dev/opentelemetry-go-instrumentation
https://github.com/keyval-dev/offsets-tracker (used by this project, can also be donated if needed)
A detailed technical explanation is documented here.
This project is a core part of keyval's product and production systems.
We are using this instrumentation successfully on many Go applications including multiple popular open-source Go projects.
Go is the most popular programming language in the CNCF landscape (see DevStats)
Providing automatic instrumentation for Go applications would allow projects without the resources needed for implementing manual instrumentation to easily adopt OpenTelemetry.
Currently, automatic instrumentation exists for Python, .NET, JavaScript, and Java. Our goal is to provide the same level of automatic instrumentation for Go applications. We hope this project will lead the way for automatic instrumentation for other compiled languages such as Rust and C++ which may also be implemented using eBPF.
Running eBPF programs requires elevated privileges. We believe that being part of the OpenTelemetry community would make users more comfortable using this project.
Keyval will continue to maintain this project and we welcome the opportunity to work with more contributors.
Multiple developers in the OpenTelemetry community (from Go SDK SIG, Operator SIG & eBPF SIG) and in the eBPF community have expressed interest in contributing to this project.
Our current roadmap contains the following tasks:
This project is licensed under the terms of the Apache 2.0 open source license.
None
Blocking
We have two formats for our docker images;
otel-instrumentation-go
autoinstrumentation-go
We should have one format and consolidate them to be consistent. It's useful to have the otel
prefix for locally built images but not required for published images because they have the repository prefix.
I didn’t see this before, but we use a different default name in the makefile than here. https://github.com/open-up/opentelemetry-go-instrumentation/blob/main/Makefile#L33
It would be nice to use the same name. Locally it’s good to have the otel prefix so it’s easy to find but not as important for published images as it gets prefix from the image store.
I don’t want to block this PR to bike shed on names so will create a follow-up issue to track our image naming.
Originally posted by @MikeGoldsmith in #152 (comment)
Currently set as 24mb.
Review to see if 24mb is a good value and whether we should have configuration options to adjust.
Originally posted by @MrAlias in #82 (comment)
We should have consistent, reproducible ways to build and test both in a development environment and in github workflows. Ideally we should have extensive Makefile targets that the workflows can call.
Our HTTP span names contain high-cardinality paths which is not specification compliant:
HTTP spans MUST follow the overall guidelines for span names. HTTP server span names SHOULD be
{http.request.method} {http.route}
if there is a (low-cardinality)http.route
available. HTTP server span names SHOULD be{http.method}
if there is no (low-cardinality)http.route
available. HTTP client spans have nohttp.route
attribute since client-side instrumentation is not generally aware of the "route", and therefore HTTP client spans SHOULD use{http.method}
. Instrumentation MUST NOT default to using URI path as span name, but MAY provide hooks to allow custom logic to override the default span name.
Originally posted by @MrAlias in #143 (review)
We should add CI with a sample app to check that changes (such as the generated offsets) are valid. I don't see an issue for that yet so making one
Originally posted by @damemi in #38 (review)
Based on my experience in https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation (where I am a maintainer) I find testing against a "test fake collector" makes the tests more maintainable:
httptest
server or by setting up a gRPC server in tests is faster than running a real collector [image]Here you can find an example implementation and usage of a "test fake collector" in Go: https://github.com/signalfx/splunk-otel-go/blob/main/distro/otel_test.go
Originally posted by @pellared in #46 (review)
CC @damemi
I think this empty span is a bug in the auto-instrumentation, but that should be a separate issue to track instead of fix in this PR
Originally posted by @damemi in #46 (comment)
This was discussed during SIG meeting. The traces.json output being used for the e2e test shows the currently incorrect behavior of empty span IDs being generated with auto-instrumentation. This test should convert span IDs to "xxxxx" as seen in the top span, and the extra span appears to be a duplicate with no information.
I think I've traced this back to this PR #34 - specifically, keeping both function names for net/http instrumentation. If I adjust the code to only use "net/http.HandlerFunc.ServeHTTP"
I don't have duplicated empty span IDs. Note I'm also working on some other changes that could also impact this so I might be off in the specific area causing this, but would be curious to try a test with this (the e2e tests were added after this change was made).
Currently this is done manually (i.e. #23), it should be automated.
Add support for otelgin instrumentation to be automatically configured.
Any idea what could be causing this error when starting the auto-instrumentation agent?
error: "field UprobeServerMuxServeHTTP: program uprobe_ServerMux_ServeHTTP: load program: invalid argument"
name: "net/http"
msg: "error while loading instrumentors, cleaning up"
ts: 1680187371.9764755
level: "error"
stacktrace: "github.com/open-telemetry/opentelemetry-go-instrumentation/pkg/instrumentors.(*instrumentorsManager).load
/app/pkg/instrumentors/runner.go:86
github.com/open-telemetry/opentelemetry-go-instrumentation/pkg/instrumentors.(*instrumentorsManager).Run
/app/pkg/instrumentors/runner.go:34
main.main
/app/cli/main.go:86
runtime.main
/usr/local/go/src/runtime/proc.go:250"
caller: "instrumentors/runner.go:86"
For reference, I'm hitting this when running in a container alongside my sample app using the fedora:37
base image. I don't get this error when running locally, so I'm wondering if this is a container permission error or something else.
This error comes after the agent has connected to the collector and found the target_exe. Based on printf debugging, it seems to be happening in this call to LoadAndAssign
I'm using an agent built off of #40 without the launcher
This is more of a question/hypothetical bug report, as I don't have an actual reproduction case and it would be very hard to make one.
If I understand correctly, the code at https://github.com/open-telemetry/opentelemetry-go-instrumentation/blob/main/include/go_types.h#L74 would allocate a new backing array for the Headers
slice, so that the span context can be added for the purpose of distributed tracing. This new memory that's 'allocated' comes from the injected shared memory space, which is managed inside the pinned bpf map.
Since this memory comes from a shared memory segment, it's not part of the Go GC memory arena, or at least I think this is the case. Initially I was surprised this works, I've done a lot of work on various JVM garbage collectors, but it seems that the Go GC is a lot more lenient.
I found this comment in mbarrier.go
:
// Another place where we intentionally omit memory barriers is when
// accessing mheap_.arena_used to check if a pointer points into the
// heap. On relaxed memory machines, it's possible for a mutator to
// extend the size of the heap by updating arena_used, allocate an
// object from this new region, and publish a pointer to that object,
// but for tracing running on another processor to observe the pointer
// but use the old value of arena_used. In this case, tracing will not
// mark the object, even though it's reachable. However, the mutator
// is guaranteed to execute a write barrier when it publishes the
// pointer, so it will take care of marking the object. A general
// consequence of this is that the garbage collector may cache the
// value of mheap_.arena_used. (See issue #9984.)
Based on this it seems that the GC will ignore this new array pointer, by design, since the new region may not be yet published. However the problem is the following statement in the same comment:
In this case, tracing will not mark the object, even though it's reachable
...
However, the mutator is guaranteed to execute a write barrier when it publishes the pointer, so it will take care of marking the object.
In this case, the mutator
is the BPF program and it will not execute the write barrier, so the slice pointer will not be correctly marked as grey and even if it was marked, the GC would refuse to trace the unknown pointer.
Essentially, we have this pointer map:
[Headers slice] -> [Headers array] ->
[Header String 1]
[Header String 2]
...
[Header String N]
So now if we replace the Headers Array
with another and do a 'shallow' copy, the GC will not be able to mark the Header Strings as live. If we were to do a GC (both mark and sweep) after this replacement of the array is made, the GC can reuse the memory of the Header strings.
In this situation, if the memory was reused by the GC, the Headers that will be sent to a downstream request can contain arbitrary memory from the source program.
This is very difficult to reproduce.
The Headers should correctly be marked by the GC and the sent request should have proper headers.
I'm not 100% sure I fully understand how this code interacts with the Go GC, so perhaps this is not an issue. If it is an issue, then I thought of few possible solutions, although none of them are great:
This project adds OpenTelemetry instrumentation (https://github.com/open-telemetry/opentelemetry-go) to Go applications by automatically modifying their source code in similar way as compiler. It can instrument any golang project. It depends only on standard libraries and is platform agnostic.
Current github repo:
https://github.com/SumoLogic-Labs/autotel
The project is developed and maintained by Sumo Logic
We are actively testing it with different Go applications and open source projects,
extending test coverage and eliminating gaps. We are planning to start beta tests with customers soon.
Project offers automatic instrumentation by modifying source code. It has similar goals as
https://github.com/keyval-dev/opentelemetry-go-instrumentation
#2
but tackles problem from different angle and has different tradeoffs:
We will continue to develop and maintain project. Our main goal is to expand awareness of existence of this project and
get feedback from community about needed features and usage scenarios. Another aspect is to get more
people involved contributing to this project. We are open to suggestions and ideas.
Our current roadmap is as follows:
This project is licensed under the terms of the Apache 2.0 open source license.
None
Not sure if that's right place or maybe better would be a https://github.com/open-telemetry/community cc @tigrannajaryan
As a vendor receiving telemetry from thousands of independent customer sources, Honeycomb has found that telemetry clients that include nuanced version information in their transmissions will dramatically shorten the time involved in troubleshooting data issues. In addition to a standard release version number, builds can include in their reported version whether the build occurred directly on the release commit or on an identifiable commit some number of changes away from a recent release commit and/or whether the build occurred with modifications to version-controlled files (repo "dirty" state).
A disadvantage of version.go
's hard-coded number being solely managed with multimod is that the version number set within a built executable does not communicate whether the build was on the commit the release tag is on (so Actual Release build or a Dev Build between releases) or whether the repository working copy was "ditry". This is one benefit of the common-but-fiddly technique of determining a version at build time based on the output of git describe and setting the version for the build through LDFLAGS.
Originally posted by @robbkidd in #94 (comment) and then revised in #94 (comment)
There have been two donation proposals for different forms of auto-instrumentation:
There is also an ongoing effort to build community support for the "Launcher" concept, a simplified way to start OTel SDKs via configuration and environment variables. @JamieDanielson @robbkidd @kentquirk @MikeGoldsmith
I have seeded the maintainers of this repository using @MadVikingGod @MrAlias @Aneurysm9 and myself. The goal is to build new maintainers and owners for this group eventually. Existing @open-telemetry/go-approvers and @open-telemetry/go-triagers are welcome but have not been added to this repo.
My goal in bringing all of these groups together is that we begin to standardize on the ways that users will configure SDKs and auto-instrumentation via packages in this repository. As I mentioned in #2, I think it would be great to see collaboration with the Go team in this area, too.
Please discuss, and thanks for the interest.
The go instrumentation agent currently only runs on amd64 architectures. We should also support arm64 architectures.
Initial support for memory allocations on arm was added in #40.
Running a sample gRPC app with go auto-instrumentation causes the app to crashloop, instrumentation fails with a permission error.
Steps to reproduce the behavior:
2023-04-14 19:40:06 {"level":"error","ts":1681515606.0069542,"caller":"cli/main.go:88","msg":"error while running instrumentors","error":"field UprobeHttp2ClientCreateHeaderFields: program uprobe_Http2Client_CreateHeaderFields: load program: permission denied: 979: (71) r7 = *(u8 *)(r7 +0): R0=inv(id=51) R1_w=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)) R2_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R3_w=inv(id=0,umax_value=16777215,var_off=(0x0; 0xffffff)) R4_w=invP3 R5_w=fp-204 R6_w=fp-381 R7_w=map_value(id=0,off=0,ks=4,vs=16,umax_value=15,var_off=(0x0; 0xf)) R8_w=map_value(id=0,off=0,ks=4,vs=16,umax_value=15,var_off=(0x0; 0xf)) R9=inv(id=0,umax_value=1099511627775,var_off=(0x0; 0xffffffffff)) R10=fp0 fp-48=???mmmmm fp-56=mmmmmmmm fp-64=mmmmmmmm fp-72=mmmmmmmm fp-80=mmmmmmmm fp-96=mmmmmmmm fp-104=mmmmmmmm fp- (truncated, 1249 line(s) omitted)","stacktrace":"main.main\n\t/app/cli/main.go:88\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
The app runs and is instrumented.
Context propagation does not work with http; a new trace id is created each request instead of a distributed trace.
Steps to reproduce the behavior:
See example app using this, where events show up but are not a distributed trace. I am using OTEL_PROPAGATORS
with values of "tracecontext,baggage"
.
The spans are connected in a distributed trace with the same trace id.
Currently the tests are executed only on the latest (1.20) Go version.
We should run on CI the tests against all supported versions of Go.
After publishing the docker images, we have 3 images available: linux,arm64
, linux/amd64
, unknown/unknown
.
We shouldn't have the unknown/unknown image and update workflow to not publish it again.
Is your feature request related to a problem? Please describe.
Automatically generate spans for calls to SQL databases
Describe the solution you'd like
Add a new instrumentor for datadbase/sql
under https://github.com/open-telemetry/opentelemetry-go-instrumentation/tree/main/pkg/instrumentors/bpf
Describe alternatives you've considered
Manual Instrumentation
Is your feature request related to a problem? Please describe.
I am looking for an official OpenTelemetry image to use instead of keyval/otel-go-agent:latest
but none exists yet.
Describe the solution you'd like
This repository publishes an image via Github/Dockerhub like other OpenTelemetry owned images.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
I feel a Community-owned image is required for open-telemetry/opentelemetry-operator#908
Is your feature request related to a problem? Please describe.
Ensure that every PR is able to build both docker image and binary on ubuntu
Describe the solution you'd like
You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Work".
Providing the copyright attribution and license notice is NOT optional if you want to distribute the code.
Add license and copywrite notices together to the "distribution".
The "distribution" means OTel Go AutoInstrumentation release package such us (but not limited to):
Personally, I suggest adding notices to LICENSE
. NOTICE
file may be needed if the dependency license requires it.
The LICENSE
must be distributed together with the release artifacts. This is the way OTel .NET AutoInstrumentation handled it. Notable hyperlinks:
zip
archive)We need to double-check if the container images have following labels:
org.opencontainers.image.source
org.opencontainers.image.revision
org.opencontainers.image.licenses
These labels are the standard (see here) and with them the user is able to find all the notices if needed.
AFAIK the docker/metadata-action@v4
should add the labels mentined above.
Even if we decide to publish only the container image we still should add notices to the LICENSE
file.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.