Giter Site home page Giter Site logo

Comments (36)

tigrannajaryan avatar tigrannajaryan commented on July 30, 2024 6

Overall I think this approach is less hacky but may be more error-prune (from the user perspective). If the TC believes this is a better way, I'm also good with this implementation.

It may also be less error prone in certain situations. context.Context is the standard propagation mechanism. The goroutine id-based proposal in a sense is an alternate way to propagate execution context which only works if the execution stays in the same goroutine. However, in many cases in Go code the receiving goroutine is different from the processing or sending one. context.Context correctly works in such cases but goroutine-id doesn't.

My personal opinion is that context.Context-based approach is preferable. Perhaps, in addition to that the goroutine id tracking can be optionally enabled as an experimental feature.

from opentelemetry-go-instrumentation.

MrAlias avatar MrAlias commented on July 30, 2024 2

Maintenance plan
Keyval will continue to maintain this project and we welcome the opportunity to work with more contributors.
Multiple developers in the OpenTelemetry community (from Go SDK SIG, Operator SIG & eBPF SIG) and in the eBPF community have expressed interest in contributing to this project.

I guess we should follow the same process that other donations went through.

@open-telemetry/technical-committee correct me if I'm wrong, but I'm not sure there is a precedence. It looked like other donations were handed off to existing SIGs.

I imagined this living in its own repository, like the Java auto-instrumentation. Is that the intention?

If so, we might also need a distinct maintainer group from the opentelemetry-go maintainers. I wouldn't want to assume we all have the bandwidth to be a part of this. I would be happy to be a maintainer for this, but would want to ensure others would as well.

Similarly for Approvers, we would want to build an initial group that might differ from the opentelemetry-go group.

from opentelemetry-go-instrumentation.

edeNFed avatar edeNFed commented on July 30, 2024 2

@jmacd Yes. I agree to take ownership over the proposed code and become a maintainer in the new repository.

Thank you for taking the time to review this donation proposal.

from opentelemetry-go-instrumentation.

edeNFed avatar edeNFed commented on July 30, 2024 2

Hi @dineshg13 I totally agree (I mentioned the offset tracker repo in the original donation proposal). Not sure if this is needed as a first step but ultimately I also think this should be owned by the community.

from opentelemetry-go-instrumentation.

edeNFed avatar edeNFed commented on July 30, 2024 1

Hi @tigrannajaryan, thank you for the quick response.

  1. We have been using this instrumentation for about 6 months. Overall, I estimate we tested it on about 30 different projects.
  2. This project requires an operating system that supports eBPF - which basically means Linux (maybe windows soon). We currently test it only on x86 but there should be no technical limitation to running it on other platforms.
  3. Each instrumentation declares the instrumented functions at the FuncNames() function. We dynamically detect the available functions in the target binary and invoke only the relevant instrumentations. Currently, the following functions are instrumented:
  • google.golang.org/grpc.(*Server).handleStream
  • github.com/gorilla/mux.(*Router).ServeHTTP
  • google.golang.org/grpc.(*ClientConn).Invoke
  • net/http.(*ServeMux).ServeHTTP
  1. As this instrumentation runs as a separate process from the target application, there is no startup overhead. We are currently working on performing performance testing so I don't have exact numbers yet. Theoretically, the performance overhead would be doing another context switch every time uprobe is hit. This should be neglectable compared to operations such as HTTP requests or DB queries. However, as I said, we are still validating this.
  2. In case of failure, we detach all the attached uprobes and exit. The target application is not affected even if the instrumentation process itself crashes.
  3. Currently, offsets are generated periodically. However, this is something that is definitely on our roadmap and we want to do very soon.
  4. Currently unsupported version would not be instrumented. We are debating about trying to guess the offset (if $current_version > $max_tracked_version use $max_tracked version, if $current_version < $min_tracked_version use $min_tracked_version or something like that).
  5. panics are not handled currently. I guess we can detect unfinished spans and terminate them after some timeout.
  6. The registers are selected at runtime (see https://github.com/keyval-dev/opentelemetry-go-instrumentation/blob/master/pkg/instrumentors/bpf/headers/arguments.h) If I am not mistaken, supporting other platforms like ARM should be a matter of implementing a version of get_argument_by_reg with the order of the registers matching to ARM.

Let me know if I can help with anything else 😄

from opentelemetry-go-instrumentation.

RichiH avatar RichiH commented on July 30, 2024 1

Context: I stumbled over this issue following up on some notes from OpenTelemetry Day in Austin; we had a well-attended eBPF breakout session there. See https://photos.app.goo.gl/EboFNpkGz41EE6ve6 for some notes from that session.

@tigrannajaryan the FAQ discourages the use of goroutine IDs by developers while designing and writing code. It lists good arguments there. It does not seem to argue against using them for automated extraction of information at runtime.

Carrying a small lookup table for well-known offsets as linked to by @edeNFed seems to be a reasonable and lightweight trade-off. The one constraint it introduces is that the autoinstrumentation needs to be at least as current as the Go version used for compilation, which seems to be another reasonable trade-off.

from opentelemetry-go-instrumentation.

edeNFed avatar edeNFed commented on July 30, 2024 1

Every implementation of the Instrumentor interface declares which target functions are instrumented. You can find all the implementations here. The automatic instrumentation looks for the specified function in the target binary file if the function does not exists (for example a Go app that does not use gRPC) than the instrumentor is not loaded.

from opentelemetry-go-instrumentation.

pellared avatar pellared commented on July 30, 2024 1

.NET auto-instrumentation has it in a similar way: separate SIG + teams from the .NET SDK.

from opentelemetry-go-instrumentation.

jmacd avatar jmacd commented on July 30, 2024 1

@edeNFed OpenTelemetry Community guidelines for donations require a timely response, I'm very sorry we have not been able to assemble one until now.

The community reviewed this proposal of yours and @pdelewski's #3. We see both of these as technically viable solutions to the auto-instrumentation problem. OpenTelemetry endorses work on auto-instrumentation as a sub-group of each language SIG, and it appears that we have the making of a new OpenTelemetry-Go auto-instrumentation group in these two donation proposals.

We like both of these proposals, and we recognize that methods for auto-instrumentation are only part of the larger-problem that includes all aspects of automatic SDK setup and program initialization. From what we can tell, there is not a hard boundary between automatically configuring the Golang SDK and automatically configuring Golang instrumentation. We recommend the creation of an opentelemetry-go-instrumentation GH repository, that will be initially populated with current opentelemetry-go approvers and maintainers. The goal should be to establish a connected but separate SIG and to plan and develop common frameworks for Golang auto-instrumentation. We expect new members to join this group with expertise in source-level auto-instrumentation and eBPF-level auto-instrumentation; eventually we expect these repositories to have separate ownership, the way Java and .NET languages have done already in OpenTelemetry.

We also expect members of the OpenTelemetry-Go vendor community looking to streamline and automate SDK setup for vendor-specific configuration (or, e.g., those looking for open-telemetry/opentelemetry-specification#1773 for Go), to work together in this new repository.

Finally, we look forward to members of the auto-instrumentation group to work with the OpenTelemetry-Go maintainers and Go developers to bridge gaps between built-in diagnostic features and telemetry SDK support. See, for example,

@edeNFed are you willing to take ownership of the code that you propose donating and eventually become a maintainer of the (proposed) new opentelemetry-go-instrumentation repository?

from opentelemetry-go-instrumentation.

MikeGoldsmith avatar MikeGoldsmith commented on July 30, 2024 1

@jmacd @edeNFed We're definitely open to helping on this project alongside the work to simplify SDK config / setup for the main SDK 👍🏻

cc @JamieDanielson @robbkidd @kentquirk

from opentelemetry-go-instrumentation.

jmacd avatar jmacd commented on July 30, 2024 1

@edeNFed I created https://github.com/open-telemetry/opentelemetry-go-instrumentation. One of the first tasks for you and @pdelewski will be to become members of the OpenTelemetry org and begin sending and reviewing PRs in this repository. I copied the current Go maintainers in as maintainers of this project temporarily; please consider me your sponsor, I will be glad to help review the first steps in this repository as we work to establish you as new a new maintainer.

from opentelemetry-go-instrumentation.

tigrannajaryan avatar tigrannajaryan commented on July 30, 2024

@edeNFed thank you for the proposal. I believe this can be a very valuable addition to OpenTelemetry.

I had a quick glance at the project and it looks very promising. I am going to add this to today's TC meeting agenda (good timing!). A few quick questions to help us get started:

  • What's the history of the project? I see the first commit is a few months ago. How much real-world production usage has the instrumentation seen?

  • Does the solution work on all platforms targeted by Go compiler or only on some subset? (e.g. x86-32/64, arm, etc).

  • Which functions are instrumented? Do you locate and instrument specific functions from specific libraries? If yes, is there a list of functions instrumented somewhere?

  • What is the typical runtime overhead? I assume there is a startup cost to inject the probes and there is an ongoing cost when the probes are hit. Do you have any perf measurements done one some typical binary?

  • What are the failure modes? If the probes do something wrong can they crash the instrumented application?

A few more questions (quotes from the "How it works" doc):

Notice that one of our design goals is to support stripped Go binaries - meaning binaries that do not contain debug information. In order to support stripped binaries and to create a stable instrumentation, we created a library called offsets-tracker. This library tracks the offset of different fields across versions.

Do you regenerate the offsets every time a new version of a supported library is released?

What happens if the version of library used is unsupported? Does instrumentation detect it and bail out?

We overcome this issue by analyzing the target binary and detecting all the return statements in the instrumented functions. We then place a uprobe at the end of each return statement. This uprobe invokes the eBPF code that collects the end timestamp.

How reliable is this detection? Is it possible to miss a return statement? What if for example the function exits by panic and not by return? If missed does the span remain forever unfinished?

Are these analysis dependent on the platform the machine code is in? Does it work on x86-64 only or other platforms too?

Since version 1.17 and above, Go changed the way it passes arguments to functions. Prior to version 1.17, Go placed arguments in the stack in the order they were defined in the function signature. Version 1.17 and above uses the machine registers to pass arguments.
We overcome this by analyzing the target binary and detecting the compiled Go version.

I assume this is dependent not only on the compiler version but also the target platform, so each specific combination may require a different analyzing code. That sounds like quite a lot of targets to support and keep maintaining. What has been your experience with this? How cumbersome it is?

from opentelemetry-go-instrumentation.

tigrannajaryan avatar tigrannajaryan commented on July 30, 2024

@edeNFed thanks a lot for the answers, they are very helpful.

The TC has discussed the donation. We believe that it indeed can be a very valuable addition to OpenTelemetry. We have 2 primary concerns:

  1. How is context propagation going to work?
  2. How will auto instrumentation work with manual instrumentation?

Both of these are on your roadmap already, which is great. It would be very useful if you could provide more details about what you think the solutions will look like. Are there any design documents on these topics?

We do not necessarily require that donations are complete and done before we accept them. However we do need certain level of confidence that important OpenTelemetry functionality is implementable, even if some of the implementation work happens after the donation.

The TC believes the above 2 topics are important for an OpenTelemetry instrumentation. We would like to learn a bit more about how you intend to solve these before we are ready to move forward.

from opentelemetry-go-instrumentation.

edeNFed avatar edeNFed commented on July 30, 2024

Makes sense.
I hope to finish those documents by next week.

from opentelemetry-go-instrumentation.

edeNFed avatar edeNFed commented on July 30, 2024

Hi @tigrannajaryan,
Sorry for the delay, here are the two requested design documents:

I believe those documents should give a good idea regarding how we are going to implement both features.
Let me know if you have any other questions

from opentelemetry-go-instrumentation.

tigrannajaryan avatar tigrannajaryan commented on July 30, 2024

Thanks @edeNFed

I will add this to the next week's TC meeting agenda to review the proposal with the new design documents again.

from opentelemetry-go-instrumentation.

tigrannajaryan avatar tigrannajaryan commented on July 30, 2024

I had a quick look at the Context Propagation proposal. The proposal uses goroutine ids to associate incoming and outgoing requests. AFAIK there is no official way to obtain goroutine ids, in fact this is actively discouraged by Go language documentation. I know there are some hacky ways to obtain the goroutine id but I am not aware of any production quality implementations to get the goroutine id.

IMO this may be a deal breaker. I wonder what @open-telemetry/go-approvers, @open-telemetry/collector-approvers, @open-telemetry/collector-contrib-approvers think.

from opentelemetry-go-instrumentation.

edeNFed avatar edeNFed commented on July 30, 2024

Yuri had a similar opinion, reposting what I answered on the other issue:

Regarding depending on goroutine ids: The main objection that I saw is that Go's internal structures like runtime.G can greatly change between Go versions and therefore it is hard to assume which offset the goid field will have in the G struct. This is solved by automatically tracking the goroutine id field in every version of Go. In the very unlikely case that the goid field will be removed (Go's scheduler also need some way to track goroutines) the instrumentation will fail at compile time. We then will probably write a different logic based on some other unique identifier for goroutines.

from opentelemetry-go-instrumentation.

edeNFed avatar edeNFed commented on July 30, 2024

There is another approach to avoid tracking goroutine ids (suggested by @tedsuo):
Storing the spancontext in the context.Context argument of the instrumented function instead of in an eBPF map.
The downsides to this approach are:

  • We lose the ability to create spans for functions without context argument. For instance, currently, we are able to create spans for database/sql driver invocations such as result := db.Query("SELECT * FROM table").
  • If the user won't pass the context argument correctly between functions the trace will break.

Overall I think this approach is less hacky but may be more error-prune (from the user perspective). If the TC believes this is a better way, I'm also good with this implementation.

from opentelemetry-go-instrumentation.

edeNFed avatar edeNFed commented on July 30, 2024

👍🏻 Agree. Please let me know if there are more pending questions.

from opentelemetry-go-instrumentation.

jsuereth avatar jsuereth commented on July 30, 2024

Regrading manual + automatic instrumentation coordination, can you update with what you'd do for Metrics (and soon, Logs) APIs?

I assume it'd be similar but would be good to write down.

from opentelemetry-go-instrumentation.

edeNFed avatar edeNFed commented on July 30, 2024

@jsuereth sure here are my initial thoughts about coordination with metrics/logs manual instrumentation:

Metrics:
I think that when talking about combining manual and automatic instrumentation for metrics, the main benefit is to automatically add exemplars for manually created metrics. It looks like exemplars support for Go SDK is not implemented yet.
I see that all the Metrics APIs functions get context.Context as an argument so I think the same approach of modifying it to contain the current span context should work also for metrics. I don't see a use case of the same metric being incremented by both manual and automatic instrumentations but if you feel like this is also something that needs to be supported, I think we can also achieve this by attaching uprobes to the Metrics SDK (very similar to the approach used for traces).

Logs:
logs are pretty early, but I think that we could add traceId and spanId to any logging library that gets context.Context as an argument.

from opentelemetry-go-instrumentation.

jsuereth avatar jsuereth commented on July 30, 2024

Understand the point. My main concerns are as follows:

  1. Autoinstrumentation + Manual instrumentation can co-exist
  2. Users know where to configure exporters/SDK features. For Autoinstrumentation, can we do so ONLY in the autoinstrumentation and have it hijack any manual?
  3. Will all features work in a way users can't tell the difference between Auto + Manual instrumentation.

It sounds like yes, but for (2) wantedto make sure, e.g. Auto + manual instrumetnation use the same exporters/buffering/etc.

from opentelemetry-go-instrumentation.

edeNFed avatar edeNFed commented on July 30, 2024

Yes this is the plan. I will add it to the roadmap and will extend the design documents to also describe metrics and logs.

from opentelemetry-go-instrumentation.

MrAlias avatar MrAlias commented on July 30, 2024

+1 on not using the goroutine ID as the source of operation identity.

The important part of tracing instrumentation is the ability to connect spans into a unified trace that represents the operation workflow.

I'm guessing there is a plan to introspect a context to determine if there is an active span present that can act as a parent, but what happens when there is not? There is a good chance that many orphaned spans will be created if every function that accepts a context is encapsulated in a span, but there is an origination issue if you never start a span because of the absence of a parent. How do you plan to originate traces?

@dashpole I know has put some thought into this. I would value his take if he has time to provide it 😄.

from opentelemetry-go-instrumentation.

edeNFed avatar edeNFed commented on July 30, 2024

if every function that accepts a context is encapsulated in a span

This is not how it works.
For automatic instrumentation, spans are created only for a specific set of instrumented functions (gRPC calls, HTTP, DB, etc).
For manual instrumentation, everything is the same, the user has to explicitly create spans by invoking OTel API
and the manually created spans will be placed correctly in a trace with the automatically created spans.

Orphans are also not a problem, If there is no active span in the context a new span will be created. Basically, it means that the root span of the trace will be the first instrumented function that got invoked in the request.

from opentelemetry-go-instrumentation.

MrAlias avatar MrAlias commented on July 30, 2024

spans are created only for a specific set of instrumented functions (gRPC calls, HTTP, DB, etc).

Gotcha 👍

Do we have a list of functions this will support? Also, how will library support be expanded?

I would be hesitant to add gRPC support, at least initially, as it is not included in the standard library. But it brings up the question about how 3rd party libraries will be supported and how will we separate them (or not?).

from opentelemetry-go-instrumentation.

MrAlias avatar MrAlias commented on July 30, 2024

Every implementation of the Instrumentor interface declares which target functions are instrumented. You can find all the implementations here. The automatic instrumentation looks for the specified function in the target binary file if the function does not exists (for example a Go app that does not use gRPC) than the instrumentor is not loaded.

Gotcha, thanks.

From a community support perspective, how are we planing to maintain this? Who will comprise the maintainer roles and what groups plan to support the development?

from opentelemetry-go-instrumentation.

edeNFed avatar edeNFed commented on July 30, 2024

Maintenance plan

Keyval will continue to maintain this project and we welcome the opportunity to work with more contributors.
Multiple developers in the OpenTelemetry community (from Go SDK SIG, Operator SIG & eBPF SIG) and in the eBPF community have expressed interest in contributing to this project.

I guess we should follow the same process that other donations went through.

from opentelemetry-go-instrumentation.

MrAlias avatar MrAlias commented on July 30, 2024

There is another approach to avoid tracking goroutine ids (suggested by @tedsuo): Storing the spancontext in the context.Context argument of the instrumented function instead of in an eBPF map. The downsides to this approach are:

Is it possible for for an eBPF to modify the argument to a function? I'm not the most familiar with the technology, but I thought the eBPF virtual machine has read-only access to the syscall parameters and return value.

from opentelemetry-go-instrumentation.

edeNFed avatar edeNFed commented on July 30, 2024

Yes. Please look at the Context propagation design doc I added a proof of concept for that in the document.

from opentelemetry-go-instrumentation.

jmacd avatar jmacd commented on July 30, 2024

I've tranfered this issue. We can close this when the donation has been merged. Thanks, @edeNFed

from opentelemetry-go-instrumentation.

edeNFed avatar edeNFed commented on July 30, 2024

Opened #4

from opentelemetry-go-instrumentation.

dineshg13 avatar dineshg13 commented on July 30, 2024

@edeNFed The go-instrumentation uses https://github.com/keyval-dev/offsets-tracker to generate offset_results.json, I believe that is the critical input to the code. What are your thoughts , on how this work once the code is merged ?
Ideally , i believe offsets-tracker should be part of open-telemetry , but would be like to know your thoughts .

from opentelemetry-go-instrumentation.

dineshg13 avatar dineshg13 commented on July 30, 2024

Thanks @edeNFed , i don't think it is necessary as a first step. We can add it to our roadmap

from opentelemetry-go-instrumentation.

dineshg13 avatar dineshg13 commented on July 30, 2024

Closing issue as PR is merged.

from opentelemetry-go-instrumentation.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.