Giter Site home page Giter Site logo

Comments (5)

wu-sheng avatar wu-sheng commented on June 29, 2024

I think this is hard to tell. The span size depends the tags and logs, you collected (tagged by users, do not mean collected by tracing system). In my tracing system(commercial edition), the span size is changing at different scenario, even trace the same service. Same story in trace size.

I think the major problem is how much cost will you take, when you are doing tracing things. In mine, 5%-15% CPU cost is the dead-line, so we did everything we can to make sure this happened.

from specification.

lookfwd avatar lookfwd commented on June 29, 2024

@wu-sheng - interesting metric. 15% meaning 15% overhead on the application it traces? i.e. if the app uses 60% of 8 CPUs, app + tracing uses 70%?

from specification.

wu-sheng avatar wu-sheng commented on June 29, 2024

@lookfwd Yes, that is my story. :) You should choose yours, based on your oss-demands.

from specification.

bhs avatar bhs commented on June 29, 2024

@lookfwd sorry for the delay, I missed this the first time around :-/

There is huge variation on this front. I have seen plenty of "real" production environments with relatively low data volumes... e.g., public companies that generate on the order of 5-10MB/sec of tracing data globally and without sampling. I have also seen plenty of equally "real" production environments that generate vast amounts of tracing data. E.g., google recently cited (publicly) the fact that, globally, they generate 10s of billions of requests per second. At hundreds of bytes per span, that's well over 1TB/sec of trace data. Yikes.

I usually assume that a Span takes up 100-500 bytes when all is said and done, but that makes lots of assumptions.

from specification.

lookfwd avatar lookfwd commented on June 29, 2024

Interesting. Makes perfect sense! For me there might be something like a taxonomy related to a) capabilities of tracing systems b) cost of implementation c) latency and d) approach in terms of technologies one can use with implementation.

taxonomies

The bottom layer is the basic stuff i.e. a few bytes with ids and timestamps going from one service to another. Those have to be stored reliably and real-time. You don't want to lose those data in case of a crash since they will certainly help you understand what happened. This means that you likely have to put those on some efficient IPC (socket, pipe, shared memory) ASAP. You can use those data to reconstruct a basic diagram of the trace.

The middle layer has significantly more data. It includes operation names, get/post, arguments etc. some stuff that can be used to aid basic debugging and interesting aggregations in terms of latencies. Those data don't really need to be real time. They might need to be sampled since they might be lots. Those can be flushed e.g. every few seconds to a central system via sockets or files. If they're lost in case of a crash, it's bad, but not that bad, since likely, you can recover all the data that lead to the crash from other nodes.

The top layer might have tons of application-specific data that can be used along with information from all the other layers to do complex predictive analytics, enable automation and very detailed debugging. This is similar to logging, just in a way that can be put back together hierarchically and form a full trace. For this layer one highly likely needs a "big data" batch processing system to index and analyse those data. Data might be left into individual servers and be recalled on-demand. This layer could be implemented on top of something like Kibana or Splunk. Will almost definitely use files. Again, data loss isn't huge problem for this layer.

Any thoughts?

from specification.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.