Giter Site home page Giter Site logo

Comments (4)

Rperry2174 avatar Rperry2174 commented on June 8, 2024 7

Hello everyone,

first of all awesome project -- really impressive how much you've built in such a short time!

I'm one of the maintainers at Pyroscope, and we're excited about the possibility of integrating with Memray to provide continuous profiling capabilities. We've done some preliminary work on this front, but we would appreciate community input and expertise to ensure a seamless and valuable integration.

For context, here is the Pyroscope Server API Reference. Our API is designed to ingest multiple profiling formats, such as collapsed, pprof, and even JFR for JVM-based applications. In particular, we are interested in understanding how we could leverage Memray's capabilities to export memory profiles in a format that our API can ingest.

A couple starting questions:

  1. Export Format: Does Memray have the capability to export memory profiles in any of the formats we support, like collapsed or pprof?
  2. Metric Association: Could Memray associate relevant metrics with the memory profiles before export? This would enable more insightful data visualization and analysis on our end.
  3. Automation: Is it possible to automate the process so that Memray profiles can be periodically sent to Pyroscope for long-term monitoring?

from memray.

Rperry2174 avatar Rperry2174 commented on June 8, 2024 1

I was definitely thinking of this as a potential "always-on" type of thing, but I didn't realize that the overhead was so high as to prevent that being realistic in an "always-on" kind of way.

given that, I'm curious how it's used in practice at Bloomberg (or from other community members): how does one cope with an overhead where the use of this is net positive?

If you think it would be useful for people using memray to have a database of memory profiling snapshots (i.e. for calculating diffs between labels or time periods) we could still explore a way to still send those to Pyroscope to be labeled, queried, etc., but definitely would be different workflow from the typical continuous profiling we usually support

from memray.

godlygeek avatar godlygeek commented on June 8, 2024 1

Sorry, I somehow missed your reply here!

I was definitely thinking of this as a potential "always-on" type of thing, but I didn't realize that the overhead was so high as to prevent that being realistic

It's not "unrealistic" to do always-on profiling per se, but it can be expensive. With the least aggressive (and least informative) set of options, macrobenchmarks tend to show us slowing the process so that it takes ~45% longer to complete. Now, if a user is willing to pay that cost constantly in order to gather insights into their memory usage patterns, I suppose that's an option available to them, but it isn't necessarily one that I'd recommend.

given that, I'm curious how it's used in practice at Bloomberg (or from other community members): how does one cope with an overhead where the use of this is net positive?

The more typical usage pattern that we anticipate is a problem-focused one. We anticipate users discovering that their memory usage is higher than they want it to be, and using Memray to dive in to perform root cause analysis, in a similar way to how they might use another debugger like gdb or pdb when searching for a different sort of bug.

If you think it would be useful for people using memray to have a database of memory profiling snapshots

I don't think there's a reasonable way to gather that - at least, not one that's meaningfully different from "always-on" tracking. With CPU profiling, you can meaningfully probe the process occasionally and gather a snapshot to see what it's currently doing based on what frames are on the stack at the time when you interrupted the process, and you can perform statistical analyses across many snapshots to figure out what frames tend to be on the stack, and therefore what the program tends to be spending its time doing. You can't really do the same sort of thing with memory profiling, though. You can interrupt a process and ask "what functions are running right now", but you can't ask it "what call stacks allocated the memory that's currently on the heap" - at least, not unless you've been tracking that information continuously and have it available. Unless you've been tracking information about those past events, you can't provide it when the process gets interrupted.

There are ways to do some sorts of statistical analyses of allocations - imagine that instead of recording every allocation occurs, we only record information about the allocation that led to every 1000th byte being allocated, or something like that. But those trade off accuracy for performance (some memory growth patterns might lead to some allocations consistently being missed), and it's not clear that they would be much faster - they'd reduce the amount of IO that we need to do, but we'd need to continue paying the cost of tracking the Python stack continuously, and that's one of our highest costs (at least for the default, non-aggressive set of options).

from memray.

godlygeek avatar godlygeek commented on June 8, 2024

1. Export Format: Does Memray have the capability to export memory profiles in any of the formats we support, like collapsed or pprof?

As of right now, no, but this might not be too tough to add. We do support exporting the set of allocation records comprising the process's heap memory high water mark in the JSON format used by gprof2dot, as well as in a simple(ish) CSV format. Though unfortunately that latter does require packing a full stack trace into a single CSV column, which might be surprising. Those exports are both done here. If the goal is to export the allocation records representing a particular point in time to pyroscope - specifically, either the heap high water mark, or the point immediately before tracking stopped when all outstanding allocations are known to be leaks - that'd be easy enough to do, I would expect.

2. Metric Association: Could Memray associate relevant metrics with the memory profiles before export? This would enable more insightful data visualization and analysis on our end.

I'm not sure - can you give some examples of metrics that would make sense to attach to a memory profile? We have some data beyond just the allocations, but not a ton - we know how the heap usage changed over time, how the RSS changed over time, the pid, command line arguments, Memray options... A handful of other things. Are those the sorts of things that you're imagining, or something else?

3. Automation: Is it possible to automate the process so that Memray profiles can be periodically sent to Pyroscope for long-term monitoring?

If you mean continuously sending data while a tracked process is running, possibly - we do something like that for our live monitoring TUI. Though the data being sent over the socket there is quite raw, and would require a fair amount of preprocessing in order for it to be forwarded on to another process. It's possible in principle, at least.

But memory profiling of the sort that Memray does has a fair amount of overhead - it makes memory-intensive programs take up to 18x as long to run with our most intensive tracking options, and even the less intensive defaults can cause runtime to jump to 1.75x. Memory profiling is expensive enough that I wouldn't necessarily want to encourage people to leave this always-on.

If you're not imagining an always-on integration, though, can you explain what you're picturing in more detail?

from memray.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.