Comments (12)
I'm not gonna pretend to have a good handle on the thread state APIs, but - could we iterate through the interpreter's thread states and, for each of them, use
PyThreadState_Swap
to make it our current thread state, then callPyEval_SetProfile
, then restore our old thread state?If that's valid, it would only use public APIs, but there's an assertion I don't understand in
_PyThreadState_Swap
which makes me worry that it wouldn't be valid to do so.
I'm not at my computer so I cannot check but if I recall correctly _PyThreadState_Swap
will crash if you try to pass a thread state that is not the one that is currently registered with the GIL, if that is set.
The other method we discussed is much more attainable than messing directly with the thread APIs.
from memray.
Right, any usage of memray run
should work fine, because in those cases our hooks get installed before the application creates any threads.
It's only using with memray.Tracker(...)
that isn't working, because doing that in some particular thread leaves any other already-running threads without our hooks installed.
from memray.
Wow, I can reproduce that, and that is very very wrong. The "stack trace unavailable" seems less wrong than the fact that the histogram thought the smallest bucket should be 225 petabytes and the largest one should be 434 million vigintillion yottabytes - even though all of those buckets except one is empty.
How very strange...
from memray.
Well, I've figured out what's happening for each of those bugs. The histogram one is easy: we go off the rails when all of the allocation locations have allocated the same amount of memory (we're careful to avoid a divide by zero, but what we do instead isn't terribly helpful).
The other problem, the one that you raised the issue for, is trickier. What's happening is that, when the tracker is started in one thread, we're failing to install our profile function on the other threads, which means that we're failing to collect the Python stack of the main thread leading up to the allocations. I'm not sure what to do about that just yet.
from memray.
The other problem, the one that you raised the issue for, is trickier. What's happening is that, when the tracker is started in one thread, we're failing to install our profile function on the other threads, which means that we're failing to collect the Python stack of the main thread leading up to the allocations. I'm not sure what to do about that just yet.
This is expected: thread profile functions only affect newly created threads and the sys.setprofile
call is per-thread. This means that when you activate the Tracker
it will only track the stack of threads created after it was initialized. We should either document this limitation or try to overcome it, which is tricky, because there isn't a supported way to "install a profile function in every running thread".
Also, even if we install a profile function, the main thread will continue to be untracked because it never entered new functions while we are tracking in another thread, so we won't see a frame push or a pop.
For the main thread we can call Py_AddPendingCall
with the trace function setter if we are invoked from a different thread, but it has limited usefulness.
I think we should:
- Document the limitation with
Tracker
and existing running threads. - Fix the stats historgram.
from memray.
because there isn't a supported way to "install a profile function in every running thread"
How about if we make install_trace_function
do
PyInterpreterState* interp = PyThreadState_GetInterpreter(PyThreadState_Get());
PyThreadState* ts = PyInterpreterState_ThreadHead(interp);
while (ts) {
if (_PyEval_SetProfile(ts, PyTraceFunction, PyLong_FromLong(123)) < 0) {
PyErr_Clear();
}
ts = PyThreadState_Next(ts);
}
Everything that uses is public except for _PyEval_SetProfile
.
even if we install a profile function, the main thread will continue to be untracked because it never entered new functions while we are tracking in another thread, so we won't see a frame push or a pop.
When we install the profile function on a thread, perhaps we could also capture its initial stack, since at the point where we would be installing the profile function, we're holding the GIL and could walk the stack backwards from the thread's current frame. We could then apply any later pushes/pops on top of our captured initial stack.
from memray.
Everything that uses is public except for
_PyEval_SetProfile
.
This means it is not supported and there is no assurance we can do this in the future, that's why I said "supported way" 😉
OTOH we can go with this for now meanwhile I raise this issue with the rest of the core team and we can consider making _PyEval_SetProfile
public or at least semi-private.
from memray.
there is no assurance we can do this in the future
Sure, fair enough. It is available in 3.7 through 3.11, though, at least. And I can't see much reason why it shouldn't be public - it's just PyEval_SetProfile
applied to any arbitrary PyThreadState
, and the ability to iterate through PyThreadState
's is already public, and PyEval_SetProfile
is public - I don't see any obvious reason why _PyEval_SetProfile
shouldn't be.
meanwhile I raise this issue with the rest of the core team and we can consider making
_PyEval_SetProfile
public or at least semi-private.
I think that's a good idea. If there's not resistance to that, then that seems like a reasonable way for us to proceed.
If there is resistance to it, then we could always interpose the pthreads functions used to acquire the GIL, giving us a hook into when the GIL is next picked up in whatever other thread is already running 😈
from memray.
Another option would be for us to use the public and documented _PyInterpreterState_SetEvalFrameFunc
to inject a frame evaluation function instead of a profile function - those are shared across all threads for an interpreter, and would let us detect that tracing has begun on the next Python function call within a thread. Though that seems much more likely to go away in the future than _PyEval_SetProfile
would be.
from memray.
I'm not gonna pretend to have a good handle on the thread state APIs, but - could we iterate through the interpreter's thread states and, for each of them, use PyThreadState_Swap
to make it our current thread state, then call PyEval_SetProfile
, then restore our old thread state?
If that's valid, it would only use public APIs, but there's an assertion I don't understand in _PyThreadState_Swap
which makes me worry that it wouldn't be valid to do so.
from memray.
(I'm guessing that the answer is "no", and that it's not valid to use a thread state on a different OS thread than the one that it was created for, but that doesn't seem to be documented anywhere...)
from memray.
If I use 'live' mode, I can get more information about every thread
Like this:
import time
from threading import Thread
def hello():
a = 'h' * 10240
count = 0
while count<15:
a += a
count += 1
time.sleep(10)
t = Thread(target=hello, args=())
t.start()
t.join()
memray run --live main.py
from memray.
Related Issues (20)
- %%memray_flamegraph magic options HOT 3
- Make the `%%memray_flamegraph` IPython magic use aggregated capture files
- is memray support profiling c version of python package like pillow? HOT 2
- continous profiling of memray HOT 1
- could i use follow-fork in api?
- the really meaning of --native HOT 1
- Accurate report? HOT 24
- `memray run` overwrites `sys.argv[0]` even when `-I` or `-P` is used
- A crash in `memray flamegraph` (Python 3.12.0, macOS, native mode) HOT 4
- empty flamegraph/summary with large memray dump HOT 15
- Move the runner into a separate package with minimal dependencies HOT 5
- Ctrl-Z in "memray tree" doesn't work HOT 2
- Include thread name in Memray live tracking view HOT 4
- Memray not reporting memory leak as expected HOT 18
- Heap size does not change after function call.
- Track virtual memory HOT 1
- How to visualize huge bin file (over 2TB CPU Memory) HOT 3
- Ability to write to a pipe from `memray.Tracker` HOT 2
- Making memray third-party allocator-aware HOT 24
- How to profile gunicorn workers?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from memray.