celtoys / remotery Goto Github PK
View Code? Open in Web Editor NEWSingle C file, Realtime CPU/GPU Profiler with Remote Web Viewer
License: Apache License 2.0
Single C file, Realtime CPU/GPU Profiler with Remote Web Viewer
License: Apache License 2.0
As per
http://stackoverflow.com/questions/16548059/how-to-trap-unaligned-memory-access
enable signals on mis-aligned accesses
echo 4 > /proc/cpu/alignment
This gives a SIGBUS
in MessageQueue_AllocMessage
on the line
msg->thread_sampler = thread_sampler;
MessageQueue_AllocMessage needs to allocate lengths that are 4 byte aligned.
I fixed it by adding the lines:
// needs to be 4 byte aligned on ARM
payload_size = ((payload_size + 3) & ~3u);
Please take a look at function WebSocket_Create, lines 2059-2061:
*web_socket = (WebSocket*)malloc(sizeof(WebSocket));
if (web_socket == NULL)
return RMT_ERROR_MALLOC_FAIL;
It's obvious that the original intent was to check *web_socket
, but in the current state this check exhibits undefined behaviour and optimizing compiler may completely remove it as we've already dereferenced the pointer.
When pausing the web page you have control over panning back in time and zooming and such. If I may can I offer some ideas that I think would improve and accelerate the navigation between samples. It can be a bit time consuming to manually scroll the timeline around when you are trying to analyze the timings of particular samples over time.
Add arrows or some sort of control(mousewheel when hovering over a sample, etc) for each sample in their respective window(when paused), in order to jump forward and backward in the timeline to each instance of that particular sample.
I wanted to be able to just drop in a single macro in a bunch of functions, and have the API auto fill-in the function names for me, i.e.
void func1( ) {
rmt_ScopedCPUSampleAutoName( );
}
void func2( ) {
rmt_ScopedCPUSampleAutoName( );
}
int main( int argc, char ** argv ) {
rmt_ScopedCPUSampleAutoName( );
func1( );
func2( );
return 0;
}
I didn't want to have to manually pass in a name to each call. My solution:
#define rmt_BeginCPUSampleAutoName( ) \
RMT_OPTIONAL(RMT_ENABLED, { \
static rmtU32 rmt_sample_hash_##__LINE__ = 0; \
_rmt_BeginCPUSample(__FUNCTION__, &rmt_sample_hash_##__LINE__); \
})
#define rmt_ScopedCPUSampleAutoName( ) \
RMT_OPTIONAL(RMT_ENABLED, rmt_BeginCPUSampleAutoName( )); \
RMT_OPTIONAL(RMT_ENABLED, rmt_EndCPUSampleOnScopeExit rmt_ScopedCPUSample##__LINE__);
This generates a call tree like:
Sorry this isn't a proper full request, but I hoped this might be something you'd consider implementing (either as is, or in a better way if you know of one).
It doesn't build out of the box, I had to do a few changes in Remotery.c:
WINGDIAPI
is unknown. so it should really be an #elif defined(_WIN32)
case + there needs to be a new #else case with #define GLAPI extern
or something like that.glXGetProcAddressARB
couldn't be found. #include <gl/glx.h>
create conflicts with your own GLuint typedefs etc, just adding extern void* glXGetProcAddressARB(const GLubyte*);
made it work.. not pretty, thoughAnd sometimes I get segfaults on startup
Program received signal SIGSEGV, Segmentation fault.
Remotery_Destructor (rmt=0xe78750) at ../Libs/Remotery/Remotery.c:4182
4182 Delete(OpenGL, rmt->opengl);
can't reproduce that reliably though and last time I forgot to get a backtrace :-/
Anyway, thanks for this awesome tool, I really enjoy using it and it makes profiling performance problems so much easier :-)
Samples are built and timed even if the viewer isn't connected.
Don't want to check for connection on each sample submit and don't also want to start accepting samples half way through a tree.
Most runtime functions return an error code because they can create thread samplers on the fly. Should we use RegisterThread instead? That would increase API init burden but would prevent the need to return error codes for all API functions.
Alternatively, fold it into the rmt_SendThreadSamples function.
Found with clang's scan-build
. Please consider the following sequence of actions in Remotery_Create:
(*rmt)->thread
to NULL
.thread
structure and save the pointer.CreateThread
/pthread_create
fail for some reason.rmt->thread
still points to now free memory and hence non-NULL
.To re-run scan-build
use the following commands:
scan-build clang lib/Remotery.c sample/sample.c -I lib -pthread -lm
scan-view /tmp/<displayed-guid>
The consumer cleans up messages it has just processed by filling them with zeroes. This is a thread-safe way of allowing multiple producers to allocate message queue memory and keep their own lock on that range of memory until the message content is complete.
This burns write bandwidth.
One potential solution would be to clear just the message ID to zero, making the consumer check for "anything other than the messages I know about." If these message IDs are kept below an small integer value then each message can ensure they never store those values themselves. Tricksy but well-defined and much more efficient.
Clean up the CUDA stuff a bit and add GPU profiling for D3D.
Should this work on OSX?
Separate asserts from errors, allowing asserts to be used for internal consistency only and turned off in release.
The message queue is polled every 10ms, in-between which the thread is put to sleep. Latency increases and messages have the potential to be discarded.
Add wakeup calls with blocking waits in the main thread (semaphores? events?) to process immediately.
I would find it very useful if the profiler breakdown was closer to other profilers I have used in the past, specifically:
It would be nice if the browser view could (either by default, or as an option) display like:
So:
This is of course on top of:
I appreciate there is quite a lot to this enhancement request, and perhaps parts of it would be better suited to subtasks, but I'll put it all here initially, maybe you'll be able to comment on whether the above fits with your vision for this project or not. From my perspective it is a neat little profiler, I'd like to be able to make more use of it (I would also like my dev team to use it) and if it worked somewhat similarly to existing projects it would help a lot.
Clean up the CUDA stuff a bit and add GPU profiling for OpenGL.
The SpSc queue for samples is quite fast and elegant but it suffers from latency issues:
These would be fixable and the code simplified by moving it over to the new message queue.
The zero-base for sample starts is set when the first sample tree on a thread is created.
This can happen arbitrarily late (imagine spinning up a worker thread). It also introduces error even when you start threads at about the same time.
All threads should have useconds reported on the same timebase with zero having the same meaning on every thread.
See #68
Assertion failed: web_socket != NULL, file ..\lib\Remotery.c, line 2192
any idea? :)
also, this happens to be hammering the console log as well:
[13:42:26] Disconnected
[13:42:28] Connecting to ws://127.0.0.1:17815/rmt
[13:42:28] Connected
[13:42:28] Connection Error
[13:42:28] Disconnected
[13:42:30] Connecting to ws://127.0.0.1:17815/rmt
[13:42:30] Connected
[13:42:30] Connection Error
[13:42:30] Disconnected
[13:42:32] Connecting to ws://127.0.0.1:17815/rmt
[13:42:32] Connected
[13:42:32] Connection Error
[13:42:32] Disconnected
[13:42:34] Connecting to ws://127.0.0.1:17815/rmt
[13:42:34] Connected
Overflow behaviour is to currently just drop the message. This is a nice, non-fatal way of dealing with the problem but not ideal.
Allow the user to allow their threads to block on overflow until more space becomes available.
Declaration of AreCUDASamplesReady/GetCUDASampleTimes functions and their usage have to be wrapped in #ifdef RMT_USE_CUDA
to avoid compilation issues on Linux/OSX.
Hi,
When I download the whole code repository and tried to compile it in my debian Linux server. I encounter following error:
The pwd is now under the home folder of Remotery-master.
linux1:/uac/msc/yfxue/www> cd Remotery-master/
linux1:/uac/msc/yfxue/www/Remotery-master> cc lib/Remotery.c sample/sample.c -l lib -p thread -lm
sample/sample.c:3:22: fatal error: Remotery.h: No such file or directory
compilation terminated.
linux1:/uac/msc/yfxue/www/Remotery-master> pwd
/uac/msc/yfxue/www/Remotery-master
linux1:/uac/msc/yfxue/www/Remotery-master> ls
./ ../ lib/ LICENSE* readme.md* sample/ screenshot.png* vis/
linux1:/uac/msc/yfxue/www/Remotery-master> ls ./lib/
./ ../ Remotery.c* Remotery.h*
Then I tried to copy ./lib/Remotery.c to ./sample/ but it still failed as following:
yfxue@linux1:~/www/Remotery-master$ cp lib/Remotery.h ./sample/
yfxue@linux1:~/www/Remotery-master$ cc ./lib/Remotery.c ./sample/sample.c -l ./lib -pthread -lm
/usr/bin/ld: cannot find -l./lib
collect2: error: ld returned 1 exit status
Then I modifed my command as following, it error out also:
yfxue@linux1:~/www/Remotery-master$ cc lib/Remotery.c sample/sample.c -l lib -pthread -lm
/usr/bin/ld: cannot find -llib
collect2: error: ld returned 1 exit status
yfxue@linux1:~/www/Remotery-master$ cc lib/Remotery.c sample/sample.c -pthread -lm
/tmp/ccy20F2Q.o: In function `rmtLoadLibrary':
Remotery.c:(.text+0x103): undefined reference to `dlopen'
/tmp/ccy20F2Q.o: In function `rmtFreeLibrary':
Remotery.c:(.text+0x11d): undefined reference to `dlclose'
/tmp/ccy20F2Q.o: In function `rmtGetProcAddress':
Remotery.c:(.text+0x142): undefined reference to `dlsym'
/tmp/ccy20F2Q.o: In function `usTimer_Init':
Remotery.c:(.text+0x1a2): undefined reference to `clock_gettime'
/tmp/ccy20F2Q.o: In function `usTimer_Get':
Remotery.c:(.text+0x21d): undefined reference to `clock_gettime'
collect2: error: ld returned 1 exit status
yfxue@linux1:~/www/Remotery-master$
My Linux version as following:
yfxue@linux1:~/www/Remotery-master$ cat /proc/version
Linux version 3.2.0-4-amd64 ([email protected]) (gcc version 4.6.3 (Debian 4.6.3-14) )
#1 SMP Debian 3.2.63-2+deb7u2
I am not sure if it suitable for compile the whole code repository under this linux version or it requires some additional setup for running?
Thanks,
There seems to be a memory leak created when WebSocket_AcceptConnection is called. The client_socket passed in already contains a tcp_socket which is then overwritten inside the function.
I'm not sure about the desired behavior at this point. Is the previously allocated tcp_socket usable or should the old one be cleaned up before assigning the new one?
When Remotery is launched about 25% of the times, starts fine but after a few seconds remotery connection log starts showing connection errors every 2 seconds.
[11:08:11] Connecting to ws://127.0.0.1:17815/rmt
[11:08:12] Connection Error
[11:08:12] Disconnected
[11:08:13] Connecting to ws://127.0.0.1:17815/rmt
[11:08:14] Connection Error
[11:08:14] Disconnected
[11:08:15] Connecting to ws://127.0.0.1:17815/rmt
[11:08:16] Connection Error
[11:08:16] Disconnected
[11:08:17] Connecting to ws://127.0.0.1:17815/rmt
[11:08:18] Connection Error
[11:08:18] Disconnected
[11:08:19] Connecting to ws://127.0.0.1:17815/rmt
[11:08:20] Connection Error
[11:08:20] Disconnected
Looking at the Chome console, I can see this:
WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Could not decode a text frame as UTF-8.
WebSocketConnection.js:54 WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Invalid frame header
WebSocketConnection.js:54 WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Could not decode a text frame as UTF-8.
WebSocketConnection.js:54 WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Invalid frame header
WebSocketConnection.js:54 WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Could not decode a text frame as UTF-8.
WebSocketConnection.js:54 WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Invalid frame header
WebSocketConnection.js:54 WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Could not decode a text frame as UTF-8.
WebSocketConnection.js:89 WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Invalid frame headerWebSocketConnection.js:89 OnOpen
WebSocketConnection.js:89 WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Could not decode a text frame as UTF-8.
9WebSocketConnection.js:54 WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Could not decode a text frame as UTF-8.
On the WebSocketConnection.js file line 54:
this.Socket = new WebSocket(address);
This error are reported by chrome:
WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Could not decode a text frame as UTF-8.
WebSocket connection to 'ws://127.0.0.1:17815/rmt' failed: Invalid frame header
Any think I could test to track this issue ?
I wonder if I have too many marks, as I have changed my current profiller marks to call remotery...
It appears that if you leave an application running for hours with Remotery enabled, memory will steadily increase and ultimately crash your program.
Does it retain sample data indefinitely?
I had to make workaround missing NSGLGetProcAddress function in newer OSX.
bkaradzic/bgfx@c3dd887
These distract from the interface and serve no real purpose other than to show what an error means in the debugger. In order to make use of these at runtime the programmer as to add them all to a big switch statement manually, which will easily get out of sync between versions.
Given that the programmer can't really do anything different based on what the returned error codes are (beyond letting the user know what's going on at runtime), there only really needs to be two catchable error codes at runtime: ERROR and OK.
So, remove the error codes - maybe just push them into the C file so that errors codes are still debuggable. Return an OK/ERROR state from the public API and a function that maps an error to a string for printing at runtime.
Hi!
I would like to add Remotery support to this project: https://github.com/01org/IntelSEAPI/wiki
Please contact me to discuss details? alexander.a.raud at intel dot com
With respect, Alex.
There are a couple calls to Server_Send with a 20ms timeout that frequently appear to fail inside TCPSocket_Send in the timeout busy loop, forcing a disconnect. This happens every few seconds. I'm not sure why it would be timing out on a loopback connection.
If it matters, I'm running the web page in the latest chrome.
By default Remotery is always enabled because there is #define RMT_ENABLED
. There is no way to change it without modifying code.
I propose change to:
#ifndef RMT_ENABLED
#define RMT_ENABLED 1
#endif
With this, code should not use #ifdefs
anymore, but rather #if
because symbol will be always defined. This simplifies integration with other projects.
Currently each thread gets its own block and line in the html page that moves forward in time as it runs and completes samples. Often times it's useful to profile occasional areas of code that are hit for a frame but then lost in the time line.
It would be super useful if one could create or label a one off sample scope so that it can be found easily in the html page. It could be diverted to its own line and block, or perhaps a search feature could be used to find things easier.
I tried to create a temporary sample scope by changing the thread name around the scope hoping it would end up in its own block in the output,but that didn't work.
I'd like to be able to use remotery in a shared library, and am prepared to do the work to add exporting functions but would like to know if this was something which you'd be prepared to accept a pull request for this?
Additionally if you have any guidelines let me know, otherwise I'll just try to follow the current code style.
Various viewer backends would be useful, so I would propose that separating out the webserver from the sampler would be a good way to make a simple expendable system.
Examples of potentially useful backends:
sorry to bother, the sample runs for 4-8 seconds and crashes all the time, Win7-64bit. the sample crashes here:
assertion failed: web_socket != NULL in remotery.c line 2139 (WebSocket_Send first assertion).
in the viewer I see that the connection is all the time up and immediately down again. from the chrome-debugger:
[19:17:00] Connecting to ws://127.0.0.1:17815/rmt
[19:17:00] Connected
[19:17:00] Connection Error
[19:17:00] Disconnected
[19:17:02] Connecting to ws://127.0.0.1:17815/rmt
[19:17:02] Connected
[19:17:02] Connection Error
[19:17:02] Disconnected
[19:17:04] Connecting to ws://127.0.0.1:17815/rmt
[19:17:04] Connected
[19:17:04] Connection Error
[19:17:04] Disconnected
[19:17:06] Connecting to ws://127.0.0.1:17815/rmt
[19:17:02] start profiling
[19:17:02] end profiling
[19:17:02] start profiling
[19:17:02] end profiling
[19:17:02] start profiling
[19:17:02] end profiling
[19:17:02] start profiling
first frame of network traffic (Notice also the error in "MainThread"!):
{"id":"SAMPLES","thread_name":"MainTh�~©{"id":"SAMPLES","thread_name":"MainThread","nb_samples":1,"sample_digest":593689054,"samples":[{"name":"delay","id":558789103,"cp
170
19:17:00
{"id":"SAMPLES","thread_name":"MainThread","nb_samples":2,"sample_digest":2560211960,"samples":[{"name":"delay","id":558789103,"cpu_us_start":2959071,"cpu_us_length":1}]}
170
19:17:00
{"id":"SAMPLES","thread_name":"MainThread","nb_samples":2,"sample_digest":2560211960,"samples":[{"name":"delay","id":558789103,"cpu_us_start":2959069,"cpu_us_length":2}]}
170
19:17:00
{"id":"SAMPLES","thread_name":"MainThread","nb_samples":1,"sample_digest":593689054,"samples":[{"name":"delay","id":558789103,"cpu_us_start":2959064,"cpu_us_length":4}]}
169
19:17:00
{ "id": "LOG", "text": "start profiling"}
41
19:17:00
{ "id": "LOG", "text": "end profiling"}
39
19:17:00
{ "id": "PING" }
and after each PING the connection is closed....
here another strange thing:
{"id":"SAMPLES","thread_name":"MainThre�'{ "id": "LOG", "text": "end profiling"}�~
I've added the thread-name to the sample:
if( RMT_ERROR_NONE != rmt_CreateGlobalInstance(&rmt) ) {
return -1;
}
rmt_SetCurrentThreadName("MainThread");
for(;;) {
rmt_LogText("start profiling");
delay();
One thing that we've used in the past is a 'Stats' window that would show per-frame things like FPS, Active Entities, Draw Call Count, Memory Usage, etc. as just a simple list. It would be very cool if Remotery had a way to do this and show a small history graph (60 frames worth maybe) of selected stats. Something similar to the Windows 8 Task Manager "Performance" Tab (left side).
Is this something that sounds doable? We are just now starting to look over the source.
There are two issues in use of glGetError in Remotery.
This is my current workaround this issue (it fixes both :):
https://github.com/bkaradzic/bgfx/blob/master/3rdparty/remotery/lib/Remotery.c#L5372
Add initialisation struct for stuff like custom memory allocators and control over how much memory gets allocated by Remotery.
Hi,
for a benchmark software i'm making, could you add the support for an OpenCL backend ?
awesome work man !
Now that all points of error for sending/receiving data on a socket have been trapped, there are some thread-safety issues with logging text (again).
If there is a failure to log text, the TCP socket for the WebSocket will be correctly shut down. However, the server thread might be using it at the same time.
The only way for this seems to be sticking the log text on a queue or opening a different socket for it.
Just a suggestion ...
Instead of hashing ThreadSampler pointers for thread names. It would make more sense to name threads by their IDs. something like :
static rmtU32 Thread_GetId()
{
#if defined(RMT_PLATFORM_WINDOWS)
return GetCurrentThreadId();
#elif defined(RMT_PLATFORM_POSIX)
return (rmtU32)pthread_self();
#endif
}
And inside ThreadSampler_Constructor
instead of Base64_Encode line :
snprintf(thread_sampler->name, sizeof(thread_sampler->name), "Thread #%d", Thread_GetId());
I don't know about windows, but there is also a thread naming feature in posix :
http://man7.org/linux/man-pages/man3/pthread_setname_np.3.html
You can use that to fetch default thread name (if the user has set it before). If the name is empty, use the Thread ID as the name, if not use the name set by user.
If I don't call rmt_SetCurrentThreadName for each thread then the viewer only sees 2 of my threads, although debugging shows that all 8 threads are tracked by Remotery_GetThreadSampler correctly.
I can likely get a repro case up if needed, but was hoping you might have an idea.
Having this macro definition
#define JSON_ERROR_CHECK(stmt) error = stmt; if (error != RMT_ERROR_NONE) return error;
and usage like this
if (sample->next_sibling != NULL)
JSON_ERROR_CHECK(json_Comma(buffer));
leads to very interesting execution path.
I recommend to either add curly braces to if
-statement, or (preferably) to macro definition itself.
Control reaches end of non-void function, return value is undefined.
Currently the web viewer does not update threads if they do not have any data - for example an asynchronous task which completes with the thread then waiting on an event for a new task.
This means that as other threads which continue to send data are updated, the timeline presented is not consistent.
Hi,
There doesn't seem to be a way to resize or scroll the content of the Timeline window (or is there?). We have a relatively high number of (IO) threads, more then fit into Timeline window which makes it impossible to view or select some thread timelines. My current solution would be to locally hack the visualizer webpage but I'm wondering whether there is a better way to do this, or how much effort it would be to make the timeline window resizable or vertically scrollable.
This is what it looks like:
Cheers,
-Floh.
hello again,
the button pauses both the graphics and windows, but console text is still displaying and logging.
is this intended?
also, it could be nice if the pause button would enhance status color to be like this:
red = status off, pause off,
green = status on, pause off,
gray = status off, pause on,
yellow = status on, pause on,
btw, the green/red light is crazy since a few ago and blinks all the time, no matter what the connection status is. any idea?
The current design accumulates samples via a hierarchy, and then transmits the data if the top level sample has ended.
This seems reasonable, but I've been using a begin/end sample marker around my thread entry function, which causes the Remotery sampler to never send any data for that thread, with the additional problem that memory usage builds up.
I've tried to run it on linux and with the latest version it doesn't even get a connection between the vizualisation and the running app. I've tried to debug it, but only by adding printfs I got the following assertion:
lib/Remotery.c:3418: SampleTree_Pop: Assertion `sample != tree->root' failed.
I've tried to debug it a bit, but because often in the case of an error you return from a subroutine without a message or an assertion it's hard to find the real source of failure. It seems the extra thread mainloop runs for exactly one time and hangs afterwards.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.