Giter Site home page Giter Site logo

ossrs / state-threads Goto Github PK

View Code? Open in Web Editor NEW
705.0 65.0 274.0 684 KB

Lightweight thread library for C/C++ coroutine (similar to goroutine), for high performance network servers.

Home Page: http://sourceforge.net/projects/state-threads

License: Other

C 9.56% Assembly 2.03% Shell 0.27% Makefile 0.95% C++ 84.84% CMake 2.35%
srs coroutines greenlet fiber networking server-side state-threads c concurrency async

state-threads's Introduction

state-threads

Fork from http://sourceforge.net/projects/state-threads, patched for SRS.

See: https://github.com/ossrs/state-threads/blob/srs/README

For original ST without any changes, checkout the ST master branch.

LICENSE

state-threads is licenced under MPL or GPLv2.

Linux: Usage

Get code:

git clone -b srs https://github.com/ossrs/state-threads.git

For Linux:

make linux-debug

For Linux aarch64, which fail with Unknown CPU architecture:

make linux-debug EXTRA_CFLAGS="-D__aarch64__"

Note: For more CPU architectures, please see #22

Linux with valgrind:

make linux-debug EXTRA_CFLAGS="-DMD_VALGRIND"

Remark: User must install valgrind, for instance, in centos6 sudo yum install -y valgrind valgrind-devel.

Linux with valgrind and epoll:

make linux-debug EXTRA_CFLAGS="-DMD_HAVE_EPOLL -DMD_VALGRIND"

Mac: Usage

Get code:

git clone -b srs https://github.com/ossrs/state-threads.git

For OSX:

make darwin-debug

For OSX, user must specifies the valgrind header files:

make darwin-debug EXTRA_CFLAGS="-DMD_HAVE_KQUEUE -DMD_VALGRIND -I/usr/local/include"

Remark: M1 is unsupported by ST, please use docker to run, please read SRS#2747.

Windows: Usage

Get code:

git clone -b srs https://github.com/ossrs/state-threads.git

For Cygwin(Windows):

make cygwin64-debug

Remark: Windows native build is unsupported right now.

Branch SRS

The branch srs was patched and refined:

  • ARM: Patch st.arm.patch, for ARM.
  • OSX: Patch st.osx.kqueue.patch, for osx.
  • Linux: Patch st.disable.examples.patch, for ubuntu.
  • System: Refine TAB of code.
  • ARM: Merge from michaeltalyansky and xzh3836598, support ARM.
  • Valgrind: Merge from toffaletti, support valgrind for ST.
  • OSX: Patch st.osx10.14.build.patch, for osx 10.14 build.
  • ARM: Support macro MD_ST_NO_ASM to disable ASM, #8.
  • AARCH64: Merge patch srs#1282 to support aarch64, #9.
  • OSX: Support OSX for Apple Darwin, macOS, #11.
  • System: Refine performance for sleep or epoll_wait(0), #17.
  • System: Support utest by gtest and coverage by gcov/gocvr.
  • System: Only support for Linux and Darwin. #19, srs#2188.
  • System: Improve the performance of timer. 9fe8cfe5b, 7879c2b, 387cddb
  • Windows: Support Windows 64bits. #20.
  • MIPS: Support Linux/MIPS for OpenWRT, #21.
  • LOONGARCH: Support loongarch for loongson CPU, #24.
  • System: Support Multiple Threads for Linux and Darwin. #19, srs#2188.
  • RISCV: Support RISCV for RISCV CPU, #24.
  • MIPS: Support Linux/MIPS64 for loongson 3A4000/3B3000, #21.
  • AppleM1: Support Apple Silicon M1(aarch64), #30.
  • IDE: Support CLion for debugging and learning.
  • Define and use a new jmpbuf, because the structure is different.
  • Check capability for backtrack.
  • Support set specifics for any thread.
  • Support st_destroy to free resources for asan.
  • System: Support sendmmsg for UDP, #12.

GDB Tools

Valgrind

How to debug with gdb under valgrind, read valgrind manual.

About startup parameters, read valgrind cli.

Important cli options:

  1. --undef-value-errors=<yes|no> [default: yes], Controls whether Memcheck reports uses of undefined value errors. Set this to no if you don't want to see undefined value errors. It also has the side effect of speeding up Memcheck somewhat.
  2. --leak-check=<no|summary|yes|full> [default: summary], When enabled, search for memory leaks when the client program finishes. If set to summary, it says how many leaks occurred. If set to full or yes, each individual leak will be shown in detail and/or counted as an error, as specified by the options --show-leak-kinds and --errors-for-leak-kinds.
  3. --track-origins=<yes|no> [default: no], Controls whether Memcheck tracks the origin of uninitialised values. By default, it does not, which means that although it can tell you that an uninitialised value is being used in a dangerous way, it cannot tell you where the uninitialised value came from. This often makes it difficult to track down the root problem.
  4. --show-reachable=<yes|no> , --show-possibly-lost=<yes|no>, to show the using memory.

Linux: UTest

Note: We use Google test in utest/gtest-fit.

To make ST with utest and run it:

make linux-debug-utest && ./obj/st_utest

Note that the gcc(4.8) of CentOS is too old, please use docker(ossrs/srs:dev-gcc7) to run:

docker run --rm -it -v $(pwd):/state-threads -w /state-threads \
    registry.cn-hangzhou.aliyuncs.com/ossrs/srs:dev-gcc7 \
    bash -c 'make linux-debug-utest && ./obj/st_utest'

Mac: UTest

Note: We use Google test in utest/gtest-fit.

To make ST with utest and run it:

make darwin-debug-utest && ./obj/st_utest

Linux: Coverage

Note: We use Google test in utest/gtest-fit.

To make ST with utest and run it:

make linux-debug-gcov && ./obj/st_utest

Note that the gcc(4.8) of CentOS is too old, please use docker(ossrs/srs:dev-gcc7) to run:

docker run --rm -it -v $(pwd):/state-threads -w /state-threads \
    registry.cn-hangzhou.aliyuncs.com/ossrs/srs:dev-gcc7 \
    bash -c 'make linux-debug-gcov && ./obj/st_utest'

Then, install gcovr for coverage:

yum install -y python2-pip &&
pip install lxml && pip install gcovr

Finally, run test and get the report:

bash auto/coverage.sh

Mac: Coverage

Note: We use Google test in utest/gtest-fit.

To make ST with utest and run it:

make darwin-debug-gcov && ./obj/st_utest

Then, install gcovr for coverage:

pip install gcovr

Finally, run test and get the report:

bash auto/coverage.sh

Docs & Analysis

CLion

Use CLion to open directory state-threads.

Then, open ide/st_clion/CMakeLists.txt and click Load CMake project.

Finally, select a configuration to run or debug.

Winlin 2016

state-threads's People

Contributors

chen-guanghua avatar t-bagwell avatar timgates42 avatar winlinvip avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

state-threads's Issues

Support backtrace and backtrace_symbols

Support backtrace: backtrace() returns a backtrace for the calling program, in the array pointed to by buffer. A backtrace is the series of currently active function calls for the program. Each item in the array pointed to by buffer is of type void *, and is the return address from the corresponding stack frame. The size argument specifies the maximum number of addresses that can be stored in buffer. If the backtrace is larger than size, then the addresses corresponding to the size most recent function calls are returned; to obtain the complete backtrace, make sure that buffer and size are large enough.

Backtrace

To dump stack and symbols by backtrace.

Build and run the example:

cd tools/backtrace
make
./backtrace

For linux, the output is bellow:

nn_return_addresses=8, symbols=0x55fc542c4d10, symbols[0]=0x55fc542c4d50

return_addresses:
0x55fc53ace582
0x55fc53ace86b
0x55fc53ace8a1
0x55fc53acf5d9
0x55fc53acffe4
0x55fc53ace8fd
0x7f8f218f2c87
0x55fc53acdeba

symbols:
./backtrace(bar+0x2e) [0x55fc53ace582]
./backtrace(foo+0xe) [0x55fc53ace86b]
./backtrace(start+0x16) [0x55fc53ace8a1]
./backtrace(_st_thread_main+0x2a) [0x55fc53acf5d9]
./backtrace(st_thread_create+0x14a) [0x55fc53acffe4]
./backtrace(main+0x49) [0x55fc53ace8fd]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f8f218f2c87]
./backtrace(_start+0x2a) [0x55fc53acdeba]
bar OK
---------------------------------------------------------------

Obtained 8 stack frames.
0x0000000000003040: PrintBacktrace at backtrace.c:40
0x000000000000387c: foo at backtrace.c:262
0x00000000000038a1: start at backtrace.c:270
0x00000000000045d9: _st_thread_main at sched.c:371
0x0000000000004fe4: st_thread_create at sched.c:657 (discriminator 3)
0x00000000000038fd: main at backtrace.c:285
0x0000000000021c87: ?? ??:0
0x0000000000002eba: _start at ??:?
foo OK
coroutine OK

For darwin, the output is:

Try to get return addresses by __builtin_return_address
nn_return_addresses=6, symbols=0x7f8b68704680, symbols[0]=0x7f8b687046b0

return_addresses:
0x10bdbf3e9
0x10bdbf411
0x10bdc07aa
0x10bdbfed7
0x10bdbf468
0x11b2ee52e

symbols:
0   backtrace                           0x000000010bdbf3e9 foo + 9
1   backtrace                           0x000000010bdbf411 start + 17
2   backtrace                           0x000000010bdc07aa _st_thread_main + 42
3   backtrace                           0x000000010bdbfed7 st_thread_create + 343
4   backtrace                           0x000000010bdbf468 main + 56
5   dyld                                0x000000011b2ee52e start + 462
bar OK
foo OK
coroutine OK

It works good.

addr2line

When we got return address by backtrace, for example 0x55fc53ace582, then convert to symbol by backtrace_symbols, the data is bellow:

./backtrace(bar+0x2e) [0x55fc53ace582]
./backtrace(foo+0xe) [0x55fc53ace86b]
./backtrace(start+0x16) [0x55fc53ace8a1]
./backtrace(_st_thread_main+0x2a) [0x55fc53acf5d9]
./backtrace(st_thread_create+0x14a) [0x55fc53acffe4]
./backtrace(main+0x49) [0x55fc53ace8fd]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f8f218f2c87]
./backtrace(_start+0x2a) [0x55fc53acdeba]

If not compiled with -rdynamic:

./backtrace(+0x1d62) [0x55629d7c3d62]
./backtrace(+0x204b) [0x55629d7c404b]
./backtrace(+0x2081) [0x55629d7c4081]
./backtrace(+0x2db9) [0x55629d7c4db9]
./backtrace(+0x37c4) [0x55629d7c57c4]
./backtrace(+0x20dd) [0x55629d7c40dd]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f39f2f70c87]
./backtrace(+0x169a) [0x55629d7c369a]

Note that the address 0x204b equals to foo+0xe, by converting symbol foo to address 0x203d:

nm backtrace|grep ' foo'
#000000000000203d T foo

So we can parse the address by addr2line:

addr2line -C -p -s -f -a -e backtrace 0x204b
#0x000000000000204b: foo at backtrace.c:258

The code line is at 258, the return address, not 255:

void foo() {
/*line 255*/     bar();

#ifdef __linux__
/*line 258*/   printf("---------------------------------------------------------------\n");
    PrintBacktrace();
#endif

    printf("foo OK\n");
    return;
}

Note that The line is the return address, so regularly it should be the next line of stack.

Breakpoint 1, bar () at backtrace.c:250
250	    return;
(gdb) bt
#0  bar () at backtrace.c:250
#1  0x000055555555604b in foo () at backtrace.c:255

See link

objdump

We can check the address and symbol by objdump:

objdump -d backtrace > t.log

It output the asm of foo():

000000000000203d <foo>:
    203d:       55                      push   %rbp
    203e:       48 89 e5                mov    %rsp,%rbp
    2041:       b8 00 00 00 00          mov    $0x0,%eax
    2046:       e8 e9 fc ff ff          callq  1d34 <bar>
>   204b:       48 8d 3d 7e 6a 00 00    lea    0x6a7e(%rip),%rdi        # 8ad0 <_IO_stdin_used+0x130>
    2052:       e8 09 f3 ff ff          callq  1360 <puts@plt>
    2057:       e8 1e f7 ff ff          callq  177a <PrintBacktrace>
    205c:       48 8d 3d ad 6a 00 00    lea    0x6aad(%rip),%rdi        # 8b10 <_IO_stdin_used+0x170>
    2063:       e8 f8 f2 ff ff          callq  1360 <puts@plt>
    2068:       90                      nop
    2069:       5d                      pop    %rbp
    206a:       c3                      retq

Note that the address 0x204b equals to foo+0xe, by converting symbol foo to address 0x203d.

Now, we know that the 0xe is the instructions after foo(0x203d).

And please note that 204b is lea, which is return address from bar().

Support sendmmsg for UDP.

See ossrs/srs#307 (comment)

Linux GSO,可以将多个UDP包延迟分包,提升性能,参考UDP GSO原理及应用

注意:Linux 4.18.0及以上的内核才支持GSO,SRS会自动检测内核版本,下面是在开启GSO后的包,从抓包上看和不开启GSO一样的。如下抓包所示。

rtc-plaintext-linux4-gso-ok.pcapng.zip

rtc-plaintext-multiple-slices-as-one-NALU.pcapng.zip

注意:如果强制在低版本内核上开启GSO,会有未知行为,如下抓包所示。

rtc-plaintext-linux3-gso-invalid.pcapng.zip

SRS新增了包性能分析的API:http://localhost:1985/api/v1/perf

可以使用工具来分析:./scripts/perf_gso.py http://localhost:1985/api/v1/perf

No GSO, Fragment at Source

不开启GSO,在接收包(Source)时将消息分成RTP包,统计数据如下:

image

Note: 可以看到RTP的包数目比RTMP多了1.6倍,如果每个包都要内核处理,性能会有较大影响。

No GSO, Fragment at Connection

不开启GSO,在发送包(Connection)时将消息分成RTP包,统计数据如下:

image

Note: 可以看到和前面差不多,在Source和Connection分包,对于包分布影响不大。

GSO, Fragment at Connection

开启GSO,在发送包(Connection)时将消息分成RTP包,统计数据如下:

image

Note: 可以看到开启GSO后,穿过内核的包数目比RTMP还要少,性能得到提升。实际上GSO并不会减少RTP包数,但能将通过内核的包减少,所以我们认为包数目变少了。

GSO, Larger FU-Payload

之前FU Payload的长度是1200,改成了1300,参考bfc70d64b91e07f4

image

修改后,IP包最大是1356字节,小于1500的MTU。从结果看,RTP包从1.56倍降低到1.49倍,GSO分包没有影响。

GSO, Padding Packets

Audio包比较多,有时候差异不大,比如有三个包:257 256 255,如果能加点padding,那么可以作为一个GSO包发送,参考c95a8517

image

image

image

从数据上看,开启padding(127)后,你将GSO包倍数从0.74降低到0.67,将效能从0.67提升到0.74。

Note:开启padding的负载不高,在千万分之N级别,因为不是每个包都增加127字节,而是只有在包开启padding能GSO时才会加padding。

Note: Padding是RTP标准协议,参考RTP Fixed Header FieldsPadding may be needed by some encryption algorithms with fixed block sizes or for carrying several RTP packets in a lower-layer protocol data

a performance issue for epoll idle

I came across a performance issue in epoll mode, when there were thousands concurrent connections. Profiling shows that _st_epoll_dispatch() consumed a lot of CPU.

After reviewing the function, I think I've found the reason: there's a loop that enumerates ALL threads in the I/O queue.

for (q = _ST_IOQ.next; q != &_ST_IOQ; q = q->next) {

As I'm using one thread per connection model, I believe this loop make epoll mode degraded effectively to select mode.

Support daemon(fork twice) for Darwin/OSX

SRS分支改进后,去掉了对多进程的检测,而OSX在fork后会关闭fd,导致kevent返回-1死循环。

在初始化时调用_st_kq_init创建了kqueue,fd是3:

(lldb) p _st_kq_data->kq
(int) $0 = 3

MB0:state-threads winlin$ lsof -p 31372|grep KQUEUE
srs     31372 winlin    3u  KQUEUE                                       count=0, state=0

运行后,daemon经过两次fork,fd=3已经关闭,变成了pid文件了:

MB0:state-threads winlin$ lsof -p 31695
COMMAND   PID  USER   FD   TYPE             DEVICE   SIZE/OFF                NODE NAME
srs     31695 winlin    3w   REG                1,4          5            31611904 /Users/winlin/git/srs/trunk/objs/srs.pid

最初是有pid改变的检测的( event.c#L970 ),由于在linux下没有用多进程,所以在这个分支去掉了这个逻辑。

Compile ASM failed.

When we use setjmp/longjmp in glibc, we still compile the md.S, which fail in some OS.

We support a macro to disable the md.S MD_ST_NO_ASM, you can enable it to avoid building md.S:

  1. Directly build ST: make EXTRA_CFLAGS=-DMD_ST_NO_ASM
  2. Build SRS with ST: ./configure --extra-flags='-DMD_ST_NO_ASM' && make

Unknown CPU architecture

I'm trying to compile on FreeBSD for powerpc64.

I have error in md.h:
md.h:177:2: error: #error Unknown CPU architecture

The problem is that MD_JB_SP is undefined on FreeBSD for POWER. What is the proper value?

If you don't know, what is this macro responsible for?

Support MIPS for OpenWRT

Calling conventions

MIPS has had several calling conventions, especially on the 32-bit platform.

image

从上图可以看出,需要保存的寄存器如下:

  • $gp global pointer
  • $fp frame pointer
  • $sp stack pointer
  • $s0–$s7 saved temporaries

调试下setjmp函数,发现还会保存下面两个寄存器:

  • $ra return address,函数返回地址,其实就是跳转的PC了,参考The jal Instruction
  • $fp, $s8, 保存的是这个寄存器,应该是s8用作fp了。

Jmpbuf

jmpbuf的布局如下:

    #define JB_SP  0    /* Stack pointer */
    #define JB_RA  11   /* Return address */
    #define JB_GP  1    /* Global pointer */
    #define JB_S0  3    /* S0-S7, Saved temporaries */
    #define JB_S1  4    /* S0-S7, Saved temporaries */
    #define JB_S2  5    /* S0-S7, Saved temporaries */
    #define JB_S3  6    /* S0-S7, Saved temporaries */
    #define JB_S4  7    /* S0-S7, Saved temporaries */
    #define JB_S5  8    /* S0-S7, Saved temporaries */
    #define JB_S6  9    /* S0-S7, Saved temporaries */
    #define JB_S7  10   /* S0-S7, Saved temporaries */
    #define JB_FP  2    /* FP/S8 Frame pointer */

选择将SP放在第一个位置,是为了方便更换:

        #elif defined(__mips__)
            /* https://github.com/ossrs/state-threads/issues/21 */
            #define MD_USE_BUILTIN_SETJMP
            #define MD_GET_SP(_t) *((long *)&((_t)->context[0].__jb[0]))

Return value

写两个函数返回0和1,可以发现返回值是寄存器$v0

由于$ra是函数的返回地址,所以返回都是跳转到$ra

        li $v0, 1       /* Set return value to 1 */
        jr $ra          /* Return to the saved return address */

Support buildin setjmp/longjmp for ARM

The ARM use glibc setjmp/longjmp, which is not available for high version ARM, for example, RespberryPi2. We must use the buildin setjmp/longjmp like the ia64/x86-64/amd64/i386, use asm to implements setjmp/longjmp.

UDP接收中文乱码

/*

  • Simple I/O functions for UDP.
    */
    int st_recvfrom(_st_netfd_t *fd, void *buf, int len, struct sockaddr *from, int *fromlen, st_utime_t timeout)
    {
    int n;

    while ((n = recvfrom(fd->osfd, buf, len, 0, from, (socklen_t )fromlen)) < 0) {
    if (errno == EINTR)
    continue;
    if (!_IO_NOT_READY_ERROR)
    return -1;
    /
    Wait until the socket becomes readable */
    if (st_netfd_poll(fd, POLLIN, timeout) < 0)
    return -1;
    }
    printf("st_recvfromserver receive %d bytes: %s\n",n,buf);
    return n;
    }
    接收到中文乱码

Support valgrind for ST

Valgrind is very useful toolkit for memory leak detect and memory corrupt check, but not support ST for the setjmp/longjmp. Maybe we can patch ST to support valgrind.

Performance improvement for st_usleep.

The resolution for epoll_wait is ms, while the st_usleep is us.

ST_HIDDEN void _st_epoll_dispatch(void)
{
    ......

    if (_ST_SLEEPQ == NULL) {
        timeout = -1;
    } else {
        min_timeout = (_ST_SLEEPQ->due <= _ST_LAST_CLOCK) ? 0 : (_ST_SLEEPQ->due - _ST_LAST_CLOCK);
        timeout = (int) (min_timeout / 1000);
    }

    /* Check for I/O operations */
    nfd = epoll_wait(_st_epoll_data->epfd, _st_epoll_data->evtlist, _st_epoll_data->evtlist_size, timeout);
int st_usleep(st_utime_t usecs)
{
    ......

    if (usecs != ST_UTIME_NO_TIMEOUT) {
        me->state = _ST_ST_SLEEPING;
        _ST_ADD_SLEEPQ(me, usecs);

void _st_add_sleep_q(_st_thread_t *thread, st_utime_t timeout)
{
    thread->due = _ST_LAST_CLOCK + timeout;
    thread->flags |= _ST_FL_ON_SLEEPQ;
    thread->heap_index = ++_ST_SLEEPQ_SIZE;
    heap_insert(thread);
}
void _st_vp_check_clock(void)
{
    ......

    now = st_utime();
    elapsed = now - _ST_LAST_CLOCK;
    _ST_LAST_CLOCK = now;
    
    while (_ST_SLEEPQ != NULL) {
        thread = _ST_SLEEPQ;
        ST_ASSERT(thread->flags & _ST_FL_ON_SLEEPQ);
        if (thread->due > now)
            break;
        _ST_DEL_SLEEPQ(thread);
        
        ......

        /* Make thread runnable */
        ST_ASSERT(!(thread->flags & _ST_FL_IDLE_THREAD));
        thread->state = _ST_ST_RUNNABLE;
        _ST_ADD_RUNQ(thread);

What happends when there is a lot of timer, so that the 0us<timeout<1ms? The _st_epoll_dispatch consumes lots of CPUs, because epoll_wait(0ms) while the timer does not run(>0us).

当系统有非常多的timer时,会出现0us<timeout<1ms的情况,这时候epoll会立刻返回epoll_wait(0ms),但是timer并不会执行(>0us)还没有到准确的唤醒时间。

How to porting ST to other OS/CPU? 如何移植ST到其他系统或CPU?

移植ST比想象的要简单很多,最关键的就是实现setjmp/longjmp,也就是保存寄存器和恢复寄存器。

目前已经实现的OS和CPU如下:

OS CPU Status Command Description
Linux x86-64 Stable make linux-debug For CentOS,Ubuntu server, etc.
Linux arm Stable make linux-debug For ARM(v7) device, #1
Linux aarch64 Stable make linux-debug For ARM(v8) server, #9
Linux mips Dev make linux-debug For OpenWRT device, #21
Linux mips64 Dev make linux-debug For Loongson 3A4000/3B3000, #21
Linux loongarch64 Dev make linux-debug For Loongson CPU, #24
Linux riscv Dev make linux-debug For RISCV CPU, #28
OSX x86-64 Stable make darwin-debug For OSX(MacPro, etc.) #11
OSX m1(aarch64) Dev make darwin-debug For OSX(MacPro M1, etc.) #30
Windows x86-64 Dev make cygwin64-debug For Windows(x64) desktop, #20

Note: 早期ST直接使用setjmp,然后修改jmpbuf的SP寄存器内容,这依赖于知道glibc如何使用jmpbuf的布局,而后来glibc改变了(加密了)布局所以就出现很多平台无法使用。其实全部使用汇编实现,移植性会更好,因为要支持的系统和CPU有限,寄存器的布局是确定的,资料也很好找。

OS

编译ST需要明确指定OS,比如:

  • Linux: make linux-debug
  • OSX: make darwin-debug
  • Windows: make cygwin64-debug

不同的OS的依赖的文件可能不同,如果需要支持其他OS则需要修改Makefile

Note: 如果你的系统的规范和现有的一样,就可以尝试用现有的OS,比如Unix一般可以指定为Linux或OSX。

CPU

不同CPU的寄存器布局不同,比如Linux下支持多种CPU,一般可以通过宏定义检测到,所以一般都使用如下命令编译:

make linux-debug

如果发现报错Unknown CPU architecture,那么可以明确指定你的CPU体系:

  • x86-64: make linux-debug EXTRA_CFLAGS="-D__x86_64__"
  • arm: make linux-debug EXTRA_CFLAGS="-D__arm__"
  • aarch64: make linux-debug EXTRA_CFLAGS="-D__aarch64__"
  • mips: make linux-debug EXTRA_CFLAGS="-D__mips__"
  • mips64: make linux-debug EXTRA_CFLAGS="-D__mips64"
  • loonarch64: make linux-debug EXTRA_CFLAGS="-D__loongarch64"
  • riscv: make linux-debug EXTRA_CFLAGS="-D__riscv"

使用命令检测你的CPU,比如检测armv8/aarch64/arm64:

g++ -dM -E - </dev/null |grep -i aarch64

如果你的CPU不属于已经适配的CPU,就需要适配,也并不难。下面介绍一些适配的工具。

Tools

适配新CPU的工具如下:

  1. 分析你的平台的寄存器使用,也就是函数调用规范。一般是由系统(Linux/OSX/Windows)和CPU(x86/ARM/MIPS)决定的。有个小工具打印这些信息,参考porting.c
  2. 有个小工具验证ST是否正常工作,会启动一个ST的协程,不断打印消息,调用st_sleep切换协程和等待,参考helloworld.c
  3. 覆盖常用的ST的函数的调用,比如thread、cond、sleep、mutex、cond等相关API和数据结构,参考verify.c
  4. 由于不同的平台的jmpbuf的定义可能会有所不同,我们自己定义了这个数据结构,参考 #29 ,有个小工具可以打印这个结构体的定义,是通过gcc -E预处理指令可以看到头文件中关于jmpbuf的定义,参考jmpbuf.c
  5. 有时候需要关注函数调用PCS(Procedure Call Standard),参考pcs.c
  6. 有时候需要关注栈的情况,参考stack.c

了解这些工具后,可以很方便的适配新的CPU,参考下面的步骤。

Porting

以MIPS为例,我们找下MIPS Calling Conventions,可以看到Callee主要保存以下寄存器:

  • $gp global pointer
  • $fp frame pointer
  • $sp stack pointer
  • $s0–$s7 saved temporaries

我们修改porting.c,增加MIPS下的print_jmpbuf,并在OpenWRT上执行,可以看到setjmp还是明文并没有混淆:

root@OpenWrt:~# ./porting OS specs:
__linux__: 1

CPU specs:
__mips__: 1, __mips:32, __mips_isa_rev:2, _MIPSEL:1

Compiler specs:
sizeof(long)=4
sizeof(long long int)=8
sizeof(void*)=4
sizeof(__ptr_t)=4

Calling conventions:
ra=0x400818, sp=0x7f968898, s0=0x7f968a8c, s1=0x1, s2=0x7f968a84, s3=0x4006b0, s4=0x77e029d0, 
s5=0x77e01660, s6=0x77e14c38, s7=0, fp=0x7f968898, gp=0x419000
sizeof(jmp_buf)=104 (unsigned long long [13])
    0x18 0x08 0x40 0x00 # ra, the return address
    0x98 0x88 0x96 0x7f # sp
    0x8c 0x8a 0x96 0x7f # s0
    0x01 0x00 0x00 0x00 # s1
    0x84 0x8a 0x96 0x7f # s2
    0xb0 0x06 0x40 0x00 # s3
    0xd0 0x29 0xe0 0x77 # s4
    0x60 0x16 0xe0 0x77 # s5
    0x38 0x4c 0xe1 0x77 # s6
    0x00 0x00 0x00 0x00 # s7
    0x98 0x88 0x96 0x7f # fp/s8
    0x00 0x90 0x41 0x00 # gp
    0x00 0x00 0x00 0x00 
    ............
    0x00 0x00 0x00 0x00 

Note: 最简单的办法,就是将jmpbuf[1],直接设置为_sp也就是协程从堆上开辟的堆栈地址,但这样依赖于glibc的布局,我们还是选择使用汇编实现,自己定义jmpbuf如何使用,不给以后挖坑了。

可以调试下setjmp,在gdb执行disassemble,就可以看到它保存的寄存器:

sw  ra,0(a0) 
sw  sp,4(a0) 
sw  s0,8(a0) 
sw  s1,12(a0)
sw  s2,16(a0)
sw  s3,20(a0)
sw  s4,24(a0)
sw  s5,28(a0)
sw  s6,32(a0)
sw  s7,36(a0)
sw  s8,40(a0)
sw  gp,44(a0)
jr  ra

同样的,可以看下longmp,可以发现恢复寄存器后,就是直接跳转到ra的地址:

lw  ra,0(a0)
lw  sp,4(a0)
......
jr  ra

Note: 只是用这种方式确认下使用的寄存器,我们并不需要严格按照glibc的方式布局jmpbuf,因为各种版本的glibc实现都不相同,我们使用汇编实现所有平台的setjmp时,可以让布局尽量一致。

ASM

接下来就是关键的用汇编实现寄存器保存,根据OS的不同,分成了不同的汇编文件:

  • md_linux.S,所有Linux平台的汇编,根据CPU架构(宏)实现不同平台的函数。
  • md_darwin.S,针对OSX/Mac的汇编,目前实现了x86_64架构,M1(aarch64)的支持情况请看最开始的表格。
  • md_cygwin64.S,针对Cygwin64/Windows的汇编,目前实现了x86_64架构,还没有支持32位Windows。

显然OpenWRT/MIPS是Linux平台,所以我们先实现两个空函数:

#elif defined(__mips__)
    #define JB_SP  0

    	.text

    	.globl _st_md_cxt_save
    _st_md_cxt_save:
    	.size _st_md_cxt_save, .-_st_md_cxt_save

    	.globl _st_md_cxt_restore
    _st_md_cxt_restore:
    	.size _st_md_cxt_restore, .-_st_md_cxt_restore

#endif

Note: 实际上,_st_md_cxt_save就是setjmp,而_st_md_cxt_restore就是longjmp

然后我们编译ST,用verify.c验证这两个函数是否正常工作。

cd tools/verify && make && ./verify

root@OpenWrt:~# ./verify
gp=0x419000, fp=0x7fe3af20, sp=0x7fe3af20, s0=0x7fe3b10c, s1=0x1, s2=0x7fe3b104, s3=0x400670, 
s4=0x77e759d0, s5=0x77e74660, s6=0x77e87c38, s7=0
    0x00 0x00 0x00 0x00 
    ............
    0x00 0x00 0x00 0x00 

Note: 由于没有实现,所以jmpbuf都是空的。

最后,就是用汇编实现函数,需要找下平台相关的资料。也可以直接通过调试setjmp和longjmp的实现,来学习如何将寄存器保存到jmpbuf,以及如何从jmpbuf恢复):

root@OpenWrt:~# gdb porting
(gdb) b main
(gdb) r
(gdb) layout next
(gdb) layout next

Note: 按CTRL+X A退出GDB的文本图形模式,进入普通的GDB模式。

Note: 如果想知道汇编怎么实现,可以看下C语言被翻译成什么汇编,调试下就能知道个大概齐,再配合搜索引擎找找资料,很快就能知道怎么实现了。

Build

实现汇编后,有些地方需要修改,比如MIPS的jmpbuf定义不太一样。

一般的jmpbuf定义如下,字段名是__jmpbuf

     typedef struct __jmp_buf_tag jmp_buf[1];
     struct __jmp_buf_tag {
         __jmp_buf __jmpbuf;
         int __mask_was_saved;
         __sigset_t __saved_mask;
     };

而在MIPS中定义的字段不同,它的字段名是__jb

    typedef struct __jmp_buf_tag {
        __jmp_buf __jb;
        unsigned long __fl;
        unsigned long __ss[128/sizeof(long)];
    } jmp_buf[1];

因此,需要我们在md.h中定义如何使用jmpbuf,SP是在__jb[0]的位置:

        #elif defined(__mips__)
            /* https://github.com/ossrs/state-threads/issues/21 */
            #define MD_USE_BUILTIN_SETJMP
            #define MD_GET_SP(_t) *((long *)&((_t)->context[0].__jb[0]))

Note: 在MIPS中,指针是4字节的,而__jblong long类型8字节的,所以需要转换类型。

其中,宏定义MD_GET_SP,就是如何将jmpbuf的SP,更新为协程的栈地址。这是在MD_INIT_CONTEXT,也就是创建协程时调用的。

Note: 创建协程时,当时的SP可能是在另外一个协程,所以创建的协程并不能直接使用当前的SP,而需要从堆上重新申请虚拟的stack,所以在setjmp后需要更新jmpbuf中的SP地址。

HelloWorld

编译成功后,我们使用一个小工具验证,会初始化ST后,不断打印日志,参考helloworld.c

root@OpenWrt:~# ./helloworld 
#000, Hello, state-threads world!
#001, Hello, state-threads world!
#002, Hello, state-threads world!
#003, Hello, state-threads world!

大功告成。

Support Loongson CPU arch

龙芯:https://www.loongson.cn/

适配芯片(CPU):

适配系统(OS):

指令集和汇编:

Supported List

已经支持的龙芯CPU列表:

Macros

先确定龙芯的宏定义:

g++ -dM -E - </dev/null |grep -i loong

可以看到应该是__loongarch__ ,或者__loongarch64

Note: __loongarch64__loongarch32都属于__loongarch__,服务器都是64bits,所以我们只需要适配__loongarch64就可以了。

Porting

寄存器的具体用法参考下图:

image

  • r0, zero, Constant zero
  • r1, ra, Return address,返回地址。
  • r2, tp, TLS(Thread Local Storage),和TLS相关。
  • r3, sp, Stack pointer,堆栈寄存器。
  • r4-r11, a0-a7, Argument registers,参数寄存器。
  • r4-r5, v0-v1, Return value,返回值。
  • r12-r20, t0-t8, Temp registers,临时寄存器。
  • r21, x, Reserved, 保留寄存器。
  • r22, fp, Frame pointer,Frame寄存器。
  • r23-r31, s0-s8, Subroutine register variable,子函数寄存器。

主要保存的寄存器如下:

  • r3, sp, Stack pointer,堆栈寄存器。
  • r22, fp, Frame pointer,Frame寄存器。
  • r23-r31, s0-s8, Subroutine register variable,子函数寄存器。

修改并编译porting.c,调试程序,设置显示汇编:

(gdb) set  disassemble-next-line on

观察下函数调用指令,调试foo_return_zero

47	    int r0 = foo_return_zero();
=> 0x00000001200008bc <main+196>:	00 78 00 54	bl	120(0x78) # 0x120000934 <foo_return_zero>
   0x00000001200008c0 <main+200>:	8c 00 15 00	move	$r12,$r4
   0x00000001200008c4 <main+204>:	cc b2 bf 29	st.w	$r12,$r22,-20(0xfec)

foo_return_zero () at porting.c:59
59	{
=> 0x0000000120000934 <foo_return_zero+0>:	63 c0 ff 02	addi.d	$r3,$r3,-16(0xff0)
   0x0000000120000938 <foo_return_zero+4>:	76 20 c0 29	st.d	$r22,$r3,8(0x8)
   0x000000012000093c <foo_return_zero+8>:	76 40 c0 02	addi.d	$r22,$r3,16(0x10)

60	    return 0;
=> 0x0000000120000940 <foo_return_zero+12>:	0c 00 15 00	move	$r12,$r0

61	}
=> 0x0000000120000944 <foo_return_zero+16>:	84 01 15 00	move	$r4,$r12
   0x0000000120000948 <foo_return_zero+20>:	76 20 c0 28	ld.d	$r22,$r3,8(0x8)
   0x000000012000094c <foo_return_zero+24>:	63 40 c0 02	addi.d	$r3,$r3,16(0x10)
   0x0000000120000950 <foo_return_zero+28>:	20 00 00 4c	jirl	$r0,$r1,0
  • bl:调用函数的指令。
  • r4:保存返回值的寄存器。
  • r3:作为sp寄存器,进入函数和返回时对r3的操作相当于push和pop。

看下带参数的函数foo_return_one_arg1

49	    int r2 = foo_return_one_arg1(r1);
=> 0x00000001200008d4 <main+220>:	cc ea ff 24	ldptr.w	$r12,$r22,-24(0xffe8)
   0x00000001200008d8 <main+224>:	84 01 15 00	move	$r4,$r12
   0x00000001200008dc <main+228>:	00 98 00 54	bl	152(0x98) # 0x120000974 <foo_return_one_arg1>
   0x00000001200008e0 <main+232>:	8c 00 15 00	move	$r12,$r4
   0x00000001200008e4 <main+236>:	cc 92 bf 29	st.w	$r12,$r22,-28(0xfe4)

foo_return_one_arg1 (r0=1) at porting.c:69
69	{
=> 0x0000000120000974 <foo_return_one_arg1+0>:	63 80 ff 02	addi.d	$r3,$r3,-32(0xfe0)
   0x0000000120000978 <foo_return_one_arg1+4>:	76 60 c0 29	st.d	$r22,$r3,24(0x18)
   0x000000012000097c <foo_return_one_arg1+8>:	76 80 c0 02	addi.d	$r22,$r3,32(0x20)
   0x0000000120000980 <foo_return_one_arg1+12>:	8c 00 15 00	move	$r12,$r4
   0x0000000120000984 <foo_return_one_arg1+16>:	8c 81 40 00	slli.w	$r12,$r12,0x0
   0x0000000120000988 <foo_return_one_arg1+20>:	cc b2 bf 29	st.w	$r12,$r22,-20(0xfec)

70	    return r0 + 2;
=> 0x000000012000098c <foo_return_one_arg1+24>:	cc b2 bf 28	ld.w	$r12,$r22,-20(0xfec)
   0x0000000120000990 <foo_return_one_arg1+28>:	8c 09 80 02	addi.w	$r12,$r12,2(0x2)

71	}
=> 0x0000000120000994 <foo_return_one_arg1+32>:	84 01 15 00	move	$r4,$r12
   0x0000000120000998 <foo_return_one_arg1+36>:	76 60 c0 28	ld.d	$r22,$r3,24(0x18)
   0x000000012000099c <foo_return_one_arg1+40>:	63 80 c0 02	addi.d	$r3,$r3,32(0x20)
   0x00000001200009a0 <foo_return_one_arg1+44>:	20 00 00 4c	jirl	$r0,$r1,0
  • r4,第一个参数。也是返回值。

调试setjmp函数:

#0  print_jmpbuf () at porting.c:141
141	    int r0 = setjmp(ctx);
=> 0x00000001200009ec <print_jmpbuf+72>:	cc c2 f9 02	addi.d	$r12,$r22,-400(0xe70)
   0x00000001200009f0 <print_jmpbuf+76>:	84 01 15 00	move	$r4,$r12
   0x00000001200009f4 <print_jmpbuf+80>:	ff 7f fc 57	bl	-900(0xffffc7c) # 0x120000670 <_setjmp@plt>
   0x00000001200009f8 <print_jmpbuf+84>:	8c 00 15 00	move	$r12,$r4
   0x00000001200009fc <print_jmpbuf+88>:	cc b2 be 29	st.w	$r12,$r22,-84(0xfac)

(gdb) disassemble 
Dump of assembler code for function _setjmp@plt:
=> 0x0000000120000670 <+0>:	pcaddu12i	$r15,8(0x8)
   0x0000000120000674 <+4>:	ld.d	$r15,$r15,-1624(0x9a8)
   0x0000000120000678 <+8>:	pcaddu12i	$r13,0
   0x000000012000067c <+12>:	jirl	$r0,$r15,0

(gdb) disassemble 
Dump of assembler code for function __sigsetjmp:
=> 0x000000fff7e943b8 <+0>:	st.d	$r1,$r4,0
   0x000000fff7e943bc <+4>:	st.d	$r3,$r4,8(0x8)
   0x000000fff7e943c0 <+8>:	st.d	$r21,$r4,16(0x10)
   0x000000fff7e943c4 <+12>:	st.d	$r22,$r4,24(0x18)
   0x000000fff7e943c8 <+16>:	st.d	$r23,$r4,32(0x20)
   0x000000fff7e943cc <+20>:	st.d	$r24,$r4,40(0x28)
   0x000000fff7e943d0 <+24>:	st.d	$r25,$r4,48(0x30)
   0x000000fff7e943d4 <+28>:	st.d	$r26,$r4,56(0x38)
   0x000000fff7e943d8 <+32>:	st.d	$r27,$r4,64(0x40)
   0x000000fff7e943dc <+36>:	st.d	$r28,$r4,72(0x48)
   0x000000fff7e943e0 <+40>:	st.d	$r29,$r4,80(0x50)
   0x000000fff7e943e4 <+44>:	st.d	$r30,$r4,88(0x58)
   0x000000fff7e943e8 <+48>:	st.d	$r31,$r4,96(0x60)
   0x000000fff7e943ec <+52>:	fst.d	$f24,$r4,104(0x68)
   0x000000fff7e943f0 <+56>:	fst.d	$f25,$r4,112(0x70)
   0x000000fff7e943f4 <+60>:	fst.d	$f26,$r4,120(0x78)
   0x000000fff7e943f8 <+64>:	fst.d	$f27,$r4,128(0x80)
   0x000000fff7e943fc <+68>:	fst.d	$f28,$r4,136(0x88)
   0x000000fff7e94400 <+72>:	fst.d	$f29,$r4,144(0x90)
   0x000000fff7e94404 <+76>:	fst.d	$f30,$r4,152(0x98)
   0x000000fff7e94408 <+80>:	fst.d	$f31,$r4,160(0xa0)
   0x000000fff7e9440c <+84>:	b	4(0x4) # 0xfff7e94410 <__sigjmp_save>

(gdb) disassemble 
Dump of assembler code for function __sigjmp_save:
=> 0x000000fff7e94410 <+0>:	addi.d	$r3,$r3,-16(0xff0)
   0x000000fff7e94414 <+4>:	stptr.d	$r23,$r3,0
   0x000000fff7e94418 <+8>:	st.d	$r1,$r3,8(0x8)
   0x000000fff7e9441c <+12>:	move	$r23,$r4
   0x000000fff7e94420 <+16>:	bnez	$r5,32(0x20) # 0xfff7e94440 <__sigjmp_save+48>
   0x000000fff7e94424 <+20>:	ld.d	$r1,$r3,8(0x8)
   0x000000fff7e94428 <+24>:	st.w	$r5,$r23,168(0xa8)
   0x000000fff7e9442c <+28>:	move	$r4,$r0
   0x000000fff7e94430 <+32>:	ldptr.d	$r23,$r3,0
   0x000000fff7e94434 <+36>:	addi.d	$r3,$r3,16(0x10)
   0x000000fff7e94438 <+40>:	jirl	$r0,$r1,0
   0x000000fff7e9443c <+44>:	andi	$r0,$r0,0x0
   0x000000fff7e94440 <+48>:	addi.d	$r6,$r4,176(0xb0)
   0x000000fff7e94444 <+52>:	move	$r5,$r0
   0x000000fff7e94448 <+56>:	move	$r4,$r0
   0x000000fff7e9444c <+60>:	bl	1924(0x784) # 0xfff7e94bd0 <sigprocmask>
   0x000000fff7e94450 <+64>:	ld.d	$r1,$r3,8(0x8)
   0x000000fff7e94454 <+68>:	sltui	$r5,$r4,1(0x1)
   0x000000fff7e94458 <+72>:	st.w	$r5,$r23,168(0xa8)
   0x000000fff7e9445c <+76>:	move	$r4,$r0
   0x000000fff7e94460 <+80>:	ldptr.d	$r23,$r3,0
   0x000000fff7e94464 <+84>:	addi.d	$r3,$r3,16(0x10)
   0x000000fff7e94468 <+88>:	jirl	$r0,$r1,0

(gdb) p &ctx 
$25 = (jmp_buf *) 0xffffff3110
(gdb) p/x $r12
$27 = 0xffffff3110

调试longjmp函数:

(gdb) disassemble 
Dump of assembler code for function __longjmp:
=> 0x000000fff7e944c0 <+0>:	ld.d	$r1,$r4,0
   0x000000fff7e944c4 <+4>:	ld.d	$r3,$r4,8(0x8)
   0x000000fff7e944c8 <+8>:	ld.d	$r21,$r4,16(0x10)
   0x000000fff7e944cc <+12>:	ld.d	$r22,$r4,24(0x18)
   0x000000fff7e944d0 <+16>:	ld.d	$r23,$r4,32(0x20)
   0x000000fff7e944d4 <+20>:	ld.d	$r24,$r4,40(0x28)
   0x000000fff7e944d8 <+24>:	ld.d	$r25,$r4,48(0x30)
   0x000000fff7e944dc <+28>:	ld.d	$r26,$r4,56(0x38)
   0x000000fff7e944e0 <+32>:	ld.d	$r27,$r4,64(0x40)
   0x000000fff7e944e4 <+36>:	ld.d	$r28,$r4,72(0x48)
   0x000000fff7e944e8 <+40>:	ld.d	$r29,$r4,80(0x50)
   0x000000fff7e944ec <+44>:	ld.d	$r30,$r4,88(0x58)
   0x000000fff7e944f0 <+48>:	ld.d	$r31,$r4,96(0x60)
   0x000000fff7e944f4 <+52>:	fld.d	$f24,$r4,104(0x68)
   0x000000fff7e944f8 <+56>:	fld.d	$f25,$r4,112(0x70)
   0x000000fff7e944fc <+60>:	fld.d	$f26,$r4,120(0x78)
   0x000000fff7e94500 <+64>:	fld.d	$f27,$r4,128(0x80)
   0x000000fff7e94504 <+68>:	fld.d	$f28,$r4,136(0x88)
   0x000000fff7e94508 <+72>:	fld.d	$f29,$r4,144(0x90)
   0x000000fff7e9450c <+76>:	fld.d	$f30,$r4,152(0x98)
   0x000000fff7e94510 <+80>:	fld.d	$f31,$r4,160(0xa0)
   0x000000fff7e94514 <+84>:	sltui	$r4,$r5,1(0x1)
   0x000000fff7e94518 <+88>:	add.d	$r4,$r4,$r5
   0x000000fff7e9451c <+92>:	jirl	$r0,$r1,0
  • jirl,b,都是跳转指令。

ASM

具体寄存器的布局,我们选择的是:

    #define JB_SP  0    /* R3, SP, Stack pointer */
    #define JB_RA  1    /* R1, RA, Return address */
    #define JB_FP  2    /* FP/R22 Frame pointer */
    #define JB_S0  3    /* R23-R31, S0-S8, Subroutine register variable */
    #define JB_S1  4    /* R23-R31, S0-S8, Subroutine register variable */
    #define JB_S2  5    /* R23-R31, S0-S8, Subroutine register variable */
    #define JB_S3  6    /* R23-R31, S0-S8, Subroutine register variable */
    #define JB_S4  7    /* R23-R31, S0-S8, Subroutine register variable */
    #define JB_S5  8    /* R23-R31, S0-S8, Subroutine register variable */
    #define JB_S6  9    /* R23-R31, S0-S8, Subroutine register variable */
    #define JB_S7  10   /* R23-R31, S0-S8, Subroutine register variable */
    #define JB_S8  11   /* R23-R31, S0-S8, Subroutine register variable */

我们将SP放在最开始的8字节,固定的位置,主要是为了方便更换SP。

适配完成后,可以用verify工具,查看jmpbuf的布局:

[root@host-192-168-100-6 verify]# pwd
/root/git/state-threads/tools/verify
[root@host-192-168-100-6 verify]# ./verify 
sp=0xfffbed2b20, ra=0x1200007ac, fp=0xfffbed2cf0, s0=(nil), s1=0x1200009e8, s2=(nil), s3=(nil), s4=0xaab5cb9fc0, s5=0xaab5baad30, s6=0xaab5bae8b0, s7=(nil), s7=(nil)
    0x20 0x2b 0xed 0xfb 0xff 0x00 0x00 0x00 
    0xac 0x07 0x00 0x20 0x01 0x00 0x00 0x00 
    0xf0 0x2c 0xed 0xfb 0xff 0x00 0x00 0x00 
    0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
    0xe8 0x09 0x00 0x20 0x01 0x00 0x00 0x00 
    0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
    0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
    0xc0 0x9f 0xcb 0xb5 0xaa 0x00 0x00 0x00 
    0x30 0xad 0xba 0xb5 0xaa 0x00 0x00 0x00 
    0xb0 0xe8 0xba 0xb5 0xaa 0x00 0x00 0x00 
    0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
    0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
    0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
    0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
    0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
    0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
    0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
    0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
    0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
    0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
    0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

运行helloworld,会创建ST协程不断打印:

[root@host-192-168-100-6 helloworld]# pwd
/root/git/state-threads/tools/helloworld
[root@host-192-168-100-6 helloworld]# ./helloworld 
#000, Hello, state-threads world!
#001, Hello, state-threads world!
#002, Hello, state-threads world!

这意味着ST就成功适配了。

Support aarch64 for armv8.

ossrs/srs#1547
支持交叉编译:ARM,MIPS,armv8,aarch64等

ossrs/srs#1282
支持直接编译:ARM和龙芯等平台,For ARM, armv8, aarch64, etc.

Unknown CPU

如果没有识别出来CPU,可以明确指定,比如:

make linux-debug EXTRA_CFLAGS="-D__aarch64__"

Support MSG_ZEROCOPY for streaming server.

参考:ossrs/srs#307 (comment)

目前内核最热的函数是copy_user_enhanced_fast_string,它主要是将用户空间的数据,拷贝到内核,可以想到是因为要将发送的UDP的payload拷贝到内核发送。

同样的,TCP也是这个是瓶颈,实际上Linux内核支持了很多种零拷贝方式,比如sendfile、splice、tee还有MSG_ZEROCOPY

它提到是有代价的,如果要发送大量的数据,那么比较值得:

Copy avoidance is not a free lunch. As implemented, with page pinning, it replaces 
per byte copy cost with page accounting and completion notification overhead. As a 
result, MSG_ZEROCOPY is generally only effective at writes over around 10 KB.

若使用sendmmsg,600Kbps码率的流,1个连接观看时一次发送50KB数据,1000个连接观看一次发送8.5MB的数据,2000个连接观看一次发送14.4MB数据,3000个连接观看一次发送20MB数据。

这可能需要修改ST做支持,参考:#13

增加可以将协程内存释放的功能

当前协程一旦开启了就永远不释放了,导致我们使用内存一直是只能往上升,对运维很困惑.
我将释放协程改为同时释放内存,但是一使用就崩溃,是内存哪里踩坏了吗?

/*

  • Free the stack for the current thread
    */
    void _st_stack_free(_st_stack_t ts)
    {
    if (!ts)
    return;
    _st_delete_stk_segment(ts->vaddr,ts->vaddr_size);
    /
    Put the stack on the free list */
    // ST_APPEND_LINK(&ts->links, _st_free_stacks.prev);
    // _st_num_free_stacks++;
    }

aosp中编译srs-server及libst报错

目前我在做的是将srs及libst源码放入android源码中交叉编译,然后集成进安卓手机中(手机CPU架构aarch64)。
android源码现在是用clang 编译。
在编译libst时,这个MD_GET_SP宏报错

#elif defined(__aarch64__)
            /* https://github.com/ossrs/state-threads/issues/9 */
            #define MD_STACK_GROWS_DOWN
            #define MD_USE_BUILTIN_SETJMP
            #define MD_GET_SP(_t) ((_t)->context[0].__jmpbuf[13])

报错内容:

system/core/libst/sched.c:616:9: error: member reference base type 'long' is not a structure or union
        _ST_INIT_CONTEXT(thread, stack->sp, _st_thread_main);
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
system/core/libst/common.h:442:30: note: expanded from macro '_ST_INIT_CONTEXT'
    #define _ST_INIT_CONTEXT MD_INIT_CONTEXT
                             ^
system/core/libst/md.h:472:13: note: expanded from macro 'MD_INIT_CONTEXT'
            MD_GET_SP(_thread) = (long) (_sp);         \
            ^~~~~~~~~~~~~~~~~~
system/core/libst/md.h:429:52: note: expanded from macro 'MD_GET_SP'
            #define MD_GET_SP(_t) ((_t)->context[0].__jmpbuf[13])
                                   ~~~~~~~~~~~~~~~~^~~~~~~~~

请问这个是什么原因啊?是clang的问题吗?

如果协程释放将内存也同时释放,为什么会崩溃

我的使用场景是会同时创建大量协程,然后也会有大量释放,因为我们协程一直是保存在free_list里,导致内存始终是在一个高位,即使没什么负载的时候也是这样,对运维很困惑.所以就想把协程释放的同时内存也一起释放.
改动的地方在下面,不知道有什么问题,不管用mmap还是malloc模式,情况都是一样的
`/*

  • Free the stack for the current thread
    */
    void _st_stack_free(_st_stack_t ts)
    {
    if (!ts)
    return;
    _st_delete_stk_segment(ts->vaddr,ts->vaddr_size);
    // /
    Put the stack on the free list */
    // ST_APPEND_LINK(&ts->links, _st_free_stacks.prev);
    // _st_num_free_stacks++;
    }`

Program received signal SIGSEGV, Segmentation fault.
_int_free (av=0x7ffff7498760 <main_arena>, p=0xe5aff0, have_lock=0) at malloc.c:4010
4010 p->fd = fwd;
(gdb) bt
#0 _int_free (av=0x7ffff7498760 <main_arena>, p=0xe5aff0, have_lock=0) at malloc.c:4010
#1 0x0000000000687dfb in _st_delete_stk_segment (vaddr=0xe5b000 "8\217I\367\377\177", size=73728) at stk.c:157
#2 0x0000000000687db9 in _st_stack_free (ts=0xe50830) at stk.c:114
#3 0x0000000000686dfb in st_thread_exit (retval=0x0) at sched.c:303
#4 0x0000000000686ff0 in _st_thread_main () at sched.c:366
#5 0x000000000068784c in st_thread_create (start=0x5d584e SrsFastCoroutine::pfn(void*), arg=0x1e08400, joinable=1, stk_size=65536) at sched.c:694

ST的多线程优化,参考dart的isolate机制

看了高性能、高并发、高扩展性和可读性的网络服务器架构:StateThreads文章
st 只是模拟线程的行为,在单核上性能比较好,
如果纯计算话,不如当线程了
在多核呢?看相关代码只提到vp 虚拟处理器概念
但是具体如何使用呢?

vp初始化先创建idle thread,然后根据I/O事件驱动其它threads,这就是ST的多核架构。

Support Multiple Threads for Linux and Darwin only.

For WebRTC or UDP transport system, multiple threads or CPUs is essential important, please read ossrs/srs#2188

The first step, is to simplify the state-threads. We should remove the dead code for UDP server:

  • Remove multiple OS support, only for Linux(CentOS,Ubuntu,etc) and Darwin(macOS).
  • Remove the examples and extensions.
  • Remove the poll support, only Linux epoll and Darwin kqueue.
  • Remove the support for multiple processes, for single process only.
  • Stack always grows from top to down.
  • Remove the deprecated serialize accept.

Then, we should use gcc __thread for multiple threads:

AppleM1: Support Apple Silicon M1(aarch64).

Support Apple M1(aarch64) CPU for MacPro, while the OS is OSX/Darwin.

Instructions

  • BL: Branch with Link branches to a PC-relative offset, setting the register X30 to PC+4. It provides a hint that this is a subroutine call.
  • RET: Return from subroutine, branches unconditionally to an address in a register, with a hint that this is a subroutine return.
  • BR: Branch to register, branches unconditionally to an address in a register, with a hint that this is not a subroutine return.

Note: BL会改变LR(X30)的值为PC+4,也就是函数的返回地址。BR和RET不会改变,返回时调用这个指令。

Links

About How to Support EPOLLET Issue in ST

In the usage scenario, it is necessary to change the trigger of EPOLL to EPOLLET. It is normal to directly call the native interface in the system. However, direct modification in ST is invalid. The modifications are as follows.

        if (events != old_events) {
            op = old_events ? EPOLL_CTL_MOD : EPOLL_CTL_ADD;
            ev.events = events | EPOLLET;
            ev.data.fd = fd;
            if (epoll_ctl(_st_epoll_data->epfd, op, fd, &ev) < 0 && (op != EPOLL_CTL_ADD || errno != EEXIST))
                break;
            if (op == EPOLL_CTL_ADD) {
                _st_epoll_data->evtlist_cnt++;
                if (_st_epoll_data->evtlist_cnt > _st_epoll_data->evtlist_size)
                    _st_epoll_evtlist_expand();
            }
        }

TRANS_BY_GPT4

Plan: Migrate to C++98/MIT. 计划迁移到C++98/MIT.

We plan to replace it progressively by C++ 98 code and switch to MIT license.

Summary

关于ST,我计划一点点换成C++ 98/MIT的,它目前有几个痛点:

  1. 不维护了,最初的作者们估计都退休了,很多新的CPU和系统无法支持,这也是srs分支做了很多修改的原因。
  2. 协议是MPL,其实是Mozilla的协议,Firefox也是这个协议,和LGPL类似,是OSI的标准协议,但这个用的少,不知道有什么坑没有。
  3. 那些宏很奇怪,都是操作结构体的,之前展示过,应该换成C++ 98的,和C没什么差别,只是基本封装。C++也可以继续暴露C的API。

其实本身ST的LICENSE就是分多部分授权的:

  1. ST的库是MPL授权。
  2. ST的头文件,和一般的做法一样,是免责可以随便用。GPL的Linux也一样。
  3. Example是BSD授权的。

原始LICENSE是这么说的:

The State Threads library is ...... Mozilla Public License (MPL) version 1.1 
or the GNU General Public License (GPL) version 2 or later.

All source code in the "examples" directory is distributed under the BSD style license.

参考:State-Threads LICENSE

Why C++98?

为何选择C++ 98?其实我们并不想换语言,C有最好的一致性,另外就是C++ 98了,它实际上也是C++ ANSI标准C++。估计支持最广泛的C++,就是C++98了。

此外,我们只使用C++98中的封装功能,只使用class和成员变量和函数,不使用虚函数,不使用继承,不使用模板,不使用任何高级能力。因为本质上ST在语言上,使用C是最合适的,但是它有大量的宏,本质上就是实现了C++的封装能力,但造成了无法维护的问题,比如:

/* Insert element "_e" into the list, after "_l" */
#define ST_INSERT_AFTER(_e,_l)     \
    ST_BEGIN_MACRO         \
    (_e)->next = (_l)->next; \
    (_e)->prev = (_l);     \
    (_l)->next->prev = (_e); \
    (_l)->next = (_e);     \
    ST_END_MACRO

/* Insert an element "_e" at the head of the list "_l" */
#define ST_INSERT_LINK(_e,_l) ST_INSERT_AFTER(_e,_l)

#define _ST_RUNQ                        (_st_this_vp.run_q)
#define _ST_IOQ                         (_st_this_vp.io_q)
#define _ST_ZOMBIEQ                     (_st_this_vp.zombie_q)

#define _ST_ADD_SLEEPQ(_thr, _timeout)  _st_add_sleep_q(_thr, _timeout)
#define _ST_DEL_SLEEPQ(_thr)        _st_del_sleep_q(_thr)

#define _ST_EPOLL_READ_CNT(fd)   (_st_epoll_data->fd_data[fd].rd_ref_cnt)
#define _ST_EPOLL_WRITE_CNT(fd)  (_st_epoll_data->fd_data[fd].wr_ref_cnt)
#define _ST_EPOLL_EXCEP_CNT(fd)  (_st_epoll_data->fd_data[fd].ex_ref_cnt)
#define _ST_EPOLL_REVENTS(fd)    (_st_epoll_data->fd_data[fd].revents)

除非天天看这些代码,偶然看一次,修一修bug,是非常难以理解这些宏定义的。

LICENSE

Original files of State-Threads:

  • examples: BSD, Eliminated.
  • common.h: MPL-1.1 OR GPL-2.0-or-later
  • md.h: MPL-1.1 OR GPL-2.0-or-later
  • public.h: MPL-1.1 OR GPL-2.0-or-later
  • event.c: MPL-1.1 OR GPL-2.0-or-later
  • io.c: MPL-1.1 OR GPL-2.0-or-later
  • key.c: MPL-1.1 OR GPL-2.0-or-later
  • sched.c: MPL-1.1 OR GPL-2.0-or-later
  • stk.c: MPL-1.1 OR GPL-2.0-or-later
  • sync.c: MPL-1.1 OR GPL-2.0-or-later
  • md_linux.S or md.S: MPL-1.1 OR GPL-2.0-or-later

New files and directories:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.