Giter Site home page Giter Site logo

cult's Introduction

AsmJit

AsmJit is a lightweight library for machine code generation written in C++ language.

See asmjit.com page for more details, examples, and documentation.

Documentation

Contributing

Breaking Changes

Breaking the API is sometimes inevitable, what to do?

Project Organization

  • / - Project root
    • src - Source code
      • asmjit - Source code and headers (always point include path in here)
        • core - Core API, backend independent except relocations
        • arm - ARM specific API, used only by ARM and AArch64 backends
        • x86 - X86 specific API, used only by X86 and X64 backends
    • test - Unit and integration tests (don't embed in your project)
    • tools - Tools used for configuring, documenting, and generating files

Ports

  • 32-bit ARM/Thumb port (work in progress)
  • RISC-V port (not in progress, help welcome)

Support

Notable Donors List:

Authors & Maintainers

cult's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cult's Issues

Segfault on Older CPUs

When trying to run cult on older CPUs (to be exact, U7300, family 6 model 23 stepping 10), segfault occurs. GDB is pointing the error to the "mov %cr0, %rax" line in the following dump:

push rbx                                ; 53
push rbp                                ; 55
push r12                                ; 4154
push r13                                ; 4155
push r14                                ; 4156
push r15                                ; 4157
sub rsp, 1032                           ; 4881EC08040000
mov rbx, rsi                            ; 488BDE
mov ebp, edi                            ; 8BEF
mov [rsp], rbx                          ; 48891C24
cpuid                                   ; 0FA2
rdtsc                                   ; 0F31
mov [rsp+8], eax                        ; 89442408
mov [rsp+12], edx                       ; 8954240C
test ebp, ebp                           ; 85ED
jz L1                                   ; 0F84........
.align 16
L0:
adc cl, al                              ; 12C8
adc dl, cl                              ; 12D1
adc bl, dl                              ; 12DA
rex adc sil, bl                         ; 4012F3
rex adc dil, sil                        ; 4012FE
rex adc r8b, dil                        ; 4412C7
rex adc r9b, r8b                        ; 4512C8
rex adc r10b, r9b                       ; 4512D1
rex adc r11b, r10b                      ; 4512DA
rex adc r12b, r11b                      ; 4512E3
rex adc r13b, r12b                      ; 4512EC
rex adc r14b, r13b                      ; 4512F5
rex adc r15b, r14b                      ; 4512FE
rex adc al, r15b                        ; 4112C7
adc cl, al                              ; 12C8
adc dl, cl                              ; 12D1
adc bl, dl                              ; 12DA
rex adc sil, bl                         ; 4012F3
rex adc dil, sil                        ; 4012FE
rex adc r8b, dil                        ; 4412C7
rex adc r9b, r8b                        ; 4512C8
rex adc r10b, r9b                       ; 4512D1
rex adc r11b, r10b                      ; 4512DA
rex adc r12b, r11b                      ; 4512E3
rex adc r13b, r12b                      ; 4512EC
rex adc r14b, r13b                      ; 4512F5
rex adc r15b, r14b                      ; 4512FE
rex adc al, r15b                        ; 4112C7
adc cl, al                              ; 12C8
adc dl, cl                              ; 12D1
adc bl, dl                              ; 12DA
rex adc sil, bl                         ; 4012F3
rex adc dil, sil                        ; 4012FE
rex adc r8b, dil                        ; 4412C7
rex adc r9b, r8b                        ; 4512C8
rex adc r10b, r9b                       ; 4512D1
rex adc r11b, r10b                      ; 4512DA
rex adc r12b, r11b                      ; 4512E3
rex adc r13b, r12b                      ; 4512EC
rex adc r14b, r13b                      ; 4512F5
rex adc r15b, r14b                      ; 4512FE
rex adc al, r15b                        ; 4112C7
adc cl, al                              ; 12C8
adc dl, cl                              ; 12D1
adc bl, dl                              ; 12DA
rex adc sil, bl                         ; 4012F3
rex adc dil, sil                        ; 4012FE
rex adc r8b, dil                        ; 4412C7
rex adc r9b, r8b                        ; 4512C8
rex adc r10b, r9b                       ; 4512D1
rex adc r11b, r10b                      ; 4512DA
rex adc r12b, r11b                      ; 4512E3
rex adc r13b, r12b                      ; 4512EC
rex adc r14b, r13b                      ; 4512F5
rex adc r15b, r14b                      ; 4512FE
rex adc al, r15b                        ; 4112C7
adc cl, al                              ; 12C8
adc dl, cl                              ; 12D1
adc bl, dl                              ; 12DA
rex adc sil, bl                         ; 4012F3
rex adc dil, sil                        ; 4012FE
rex adc r8b, dil                        ; 4412C7
rex adc r9b, r8b                        ; 4512C8
rex adc r10b, r9b                       ; 4512D1
sub ebp, 1                              ; 83ED01
jnz L0                                  ; 0F8546FFFFFF
L1:
mov rax, cr0                            ; 0F20C0
mov cr0, rax                            ; 0F22C0
rdtsc                                   ; 0F31
mov esi, eax                            ; 8BF0
mov edi, edx                            ; 8BFA
mov rbx, [rsp]                          ; 488B1C24
sub esi, [rsp+8]                        ; 2B742408
sbb edi, [rsp+12]                       ; 1B7C240C
mov [rbx], esi                          ; 8933
mov [rbx+4], edi                        ; 897B04
add rsp, 1032                           ; 4881C408040000
pop r15                                 ; 415F
pop r14                                 ; 415E
pop r13                                 ; 415D
pop r12                                 ; 415C
pop rbp                                 ; 5D
pop rbx                                 ; 5B
ret                                     ; C3

(Also attached here the CPUID dump)

CPUID:
  In:00000000 Sub:00000000 -> EAX:0000000D EBX:756E6547 ECX:6C65746E EDX:49656E69
  In:00000001 Sub:00000000 -> EAX:0001067A EBX:00020800 ECX:0C08E3BD EDX:BFEBFBFF
  In:00000002 Sub:00000000 -> EAX:05B0B101 EBX:005657F0 ECX:00000000 EDX:2CB43048
  In:00000003 Sub:00000000 -> EAX:00000000 EBX:00000000 ECX:00000000 EDX:00000000
  In:00000004 Sub:00000000 -> EAX:04000121 EBX:01C0003F ECX:0000003F EDX:00000001
  In:00000004 Sub:00000001 -> EAX:04000122 EBX:01C0003F ECX:0000003F EDX:00000001
  In:00000004 Sub:00000002 -> EAX:04004143 EBX:02C0003F ECX:00000FFF EDX:00000001
  In:00000005 Sub:00000000 -> EAX:00000040 EBX:00000040 ECX:00000003 EDX:03122220
  In:00000006 Sub:00000000 -> EAX:00000001 EBX:00000002 ECX:00000003 EDX:00000000
  In:00000007 Sub:00000000 -> EAX:00000000 EBX:00000000 ECX:00000000 EDX:00000000
  In:00000008 Sub:00000000 -> EAX:00000400 EBX:00000000 ECX:00000000 EDX:00000000
  In:00000009 Sub:00000000 -> EAX:00000000 EBX:00000000 ECX:00000000 EDX:00000000
  In:0000000A Sub:00000000 -> EAX:07280202 EBX:00000000 ECX:00000000 EDX:00000503
  In:0000000C Sub:00000000 -> EAX:00000000 EBX:00000000 ECX:00000000 EDX:00000000
  In:0000000D Sub:00000000 -> EAX:00000003 EBX:00000240 ECX:00000240 EDX:00000000
  In:80000000 Sub:00000000 -> EAX:80000008 EBX:00000000 ECX:00000000 EDX:00000000
  In:80000001 Sub:00000000 -> EAX:00000000 EBX:00000000 ECX:00000001 EDX:20100800
  In:80000002 Sub:00000000 -> EAX:756E6547 EBX:20656E69 ECX:65746E49 EDX:2952286C
  In:80000003 Sub:00000000 -> EAX:55504320 EBX:20202020 ECX:20202020 EDX:55202020
  In:80000004 Sub:00000000 -> EAX:30303337 EBX:20402020 ECX:30332E31 EDX:007A4847
  In:80000005 Sub:00000000 -> EAX:00000000 EBX:00000000 ECX:00000000 EDX:00000000
  In:80000006 Sub:00000000 -> EAX:00000000 EBX:00000000 ECX:0C006040 EDX:00000000
  In:80000007 Sub:00000000 -> EAX:00000000 EBX:00000000 ECX:00000000 EDX:00000000
  In:80000008 Sub:00000000 -> EAX:00003024 EBX:00000000 ECX:00000000 EDX:00000000

complier error

my compile environment:

  • ubuntu16.04 lts,gcc-5.4
  • cult commit :86be01582e965e27504eecbf5fc21ef7e7cae738
  • asmjit commit :b49d685cd9e2e4488f55ce6004306a79bdea056b

error log:
/home/seanxcwang/Work/04_research/inference_framework/asmjit/cult/src/cult/instbench.cpp: In function ‘void cult::fillRegArray(asmjit::Operand*, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t)’:
/home/seanxcwang/Work/04_research/inference_framework/asmjit/cult/src/cult/instbench.cpp:148:42: error: no matching function for call to ‘asmjit::BaseReg::BaseReg(uint32_t&, uint8_t&)’
dst[i] = BaseReg(rSign, rIdArray[rId]);

vzeroupper use without check cpu support

Hi All
when i use cult in old cpu which not support vzeroupper like Q8200 (core2 cpu)
in master/src/asmjit/x86/x86emithelper.cpp line 538
if (frame.hasAvxCleanup()) ASMJIT_PROPAGATE(emitter->vzeroupper());
or
in master/src/cult/instbench.cpp line 1777
if (isVec(_instId, _instSpec))
a.vzeroupper();
cult test all instruction (./cult)
near add r64,r64
maybe still use vzeroupper, so illegal instruction #UD occur and exit program

Results for other systems

As I mentioned on HN, I can run this on SKL, SKX and CNL (CannonLake) for you.

Are there any specific arguments or format you want the results in, or just capture the output of cult and include it in this issue?

#UD case2

Hi All
When i build "cult" in AMD Ryzen 9 3900X 12-Core Processor
found src/cult/instbench.cpp will use
a.kxnorq(x86::k7, x86::k7, x86::k7);
and
a.kmovw(pred, x86::k7);
but 3900x not support these instruction
maybe need add some check to avoid Illegal instruction (#UD)

Readme / CMake should enforce required C++11

Building on Ubuntu with recent gcc 5.4, when I follow the basic Readme instructions the build fails as std=c++11 is not selected, but is required.

~/devel/cult/build$ make
[  2%] Building CXX object CMakeFiles/cult.dir/src/cult/app.cpp.o
In file included from /home/mmm/devel/cult/src/cult/./app.h:5:0,
                 from /home/mmm/devel/cult/src/cult/app.cpp:1:
/home/mmm/devel/cult/src/cult/././jsonbuilder.h:15:3: warning: identifier ‘noexcept’ is a keyword in C++11 [-Wc++0x-compat]
   JSONBuilder(StringBuilder* dst) noexcept;
   ^

some instruction not test in q8200

Hi All
When I check cult result, find some instruction not test in q8200
not test:
lea r16,m
movnti m32, r32
movnti m64, r64

only mmx type not test, sse2 type test:
maskmovq mm1,mm2
pavgb mm, mm
pavgb mm, m64 {a}
pavgb mm, m64 {u}
pavgw mm, mm
pavgw mm, m64 {a}
pavgw mm, m64 {u}
pextrw r32, mm, i8
pinsrw mm, r32, i8
pinsrw mm, m16 {a}, i8
pinsrw mm, m16 {u}, i8
pmaxsw mm, mm
pmaxsw mm, m64 {a}
pmaxsw mm, m64 {u}
pmaxub mm, mm
pmaxub mm, m64 {a}
pmaxub mm, m64 {u}
pminsw mm, mm
pminsw mm, m64 {a}
pminsw mm, m64 {u}
pminub mm, mm
pminub mm, m64 {a}
pminub mm, m64 {u}
pmovmskb r32, mm
pmulhuw mm, mm
pmulhuw mm, m64 {a}
pmulhuw mm, m64 {u}
psadbw mm, mm
psadbw mm, m64 {a}
psadbw mm, m64 {u}
pshufw mm, mm, i8
pshufw mm, m64 {a}, i8
pshufw mm, m64 {u}, i8

not same op code:
sar qword ptr [rsp+1], 0 ; 48C17C240100 //op code=C1 and length is longer
sar qword ptr [rsp+9], 1 ; 48D17C2409 //op code=D1 and length is shorter
sar qword ptr [rsp+17], 2 ; 48C17C241102

sar r8, 1
sar r16, 1
sar r32, 1
sar r64, 1
sar m8, 1
sar m16, 1
sar m32, 1
sar m64, 1
shl r8, 1
shl r16, 1
shl r32, 1
shl r64, 1
shl m8, 1
shl m16, 1
shl m32, 1
shl m64, 1
shr r8, 1
shr r16, 1
shr r32, 1
shr r64, 1
shr m8, 1
shr m16, 1
shr m32, 1
shr m64, 1

[Feature Request] Support for ARM

As asmjit itself supports ARM, I tried to build cult on my Raspberry Pi, but that failed (as the code has not yet support ARM.

Is it possible to have the code extended so that we can test on Raspberry Pi and even ARM servers?

Problem with CPU feature identification

First - I love this project :) Having a way to find the perf characteristics for a system by running a tool, as opposed to looking up in a doc, and having to trust what's there is great.

I think there may be a problem within the CPU detection code. I'm testing on i7 3820. If I run cult it crashes on vfmadd132pd instruction.

If I look up that instruction on intel intrinsics website - it says it needs the FMA flag. If I lookup that CPU on cpuboss it claims that processor does not have FMA3.

The detection code detects FMA (presumably FMA3) is available.

If I look at this CPU id code https://github.com/klauspost/cpuid/blob/master/cpuid.go

The identification code is...

if c&(1<<26) != 0 && c&(1<<27) != 0 && c&(1<<28) != 0 {
// Check for OS support
eax, _ := xgetbv(0)
if (eax & 0x6) == 0x6 {
rval |= AVX
if (c & 0x00001000) != 0 {
rval |= FMA3
}
}
}

The code in cult is...

// Detect AVX+.
if (regs.ecx & 0x10000000U) {
  // - XCR0[2:1] == 11b
  //   XMM & YMM states need to be enabled by OS.
  if ((xcr0.eax & 0x00000006U) == 0x00000006U) {
    cpuInfo->addFeature(CpuInfo::kX86FeatureAVX);

    if (regs.ecx & 0x00004000U) cpuInfo->addFeature(CpuInfo::kX86FeatureFMA);
    if (regs.ecx & 0x20000000U) cpuInfo->addFeature(CpuInfo::kX86FeatureF16C);
  }
}

so perhaps the line

if (regs.ecx & 0x00004000U)

should be

if (regs.ecx & 0x00001000U)

Segfault on Illegal instruction

 vblendvpd ymm, ymm, ymm, ymm    : Lat:  1.00 Rcp:  1.00
  vblendvps xmm, xmm, xmm, xmm    : Lat:  1.00 Rcp:  1.00
  vblendvps ymm, ymm, ymm, ymm    : Lat:  1.00 Rcp:  1.00
Illegal instruction (core dumped)

I get a segfault after running the system for a while. Is there a way I could provide a better debug info? (the coredump is not saved somehow, even if I do ulimit)

GCC compile failure

Wanted to test the code and see if #4 is correctly removed but found commit "c9a7efcf0ce6caa1ddad8f11abfeb431b6a932f6" broke compilation on GCC.

[Feature Request] Extension to the instruction flag

Is it possible to have an extension to the --instruction flag which also allow us to specify the instruction signature?
Say I want to test --instruction="vsomevector zmm,zmm" directly without going over the xmm and ymm variant.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.