Giter Site home page Giter Site logo

Comments (8)

klauspost avatar klauspost commented on June 5, 2024 2

Yes, there are pure SSE2 (non+VEX)/AVX2 (VEX) versions of these. There are "duplicates":

66 0F 3A 44 /r ib PCLMULQDQ xmm1, xmm2/m128, imm8
VEX.128.66.0F3A.WIG 44 /r ib VPCLMULQDQ xmm1, xmm2, xmm3/m128, imm8
EVEX.128.66.0F3A.WIG 44 /r /ib VPCLMULQDQ xmm1, xmm2, xmm3/m128, imm8

VEX.256.66.0F3A.WIG 44 /r /ib VPCLMULQDQ ymm1, ymm2, ymm3/m256, imm8
EVEX.256.66.0F3A.WIG 44 /r /ib VPCLMULQDQ ymm1, ymm2, ymm3/m256, imm8

I assume the VEX/EVEX versions are only selected by the assembler when using the extended registers.

In #146 I was looking for a way to restrict extended registry usage.

Example of CPU with GFNI and no AVX
 mockcpu_test.go:177: Opening GenuineIntel0090661_ElkhartLake_02_CPUID.txt
    mockcpu_test.go:180: Name: Intel Atom(R) x6425RE Processor @ 1.90GHz
    mockcpu_test.go:182: Max Function:0x1b
    mockcpu_test.go:184: Max Extended Function:0x80000008
    mockcpu_test.go:185: VendorString: GenuineIntel
    mockcpu_test.go:186: VendorID: Intel
    mockcpu_test.go:187: PhysicalCores: 4
    mockcpu_test.go:188: ThreadsPerCore: 1
    mockcpu_test.go:189: LogicalCores: 4
    mockcpu_test.go:190: Family 6 Model: 150 Stepping: 1
    mockcpu_test.go:191: Features: AESNI,CLMUL,CMOV,CMPXCHG8,CX16,ERMS,FLUSH_L1D,FXSR,FXSROPT,GFNI,IA32_ARCH_CAP,IA32_CORE_CAP,IBPB,LAHF,MD_CLEAR,MMX,MOVBE,MOVDIR64B,MOVDIRI,NX,OSXSAVE,POPCNT,RDRAND,RDSEED,RDTSCP,SHA,SPEC_CTRL_SSBD,SSE,SSE2,SSE3,SSE4,SSE42,SSSE3,STIBP,SYSCALL,SYSEE,VMX,WAITPKG,X87,XGETBV1,XSAVE,XSAVEC,XSAVEOPT,XSAVES
    mockcpu_test.go:192: Microarchitecture level: 2
    mockcpu_test.go:193: Cacheline bytes: 64
    mockcpu_test.go:194: L1 Instruction Cache: 32768 bytes
    mockcpu_test.go:195: L1 Data Cache: 32768 bytes
    mockcpu_test.go:196: L2 Cache: 1572864 bytes
    mockcpu_test.go:197: L3 Cache: 4194304 bytes
    mockcpu_test.go:198: Hz: 1900000000 Hz
    mockcpu_test.go:199: Boost: 1900000000 Hz

from avo.

vsivsi avatar vsivsi commented on June 5, 2024
  • Actually, I'm unsure how (or if) the Go assembler decides how to select among the VEX and EVEX, encodings for a 256-bit wide instruction. For SSE encoding the omitted "V" prefix (e.g. VPADDQ vs PADDQ) is the signal. Does it always use VEX unless an AVX512 option necessitating EVEX is used (e.g. mask, memory broadcast, ymm > ymm15, etc.)?

from avo.

vsivsi avatar vsivsi commented on June 5, 2024

And everything above also applies to VPCLMULQDQ including the misnaming/handling in x/sys/cpu

from avo.

mmcloughlin avatar mmcloughlin commented on June 5, 2024

Thanks for pointing this out. I could consider a change in avo to match. It would be a very minor break, but since GFNI isn't even in a tagged release I suspect it wouldn't affect anyone.

However, I tend to agree that we've got this right in avo and x/sys/cpu is wrong? I agree the Go project is probably not going to think this is worth a breaking change for, though.

In the event of implementing feature checks #168 I don't think having a special-case fixup for those specific ISAs is going to be that bad?

Note that @klauspost's cpuid agrees with avo here:

https://pkg.go.dev/github.com/klauspost/cpuid/v2#FeatureID

from avo.

mmcloughlin avatar mmcloughlin commented on June 5, 2024

Ah! Sorry I'm just now grasping what you're saying. It's not simply an issue of applying a naming transform. x/sys/cpu does not have a flag that indicates the presence of GFNI, it only has one that indicates AVX512F && GFNI.

from avo.

mmcloughlin avatar mmcloughlin commented on June 5, 2024
  • Actually, I'm unsure how (or if) the Go assembler decides how to select among the VEX and EVEX, encodings for a 256-bit wide instruction. For SSE encoding the omitted "V" prefix (e.g. VPADDQ vs PADDQ) is the signal. Does it always use VEX unless an AVX512 option necessitating EVEX is used (e.g. mask, memory broadcast, ymm > ymm15, etc.)?

If I recall correctly that's exactly what it does. It will use VEX unless it has to use EVEX.

avo's handling of this is not great.

avo/internal/load/load.go

Lines 786 to 799 in 05ed388

// vexevex fixes instructions that have both VEX and EVEX encoded forms with the
// same operand types. Go uses the VEX encoded form unless EVEX-only features
// are used. This function will only keep the VEX encoded version in the case
// where both exist.
//
// Note this is somewhat of a hack. There are real reasons to use the EVEX
// encoded version even when both exist. The main reason to use the EVEX version
// rather than VEX is to use the registers Z16, Z17, ... and up. However, avo
// does not implement the logic to distinguish between the two halfs of the
// vector registers. So in its current state the only reason to need the EVEX
// version is to encode suffixes, and these are represented by other instruction
// forms.
//
// TODO(mbm): restrict use of vector registers https://github.com/mmcloughlin/avo/issues/146

from avo.

vsivsi avatar vsivsi commented on June 5, 2024

Yes, on reflection this is how it would have to work, otherwise it would be impossible to write valid AVX/AVX2 code and ensure that it would not fault on hardware without AVX512.

from avo.

vsivsi avatar vsivsi commented on June 5, 2024

I just checked and there doesn't appear to be an existing issue on the golang project for this problem. I've already prototyped it and the fix is straightforward, so I'm considering filing a new issue over there this week.

from avo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.