Giter Site home page Giter Site logo

instructions: GFNI about avo HOT 5 CLOSED

klauspost avatar klauspost commented on May 29, 2024
instructions: GFNI

from avo.

Comments (5)

mmcloughlin avatar mmcloughlin commented on May 29, 2024

Ah, yes this would be awesome, and should be supported by avo.

I wish the fix were simpler. As you can tell, I've been procrastinating on these kinds of things because fixing it is either nasty or a significant undertaking. Options are basically:

(1) Maintain a fork of Opcodes database or a collection of patch files. Potentially contribute these back to upstream.

(2) Add the instructions during the loading phase. That would be done here:

avo/internal/load/tables.go

Lines 146 to 178 in fa3cfb0

// extras is simply a list of extra instructions to add to the database.
var extras = []*inst.Instruction{
// MOVLQZX does not appear in either x86 CSV or Opcodes, but does appear in stdlib assembly.
//
// Reference: https://github.com/golang/go/blob/048c9164a0c5572df18325e377473e7893dbfb07/src/runtime/asm_amd64.s#L451-L453
//
// TEXT ·reflectcall(SB), NOSPLIT, $0-32
// MOVLQZX argsize+24(FP), CX
// DISPATCH(runtime·call32, 32)
//
// Reference: https://github.com/golang/go/blob/048c9164a0c5572df18325e377473e7893dbfb07/src/cmd/internal/obj/x86/asm6.go#L1217
//
// {AMOVLQZX, yml_rl, Px, opBytes{0x8b}},
//
// Reference: https://github.com/golang/go/blob/048c9164a0c5572df18325e377473e7893dbfb07/src/cmd/internal/obj/x86/asm6.go#L515-L517
//
// var yml_rl = []ytab{
// {Zm_r, 1, argList{Yml, Yrl}},
// }
//
{
Opcode: "MOVLQZX",
Summary: "Move with Zero-Extend",
Forms: []inst.Form{
{
Operands: []inst.Operand{
{Type: "m32", Action: inst.R},
{Type: "r64", Action: inst.W},
},
},
},
},
}

(3) Generate the instructions database from a better source. Intel XED would be the best starting point #23. I've been meaning to do this since the beginning, as you can see from the issue number. Intel XED is unwieldy so it's not a trivial project. There are other third-party databases that are more up-to-date and easier to use, such as asmjit https://github.com/asmjit/asmdb.

Since GFNI is only three instructions, maybe we can support it with option (1) or (2), and replace it with (3) when I finally get around to it.

from avo.

klauspost avatar klauspost commented on May 29, 2024

I am mostly leaning towards 2) I don't feel like forking the XML is making things that much easier. I missed the 'extras' when looking though the code, and it seems like a nice, clean way to extend functionality safely - at least for a limited set of instructions.

I will try it out and report back either here or with a PR.

from avo.

mmcloughlin avatar mmcloughlin commented on May 29, 2024

They're defined in the Go optabs:

	{as: AVGF2P8AFFINEINVQB, ytab: _yvgf2p8affineinvqb, prefix: Pavx, op: opBytes{
		avxEscape | vex128 | vex66 | vex0F3A | vexW1, 0xCF,
		avxEscape | vex256 | vex66 | vex0F3A | vexW1, 0xCF,
		avxEscape | evex128 | evex66 | evex0F3A | evexW1, evexN16 | evexBcstN8 | evexZeroingEnabled, 0xCF,
		avxEscape | evex256 | evex66 | evex0F3A | evexW1, evexN32 | evexBcstN8 | evexZeroingEnabled, 0xCF,
		avxEscape | evex512 | evex66 | evex0F3A | evexW1, evexN64 | evexBcstN8 | evexZeroingEnabled, 0xCF,
	}},
	{as: AVGF2P8AFFINEQB, ytab: _yvgf2p8affineinvqb, prefix: Pavx, op: opBytes{
		avxEscape | vex128 | vex66 | vex0F3A | vexW1, 0xCE,
		avxEscape | vex256 | vex66 | vex0F3A | vexW1, 0xCE,
		avxEscape | evex128 | evex66 | evex0F3A | evexW1, evexN16 | evexBcstN8 | evexZeroingEnabled, 0xCE,
		avxEscape | evex256 | evex66 | evex0F3A | evexW1, evexN32 | evexBcstN8 | evexZeroingEnabled, 0xCE,
		avxEscape | evex512 | evex66 | evex0F3A | evexW1, evexN64 | evexBcstN8 | evexZeroingEnabled, 0xCE,
	}},
	{as: AVGF2P8MULB, ytab: _yvandnpd, prefix: Pavx, op: opBytes{
		avxEscape | vex128 | vex66 | vex0F38 | vexW0, 0xCF,
		avxEscape | vex256 | vex66 | vex0F38 | vexW0, 0xCF,
		avxEscape | evex128 | evex66 | evex0F38 | evexW0, evexN16 | evexZeroingEnabled, 0xCF,
		avxEscape | evex256 | evex66 | evex0F38 | evexW0, evexN32 | evexZeroingEnabled, 0xCF,
		avxEscape | evex512 | evex66 | evex0F38 | evexW0, evexN64 | evexZeroingEnabled, 0xCF,

https://github.com/golang/go/blob/go1.19.3/src/cmd/internal/obj/x86/avx_optabs.go#L2250-L2269

The operand tables are:

var _yvgf2p8affineinvqb = []ytab{
	{zcase: Zvex_i_rm_v_r, zoffset: 2, args: argList{Yu8, Yxm, Yxr, Yxr}},
	{zcase: Zvex_i_rm_v_r, zoffset: 2, args: argList{Yu8, Yym, Yyr, Yyr}},
	{zcase: Zevex_i_rm_v_r, zoffset: 0, args: argList{Yu8, YxmEvex, YxrEvex, YxrEvex}},
	{zcase: Zevex_i_rm_v_k_r, zoffset: 3, args: argList{Yu8, YxmEvex, YxrEvex, Yknot0, YxrEvex}},
	{zcase: Zevex_i_rm_v_r, zoffset: 0, args: argList{Yu8, YymEvex, YyrEvex, YyrEvex}},
	{zcase: Zevex_i_rm_v_k_r, zoffset: 3, args: argList{Yu8, YymEvex, YyrEvex, Yknot0, YyrEvex}},
	{zcase: Zevex_i_rm_v_r, zoffset: 0, args: argList{Yu8, Yzm, Yzr, Yzr}},
	{zcase: Zevex_i_rm_v_k_r, zoffset: 3, args: argList{Yu8, Yzm, Yzr, Yknot0, Yzr}},
}

https://github.com/golang/go/blob/go1.19.3/src/cmd/internal/obj/x86/avx_optabs.go#L483-L492

var _yvandnpd = []ytab{
	{zcase: Zvex_rm_v_r, zoffset: 2, args: argList{Yxm, Yxr, Yxr}},
	{zcase: Zvex_rm_v_r, zoffset: 2, args: argList{Yym, Yyr, Yyr}},
	{zcase: Zevex_rm_v_r, zoffset: 0, args: argList{YxmEvex, YxrEvex, YxrEvex}},
	{zcase: Zevex_rm_v_k_r, zoffset: 3, args: argList{YxmEvex, YxrEvex, Yknot0, YxrEvex}},
	{zcase: Zevex_rm_v_r, zoffset: 0, args: argList{YymEvex, YyrEvex, YyrEvex}},
	{zcase: Zevex_rm_v_k_r, zoffset: 3, args: argList{YymEvex, YyrEvex, Yknot0, YyrEvex}},
	{zcase: Zevex_rm_v_r, zoffset: 0, args: argList{Yzm, Yzr, Yzr}},
	{zcase: Zevex_rm_v_k_r, zoffset: 3, args: argList{Yzm, Yzr, Yknot0, Yzr}},
}

https://github.com/golang/go/blob/go1.19.3/src/cmd/internal/obj/x86/avx_optabs.go#L137-L146

from avo.

mmcloughlin avatar mmcloughlin commented on May 29, 2024

See also the auto-generated test cases:
https://github.com/golang/go/blob/go1.19.3/src/cmd/asm/internal/asm/testdata/avx512enc/gfni_avx512f.s

from avo.

mmcloughlin avatar mmcloughlin commented on May 29, 2024

The other factor here is the way avo handles AVX-512 instructions. In particular, merging, zeroing, broadcast, etc result in the duplication of instruction forms with slight modifications.

avo/internal/load/load.go

Lines 377 to 393 in fa3cfb0

// Apply modification stages to produce final list of forms.
stages := []func(string, inst.Form) []inst.Form{
avx512rounding,
avx512sae,
avx512bcst,
avx512masking,
avx512zeroing,
}
forms := []inst.Form{form}
for _, stage := range stages {
var next []inst.Form
for _, f := range forms {
next = append(next, stage(opcode, f)...)
}
forms = next
}

I see you did this manually for broadcast forms, but I think zeroing forms might need to be handled too.

I think it will be more robust if I make a change so the AVX-512 post-processing happens after extras are added to the list. Then we can ensure they're handled the same way.

from avo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.