Details
Diff Detail
Event Timeline
llvm/lib/Target/AMDGPU/VOP2Instructions.td | ||
---|---|---|
778 | Update comment? |
Dropped all 16 bit instructions which have VOP3 forms from the change.
Due to the encoding change in GFX10 16 bit instructions moved from VOP2 to VOP3 have changed behavior from zeroing high 16 bits of dst to preserve. A same instruction behaves differently between GFX9 and GFX10 depending on if that is a VOP2 promoted to VOP3 or a native VOP3. We cannot rematerialize instructions which merge dst.
That said, the behavior of these instructions is not properly modeled to begin with and only guarded with selection patterns forcingly clearing high 16 bits. That needs to be addressed first.
Excluded V_LDEXP_F16.
llvm/lib/Target/AMDGPU/VOP2Instructions.td | ||
---|---|---|
651 | GFX9 manual: VOP1/VOP2 will write zero to unused bits unless SDWA specifies otherwise, and VOP1/VOP2 ops encoded as VOP3 will write zero. So I assume it does not. | |
652 | Thanks for catching! | |
765 | This is f32, both f16 fma shall also zero hi bits because these are VOP2 only. |
llvm/lib/Target/AMDGPU/VOP2Instructions.td | ||
---|---|---|
765 | I wrote inline asm tests a few weeks ago for all of these. The gfx9 manual says it didn't change the existing instruction behavior, but I think this was wrong. mad/mac/fma all seem to preserve (see GCNSubtarget::zeroesHigh16BitsOfDest) |
llvm/lib/Target/AMDGPU/VOP2Instructions.td | ||
---|---|---|
765 | Sigh. There are some VOP1 in that list too. I probably need to avoid any 16 bit dst completely. |
This preserves high bits on gfx9