This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Mark relevant rematerializable VOP2 instructions
ClosedPublic

Authored by rampitec on Jul 14 2021, 4:26 PM.

Diff Detail

Event Timeline

rampitec created this revision.Jul 14 2021, 4:26 PM
rampitec requested review of this revision.Jul 14 2021, 4:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2021, 4:26 PM
Herald added a subscriber: wdng. · View Herald Transcript
foad added inline comments.Jul 15 2021, 7:02 AM
llvm/lib/Target/AMDGPU/VOP2Instructions.td
778

Update comment?

rampitec updated this revision to Diff 359119.Jul 15 2021, 1:55 PM
rampitec marked an inline comment as done.

Dropped all 16 bit instructions which have VOP3 forms from the change.

Due to the encoding change in GFX10 16 bit instructions moved from VOP2 to VOP3 have changed behavior from zeroing high 16 bits of dst to preserve. A same instruction behaves differently between GFX9 and GFX10 depending on if that is a VOP2 promoted to VOP3 or a native VOP3. We cannot rematerialize instructions which merge dst.

That said, the behavior of these instructions is not properly modeled to begin with and only guarded with selection patterns forcingly clearing high 16 bits. That needs to be addressed first.

arsenm added inline comments.Jul 15 2021, 3:09 PM
llvm/lib/Target/AMDGPU/VOP2Instructions.td
651

This preserves high bits on gfx9

652

This one does not (but does on gfx10)

765

This and the other fma flavors preserve the high bits on gfx9

rampitec updated this revision to Diff 359146.Jul 15 2021, 3:20 PM
rampitec marked an inline comment as done.

Excluded V_LDEXP_F16.

llvm/lib/Target/AMDGPU/VOP2Instructions.td
651

GFX9 manual: VOP1/VOP2 will write zero to unused bits unless SDWA specifies otherwise, and VOP1/VOP2 ops encoded as VOP3 will write zero.

So I assume it does not.

652

Thanks for catching!

765

This is f32, both f16 fma shall also zero hi bits because these are VOP2 only.

arsenm added inline comments.Jul 15 2021, 3:23 PM
llvm/lib/Target/AMDGPU/VOP2Instructions.td
765

I wrote inline asm tests a few weeks ago for all of these. The gfx9 manual says it didn't change the existing instruction behavior, but I think this was wrong. mad/mac/fma all seem to preserve (see GCNSubtarget::zeroesHigh16BitsOfDest)

rampitec added inline comments.Jul 15 2021, 3:32 PM
llvm/lib/Target/AMDGPU/VOP2Instructions.td
765

Sigh. There are some VOP1 in that list too. I probably need to avoid any 16 bit dst completely.

rampitec updated this revision to Diff 359151.Jul 15 2021, 3:39 PM
rampitec marked 3 inline comments as done.

Dropped all 16 bit dst VOP2.

arsenm accepted this revision.Jul 19 2021, 7:18 PM
This revision is now accepted and ready to land.Jul 19 2021, 7:18 PM
This revision was landed with ongoing or failed builds.Jul 21 2021, 2:25 PM
This revision was automatically updated to reflect the committed changes.