This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix the SDWA Peephole phase to handle src for dst:UNUSED_PRESERVE.
ClosedPublic

Authored by mjbedy on Mar 10 2018, 8:13 PM.

Details

Summary

The phase attempts to transform operations that extract a portion of a value
into an SDWA src operand in cases where that value is used only once. It
was not prepared for this use to be the preserved portion of a value for
dst:UNUSED_PRESERVE, resulting in a crash or assert.

This change either rejects the illegal SDWA attempt, or in the case where
dst:WORD_1 and the src_sel would be WORD_0, removes the unneeded
extract instruction.

Diff Detail

Repository
rL LLVM

Event Timeline

mjbedy created this revision.Mar 10 2018, 8:13 PM
mjbedy added a project: Restricted Project.Mar 10 2018, 8:27 PM
mjbedy added a reviewer: Restricted Project.
mjbedy removed a project: Restricted Project.
mjbedy updated this revision to Diff 137943.Mar 10 2018, 8:39 PM
  • Fix typo.
arsenm added inline comments.Mar 11 2018, 8:40 AM
lib/Target/AMDGPU/SIPeepholeSDWA.cpp
920–921 ↗(On Diff #137943)

You should use either getNamedOperand or getNamedOperandIdx, not both

930 ↗(On Diff #137943)

Why is it interesting to use check sdst for this? You can't partially write the SGPR output?

I'm also not sure why the existing code treats sdst as an alternative to vdst. What about add with carry out?

1033–1034 ↗(On Diff #137943)

Combine getNamedOperandIdx/getNamedOperand usage

1036 ↗(On Diff #137943)

Th isImm assert is redundant since getImm() will do it for you

mjbedy added inline comments.Mar 11 2018, 12:03 PM
lib/Target/AMDGPU/SIPeepholeSDWA.cpp
920–921 ↗(On Diff #137943)

The Dst operand is for the original instruction (which in my current understanding could be for example V_MOV_B32_e32 or V_MOV_B32_sdwa) and the DstIdx is for the replacement instruction (V_MOV_B32_sdwa). Is it true that vdst will always be the same index for both _e32 and _sdwa versions?

Will TII->getNamedOperand(SDWAInst, AMDGPU::OpName::vdst) return NULL if the operand has not yet been assigned? My assumption was that it would return NULL, which is why I used an index here. I could change the existing code to use indexes as well.

Or perhaps I should have just called it SDWADstIdx to make it clearer that it's potentially not the same index as the operand pointed to by Dst.

930 ↗(On Diff #137943)

The original version of this change was copying any tied operand no matter what, instead of limiting it to the UNUSED_PRESERVE case, which is why this line is here. I will remove this.

I believe add with carry out (and similar) has to use vcc if SDWA is used. VOPC can use SDWA with an sdst.

One thing I was curious about in this function: it seems to assume a certain order of source operands. (it calls SDWAInst.add() for everything, which seems to just append.) Is this typical?

mjbedy updated this revision to Diff 137969.Mar 11 2018, 7:28 PM
  • Improve the changes to SIPeepholeSDWA::convertToSDWA to address review comments.
mjbedy marked 4 inline comments as done.Mar 11 2018, 7:34 PM
mjbedy added inline comments.Mar 11 2018, 7:37 PM
lib/Target/AMDGPU/SIPeepholeSDWA.cpp
1035 ↗(On Diff #137969)

I couldn't find a way to get at the tied operand without using an index, but I changed all the other cases to use the non-index form.

arsenm added inline comments.Mar 23 2018, 7:18 AM
lib/Target/AMDGPU/SIPeepholeSDWA.cpp
1035 ↗(On Diff #137969)

I think the dst_preserve tied SDWA operands are manually added on (although it would be better if we had static instruction definitions for these variants, which could have a named operand)

920–921 ↗(On Diff #137943)

I'm pretty sure vdst will always be 0. If that's not true we should probably fix that.

getNamedOperand doesn't do any bounds checking, it's assumed to be operating on a complete instruction

arsenm accepted this revision.Mar 23 2018, 3:38 PM

LGTM

This revision is now accepted and ready to land.Mar 23 2018, 3:38 PM
This revision was automatically updated to reflect the committed changes.