Page MenuHomePhabricator

[AMDGPU] Divergence driven instruction selection. Shift operations.
ClosedPublic

Authored by alex-t on Sep 26 2018, 9:31 AM.

Details

Summary

This change enables VOP3 shifts to be explicitly selected dependent on the divergence.

Tests: CodeGen/AMDGPU passed

Diff Detail

Repository
rL LLVM

Event Timeline

alex-t created this revision.Sep 26 2018, 9:31 AM
rampitec added inline comments.Sep 26 2018, 11:46 AM
lib/Target/AMDGPU/VOP3Instructions.td
399 ↗(On Diff #167153)

Why not GCNPat?
Why divergence is not checked?
Why do you need it at all if you have patgen enabled for these instructions above?

587 ↗(On Diff #167153)

Does it really belong to this patch?

alex-t updated this revision to Diff 167300.Sep 27 2018, 5:42 AM

Pattern changed to GCNPat, divergence check added.

lib/Target/AMDGPU/VOP3Instructions.td
399 ↗(On Diff #167153)

On VI+ we have no V_LSHL_B64 only V_LSHLREV_B64, same for sra, srl.
So, we cannot select in case we have, let's say, shl with src0 i64 and src1 i32.
The aim of these patterns to swap operands. Current implementation does this in
SIInstrInfo::moveToVALU

case AMDGPU::S_LSHL_B64:
  if (ST.getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS) {
    NewOpcode = AMDGPU::V_LSHLREV_B64;
    **swapOperands(Inst);**
  }
587 ↗(On Diff #167153)

Sure it does. As soon as I start selecting this _b64 the LIT tests failed because of the odd "_e64" suffix printed by AMDGPUInstrPrinter. It prints it for all with VOP3 flag.
The flag was not set before this change because the instructions were created in moveToVALU.
I filed the bug https://bugs.llvm.org/show_bug.cgi?id=39086 for adding the flag to indicate that the instruction does not have 32bit encoding.

rampitec added inline comments.Sep 27 2018, 9:35 AM
lib/Target/AMDGPU/VOP3Instructions.td
399 ↗(On Diff #167153)

Then what the code "def V_LSHLREV_B64 : VOP3Inst <"v_lshlrev_b64", VOP_PAT_GEN<VOP3_Profile<VOP_I64_I32_I64>>, shl>" 8 lines above does? Isn't it the same?

587 ↗(On Diff #167153)

How can it be possible for moveToVALU to create a VOP3 instruction without a VOP3 flag?
VOP3_Pseudo sets VOP3 flag. InstSI copies it into TSFlags. VOP3_Real copies TSFlags from a pseudo.

alex-t added inline comments.Sep 27 2018, 9:46 AM
lib/Target/AMDGPU/VOP3Instructions.td
399 ↗(On Diff #167153)

No. The code you've mentioned defines the pattern for:

Divergent node that has

  1. shl operator
  2. i64 result, i32 src0 and i64 src1

it is okay for v_lshlREV - it shifts src1 64 bits to i32 src0 positions

what if we have the opposite order of operands? On SICI we'd select v_lshl since
that order fits. On VI we have only v_lshlREV so we need to swap operands.

alex-t added inline comments.Sep 27 2018, 9:49 AM
lib/Target/AMDGPU/VOP3Instructions.td
587 ↗(On Diff #167153)

InstSI copies VOP3 to TSFlags and your code checks exactly TSFlags & SIInstrFlags::VOP3

rampitec requested changes to this revision.Sep 27 2018, 11:25 AM
rampitec added inline comments.
lib/Target/AMDGPU/VOP3Instructions.td
392 ↗(On Diff #167300)

It sounds like from your explanation below and the logic of the getVOP3Pat this will create a bogus pattern with wrong operand order. Only one pattern shall exist for (shl i64:x, i32:y) and it seems to be the pattern below. At best this one will never match.

587 ↗(On Diff #167153)

So the explanation is wrong. It has VOP3 in TSFlags with or without your changes. It is worth nothing where the instruction is created. The suffix was not printed because the control did not even come to the AMDGPUInstPrinter::printVOPDst, and it did not trigger because these instructions did not have VOPDstOperand in the td. Now what you have changed is the operand definition. What you need to fix the problem instead of the hack is to return DstRC back to the RegisterOperand<VReg_64> in the profile.

This revision now requires changes to proceed.Sep 27 2018, 11:25 AM
alex-t added inline comments.Sep 27 2018, 12:24 PM
lib/Target/AMDGPU/VOP3Instructions.td
587 ↗(On Diff #167153)

I did not change the VOP3_Profile.
The reason was in the order of the profiles instantiation: VOP_PAT_GEN<VOP3_Profile<...>> VOP_PAT_GEN does not inherit VOP3_Profile let statements. It just inherits types.

The fix is: VOP3_Profile<VOP_PAT_GEN<...> and to add to VOP3_Profile

"let NeedPatGen = P.NeedPatGen"

alex-t added inline comments.Sep 27 2018, 12:30 PM
lib/Target/AMDGPU/VOP3Instructions.td
392 ↗(On Diff #167300)

If we have VI and only one this:

def V_LSHLREV_B64 : VOP3Inst <"v_lshlrev_b64", VOP_PAT_GEN<VOP3_Profile<VOP_I64_I32_I64>>, shl>;

then "shl i64 i32" will never match.

rampitec added inline comments.Sep 27 2018, 12:34 PM
lib/Target/AMDGPU/VOP3Instructions.td
392 ↗(On Diff #167300)

You have the match for it right below:

(getDivergentFrag<shl>.ret i64:$x, i32:$y)

This is exactly shl i64, i32.

alex-t updated this revision to Diff 167461.Sep 28 2018, 6:16 AM

Fixes according the discussion results.

alex-t marked 13 inline comments as done.Sep 28 2018, 6:16 AM
alex-t added inline comments.Sep 28 2018, 6:22 AM
lib/Target/AMDGPU/VOP3Instructions.td
392 ↗(On Diff #167300)

The reason of misunderstanding was that I assumed that it could be

shl i32 i64 input on VI

If it could be we'd really need 2 different patterns for the same instruction V_LSHLREV_B32

one for

shl i32 i64
``` and another for

shl i64 i32

As soon as I understood that nobody can swap shl dag node operands, everything become clear.

This revision is now accepted and ready to land.Sep 28 2018, 8:31 AM
This revision was automatically updated to reflect the committed changes.