This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: cmp/select method for insert element
ClosedPublic

Authored by rampitec on May 28 2020, 12:55 PM.

Diff Detail

Event Timeline

rampitec created this revision.May 28 2020, 12:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 28 2020, 12:55 PM
rampitec marked an inline comment as done.May 28 2020, 1:06 PM
rampitec added inline comments.
llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll
331

I assume this will get better when it is moved after RegBankSelect. The issue is copies to VCC inserted by RegBankSelect. The code does not need to be like this, the same code in SelectionDAG creates here:

; %bb.0:                                ; %entry
        v_mov_b32_e32 v1, s0
        v_cmp_eq_u32_e64 vcc, s8, 0
        v_cndmask_b32_e32 v8, v1, v0, vcc
        v_mov_b32_e32 v1, s1
        v_cmp_eq_u32_e64 vcc, s8, 1
        v_cndmask_b32_e32 v1, v1, v0, vcc
        v_mov_b32_e32 v2, s2
        v_cmp_eq_u32_e64 vcc, s8, 2
        v_cndmask_b32_e32 v2, v2, v0, vcc
        v_mov_b32_e32 v3, s3
        v_cmp_eq_u32_e64 vcc, s8, 3
        v_cndmask_b32_e32 v3, v3, v0, vcc
        v_mov_b32_e32 v4, s4
        v_cmp_eq_u32_e64 vcc, s8, 4
        v_cndmask_b32_e32 v4, v4, v0, vcc
        v_mov_b32_e32 v5, s5
        v_cmp_eq_u32_e64 vcc, s8, 5
        v_cndmask_b32_e32 v5, v5, v0, vcc
        v_mov_b32_e32 v6, s6
        v_cmp_eq_u32_e64 vcc, s8, 6
        v_cndmask_b32_e32 v6, v6, v0, vcc
        v_mov_b32_e32 v7, s7
        v_cmp_eq_u32_e64 vcc, s8, 7
        v_cndmask_b32_e32 v7, v7, v0, vcc
        v_mov_b32_e32 v0, v8
        ; return to shader part epilog
arsenm added inline comments.May 28 2020, 1:34 PM
llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll
331

Boolean handling is a mess that needs cleanups, and now we get none. I recently saw a case using 5 instructions to get a constant 0.

rampitec updated this revision to Diff 267059.May 28 2020, 3:21 PM

Changes similar to extractelement.

This revision is now accepted and ready to land.May 29 2020, 7:00 AM
arsenm added inline comments.May 29 2020, 12:27 PM
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
288

Can initialize to the size here and avoid push_back

300

Don't need getReg here

rampitec updated this revision to Diff 267356.May 29 2020, 1:34 PM
rampitec marked 2 inline comments as done.

Addressed review comments.

rampitec updated this revision to Diff 268285.Jun 3 2020, 1:01 PM

Moved into RegBankSelect.

rampitec requested review of this revision.Jun 3 2020, 1:04 PM
rampitec marked an inline comment as done.
rampitec added inline comments.
llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll
56

That is an incorrect selection of G_ICMP with wave32. AMDGPUInstructionSelector::isVCC() does this:

if (RC) {
  const LLT Ty = MRI.getType(Reg);
  return RC->hasSuperClassEq(TRI.getBoolRC()) &&
         Ty.isValid() && Ty.getSizeInBits() == 1;
}

Since SGPR_32 is used for condition hasSuperClassEq() returns true, and even though we have Ty == s1 it still considers it a VCC.

rampitec marked an inline comment as done.Jun 3 2020, 2:16 PM
rampitec added inline comments.
llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll
56

This can be fixed by using LLT::scalar(32) instead of LLT::scalar(1).

rampitec updated this revision to Diff 268300.Jun 3 2020, 2:17 PM

Changed scalar condition type to S32.

rampitec updated this revision to Diff 268316.Jun 3 2020, 3:10 PM

Updated test.

arsenm added inline comments.Jun 3 2020, 5:45 PM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1903–1904 ↗(On Diff #268316)

This can ignore this? If it had a constant index the legalizer would have dealt with it already. It's also not wrong to do the rest for a constant

1937–1938 ↗(On Diff #268316)

RegBankSelect should never need to consider the constant bus restriction, see the long comment at the top of the file. Any vector operation should only use VGPR operands

llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll
56

Yes, see the long comment at the top of the file. Scalar compares need to always produce s32, s1 is assumed VCC.

106–145

I think the register indexing looks better if the index is uniform

rampitec updated this revision to Diff 268530.Jun 4 2020, 10:53 AM
rampitec marked 4 inline comments as done.
rampitec added inline comments.
llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll
106–145

It may look better, but it is not faster.

arsenm accepted this revision.Jun 10 2020, 12:10 PM
This revision is now accepted and ready to land.Jun 10 2020, 12:10 PM
This revision was automatically updated to reflect the committed changes.