This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GlobalISel] Select llvm.amdgcn.ballot
ClosedPublic

Authored by mbrkusanin on Jul 6 2020, 4:52 AM.

Details

Summary

Select ballot intrinsic for GlobalISel.

Diff Detail

Event Timeline

mbrkusanin created this revision.Jul 6 2020, 4:52 AM
arsenm requested changes to this revision.Jul 6 2020, 6:34 AM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
1054–1059

You want getConstantVRegVal instead of looking through a copy and checking for G_CONSTANT

1062

This doesn't make any sense; there's no reason to ever use the VOP3 encoded form of v_mov_b32. It's nota 64-bit move.

This also returns a scalar value

1065

Register

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
4167

This returns an SGPR value

This revision now requires changes to proceed.Jul 6 2020, 6:34 AM
arsenm added inline comments.Jul 6 2020, 6:34 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll
9

Can you give the test names more desrciptive names, like constant_false, constant_true? Also the function returns should use SGPRs, so switch to shader calling conventions?

mbrkusanin updated this revision to Diff 276025.Jul 7 2020, 5:55 AM
  • Addressed comments
  • Also renamed and updated tests for SDag. Let me know if you would rather have this as a separate patch.
arsenm added inline comments.Jul 7 2020, 6:04 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
1053–1054

I think you want just regular getConstantVRegVal. I don't think you're getting much from the look through

1059

This would need to be an S_MOV_B64 for wave 64?

1064

This should be unreachable code (however, the verifier doesn't check intrinsic operand types so I guess you have to leave this)

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll
12–13

This can be one s_mov_b64

24–25

One s_mov_b64

mbrkusanin marked 3 inline comments as done.
mbrkusanin set the repository for this revision to rG LLVM Github Monorepo.
  • Also renamed and updated SDag tests.
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
1053–1054

Unfortunately regular version fails to produce the value.

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll
12–13

It can, but SIFoldOperands will not let that happen.

From:

%10:sreg_64 = S_MOV_B64 0
%3:sreg_32 = COPY %10.sub0:sreg_64
%4:sreg_32 = COPY %10.sub1:sreg_64
plus some instructions that use %3, %4 but will eventually be removed.

SIFoldOperands will produce:

%10:sreg_64 = S_MOV_B64 0
%3:sreg_32 = S_MOV_B32 0
%4:sreg_32 = S_MOV_B32 0
...

which makes the first instruction dead and in the end we're left with two S_MOV_B32.

For example bellow with exec, AMDGPU::sub0_sub1 seems to do the trick but I don't see anything similar for immediate opreands.
Alternatively we can produce

v_cmp_ne_u32_e64 s[0:1], 0, 0

if for whatever reason that is more preferable then

s_mov_b32 s0, 0
s_mov_b32 s1, 0

Anyway, this is not an issue with selecting ballot. Following example has the same issue:

define amdgpu_cs i64 @si_fold_constants_i64() {
  %x = add i64 0, 0
  ret i64 %x
}
arsenm accepted this revision.Jul 10 2020, 10:34 AM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
1053–1054

I think something probably went wrong here due to us not trying to do anything resembling optimization during/after RegBankSelect. When we do that, we can probably remove a lot of these

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll
12–13

I guess this is another bug to solve. Can you file that somewhere? We shouldn't be trying to workaround it in the selector

This revision is now accepted and ready to land.Jul 10 2020, 10:34 AM
This revision was automatically updated to reflect the committed changes.
mbrkusanin marked an inline comment as done.