Select ballot intrinsic for GlobalISel.
Details
Diff Detail
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
1054–1059 | You want getConstantVRegVal instead of looking through a copy and checking for G_CONSTANT | |
1062 | This doesn't make any sense; there's no reason to ever use the VOP3 encoded form of v_mov_b32. It's nota 64-bit move. This also returns a scalar value | |
1065 | Register | |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
4167 | This returns an SGPR value |
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll | ||
---|---|---|
9 | Can you give the test names more desrciptive names, like constant_false, constant_true? Also the function returns should use SGPRs, so switch to shader calling conventions? |
- Addressed comments
- Also renamed and updated tests for SDag. Let me know if you would rather have this as a separate patch.
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
1053–1054 | I think you want just regular getConstantVRegVal. I don't think you're getting much from the look through | |
1059 | This would need to be an S_MOV_B64 for wave 64? | |
1064 | This should be unreachable code (however, the verifier doesn't check intrinsic operand types so I guess you have to leave this) | |
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll | ||
12–13 | This can be one s_mov_b64 | |
24–25 | One s_mov_b64 |
- Also renamed and updated SDag tests.
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
1053–1054 | Unfortunately regular version fails to produce the value. | |
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll | ||
12–13 | It can, but SIFoldOperands will not let that happen. From: %10:sreg_64 = S_MOV_B64 0 %3:sreg_32 = COPY %10.sub0:sreg_64 %4:sreg_32 = COPY %10.sub1:sreg_64 plus some instructions that use %3, %4 but will eventually be removed. SIFoldOperands will produce: %10:sreg_64 = S_MOV_B64 0 %3:sreg_32 = S_MOV_B32 0 %4:sreg_32 = S_MOV_B32 0 ... which makes the first instruction dead and in the end we're left with two S_MOV_B32. For example bellow with exec, AMDGPU::sub0_sub1 seems to do the trick but I don't see anything similar for immediate opreands. v_cmp_ne_u32_e64 s[0:1], 0, 0 if for whatever reason that is more preferable then s_mov_b32 s0, 0 s_mov_b32 s1, 0 Anyway, this is not an issue with selecting ballot. Following example has the same issue: define amdgpu_cs i64 @si_fold_constants_i64() { %x = add i64 0, 0 ret i64 %x } |
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
1053–1054 | I think something probably went wrong here due to us not trying to do anything resembling optimization during/after RegBankSelect. When we do that, we can probably remove a lot of these | |
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ballot.i64.ll | ||
12–13 | I guess this is another bug to solve. Can you file that somewhere? We shouldn't be trying to workaround it in the selector |
I think you want just regular getConstantVRegVal. I don't think you're getting much from the look through