Details
Diff Detail
Event Timeline
llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll | ||
---|---|---|
299 | I assume this will get better when it is moved after RegBankSelect. The issue is copies to VCC inserted by RegBankSelect. The code does not need to be like this, the same code in SelectionDAG creates here: ; %bb.0: ; %entry v_mov_b32_e32 v1, s0 v_cmp_eq_u32_e64 vcc, s8, 0 v_cndmask_b32_e32 v8, v1, v0, vcc v_mov_b32_e32 v1, s1 v_cmp_eq_u32_e64 vcc, s8, 1 v_cndmask_b32_e32 v1, v1, v0, vcc v_mov_b32_e32 v2, s2 v_cmp_eq_u32_e64 vcc, s8, 2 v_cndmask_b32_e32 v2, v2, v0, vcc v_mov_b32_e32 v3, s3 v_cmp_eq_u32_e64 vcc, s8, 3 v_cndmask_b32_e32 v3, v3, v0, vcc v_mov_b32_e32 v4, s4 v_cmp_eq_u32_e64 vcc, s8, 4 v_cndmask_b32_e32 v4, v4, v0, vcc v_mov_b32_e32 v5, s5 v_cmp_eq_u32_e64 vcc, s8, 5 v_cndmask_b32_e32 v5, v5, v0, vcc v_mov_b32_e32 v6, s6 v_cmp_eq_u32_e64 vcc, s8, 6 v_cndmask_b32_e32 v6, v6, v0, vcc v_mov_b32_e32 v7, s7 v_cmp_eq_u32_e64 vcc, s8, 7 v_cndmask_b32_e32 v7, v7, v0, vcc v_mov_b32_e32 v0, v8 ; return to shader part epilog |
llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll | ||
---|---|---|
299 | Boolean handling is a mess that needs cleanups, and now we get none. I recently saw a case using 5 instructions to get a constant 0. |
llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll | ||
---|---|---|
32 | That is an incorrect selection of G_ICMP with wave32. AMDGPUInstructionSelector::isVCC() does this: if (RC) { const LLT Ty = MRI.getType(Reg); return RC->hasSuperClassEq(TRI.getBoolRC()) && Ty.isValid() && Ty.getSizeInBits() == 1; } Since SGPR_32 is used for condition hasSuperClassEq() returns true, and even though we have Ty == s1 it still considers it a VCC. |
llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll | ||
---|---|---|
32 | This can be fixed by using LLT::scalar(32) instead of LLT::scalar(1). |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
1902–1903 | This can ignore this? If it had a constant index the legalizer would have dealt with it already. It's also not wrong to do the rest for a constant | |
1936–1937 | RegBankSelect should never need to consider the constant bus restriction, see the long comment at the top of the file. Any vector operation should only use VGPR operands | |
llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll | ||
32 | Yes, see the long comment at the top of the file. Scalar compares need to always produce s32, s1 is assumed VCC. | |
66–81 | I think the register indexing looks better if the index is uniform |
llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll | ||
---|---|---|
66–81 | It may look better, but it is not faster. |
This can ignore this? If it had a constant index the legalizer would have dealt with it already. It's also not wrong to do the rest for a constant