This function can change regbank for registers which already have a selected
bank. Depending on the instruction where these registers were used it can
cause instruction selection to fail.
Details
Diff Detail
Event Timeline
Needs a new testcase that failed to select
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
2049 | I don't understand how this came to change the bank of an already assigned register. The bank should already be correct here? Why isn't the problem earlier in the initial bank selection? |
The issue is in G_AMDGPU_S_BUFFER_LOAD where first we select sgpr for %4. However when selecting a bank for G_INSERT_VECTOR_ELT then foldInsertEltToCmpSelect will change it to vgpr and instruction-select will fail later on.
This can also be seen in other tests like insert_vector_elt_v4i32_v_s_s
where before the patch we had:
%1:vgpr(s32) = COPY $sgpr0
and now it is:
%1:sgpr(s32) = COPY $sgpr0
but this does not bother instruction-select.
Specifically in the new test:
%4:sgpr(s32) = G_AMDGPU_S_BUFFER_LOAD %0(<4 x s32>), %3(s32), 0 :: (dereferenceable invariant load 4)
was changed into
%4:vgpr(s32) = G_AMDGPU_S_BUFFER_LOAD %0(<4 x s32>), %3(s32), 0 :: (dereferenceable invariant load 4)
after foldInsertEltToCmpSelect
I wonder if we should ban this in the verifier. It's not wrong, but it sure feels like bad form to allow cross bank copies involving physical registers
I guess that should be a separate patch and not related to this?
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
2049 | Can you clarify what you mean by this? %1:sgpr(<2 x s32>) = COPY $sgpr4_sgpr5 %2:vgpr(s32) = COPY $vgpr0 %3:sgpr(s32) = G_CONSTANT i32 0 - %4:sgpr(s32) = G_AMDGPU_S_BUFFER_LOAD %0:sgpr(<4 x s32>), %3:sgpr(s32), 0 :: (dereferenceable invariant load 4) + %4:vgpr(s32) = G_AMDGPU_S_BUFFER_LOAD %0:sgpr(<4 x s32>), %3:sgpr(s32), 0 :: (dereferenceable invariant load 4) %6:vgpr(<2 x s32>) = COPY %1:sgpr(<2 x s32>) - %5:vgpr(<2 x s32>) = G_INSERT_VECTOR_ELT %6:vgpr, %4:sgpr(s32), %2:vgpr(s32) + %7:vgpr(s32), %8:vgpr(s32) = G_UNMERGE_VALUES %6:vgpr(<2 x s32>) + %9:sgpr(s32) = G_CONSTANT i32 0 + %10:vcc(s1) = G_ICMP intpred(eq), %2:vgpr(s32), %9:sgpr + %11:vgpr(s32) = G_SELECT %10:vcc(s1), %4:vgpr, %7:vgpr + %12:sgpr(s32) = G_CONSTANT i32 1 + %13:vcc(s1) = G_ICMP intpred(eq), %2:vgpr(s32), %12:sgpr + %14:vgpr(s32) = G_SELECT %13:vcc(s1), %4:vgpr, %8:vgpr + %5:vgpr(<2 x s32>) = G_BUILD_VECTOR %11:vgpr(s32), %14:vgpr(s32) S_ENDPGM 0, implicit %5:vgpr(<2 x s32>) | |
2066 | *this overwrites the bank for operand 2 which is the destination of G_AMDGPU_S_BUFFER_LOAD. |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
2048 | OK I see what's happening now. I think this should introduce a new utility function to try to constrain a register to a register bank, or insert a copy if not (similar to the existing ones for concrete register classes). This should also be called below in place of the setRegBank where this is overwritten |
The function would look something like this.
Where do you think we should put this new function? MachineRegisterInfo? AMDGPURegisterBankInfo?
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
1989 | If we set reg banks before creating select we can avoid iterator changes. |
llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll | ||
---|---|---|
2–4 | Did you mean to commit this? Can you fix the tests instead of disabling the RUN lines? |
llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll | ||
---|---|---|
2–4 | Sorry, missed this. It's updated now. |
How about constrainRegToBank?