This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GlobalISel] Stop foldInsertEltToCmpSelect from changing reg banks
ClosedPublic

Authored by mbrkusanin on Mar 12 2021, 8:39 AM.

Details

Summary

This function can change regbank for registers which already have a selected
bank. Depending on the instruction where these registers were used it can
cause instruction selection to fail.

Diff Detail

Event Timeline

mbrkusanin created this revision.Mar 12 2021, 8:39 AM
mbrkusanin requested review of this revision.Mar 12 2021, 8:39 AM

Needs a new testcase that failed to select

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
2046

I don't understand how this came to change the bank of an already assigned register. The bank should already be correct here? Why isn't the problem earlier in the initial bank selection?

arsenm requested changes to this revision.Mar 22 2021, 2:41 PM
This revision now requires changes to proceed.Mar 22 2021, 2:41 PM

The issue is in G_AMDGPU_S_BUFFER_LOAD where first we select sgpr for %4. However when selecting a bank for G_INSERT_VECTOR_ELT then foldInsertEltToCmpSelect will change it to vgpr and instruction-select will fail later on.

This can also be seen in other tests like insert_vector_elt_v4i32_v_s_s
where before the patch we had:
%1:vgpr(s32) = COPY $sgpr0
and now it is:
%1:sgpr(s32) = COPY $sgpr0
but this does not bother instruction-select.

Specifically in the new test:
%4:sgpr(s32) = G_AMDGPU_S_BUFFER_LOAD %0(<4 x s32>), %3(s32), 0 :: (dereferenceable invariant load 4)
was changed into
%4:vgpr(s32) = G_AMDGPU_S_BUFFER_LOAD %0(<4 x s32>), %3(s32), 0 :: (dereferenceable invariant load 4)
after foldInsertEltToCmpSelect

%1:vgpr(s32) = COPY $sgpr0

I wonder if we should ban this in the verifier. It's not wrong, but it sure feels like bad form to allow cross bank copies involving physical registers

%1:vgpr(s32) = COPY $sgpr0

I wonder if we should ban this in the verifier. It's not wrong, but it sure feels like bad form to allow cross bank copies involving physical registers

I guess that should be a separate patch and not related to this?

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
2046

Can you clarify what you mean by this?
The register in issue is %4. Below is the diff for state before and after foldInsertEltToCmpSelect for the new test. G_INSERT_VECTOR_ELT is expanded but the function in question also affects destination of G_AMDGPU_S_BUFFER_LOAD and that is what I'm trying to avoid changing. I don't believe that the issue was somehow in selecting sgpr for %4 but rather in line 2023* when we overwrite the bank for it.

   %1:sgpr(<2 x s32>) = COPY $sgpr4_sgpr5
   %2:vgpr(s32) = COPY $vgpr0
   %3:sgpr(s32) = G_CONSTANT i32 0
-  %4:sgpr(s32) = G_AMDGPU_S_BUFFER_LOAD %0:sgpr(<4 x s32>), %3:sgpr(s32), 0 :: (dereferenceable invariant load 4)
+  %4:vgpr(s32) = G_AMDGPU_S_BUFFER_LOAD %0:sgpr(<4 x s32>), %3:sgpr(s32), 0 :: (dereferenceable invariant load 4)
   %6:vgpr(<2 x s32>) = COPY %1:sgpr(<2 x s32>)
-  %5:vgpr(<2 x s32>) = G_INSERT_VECTOR_ELT %6:vgpr, %4:sgpr(s32), %2:vgpr(s32)
+  %7:vgpr(s32), %8:vgpr(s32) = G_UNMERGE_VALUES %6:vgpr(<2 x s32>)
+  %9:sgpr(s32) = G_CONSTANT i32 0
+  %10:vcc(s1) = G_ICMP intpred(eq), %2:vgpr(s32), %9:sgpr
+  %11:vgpr(s32) = G_SELECT %10:vcc(s1), %4:vgpr, %7:vgpr
+  %12:sgpr(s32) = G_CONSTANT i32 1
+  %13:vcc(s1) = G_ICMP intpred(eq), %2:vgpr(s32), %12:sgpr
+  %14:vgpr(s32) = G_SELECT %13:vcc(s1), %4:vgpr, %8:vgpr
+  %5:vgpr(<2 x s32>) = G_BUILD_VECTOR %11:vgpr(s32), %14:vgpr(s32)
   S_ENDPGM 0, implicit %5:vgpr(<2 x s32>)
2063–2064

*this overwrites the bank for operand 2 which is the destination of G_AMDGPU_S_BUFFER_LOAD.

arsenm requested changes to this revision.May 6 2021, 6:05 PM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
2045

OK I see what's happening now. I think this should introduce a new utility function to try to constrain a register to a register bank, or insert a copy if not (similar to the existing ones for concrete register classes). This should also be called below in place of the setRegBank where this is overwritten

This revision now requires changes to proceed.May 6 2021, 6:05 PM

The function would look something like this.

Where do you think we should put this new function? MachineRegisterInfo? AMDGPURegisterBankInfo?

arsenm added inline comments.May 14 2021, 10:51 AM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1982

How about constrainRegToBank?

1989

setInstrAndDebugLoc

Isn't this set where you want it to be already? Can you just avoid all the iterator changes?

mbrkusanin marked an inline comment as not done.
mbrkusanin added inline comments.
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1989

If we set reg banks before creating select we can avoid iterator changes.

foad added a comment.May 19 2021, 6:23 AM

Patch looks much nicer now, thanks.

arsenm accepted this revision.May 24 2021, 5:19 PM
This revision is now accepted and ready to land.May 24 2021, 5:19 PM
foad added inline comments.May 26 2021, 12:28 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll
2–4

Did you mean to commit this? Can you fix the tests instead of disabling the RUN lines?

mbrkusanin added inline comments.May 26 2021, 3:01 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll
2–4

Sorry, missed this. It's updated now.