Potentially sgpr to sgpr copy should also be possible.
That is however trickier because we may end up with a
wrong register class at use because of xm0/xexec permutations.
Details
Details
- Reviewers
- arsenm - vpykhtin 
- Commits
- rG61e7a61bdccf: [AMDGPU] Allow folding of sgpr to vgpr copy
Diff Detail
Diff Detail
Event Timeline
| llvm/test/CodeGen/AMDGPU/fmul-2-combine-multi-use.ll | ||
|---|---|---|
| 79–80 ↗ | (On Diff #225952) | This looks like it got worse? | 
| llvm/test/CodeGen/AMDGPU/fmul-2-combine-multi-use.ll | ||
|---|---|---|
| 79–80 ↗ | (On Diff #225952) | Yes, this is regression specific to fma/mac. The reg class after the folding mismatches xm0/xexec operand definition of fma src. | 
| llvm/test/CodeGen/AMDGPU/fmul-2-combine-multi-use.ll | ||
|---|---|---|
| 79–80 ↗ | (On Diff #225952) | I.e. we should refine how we use sgpr register classes instead of inhibiting folding. | 
| llvm/test/CodeGen/AMDGPU/fmul-2-combine-multi-use.ll | ||
|---|---|---|
| 79–80 ↗ | (On Diff #225952) | The fma src doesn't use xm0_xexec though? Can you add a testcase with this specific case? I think this should be easily avoidable | 
| llvm/test/CodeGen/AMDGPU/fmul-2-combine-multi-use.ll | ||
|---|---|---|
| 79–80 ↗ | (On Diff #225952) | It is explicitly disabled in the SIFoldOperands::foldOperand(): // Don't fold subregister extracts into tied operands, only if it is a full
// copy since a subregister use tied to a full register def doesn't really
// make sense. e.g. don't fold:
//
// %1 = COPY %0:sub1
// %2<tied3> = V_MAC_{F16, F32} %3, %4, %1<tied0>
//
//  into
// %2<tied3> = V_MAC_{F16, F32} %3, %4, %0:sub1<tied0>
if (UseOp.isTied() && OpToFold.getSubReg() != AMDGPU::NoSubRegister)
  return; | 
Comment Actions
Changed run line to gfx1010, otherwise folding of sgpr in the test does not happen because it violates constant bus restriction.