This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Remove unnecessary movs for v_fmac operands
AbandonedPublic

Authored by rtaylor on Aug 23 2019, 10:17 AM.

Details

Summary

This propagates a sgpr->vgpr copy into an operand of a
v_fma/v_mad that is generated from v_fmac, given it
doesn't break constant bus restriction

Change-Id: I39cacf43205e15982f7405c805d67b7a8b11d2a9

Diff Detail

Event Timeline

rtaylor created this revision.Aug 23 2019, 10:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2019, 10:17 AM
rtaylor updated this revision to Diff 216890.Aug 23 2019, 10:18 AM

Added test case

Harbormaster completed remote builds in B37202: Diff 216890.

I somewhat expected this to be handled in SIFoldOperands, as constants are already folded there and this is essentially the same problem. It will always save an instruction. This is the version RA uses and I'm not sure I expect it to do any folding

lib/Target/AMDGPU/SIInstrInfo.cpp
2630–2633 ↗(On Diff #216890)

These are the exact conditions as checked above

2636–2637 ↗(On Diff #216890)

This can fail

2638 ↗(On Diff #216890)

Extra parentheses

test/CodeGen/AMDGPU/fmac-fma-sgpr-copy.ll
5–13

You should be able to reduce this. You shouldn't need any vector operations

rtaylor updated this revision to Diff 216914.Aug 23 2019, 11:44 AM
rtaylor marked an inline comment as done.

Made some changes to code.

rtaylor marked 2 inline comments as done.Aug 23 2019, 11:49 AM
rtaylor added inline comments.
lib/Target/AMDGPU/SIInstrInfo.cpp
2630–2633 ↗(On Diff #216890)

Yes, it is. I could create a local function that does this and replace both with that, it would be just as ugly since there are so many conditions (params) to pass, or I could pass MI and re-get all those operands, which would be the exact code that is already in the function.

test/CodeGen/AMDGPU/fmac-fma-sgpr-copy.ll
5–13

I tried a few different things but I probably just couldn't find the simple form to produce a convertable V_FMAC. Suggestions?

arsenm added inline comments.Aug 23 2019, 12:11 PM
lib/Target/AMDGPU/SIInstrInfo.cpp
2630–2633 ↗(On Diff #216890)

Why can't you just set a bool flag inside the first condition?

Anyway, I would prefer if the wasn't done here at all, and SIFoldOperands took care of this

rtaylor marked 2 inline comments as done.Aug 23 2019, 1:24 PM
rtaylor added inline comments.
lib/Target/AMDGPU/SIInstrInfo.cpp
2630–2633 ↗(On Diff #216890)

Sure, no problem, if you think that looks better.

This test case doesn't seem to be handled by SIFoldOperands (V_FMAC before and V_FMAC after) SIFoldOperands, so doing this change there wouldn't do anything.

test/CodeGen/AMDGPU/fmac-fma-sgpr-copy.ll
5–13

Just to be more explicit, this test case was stripped directly from a shader. If I do something like:

define amdgpu_cs float @test1(<4 x i32> inreg %a, float %b, float %y, float %c) {
entry:

%.i098 = fsub nnan arcp float %b, %c
%fma1 = call float @llvm.fma.f32(float %y, float %.i098, float %b) #3
ret float %fma1

}

results in no v_mov, either inreg for b or not. If it's not inreg than it's a vgpr directly and if it's inreg than it's a sgpr already. Is there a simpler way to take in a vector and convert to scalar?

rtaylor updated this revision to Diff 216936.Aug 23 2019, 1:26 PM

Set bool in condition.

rtaylor updated this revision to Diff 216938.Aug 23 2019, 1:30 PM

Forgot to capitalize var name.

Harbormaster completed remote builds in B37216: Diff 216938.
nhaehnle added inline comments.Aug 27 2019, 8:41 AM
lib/Target/AMDGPU/SIInstrInfo.cpp
2630–2633 ↗(On Diff #216890)

ConstantBus really shouldn't be assigned inside the if condition itself. I assume @arsenm meant to set that variable in the body.

That said, I also agree that this should happen in SIFoldOperands, and I don't follow your argument here. SIFoldOperands can already do this, e.g. consider

define amdgpu_cs float @test2(float inreg %a, float %b, float %y, float %z) {
entry:
  %fma1 = call float @llvm.fma.f32(float %y, float %z, float %a) #3
  ret float %fma1
}

The FMAC becomes an FMA in SIFoldOperands for this example. It's possible that the reason why it doesn't work in your case is that that code doesn't handle sub-registers.

rtaylor marked an inline comment as done.Aug 28 2019, 10:50 AM
rtaylor added inline comments.
lib/Target/AMDGPU/SIInstrInfo.cpp
2630–2633 ↗(On Diff #216890)

I took the "inside the first condition" literally.

Yes, you are correct, it seems it's because of subregs use in this test case.

rtaylor marked an inline comment as done.Aug 28 2019, 12:01 PM
rtaylor added inline comments.
lib/Target/AMDGPU/SIInstrInfo.cpp
2630–2633 ↗(On Diff #216890)

Looks like this was done on purpose, not folding a sub register to a fully defined tied register.

rtaylor updated this revision to Diff 219176.Sep 6 2019, 2:20 PM

Added condition in foldOperands to allow tied subreg folding
for V_FMAC and V_MAC operations

Can you also add a MIR test for this? Also testing the 16-bit case would be nice

lib/Target/AMDGPU/SIFoldOperands.cpp
512–514

Can you fold this opcode list into a canFoldTiedSrcOp (or something similar) predicate function