Download Raw Diff

Details

Reviewers

arsenm
tpr
dstuttard
nhaehnle

Summary

This propagates a sgpr->vgpr copy into an operand of a
v_fma/v_mad that is generated from v_fmac, given it
doesn't break constant bus restriction

Change-Id: I39cacf43205e15982f7405c805d67b7a8b11d2a9

Diff Detail

Repository

rL LLVM

Build Status

Buildable 37871
Build 37870: arc lint + arc unit

Event Timeline

rtaylor created this revision.Aug 23 2019, 10:17 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2019, 10:17 AM

Herald added subscribers: llvm-commits, t-tye, tpr and 7 others. · View Herald Transcript

Added test case

rtaylor added reviewers: arsenm, tpr, dstuttard, nhaehnle.Aug 23 2019, 10:19 AM

Harbormaster completed remote builds in B37201: Diff 216889.Aug 23 2019, 10:19 AM

Harbormaster completed remote builds in B37202: Diff 216890.

I somewhat expected this to be handled in SIFoldOperands, as constants are already folded there and this is essentially the same problem. It will always save an instruction. This is the version RA uses and I'm not sure I expect it to do any folding

lib/Target/AMDGPU/SIInstrInfo.cpp
2630–2633 ↗	(On Diff #216890)	These are the exact conditions as checked above
2636–2637 ↗	(On Diff #216890)	This can fail
2638 ↗	(On Diff #216890)	Extra parentheses
test/CodeGen/AMDGPU/fmac-fma-sgpr-copy.ll
5–13	You should be able to reduce this. You shouldn't need any vector operations

Made some changes to code.

Harbormaster completed remote builds in B37207: Diff 216914.Aug 23 2019, 11:47 AM

rtaylor marked 2 inline comments as done.Aug 23 2019, 11:49 AM

rtaylor added inline comments.

lib/Target/AMDGPU/SIInstrInfo.cpp
2630–2633 ↗	(On Diff #216890)	Yes, it is. I could create a local function that does this and replace both with that, it would be just as ugly since there are so many conditions (params) to pass, or I could pass MI and re-get all those operands, which would be the exact code that is already in the function.
test/CodeGen/AMDGPU/fmac-fma-sgpr-copy.ll
5–13	I tried a few different things but I probably just couldn't find the simple form to produce a convertable V_FMAC. Suggestions?

arsenm added inline comments.Aug 23 2019, 12:11 PM

lib/Target/AMDGPU/SIInstrInfo.cpp
2630–2633 ↗	(On Diff #216890)	Why can't you just set a bool flag inside the first condition? Anyway, I would prefer if the wasn't done here at all, and SIFoldOperands took care of this

rtaylor marked 2 inline comments as done.Aug 23 2019, 1:24 PM

rtaylor added inline comments.

lib/Target/AMDGPU/SIInstrInfo.cpp
2630–2633 ↗	(On Diff #216890)	Sure, no problem, if you think that looks better. This test case doesn't seem to be handled by SIFoldOperands (V_FMAC before and V_FMAC after) SIFoldOperands, so doing this change there wouldn't do anything.
test/CodeGen/AMDGPU/fmac-fma-sgpr-copy.ll
5–13	Just to be more explicit, this test case was stripped directly from a shader. If I do something like: define amdgpu_cs float @test1(<4 x i32> inreg %a, float %b, float %y, float %c) { entry: %.i098 = fsub nnan arcp float %b, %c %fma1 = call float @llvm.fma.f32(float %y, float %.i098, float %b) #3 ret float %fma1 } results in no v_mov, either inreg for b or not. If it's not inreg than it's a vgpr directly and if it's inreg than it's a sgpr already. Is there a simpler way to take in a vector and convert to scalar?

Set bool in condition.

Forgot to capitalize var name.

Harbormaster completed remote builds in B37215: Diff 216936.Aug 23 2019, 1:31 PM

Harbormaster completed remote builds in B37216: Diff 216938.

nhaehnle added inline comments.Aug 27 2019, 8:41 AM

lib/Target/AMDGPU/SIInstrInfo.cpp
2630–2633 ↗	(On Diff #216890)	`ConstantBus` really shouldn't be assigned inside the if condition itself. I assume @arsenm meant to set that variable in the body. That said, I also agree that this should happen in SIFoldOperands, and I don't follow your argument here. SIFoldOperands can already do this, e.g. consider define amdgpu_cs float @test2(float inreg %a, float %b, float %y, float %z) { entry: %fma1 = call float @llvm.fma.f32(float %y, float %z, float %a) #3 ret float %fma1 } The FMAC becomes an FMA in SIFoldOperands for this example. It's possible that the reason why it doesn't work in your case is that that code doesn't handle sub-registers.

rtaylor marked an inline comment as done.Aug 28 2019, 10:50 AM

rtaylor added inline comments.

lib/Target/AMDGPU/SIInstrInfo.cpp
2630–2633 ↗	(On Diff #216890)	I took the "inside the first condition" literally. Yes, you are correct, it seems it's because of subregs use in this test case.

rtaylor marked an inline comment as done.Aug 28 2019, 12:01 PM

rtaylor added inline comments.

lib/Target/AMDGPU/SIInstrInfo.cpp
2630–2633 ↗	(On Diff #216890)	Looks like this was done on purpose, not folding a sub register to a fully defined tied register.

Added condition in foldOperands to allow tied subreg folding
for V_FMAC and V_MAC operations

Harbormaster completed remote builds in B37871: Diff 219176.Sep 6 2019, 2:22 PM

Ping.

Can you also add a MIR test for this? Also testing the 16-bit case would be nice

lib/Target/AMDGPU/SIFoldOperands.cpp
512–514	Can you fold this opcode list into a canFoldTiedSrcOp (or something similar) predicate function

https://reviews.llvm.org/D69287 covers this.

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Remove unnecessary movs for v_fmac operands
AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 219176

lib/Target/AMDGPU/SIFoldOperands.cpp

test/CodeGen/AMDGPU/fmac-fma-sgpr-copy.ll

test/CodeGen/AMDGPU/fmul-2-combine-multi-use.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Remove unnecessary movs for v_fmac operandsAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 219176

lib/Target/AMDGPU/SIFoldOperands.cpp

test/CodeGen/AMDGPU/fmac-fma-sgpr-copy.ll

test/CodeGen/AMDGPU/fmul-2-combine-multi-use.ll

[AMDGPU] Remove unnecessary movs for v_fmac operands
AbandonedPublic