This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Produce madak and madmk from the two-address pass
ClosedPublic

Authored by rampitec on Sep 1 2017, 12:10 PM.

Details

Summary

These two instructions are normally selected, but when the
two address pass converts mac into mad we end up with the
mad where we could have one of these.

Diff Detail

Repository
rL LLVM

Event Timeline

rampitec created this revision.Sep 1 2017, 12:10 PM
arsenm edited edge metadata.Sep 1 2017, 3:45 PM

I don't think this is the right place for this. This is converting a two address instruction into another two address instruction. Why wasn't this matched to a v_madak originally?

arsenm added a comment.Sep 1 2017, 3:52 PM

This looks like it is undoing the specific handling for 2 uses done in FoldImmediate. This also probably should be handled there, and specifically for 2 uses. Also needs a test with 3+ uses, where it probably still shouldn't be folded.

I don't think this is the right place for this. This is converting a two address instruction into another two address instruction. Why wasn't this matched to a v_madak originally?

It was selected as mac, and mac is preferable over mad or madmk.

This looks like it is undoing the specific handling for 2 uses done in FoldImmediate. This also probably should be handled there, and specifically for 2 uses. Also needs a test with 3+ uses, where it probably still shouldn't be folded.

There is no point in the folding. Both mad and madak use 64 bit, both use constant bus. With folding we also using one extra register and plus madak/madmk are VOP2, so preferable.

I don't think this is the right place for this. This is converting a two address instruction into another two address instruction. Why wasn't this matched to a v_madak originally?

It was selected as mac, and mac is preferable over mad or madmk.

In fact I do not see how madak is different from mad in the respect of two/three address. It really worth nothing that one of the register operands can be also folded into the instruction as immediate. If we would produce mad here and fold immediate later the result would be same, just more work. The pass needs to simplify operands and we do it. For the selection part I also do not see how could it efficiently deal with register pressure issues, it is just too early.

arsenm added a comment.Sep 6 2017, 3:08 PM

I don't think this is the right place for this. This is converting a two address instruction into another two address instruction. Why wasn't this matched to a v_madak originally?

It was selected as mac, and mac is preferable over mad or madmk.

In fact I do not see how madak is different from mad in the respect of two/three address. It really worth nothing that one of the register operands can be also folded into the instruction as immediate. If we would produce mad here and fold immediate later the result would be same, just more work. The pass needs to simplify operands and we do it. For the selection part I also do not see how could it efficiently deal with register pressure issues, it is just too early.

OK, I was thinking of s_mulk_i32/s_addk_i32 which is an inplace update of the register.

test/CodeGen/AMDGPU/madak.ll
40 ↗(On Diff #113569)

Should add a test with 3 uses. We should consider not doing it for > 2 uses if optsize

rampitec added inline comments.Sep 6 2017, 3:25 PM
test/CodeGen/AMDGPU/madak.ll
40 ↗(On Diff #113569)

We shall always use madak/madmk instead of v_mad_f32. Size of the instruction is the same: mad is VOP3, madak/madmk are VOP2 + literal. I.e. 64 bit VOP3 vs 32 bit VOP2 + 32 bit literal.

At the same time we can save a register for the literal and move into that register. So even for the optsize we shall prefer these.

arsenm added inline comments.Sep 6 2017, 6:27 PM
lib/Target/AMDGPU/SIInstrInfo.cpp
2088 ↗(On Diff #113569)

You don't need the isFullCopy check. COPY can't be used to materialize an immediate. Def->isMoveImm() instead of the separate opcode checks

2133 ↗(On Diff #113569)

Braces

2134 ↗(On Diff #113569)

What if this is an f16 mac?

rampitec updated this revision to Diff 114118.Sep 6 2017, 7:30 PM
rampitec marked 2 inline comments as done.
rampitec added inline comments.
lib/Target/AMDGPU/SIInstrInfo.cpp
2088 ↗(On Diff #113569)

isMoveImm() will catch cmov, so will not work. isFullCopy removed.

2134 ↗(On Diff #113569)

"if (!IsF16 ..." on line 2132.

arsenm added inline comments.Sep 7 2017, 3:25 PM
lib/Target/AMDGPU/SIInstrInfo.cpp
2088 ↗(On Diff #113569)

It won't catch cmov (I assume you mean s_cmov_b32, which we also don't emit).

2134 ↗(On Diff #113569)

But we can handle f16 here as well, so might as well

rampitec added inline comments.Sep 7 2017, 3:30 PM
lib/Target/AMDGPU/SIInstrInfo.cpp
2088 ↗(On Diff #113569)

It doesn't matter we do not produce it right now. Until it is marked as move imm I cannot use it, that is a time bomb. In fact I could remove flag from S_CMOV_B*. Objections?

rampitec updated this revision to Diff 114270.Sep 7 2017, 3:52 PM
rampitec marked 3 inline comments as done.

Added f16 case.

arsenm accepted this revision.Sep 11 2017, 9:18 AM

LGTM with the minor cleanup

lib/Target/AMDGPU/SIInstrInfo.cpp
2139 ↗(On Diff #114270)

Assign the opcode to a variable instead of repeating this 3 places

This revision is now accepted and ready to land.Sep 11 2017, 9:18 AM

LGTM with the minor cleanup

We only know what to assign inside each if statement.

This revision was automatically updated to reflect the committed changes.