These two instructions are normally selected, but when the
two address pass converts mac into mad we end up with the
mad where we could have one of these.
Details
Diff Detail
Event Timeline
I don't think this is the right place for this. This is converting a two address instruction into another two address instruction. Why wasn't this matched to a v_madak originally?
This looks like it is undoing the specific handling for 2 uses done in FoldImmediate. This also probably should be handled there, and specifically for 2 uses. Also needs a test with 3+ uses, where it probably still shouldn't be folded.
There is no point in the folding. Both mad and madak use 64 bit, both use constant bus. With folding we also using one extra register and plus madak/madmk are VOP2, so preferable.
In fact I do not see how madak is different from mad in the respect of two/three address. It really worth nothing that one of the register operands can be also folded into the instruction as immediate. If we would produce mad here and fold immediate later the result would be same, just more work. The pass needs to simplify operands and we do it. For the selection part I also do not see how could it efficiently deal with register pressure issues, it is just too early.
OK, I was thinking of s_mulk_i32/s_addk_i32 which is an inplace update of the register.
test/CodeGen/AMDGPU/madak.ll | ||
---|---|---|
40 | Should add a test with 3 uses. We should consider not doing it for > 2 uses if optsize |
test/CodeGen/AMDGPU/madak.ll | ||
---|---|---|
40 | We shall always use madak/madmk instead of v_mad_f32. Size of the instruction is the same: mad is VOP3, madak/madmk are VOP2 + literal. I.e. 64 bit VOP3 vs 32 bit VOP2 + 32 bit literal. At the same time we can save a register for the literal and move into that register. So even for the optsize we shall prefer these. |
lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
2092 | It doesn't matter we do not produce it right now. Until it is marked as move imm I cannot use it, that is a time bomb. In fact I could remove flag from S_CMOV_B*. Objections? |
LGTM with the minor cleanup
lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
2139 | Assign the opcode to a variable instead of repeating this 3 places |
You don't need the isFullCopy check. COPY can't be used to materialize an immediate. Def->isMoveImm() instead of the separate opcode checks