D.u32 = S0.u8[0] * S1.u8[0] +
S0.u8[1] * S1.u8[1] + S0.u8[2] * S1.u8[2] + S0.u8[3] * S1.u8[3] + S2.u32
Paths
| Differential D50921
[AMDGPU] Match udot4 pattern. ClosedPublic Authored by FarhanaAleen on Aug 17 2018, 1:47 PM.
Details Summary D.u32 = S0.u8[0] * S1.u8[0] + S0.u8[1] * S1.u8[1] + S0.u8[2] * S1.u8[2] + S0.u8[3] * S1.u8[3] + S2.u32
Diff Detail
Event TimelineHerald added subscribers: t-tye, tpr, dstuttard and 4 others. · View Herald TranscriptAug 17 2018, 1:47 PM Comment Actions Can you also add tests/support for the negated form? i.e. -S0.u8[1] * S1.u8[1] - S0.u8[2] * S1.u8[2] - S0.u8[3] * S1.u8[3] - S2.u32. I'm not sure how this will canonicalize, but I don't think we do as much as we do with FP negates since we don't have int source modifiers Comment Actions
I would like to support it in a separate patch. First of all, once we negate one of the operands, we will need to generate signed dot instruction. This patch only defines/matches unsigned dot4. Secondly, I am not sure about the profitability. It's not going to be a trivial negate operation since the negation needs to happen on certain number of bits on any location inside a temp instead of on a temp. Roughly, we will need 12 instructions. May be there is a better way of doing it. Even then, to me it seems like it will defeat the whole purpose of having these new dot instructions as opposed to run them as a series of MAD. May be it's better to support it in the hardware by defining a separate dot4 instruction. I will perform some analysis. In any case, I would like to handle this in a separate patch. Thanks. Comment Actions
I mean you can factor out the negate to match this, the opposite of for fneg. As a separate patch is fine
This revision is now accepted and ready to land.Aug 28 2018, 10:42 AM Closed by commit rL340936: [AMDGPU] Match udot4 pattern. (authored by faaleen). · Explain WhyAug 29 2018, 9:34 AM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 163123 llvm/trunk/lib/Target/AMDGPU/VOP3PInstructions.td
llvm/trunk/test/CodeGen/AMDGPU/idot4.ll
|