D.u32 = S0.u4[0] * S1.u4[0] +
S0.u4[1] * S1.u4[1] + S0.u4[2] * S1.u4[2] + S0.u4[3] * S1.u4[3] + S0.u4[4] * S1.u4[4] + S0.u4[5] * S1.u4[5] + S0.u4[6] * S1.u4[6] + S0.u4[7] * S1.u4[7] + S2.u32
Negated form will be supported with idot8.
Paths
| Differential D51947
[AMDGPU] Match udot8 pattern ClosedPublic Authored by FarhanaAleen on Sep 11 2018, 1:39 PM.
Details Summary D.u32 = S0.u4[0] * S1.u4[0] + S0.u4[1] * S1.u4[1] + S0.u4[2] * S1.u4[2] + S0.u4[3] * S1.u4[3] + S0.u4[4] * S1.u4[4] + S0.u4[5] * S1.u4[5] + S0.u4[6] * S1.u4[6] + S0.u4[7] * S1.u4[7] + S2.u32 Negated form will be supported with idot8.
Diff Detail Event TimelineHerald added subscribers: t-tye, tpr, dstuttard and 5 others. · View Herald TranscriptSep 11 2018, 1:39 PM Comment Actions As for the testcases, what about vectorized multiplicaton, i.e.: %vec1 = load <8 x i4>, ... vec2 = load <8 x i4>, ... %ext1 = zext <8 x i4> %vec1 to <8 x i32> %ext2 = zext <8 x i4> %vec2 to <8 x i32> %mul = mul nuw nsw <8 x i32> %ext1, %ext2 ... then extractelement and add up the result ... or possibly the same thing without the zext The TableGen itself looks good to me, except for one nitpick (inline).
Comment Actions Thanks Nicolai.
Comment Actions Thanks, this mostly looks good to me. Looks like this may be running into a serious limitation of the ISel infrastructure with commutativity / associativity, but it makes sense to land this patch without addressing it. I do have one last question.
Comment Actions
I have been thinking about different solutions to handle it. One easiest solution would be to put a threshold during permutation. Thanks, yes I would like to go ahead with this patch.
This revision is now accepted and ready to land.Sep 18 2018, 3:55 AM Closed by commit rL342497: [AMDGPU] Match udot8 pattern (authored by faaleen). · Explain WhySep 18 2018, 10:02 AM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 165175 lib/Target/AMDGPU/VOP3PInstructions.td
test/CodeGen/AMDGPU/idot8.ll
|
I realize this was already done this way for the 8bit case, but it would be cleaner to use Index 0-7 instead 1-8. 0-based indexing is more natural here, and would avoid the !add(.., -1).