D.u32 = S0.u8 * S1.u8 +
S0.u8 * S1.u8 + S0.u8 * S1.u8 + S0.u8 * S1.u8 + S2.u32
Can you also add tests/support for the negated form? i.e. -S0.u8 * S1.u8 - S0.u8 * S1.u8 - S0.u8 * S1.u8 - S2.u32. I'm not sure how this will canonicalize, but I don't think we do as much as we do with FP negates since we don't have int source modifiers
I would like to support it in a separate patch.
First of all, once we negate one of the operands, we will need to generate signed dot instruction. This patch only defines/matches unsigned dot4.
Secondly, I am not sure about the profitability. It's not going to be a trivial negate operation since the negation needs to happen on certain number of bits on any location inside a temp instead of on a temp. Roughly, we will need 12 instructions. May be there is a better way of doing it. Even then, to me it seems like it will defeat the whole purpose of having these new dot instructions as opposed to run them as a series of MAD. May be it's better to support it in the hardware by defining a separate dot4 instruction. I will perform some analysis. In any case, I would like to handle this in a separate patch.
|204–209 ↗||(On Diff #161317)|
It's not going to be tremendouly less if we were to implement it using fold operator.
(!cast<dag>(!foldl(node:$src2, [1, 2, 3, 4], lhs, y, (add_oneuse lhs, (!cast<PatFrag>("MulU_Elt"#y) node:$src0, node:$src1))))), (V_DOT4_U32_U8 (i32 8), $src0, (i32 8), $src1, (i32 8), $src2, (i1 0))
Currently, it does not support a dag pattern. Looks like there is a bug in the fold input parser. I am debugging the root cause. In the meantime, I would like to go ahead with this patch. I will revisit this once the fold operator supports a dag pattern.