PMADDWD can help improve 8/16 bit integer mutliply-add operation performance for cases like:
for (int i = 0; i < count; i++)
a += x[i] * y[i];
Differential D31679
Use PMADDWD to expand reduction in a loop danielcdh on Apr 4 2017, 2:05 PM. Authored by
Details PMADDWD can help improve 8/16 bit integer mutliply-add operation performance for cases like: for (int i = 0; i < count; i++) a += x[i] * y[i];
Diff Detail
Event Timeline
Comment Actions remove the support for PMADDUBSW as it cannot handle overflow case.
Comment Actions Thanks for working on this patch. Regarding support for PMADDUBSW, can we match something like the following? for (int i = 0; i < count; i++) { a = saturate(a + x[i] * y[i]); } Comment Actions I suggest we leave the PMADDUBSW discussion for a separate patch. Some minor comments inline.
Comment Actions I'm not aware of such a builtin and my snippet above was more of pseudo-code. int sat_sint16(int x) { return std::min(32767, std::max(-32768, x)); } AFAIK, the loop vectorizer will not vectorize the reduction for PMADDUBSW, so i agree with @mkuper to do this in a different patch, |
Maybe use std::swap, so that Op0 and Op1 are unnecessary.
MulOp = N->getOperand(0);
Phi = N->getOperand(1);
if (MulOp.getOpcode() != ISD::MUL) {
std::swap(MulOp, Phi);
if (MulOp.getOpcode() != ISD::MUL)
return SDValue();
}