For below C code, we can use VNNI to combine the mul and add operation.
int usdot_prod_qi(unsigned char *restrict a, char *restrict b, int c, int n) {
int i; for (i = 0; i < 32; i++) { c += ((int)a[i] * (int)b[i]); } return c;
}
We didn't support the combine acoss basic block in this patch.
Any explicit checks for extension/truncation and their bitwidth delta instantly make me suspicious nowadays.
Does this deal with commutativity?
I think what you want to check is the number of known sign bits / known leading zero bits.