We can legalize the operation MUL for v8i16 with instruction (vmladduhm A, B, 0) if altivec enabled. Now, it is set as custom and expand it later, which is not the right way. And then, we can add the pattern to match the mul + add with (vmladduhm A, B, C)
Details
Details
- Reviewers
nemanjai jsji - Group Reviewers
Restricted Project - Commits
- rG1ef7bf412141: [PowerPC] Improve the way legalize mul for v8i16 and add pattern to match mul +…
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
Thanks for putting this up. This is actually something I meant to do very soon as we need it for a specific benchmark. Please fix the pattern and then this patch is fine.
Demonstration that the pattern is backwards:
$ cat def.c vector unsigned short test(vector unsigned short A, vector unsigned short B, vector unsigned short C) { return A + B * C; } $ cat use.c #include <stdio.h> #include <stdlib.h> vector unsigned short test(vector unsigned short, vector unsigned short, vector unsigned short); int main(int argc, const char **argv) { unsigned short A = atoi(argv[1]); unsigned short B = atoi(argv[2]); unsigned short C = atoi(argv[3]); vector unsigned short Res = test((vector unsigned short)A, (vector unsigned short)B, (vector unsigned short)C); printf("Res: { %hu, %hu, %hu, %hu, %hu, %hu, %hu, %hu }\n", Res[0], Res[1], Res[2], Res[3], Res[4], Res[5], Res[6], Res[7]); return 0; } $ ./a.out 4 3 2 Res: { 11, 11, 11, 11, 11, 11, 11, 11 } # correct value: $ ./correct 4 3 2 Res: { 10, 10, 10, 10, 10, 10, 10, 10 }
llvm/lib/Target/PowerPC/PPCInstrAltivec.td | ||
---|---|---|
876 | Huh? Is this right? Doesn't the instruction do $vA * $vB + $vC? So wouldn't that mean that we need the input pattern to be (add v8i16:$vC, (mul v8i16:$vA, v8i16:$vB))? Or the output to be (VMLADDUHM $vB, $vC, $vA) if we are leaving the input the same. |
llvm/lib/Target/PowerPC/PPCInstrAltivec.td | ||
---|---|---|
876 | Oops! You are right. |
Huh? Is this right? Doesn't the instruction do $vA * $vB + $vC? So wouldn't that mean that we need the input pattern to be (add v8i16:$vC, (mul v8i16:$vA, v8i16:$vB))? Or the output to be (VMLADDUHM $vB, $vC, $vA) if we are leaving the input the same.