Following on from D91255, this patch is responsible for sinking relevant mul operands to the same block so that umull/smull instructions can be correctly generated by the mul combine implemented in the aforementioned patch.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
10967 | I would expect this to check more about the Shuffle I think. Like the fact that it is a duplicate - a pair that looks like: %0 = insertelement <4 x i16> undef, i16 %src, i32 0 %x = shufflevector <4 x i16> %0, <4 x i16> undef, <4 x i32> zeroinitializer Otherwise you might be sinking any shuffle needlessly, or it may not have an insertelement (which might be what is going wrong below). |
Addressed comments, and rebased on top of the changes in the parent revision.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
10959 | From my understanding, the Add/Sub above will only sink the operands if they are both extending/shuffling, while this case checks each operand individually. |
llvm/test/CodeGen/AArch64/aarch64-matrix-umull.ll | ||
---|---|---|
4 ↗ | (On Diff #315077) | These two tests could probably live in the same file. Also it would be good to have a test that has a mul with both operands being sinkable shuffles. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
10967 | Should this be checking for isZeroEltSplat too? | |
10972 | Check for insert into element 0 too? | |
llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll | ||
170 | Try loading the value in the loop, to keep the mul loop-variant: vector.body: ; preds = %vector.header, %vector.body %index = phi i64 [ %index.next, %vector.body ], [ 0, %vector.header ] %g = getelementptr inbounds i16, i16* %A, i64 %index %val1 = load i16, i16* %g %splat.input.ext = zext i16 %val1 to i32 %broadcast.splatinsert31 = insertelement <4 x i32> undef, i32 %splat.input.ext, i32 0 %broadcast.splat32 = shufflevector <4 x i32> %broadcast.splatinsert31, <4 x i32> %broadcast.splat, <4 x i32> <i32 0, i32 0, i32 0, i32 0> |
This broke compilation of some sources, reproducible like this:
$ cat repro.c a; b(long long c) { if (c & 5) return c; } d(c, e) { int f; int *g = a, *h = d; f = 0; for (; f < c; f++) g[f] = b((long long)h[f] * e); } $ clang -target aarch64-linux-gnu -c repro.c -O2 fatal error: error in backend: Cannot select: t85: v2i32 = AArch64ISD::DUP t15 t15: i64,ch = CopyFromReg t0, Register:i64 %1 t14: i64 = Register %1
Thanks for reporting this, and thanks for handling the revert on my behalf. I had hoped to be able to fix it today (avoiding said revert), but it looks to be bigger than I had initially thought.
Why isn't this logic the same as for Add/Sub above?