Page MenuHomePhabricator

[AArch64] Attempt to sink mul operands

Authored by NickGuy on Nov 11 2020, 8:47 AM.



Following on from D91255, this patch is responsible for sinking relevant mul operands to the same block so that umull/smull instructions can be correctly generated by the mul combine implemented in the aforementioned patch.

Diff Detail

Event Timeline

NickGuy created this revision.Nov 11 2020, 8:47 AM
NickGuy requested review of this revision.Nov 11 2020, 8:47 AM
resistor added inline comments.

Why isn't this logic the same as for Add/Sub above?


Dynamic extraction from a constant vector.

dmgreen added inline comments.Nov 11 2020, 1:54 PM

I would expect this to check more about the Shuffle I think. Like the fact that it is a duplicate - a pair that looks like:

%0 = insertelement <4 x i16> undef, i16 %src, i32 0
%x = shufflevector <4 x i16> %0, <4 x i16> undef, <4 x i32> zeroinitializer

Otherwise you might be sinking any shuffle needlessly, or it may not have an insertelement (which might be what is going wrong below).

NickGuy updated this revision to Diff 310119.Dec 8 2020, 3:01 AM
NickGuy marked 3 inline comments as done.

Addressed comments, and rebased on top of the changes in the parent revision.


From my understanding, the Add/Sub above will only sink the operands if they are both extending/shuffling, while this case checks each operand individually.
It might be beneficial to make Add/Sub aware of this case, unifying the instruction cases, but that's not what this patch is trying to achieve (the parent patch that this aids only checks for mul patterns)

NickGuy updated this revision to Diff 315077.Jan 7 2021, 2:52 AM
NickGuy edited the summary of this revision. (Show Details)

Made the return value respect whether any operands are to be sunk.

dmgreen added inline comments.Jan 7 2021, 3:06 AM
4 ↗(On Diff #315077)

These two tests could probably live in the same file.

Also it would be good to have a test that has a mul with both operands being sinkable shuffles.

NickGuy updated this revision to Diff 315767.Jan 11 2021, 5:25 AM
NickGuy marked an inline comment as done.
dmgreen added inline comments.Jan 11 2021, 6:11 AM

Should this be checking for isZeroEltSplat too?


Check for insert into element 0 too?


Try loading the value in the loop, to keep the mul loop-variant:

vector.body:                                      ; preds = %vector.header, %vector.body
  %index = phi i64 [, %vector.body ], [ 0, %vector.header ]
  %g = getelementptr inbounds i16, i16* %A, i64 %index
  %val1 = load i16, i16* %g
  %splat.input.ext = zext i16 %val1 to i32
  %broadcast.splatinsert31 = insertelement <4 x i32> undef, i32 %splat.input.ext, i32 0
  %broadcast.splat32 = shufflevector <4 x i32> %broadcast.splatinsert31, <4 x i32> %broadcast.splat, <4 x i32> <i32 0, i32 0, i32 0, i32 0>
NickGuy updated this revision to Diff 316376.Jan 13 2021, 6:00 AM
NickGuy marked 3 inline comments as done.

Addressed comments; modifying checks and updating tests.

dmgreen accepted this revision.Jan 13 2021, 6:05 AM

Thanks for the changes. LGTM

This revision is now accepted and ready to land.Jan 13 2021, 6:05 AM
This revision was automatically updated to reflect the committed changes.

This broke compilation of some sources, reproducible like this:

$ cat repro.c 
b(long long c) {
  if (c & 5)
    return c;
d(c, e) {
  int f;
  int *g = a, *h = d;
  f = 0;
  for (; f < c; f++)
    g[f] = b((long long)h[f] * e);
$ clang -target aarch64-linux-gnu -c repro.c -O2
fatal error: error in backend: Cannot select: t85: v2i32 = AArch64ISD::DUP t15
  t15: i64,ch = CopyFromReg t0, Register:i64 %1
    t14: i64 = Register %1

This broke compilation of some sources...

Thanks for reporting this, and thanks for handling the revert on my behalf. I had hoped to be able to fix it today (avoiding said revert), but it looks to be bigger than I had initially thought.