This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE]Make better use of gather/scatter when inside a loop body
Needs ReviewPublic

Authored by CarolineConcatto on Mar 3 2022, 5:18 AM.

Details

Summary

In the loopvectorizer it hoists out from the loop body some variables
to optimize the loop body. When the loop has a gather/scatter
it hoists out parts of the index, as the step_vector.

However, with gather/scatter when parts of the index (step_vector and index)
are hoisted out from the loop, the compiler cannot check if the index could be
folded in 32 bits (in findMoreOptimalIndexType).

This patch pulls stepvector and the index to be inside the loop body, so we
can check if the compiler can fold the gather/scatter.

Diff Detail

Event Timeline

CarolineConcatto requested review of this revision.Mar 3 2022, 5:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 5:18 AM
sdesmalen added inline comments.Mar 3 2022, 6:35 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12240

You may want to limit this recursion a bit further to some threshold, to avoid it becoming too expensive.

Maybe you can rewrite this to a loop, like this:

unsigned Threshold = 3;  // I'd expect 3 or 4 to be sufficient for most cases.
for (const User *U : I->users()) {
  for (unsigned J=0; J<Threshold; ++J) {
    if (isa<GetElementPtrInst>(U))
      return true;
    if (isa<PHINode>(U) || !U->hasOneUse())
      break;
    U = cast<Instruction>(U)->getUser();
  }
}
return false;
12301

This doesn't need to be limited to the second operand. You can just combine it with the case for splat.

12308–12310

I'm a bit confused about what this is doing, it seems like this is adding:

I->getOperand(i)->getOperandUse(0);
I->getOperandUse(i);

is that correct?

I notice it also hasn't checked the other operand yet before it returns true.

What I think you want to do is:

  • First check if I is used by a GEP in some way.
  • Then iterate through each of the operands, adding it to Ops if it's either a splat/stepvector.
  • If any splats/stepvectors have been found, return.
  • use a threshold for isInstUsedByGEP to reduce future costs.
  • add test for the threshold feature.
  • pull step and splat only when being used by a ge.p
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12301

Step vector will always be the second operand, because the work you did in
https://reviews.llvm.org/D118459

12308–12310

If it is a splat suffle it needs to get the shutter and the insert.
I->getOperand(i)->getOperandUse(0); -> insert element in position 0
I->getOperandUse(i); -> is the suffle of the insert
If we only push the shuffle, the optimisation does not see it as a splat further in the line.

I believe now it checks all operands that need to be checked.

david-arm added inline comments.Mar 7 2022, 6:48 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12305

I think you might be able to simplify this a little with this:

auto OpI = I->getOperand(i);
if (isSplatShuffle(OpI))
  Ops.push_back(&cast<Instruction>(OpI)->getOperandUse(0));
Ops.push_back(&I->getOperandUse(i));

Hi @CarolineConcatto, I just have a few more minor comments!

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12240

nit: Just a minor comment, but perhaps it's simpler to explain this as something like:

// Use a threshold to limit the amount of recursion.
12245

nit: Perhaps these can be combined into a single statement, i.e.

if (isa<PHINode>(UI)) || !UI->hasOneUse())
  return false;

to reduce lines of code?

12249

I think it's more common to use

if (isa<GetElementPtrInst>(UI))

here.

sdesmalen added inline comments.Mar 9 2022, 2:31 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12243

Because you iterate all uses here, it will look at all uses for the first entry to this function, and later on only check a single use (because of the if (!UI->hasOneUse()) return false;). I don't think that's your intention.

12301

In D118459 it canonicalizes binop(splat(x), step_vector) -> binop(step_vector, splat(x)), so that we can recognize binop(step_vector, splat(x)) directly.

At this point you'll want to push down the stepvector intrinsic regardless of which operand it is used in, because if it happens to be used in operand 0, we want to recognize that case as well. If it's in operand 1 and you push it down, SelectionDAG will swap the operands.

12301–12305

This seems to be doing things the wrong way around. It first checks the operands 0 and 1 individually, and then goes into a loop to go over all the operands to check them again.

I'm thinking more of something like this:

for (unsigned J = 0; J < I->getNumOperands(); ++J) {
  Use &U = I->getOperandUse(J);
  if (isSplatShuffle(U.get())) {
    Ops.push_back(&cast<Instruction>(U.get())->getOperandUse(0));
    Ops.push_back(&U);
  } else if (auto *II = dyn_cast<IntrinsicInst>(U.get()))
    if (II->getIntrinsicID() == Intrinsic::experimental_stepvector)
      Ops.push_back(&U);
}
12312

If it did not find anything to sink based on the above criteria, the code below may still be applicable. So maybe you can query the Ops.size() before/after this loop, and only return true if it is different (because you've added new operands).

  • Earlier return in isInstUsedByGEP when I has more than 1 use
  • Update for loop in shouldSinkOperands to check all operands
CarolineConcatto marked 4 inline comments as done.Mar 10 2022, 7:22 AM
CarolineConcatto added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12305

I changed a bit the code, as asked by Sander.
So now I cannot do as you suggested David

Matt added a subscriber: Matt.Mar 10 2022, 9:33 AM

Removed unnecessary for loop from isInstUsedByGEP

tschuett added a comment.EditedMar 11 2022, 6:02 AM

Removed unnecessary for loop from isInstUsedByGEP

It sounds more like Optional<bool>, if the threshold was exceeded. i.e. is false sound?