This is an archive of the discontinued LLVM Phabricator instance.

[ARM][MVE] Optimise offset addresses of gathers/scatters
ClosedPublic

Authored by anwel on Mar 24 2020, 2:49 AM.

Details

Summary

This patch adds an analysis of the offset addresses used by gathers and scatters to the MVEGatherScatterLowering pass to find multiplications and additions that are loop invariant and thus can be moved into the loop preheader, avoiding to execute them each time.

An add where a constant or a loop invariant value is added to the phi value can be added to the start value of the phi and thereby removed from the loop completely.
A mul of the phi value and a constant/loop invariant value can be transformed into an add inside the loop, each iteration adding the product of the loop increment and the loop invariant operand of the mul to the phi value.
If there are other users of the phi, it will be duplicated.

Diff Detail

Event Timeline

anwel created this revision.Mar 24 2020, 2:49 AM
dmgreen added inline comments.Mar 24 2020, 8:09 AM
llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp
93

Is Increment used yet? Or is that needed in a later patch.

476

order -> Order

Do you know why the ordering matters? Is it reliably better or just happens to turn out that way?

518–531

Can this be getIntN(Phi->getType()->getScalarSizeInBits(), Product) ?

609

Needn't have brackets around single statements.

It might be worth simplifying this initially to only deal with a single loop; the loop that the gather is in. We are only then dealing with:

Loop:
  phi
  adds/muls
  gep
  gather
  increment phi, c

(With some other instructions in there too). And we are trying to remove the adds/muls by adjusting the value of all the users. Phi's are in the header, like you have here, but we needn't look into any other levels of loops.

If we don't have a loop we can just return early, I think? I don't think this is needed if we are not in a loop anyway, we won't be moving anything out of a loop.

653–654

I think Offs->getParent() wouldn't necessarily be the incoming block? Maybe use L->getLatch() instead? (and if it doesn't have one, bail out). We are not necessarily in loop simple form, so be careful about the loop looking weird, but we don't necessarily need to handle those loops.

655

Should we be checking that IncInstruction is an Add and that the other operand is the phi? That will be true for most loops, but needn't be in general.

llvm/test/CodeGen/Thumb2/mve-gather-optimisation.ll
315–316 ↗(On Diff #252251)

What's going on with these two instructions?

And why does something called mat_mult have two gathers in the inner loop? (It's good to have a test like that, I was just expecting one of the gathers to be a normal load)

413 ↗(On Diff #252251)

FYI, q31 means "int", so maybe change this to "q15" for a short?

anwel updated this revision to Diff 252589.Mar 25 2020, 8:46 AM
anwel marked 9 inline comments as done.
anwel added inline comments.
llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp
93

It is not used for anything other than 0 yet, but will be used in the follow-up patch to this. Slipped through when I created this diff.

476

This is based on the observation that in my tests, this order the phi will not cause additional movs to be generated. I am assuming that this is a standard order of incoming blocks that is expected by llvm.

518–531

I assume so, but wasn't aware this function exists - thanks for the hint!

655

Yes, we really should. I'll make sure to add that check.

llvm/test/CodeGen/Thumb2/mve-gather-optimisation.ll
413 ↗(On Diff #252251)

Oops, yes, thanks for pointing that out

dmgreen added inline comments.Mar 26 2020, 6:59 AM
llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp
547

Perhaps call this hasAllGatScatUsers ?

594

std::vector -> SmallPtrSet ? Then (so long as you are not relying on the order of the PhiNodes), you can use PhiNodes.count(...) instead of the find's below.

It could also perhaps be switched around so that instead of getting this list of PHI's, we test the phi below to check if it is in the header of the current loop, which I think should end up as the same thing.

596

I think it's best for this not to look at recursive loops. At least, it makes things more complex to make sure we are getting it right, and I'm not sure it's worth it most of the time. Maybe we add it in again later, but exclude that from this patch?

Can you add a check for (L != null) -> return false; early up here too. Then replace uses of LI->getLoopFor(BB) below with L.

607

Is this insertion point ever used?

608

I feel this should be Offs->getDebugLoc() maybe?

612–613

Is is possible to remove the duplication in finding the phi's? Maybe check

if (!phi Offs->getOperand(0) && !phi Offs->getOperand(1))
  recurse
if (phi Offs->getOperand(0))
  etc...
else if (phi Offs->getOperand(1))
  etc...
647

You can drop some brackets around these. Maybe make a small lambda to common out the similar code too?

674

What is this checking for? Should it be checking that it is a constant?

757

Can we make sure there is a scatter test too? They work the same as gathers, but we should make sure we have at least one.

And maybe a gather + scatter test with the same offsets? That sounds like it would be worth testing.

anwel updated this revision to Diff 253120.Mar 27 2020, 7:53 AM
anwel marked 15 inline comments as done.

Added scatter tests, and some code changes (see comments).

llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp
547

Can do. I also added a comment to make it a bit more clear what this function is supposed to do.

594

I went with your suggestion of getting rid of the list of phis, this cleans up the code a lot.

607

It is; but only in the pushOutMul function it is handed over to. So I will at least move it a bit further down to make that more obvious, and avoid building it if we don't see a mul.

608

Yes, that's more accurate for sure.

612–613

Has been dealt with by getting rid of the phi node list entirely.

674

Writing that I was thinking that there might be stuff other than constants that is not an instruction and thus not bound to a specific block, so we could push out in those cases, too. As I'm still not sure what this 'stuff' would be and whether it would actually be safe, I'll indeed change it to a check for Constant.

757

That's true, tests will be added.

dmgreen added inline comments.Mar 30 2020, 1:13 AM
llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp
507

Is this code necessary, or can it be left to be optimised by something else? Does the call to SimplifyInstructions already do this constant folding?

Maybe if we called SimplifyInstructions on the preheader too?

511

uint64_t would probably be better than int

535–536

Is the .getPrevNode() needed?

557–558

I think the User here will always be an instruction.

559–573

Should this code be returning gatscat when the first gather is found? Or should it be making sure there are no other uses, that are not gathers/scatter?

Would this code be better as

for (User *U : I->users()) {
  if (OpCode == Instruction::Add || OpCode == Instruction::Mul) {
    if (!hasAllGatScatUsers(cast<Instruction>(U))
      return false;
  else if (!isa<GetElementPtrInst>(U) && !isagather...)
    return false;
561

This assumes an ordering to the intrinsic instructions? Alphabetical it would seem?

642

(Op->getOperand(0) == Phi) => Op->getOperand(0) == Phi

657

Can you rename SecondOperand to something more descriptive. Maybe OffsSecondOperand? To make it clear where it is coming from. SecondOp too.

666

Should we check that IncInstruction->getNumUses == 1 too?

anwel updated this revision to Diff 253820.Mar 31 2020, 2:47 AM
anwel marked 20 inline comments as done.

Implemented some changes suggested in comments (see below for details)

llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp
507

SimplifyInstructions seems to do the trick, however it would have to be called after each pushing out of a mul to ensure that further adds/muls in the loop can be pushed out, too (as we only allow pushing out if the value that's incremented by each iteration is a constant).
For now I will leave this part in but add a FIXME comment so this matter isn't forgotten and the decision can be revisited later.

535–536

In this version of the patch it apparently isn't.

557–558

Most probably. I'd still prefer to make sure. However, if it should ever not be an instruction, we are on 'foreign' terrain, so we should just return false in that case.

559–573

Yes, this function should return false as soon as it hits the first user that is neither a gep, nor a gather/scatter, nor being used solely by those kinds of instructions.

561

Yes it does assume the order as currently given in IntrinsicsArm.h. Is this order subject to frequent change and thus unreliable?
The alternative to this would be to check the 12 kinds of gathers and scatters individually - or can you suggest another way around that?

657

Can do.

666

Yes, we should probably cover that case. If the phi has more than 2 users, we're already making a copy of the incrementing instruction, but I added a check and respective action to the other case, too.

dmgreen added inline comments.Apr 1 2020, 3:01 AM
llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp
507

Ah yeah. I see. It might be worth changing that then, to allow instructions so long as they are outside the loop. It looks like it's something that might come up anyway from some mul's when they are being pulled out of the loop.

561

Hmm. It makes an assumption that I'm not sure will be necessarily kept in the future, even if it is true now. I see this same pattern used in one other case but that is to check all aarch64 intrinsics, not a narrow range like this.

Being explicit is probably best. Perhaps make a new "IsMVEGatherScatterIntrinsic" function?

666

Copying it might be dangerous, because we end up with more code, more code than we end up saving. It may be better to just bail in those situations. I'm not sure. We'll have to see if either way becomes a problem in practice.

669

It might be better that instead we check that IncrementPerRound is outside the loop, even if it is an instruction, then as far as I can see it should be OK. I was looking at one of the test cases, arm_mat_mult_q31, and it still has more add's in the loop than I expected to see.

I think when we sink Mul's this operand might have become an instruction, so is probably something we should be dealing with.

Also worth adding a testcase where this is not true, if it doesn't exist already.

anwel updated this revision to Diff 255333.Apr 6 2020, 7:36 AM
anwel marked 4 inline comments as done.
anwel marked 2 inline comments as done.
anwel added inline comments.
llvm/lib/Target/ARM/MVEGatherScatterLowering.cpp
507

Okay. I have adapted that strategy and this indeed gets rid of more instructions (we are able to push out one more add in the mat_mult test below).

561

Hm, alright I have made it explicit, just to be sure.

669

Is now being dealt with, and indeed we can this way get rid of one more add in the arm_mat_mult_q31 test case (and some others).
As for test cases that make sure this only works for instructions not originating from the loop, arm_mat_mult_q31 already is a good example - the add that adds the result of the multiplication of the gather results to the overall sum could be pushed out, too, if the mul instruction wasn't located in the loop.

dmgreen accepted this revision.Apr 7 2020, 12:43 AM

LGTM. Thanks.

This revision is now accepted and ready to land.Apr 7 2020, 12:43 AM
This revision was automatically updated to reflect the committed changes.
anwel marked an inline comment as done.