This is an archive of the discontinued LLVM Phabricator instance.

[LoopPredication] Allow predication of loop invariant computations
ClosedPublic

Authored by reames on Apr 1 2019, 1:15 PM.

Details

Summary

The purpose of this patch is to eliminate a pass ordering dependence between LoopPredication and LICM. To understand the purpose, consider the following snippet of code inside some loop 'L' with IV 'i'
A = _a.length;
guard (i < A)
a = _a[i]
B = _b.length;
guard (i < B);
b = _b[i];
...
Z = _z.length;
guard (i < Z)
z = _z[i]
accum += a + b + ... + z;

Today, we need LICM to hoist the length loads, LoopPredication to make the guards loop invariant, and TrivialUnswitch to eliminate the loop invariant guard to establish must execute for the next length load. Today, if we can't prove speculation safety, we'd have to iterate these three passes 26 times to reduce this example down to the minimal form.

Using the fact that the array lengths are known to be invariant, we can short circuit this iteration. By forming the loop invariant form of all the guards at once, we remove the need for LoopPredication from the iterative cycle. At the moment, we'd still have to iterate LICM and TrivialUnswitch; we'll leave that part for later.

There are two high level design questions on this patch I'd like input on:

  1. I realized I'm essentially duplicating most of the loop disposition logic from SCEV, but expressed over IR. The critical missing piece is that SCEV does not consider a SCEVUknown for an invariant load w/invariant operands to be invariant. Would it make sense to implement this in SCEV?
  2. Anyone see a better way to handle the adjustment of the insertion point? I want to be inserting in the preheader if possible, and only falling back to in the loop if required. I thought about inserting in the loop and then hoisting, but we may have expressions we can't prove the safety of in IR which SCEV has already established. Wasting that opportunity is undesirable, but mucking around with the insertion point is also quite ugly. Ideas?

Diff Detail

Event Timeline

reames created this revision.Apr 1 2019, 1:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2019, 1:15 PM
  1. Since we need to fiddle with the insertion point I think we should stop using single IRBuilder which we pass around. It looks like creating a local IRBuilder is a cheap operation, so we can have local IRBuilders when we need to generate instructions. This way the insertion point will be clear from the local context.
  1. Invariant expression analysis over IR is a bit non-trivial. It looks like extending SCEV would be a simpler option. Alternatively, we can build up the complexity of the IR analysis incrementally as I suggest in the comment below.
lib/Transforms/Scalar/LoopPredication.cpp
401–409

Nit. Extract SE->isLoopInvariant(LHS, L) && SE->isLoopInvariant(RHS, L) into a local variable? You check it below as well.

413

You can use Preheader cached in the field.

473

Debug printing.

966–977

The complexity to handle cases when the expression is a non-side effecting operation with invariant inputs deserves a test.

Alternatively, you can introduce this complexity as a follow up. In the initial patch you can handle only loads, which should cover most of the cases.

reames planned changes to this revision.Apr 5 2019, 4:19 PM
  1. Invariant expression analysis over IR is a bit non-trivial. It looks like extending SCEV would be a simpler option. Alternatively, we can build up the complexity of the IR analysis incrementally as I suggest in the comment below.

I started prototyping the SCEV change, and found one downside. Because SCEV doesn't have access to AA, I can't use the pointsToConstantMemory as I do with the current scheme. I think this is a fairly minor impact since loads from constant memory which can't be proven trivial dereferenceable should be rare, but I'd like a second opinion on that.

(Also, marking as plan changes so that this shows up in my queue of things to do)

hfinkel added a subscriber: hfinkel.Apr 7 2019, 6:57 AM

Because SCEV doesn't have access to AA,

We should change that if useful.

reames updated this revision to Diff 195189.Apr 15 2019, 8:25 AM

Remove the invariant expression complexity, and the SCEV loop disposition case. I'm still hoping to sink that into SCEV, but handling the simple case is straight forward and I'd like to get at least that in. The SCEV patch turns out to be a bit more complex than I'd originally expected, so I need to take a step back and revaluate that approach.

Still working on the builder usage. Likely to update again, but I wanted to keep the snapshot of the progress.

  1. Since we need to fiddle with the insertion point I think we should stop using single IRBuilder which we pass around. It looks like creating a local IRBuilder is a cheap operation, so we can have local IRBuilders when we need to generate instructions. This way the insertion point will be clear from the local context.

This is handled in a separate review ( D60718 ). Once that's landed, I'll rebase.

reames planned changes to this revision.Apr 15 2019, 9:34 AM
reames updated this revision to Diff 195243.Apr 15 2019, 1:43 PM

Update after rebasing over previous changes. This is a fairly major rework though, so please review from scratch.

Key thing is that I stumbled across a bug in the previous implementation. I've committed the test to highlight it as neg_invariant_latch_limit. The problem is that SCEV turns out not to be correctly handling dominance within a single block within isSafeToSpeculateAt. The fix for that is included, along with just enough special casing to keep all of the existing tests passing.

Secondly, I realized that we were mixing two different concepts and that's what made the bug possible. Given that, I restructured the change to check invariance separately from insertion safety. This relies on the previous patch which made it clear that insertion at the guard is always valid, and that hoisting out of the loop is purely an optimization. Interestingly, splitting in this manner actually makes the transform *more* powerful, not less. See the (correct) diffs around udiv handling.

apilipenko accepted this revision.Apr 17 2019, 4:32 AM

Looks good overall, a couple of comments inline.

lib/Transforms/Scalar/LoopPredication.cpp
452

Am I right that this is not specific to the change you make and just a fix for an existing bug? If so, can be integrated separately with a test demonstrating the problem.

498–500

Can we assert isSafeToExpandAt for the values we don't check?

498–517

These changes look correct to me, but there are actually a couple of NFC refactorings (replacing CanExpand(RHS) with CanExpand(LatchStart), inlining CanExpand, removing isSafeToExpandAt checks for GuardStart, GuardLimit) mixed with a functional change to use isLoopInvariantValue instead of SE->isLoopInvariant. You might want to split these when integrating to make bisecting easier.

965

Debug printing.

This revision is now accepted and ready to land.Apr 17 2019, 4:32 AM
reames updated this revision to Diff 195756.Apr 18 2019, 8:47 AM
reames marked 4 inline comments as done.
reames added inline comments.
lib/Transforms/Scalar/LoopPredication.cpp
452

You are correct, and I'd completely missed that when writing. Will do!

498–500

No we can't. That's the entire point of not checking.

The basic structure here is that isSafeToExpand guarantees expansion is safe, but a false return value can still be safe to expand. In this case, we use information about dominance of the original conditions in the IR to ensure safety of these two.

498–517

I thought about it, but explaining the NFC part without context seemed challenging. So I left them together.

965

Oops, good catch, thanks.

reames marked an inline comment as done.Apr 18 2019, 8:51 AM
reames added inline comments.
lib/Transforms/Scalar/LoopPredication.cpp
452

Actually, no. On further reflection, this is a silent bug without the rest of this change. All of the operands to the expressions we call this on have previously been checked via isSafeToSpeculate. It's only with this change that we weaken that precondition and expose the latent issue. Given that, I'm not going to submit this separately.

This revision was automatically updated to reflect the committed changes.

Immediately after submitting this, I noticed an important piece had gotten lost somewhere in the rebasing. rL358688 fixes an obvious bug with the invariant_load casing.