Page MenuHomePhabricator

New pass: inductive range check elimination
ClosedPublic

Authored by sanjoy on Dec 16 2014, 2:15 PM.

Details

Summary

The inductive range check elimination pass eliminates range checks from within loops by splitting the iteration space into three segments in a way that the range check is fully redundant in one of the segments. As an example, it will convert

len = < known positive >
for (i = 0; i < n; i++) {
  if (0 <= i && i < len) {
    do_something();
  } else {
    throw_out_of_bounds();
  }
}

to

len = < known positive >
limit = smin(n, len)
// no first segment
for (i = 0; i < limit; i++) {
  if (0 <= i && i < len) { // this check is fully redundant
    do_something();
  } else {
    throw_out_of_bounds();
  }
}
for (i = limit; i < n; i++) {
  if (0 <= i && i < len) {
    do_something();
  } else {
    throw_out_of_bounds();
  }
}

This is very close to the now removed LoopIndexSplit pass in spirit.

I believe the pass is correct (i.e. running it should not change the meaning of the program) but there still remains a considerable amount of work left to make it actually effective. I'd like to do as much of that work as possible in-tree.

Diff Detail

Repository
rL LLVM

Event Timeline

sanjoy updated this revision to Diff 17362.Dec 16 2014, 2:15 PM
sanjoy retitled this revision from to New pass: inductive range check elimination.
sanjoy updated this object.
sanjoy edited the test plan for this revision. (Show Details)
sanjoy added reviewers: reames, atrick, hfinkel.
sanjoy added a subscriber: Unknown Object (MLST).
hfinkel edited edge metadata.Dec 23 2014, 5:41 PM

Generally speaking, I'm fine with you working on this in-tree once we've reviewed this code.

We'll need to decide, at some later point, what kind of heuristics and/or target costs we'd like to use, and whether we want this kind of transformation early or late in the optimization pipeline.

Please run this through clang-format (or at least fix the 80-cols issues) too.

lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
12 ↗(On Diff #17362)

Please add an example (such as the one from the patch summary) here.

487 ↗(On Diff #17362)

Please break out this function into several smaller ones -- there are a lot of 'scoping blocks' here, and it is difficult to read.

reames edited edge metadata.Dec 29 2014, 5:22 PM

As with Hal, I'm happy to see this evolve in tree, provided that checking it in doesn't break any existing use cases and that you're going to continue to evolve this in the near term. Given I happen to know the later is true, LGTM. Feel free to address comments in separate submissions. (If you choose that option, document that's what you're planning on doing in the submission comment.)

lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
85 ↗(On Diff #17362)

It feels like range is a first class concept here. Maybe extract this out? Probably only file local at the moment, but separate from IRC.

158 ↗(On Diff #17362)

This structure of checks seems very fragile. In particular, I would expect things like flatten cfg to create patterns this misses. Also, that "sink load along path with uses" optimization we were talking to enable vectorization would also break this.

Putting a TODO here for the moment is fine, but I suspect we'll want something more robust quickly.

164 ↗(On Diff #17362)

These lambdas looks very similar to code inside InstCombine. Could they be extracted to some common place?

200 ↗(On Diff #17362)

This doesn't look right. There's nothing necessarily involving the induction variable here is there?

Actually, there's a broader problem in this entire function. There's nothing that requires IndexV to actually be the induction variable of the loop.

p.s. The upper range check with non-negative index should apply to the first case first. You should factor this code.

222 ↗(On Diff #17362)

The function says 'get', but the comment says 'make'?

237 ↗(On Diff #17362)

This seems restrictive (i.e. only loops in simplified form). At minimum, add a TODO.

250 ↗(On Diff #17362)

This would be clearer as if (!x || !y || !z)

253 ↗(On Diff #17362)

This looks like a constructor begging to be born.

262 ↗(On Diff #17362)

This feels like it should be on IRBuilder.

At minimum, you should use IRBuilder so that these eagerly collapse.

445 ↗(On Diff #17362)

This seems very very limiting. In particular, it would seem to exclude allmost all interesting cases. Is this intentional?

Just to make sure I have my terminology right: this is checking for a conditional branch which either goes to the header or exits the loop right?

451 ↗(On Diff #17362)

I didn't bother to read past here in this function. This code needs broken up before it's reviewable.

774 ↗(On Diff #17362)

You should use IRBuilder here so that these are eagerly simplified in trivial cases.

786 ↗(On Diff #17362)

Is there any guarantee that R2 has a value?

788 ↗(On Diff #17362)

Given you're computing a recursive tree of mins, it might be worth talking about whether we can either a) balance the tree, or b) easy to optimize representation. I worry about the depth of these select trees.

796 ↗(On Diff #17362)

The fact these are non-owning pointers isn't particularly clear. It might be better to structure this vector as containing IRCs not IRC*s.

799 ↗(On Diff #17362)

Add a comment along the lines of: //identify any range checks we can handle.

This might be a convenient helper function too.

816 ↗(On Diff #17362)

Your legality checks should be before you do any work.

826 ↗(On Diff #17362)

I think this is dead code?

830 ↗(On Diff #17362)

It feels odd to me that your doing this for all range checks at once. Is there an argument for why this is sufficient/desirable? If so, comment.

840 ↗(On Diff #17362)

When would this trigger?

test/Transforms/InductiveRangeCheckElimination/single.ll
3 ↗(On Diff #17362)

You need far more tests to claim any kind of reasonable coverage here.

In particular, trivial loop tests which check all of the corner-cases in your matching logic, both positive and negative.

sanjoy added a comment.Jan 7 2015, 2:59 PM

I've replied to some of the comments inline. I will upload a new version of the patch by the end of this week.

lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
200 ↗(On Diff #17362)

SplitRangeCheckCondition does not interpret Index as anything related to the loop's induction variable, that happens later. I'll add a comment making that invariant explicit.

222 ↗(On Diff #17362)

Fixed, thanks!

237 ↗(On Diff #17362)

This pass has AU.addRequiredID(LoopSimplifyID)

250 ↗(On Diff #17362)

Changed this to

bool IsAffineIndex =
  IndexAddRec && (IndexAddRec->getLoop() == L) && IndexAddRec->isAffine();
  
if (!IsAffineIndex)
  return nullptr;
253 ↗(On Diff #17362)

I like the current pattern better -- with a constructor it is easy to pass Length instead of Offset for example, since they're all llvm::Values. This pattern makes it clearer what is initialized with what.

262 ↗(On Diff #17362)

Fixed, thanks!

445 ↗(On Diff #17362)

I think loop rotation tries to canonicalize loops into this form (i.e. the loop's latch is a conditional exit). Without this, it is difficult to split the loop's iteration space with low overhead -- there will need to be an extra check in the previously unconditional jump back to the loop's header from the latch which may or may not combine with checks in previous loop exits.

451 ↗(On Diff #17362)

Agreed, this will be fixed in the next patch I upload for review.

774 ↗(On Diff #17362)

Agreed, will do.

786 ↗(On Diff #17362)

R2 is not an Optional<>

788 ↗(On Diff #17362)

Agreed, but that is not the lowest hanging fruit at the moment. :)

799 ↗(On Diff #17362)

Will do.

816 ↗(On Diff #17362)

Right, this is an obvious candidate for an early check.

826 ↗(On Diff #17362)

No, RangeChecksToEliminate is used later, also under #ifndef NDEBUG

830 ↗(On Diff #17362)

I think if I do it for one array index at a time, I will end up with N pre / post loops. Do you have something else in mind?

840 ↗(On Diff #17362)

When we could not solve *any* of the range checks. e.g

for (i = 0 to N) {
  a[i * i] = ..
}
sanjoy updated this revision to Diff 17971.Jan 9 2015, 11:34 PM
sanjoy edited edge metadata.

Major changes in this version:

  1. I broke up ConstrainLoopRange into a bunch of functions, grouped together as the LoopConstrainer class.
  2. this new version now no longer tries to preserve any analyses. That will be a separate improvement later on.
  3. the range check detection logic is more general.

The test coverage is still pretty low, but I plan on improving this once this is in tree.

As Hal and I said previously, LGTM so that you can continue to work on this incrementally in tree. Each change should be reviewed incrementally, and when you ready to insert this in the standard pass order, we'll need to do another holistic review.

lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
16 ↗(On Diff #17971)

I think your example would be clearer with the full generality of three loops. No strong preference though.

697 ↗(On Diff #17971)

Is there a utility function somewhere you could use for this? Or can you extract one?

749 ↗(On Diff #17971)

FYI, as a name rewriteIterationRange doesn't tell me that much.

852 ↗(On Diff #17971)

The fact you have these functions at all seems suspicious. I suspect you could get quite far by using the utilities in BasicBlockUtils.

e.g.
NewPreheader = SplitEdge(OriginalPreheader, OriginalHeader)

895 ↗(On Diff #17971)

It really feels like you're mixing the flow here. I don't have a concrete suggestion, but you might try inserting a legal loop in its entirety, then rewriting it to be restricted to the proper range. Alternatively, rewrite the free standing loop in it's entirety, then insert. Mixing things is confusing.

920 ↗(On Diff #17971)

Helper function?

934 ↗(On Diff #17971)

Is this valid if the loop is not created?

p.s. Your LoopClone *really* looks like LoopInfo.

1055 ↗(On Diff #17971)

I'm really not a fan of MethodObject pattern. It's better than what you had, but please don't keep this long term. :)

1069 ↗(On Diff #17971)

As we talked about offline, I really think that unconditionally rewriting the branches based on inferred knowledge is the right approach. You can emit a runtime check for diagnostics/debugging if desired, but the core behaviour of the pass should not change between debug and non-debug builds. You also don't want to tightly couple this pass with the effectiveness of other parts of the optimizer.

This is the one *must fix* I have. Even here, a follow up change is fine.

test/Transforms/InductiveRangeCheckElimination/multiple-access-no-preloop.ll
3 ↗(On Diff #17971)

Mild personal preference: I'd name the directory IRCE to avoid something so long.

12 ↗(On Diff #17971)

Intermixing checks with the source IR is somewhat confusing with this radical a transform.

test/Transforms/InductiveRangeCheckElimination/single-access-no-preloop.ll
3 ↗(On Diff #17971)

It looks like all of these tests could be in a single file?

p.s. You *need* more tests. While I'm okay with this landing without them so that you can work incrementally, every follow up change will need a motivating test. You current coverage is minimal at best. Ideas: loops which don't match, conditions you can't handle, a starter loop which isn't in loop-simplify form...

Actually, I may have accidentally misrepresented Hal's earlier comment.
When I reread, it wasn't a clear LGTM. Please do not submit until Hal
has okayed the current state as well.

Philip

Actually, I may have accidentally misrepresented Hal's earlier comment.
When I reread, it wasn't a clear LGTM. Please do not submit until Hal
has okayed the current state as well.

I'm okay with this going in at this point. As has been discussed, it needs more test cases, better comments, etc., but those can be added as they're produced.

Philip

lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
652 ↗(On Diff #17971)

As a general comment, you really need some ASCII-art diagrams here illustrating what all these blocks are and how they tie together. The loop vectorizer has some of these, and while imperfect, they do help to explain what is going on.

Comments replied to inline. I'll address some of the easier ones before checking in.

lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
697 ↗(On Diff #17971)

Is there a utility function somewhere you could use for this?

I could not find any.

Or can you extract one?

I think that is good idea as long as I can find another user. I plan to loop at both LoopUnroll and LoopRotate for this in the future.

749 ↗(On Diff #17971)

I'll change it to something more descriptive before checking in.

852 ↗(On Diff #17971)

NewPreheader = SplitEdge(OriginalPreheader, OriginalHeader)

That's a great idea, I'll do that before checking in.

920 ↗(On Diff #17971)

Will do before checkin.

934 ↗(On Diff #17971)

Is this valid if the loop is not created?

Yes -- PreLoop.Blocks is empty in that case.

p.s. Your LoopClone *really* looks like LoopInfo.

The crucial difference is that there is a bunch of stuff LoopInfo computes that is statically cached in a LoopClone. This lets me use LoopClone while I'm transforming the IR. For instance, I don't think it is okay to call LoopInfo::getLoopLatch if the loop is not well-formed. I'll document this difference in a comment.

1069 ↗(On Diff #17971)

I agree -- the code as of now is basically to help debugging.

test/Transforms/InductiveRangeCheckElimination/multiple-access-no-preloop.ll
3 ↗(On Diff #17971)

Good idea, will do that before checkin.

test/Transforms/InductiveRangeCheckElimination/single-access-no-preloop.ll
3 ↗(On Diff #17971)

It looks like all of these tests could be in a single file?

These files are really categories of tests that I will fill in.

p.s. You *need* more tests.

Completely agree. In fact, I plan to add at least a few more tests before checking in.

This revision was automatically updated to reflect the committed changes.