This is an archive of the discontinued LLVM Phabricator instance.

[LoopReroll] Rewrite induction variable rewriting.
ClosedPublic

Authored by efriedma on Apr 2 2018, 4:33 PM.

Details

Summary

This gets rid of a bunch of weird special cases; instead of trying to understand the structure of the induction variable, just use SCEV expansion for everything. In addition to being simpler, this fixes a bug where we would use the wrong stride in certain edge cases; see the new test pointer_bitcast_baseinst. (The bug might be a regression from D26529 ? Not sure.)

The one bit I'm not quite sure about is the trip count handling, specifically the FIXME about overflow. In general, I think we need to widen the exit condition, but that's probably not profitable if the new type isn't legal, so we probably need a check somewhere. That said, I don't think this patch makes the existing problem any worse.

As a followup to this, a bunch of other IV-related could be cleaned up and generalized, since the rewriting can handle more cases.

Diff Detail

Repository
rL LLVM

Event Timeline

efriedma created this revision.Apr 2 2018, 4:33 PM
fhahn added a subscriber: fhahn.Apr 2 2018, 10:07 PM
fhahn added a comment.Apr 24 2018, 7:38 AM

I had a high-level look and left some minor comments inline. With respect to the overflow, the original code seems to have the same problem, so this patch does not make things worse.

IIUC the induction variables must be wide enough to represent the scaled number of iterations in the original code, but an overflow can happen because we use EQ with the scaled trip count and the new induction variables start at 0. (e.g. if we have something like for (int64_t i = -100; i < INT64_MAX; i += 10)). I am probably missing something, but maybe we could avoid the overflow by doing the re-writing of the induction variables/exit conditions closer to the original range for the induction variables?

lib/Transforms/Scalar/LoopRerollPass.cpp
397 ↗(On Diff #140701)

Stale doxygen comment. Could you document LIBETC?

1409 ↗(On Diff #140701)

Typo, Copute -> Compute

test/Transforms/LoopReroll/basic.ll
86 ↗(On Diff #140701)

I think we should also check the instruction producing %0, which should truncate %indvar in this and other changed tests.

javed.absar added inline comments.Apr 24 2018, 9:30 AM
lib/Transforms/Scalar/LoopRerollPass.cpp
1415 ↗(On Diff #140701)

Should probably use dyn_cast if we cant be 100% it will be SCEVAddRecExpr.

1618 ↗(On Diff #140701)

Can LIBETC be renamed to something more meaningful.
Is LIBETC = Loop ??? back edge taken count?

I am probably missing something, but maybe we could avoid the overflow by doing the re-writing of the induction variables/exit conditions closer to the original range for the induction variables?

The fundamental problem is that the rerolled loop could have a trip count that doesn't fit into a single register, even if the original loop's trip count does fit (e.g. on a 32-bit target, the original loop has a trip count 2^31, we reroll by a factor of four, and the new loop has a trip count of 2^33). So handling overflow correctly in general requires some extra code in the rerolled loop.

In some cases, we could probably prove the trip count can't actually overflow. And in the cases where it could overflow, there are a few ways to handle it: we could emit an induction variable with an illegal type, or we could emit a nested loop. I'm not sure what the best approach is.

lib/Transforms/Scalar/LoopRerollPass.cpp
1415 ↗(On Diff #140701)

It will always be a SCEVAddRecExpr; we check earlier. (This is a fundamental part of proving that the rerolled loop is equivalent to the original loop.)

fhahn added a comment.Apr 24 2018, 2:04 PM

I am probably missing something, but maybe we could avoid the overflow by doing the re-writing of the induction variables/exit conditions closer to the original range for the induction variables?

The fundamental problem is that the rerolled loop could have a trip count that doesn't fit into a single register, even if the original loop's trip count does fit (e.g. on a 32-bit target, the original loop has a trip count 2^31, we reroll by a factor of four, and the new loop has a trip count of 2^33). So handling overflow correctly in general requires some extra code in the rerolled loop.

Ah okay. After a very brief look I thought we only support loops like for ( i = start; i < end; i += scale) for which trip count should be something like (end - start) / scale and the original i should fit into (start, end]. In that case, the induction variable in the re-rolled loop could fit into (start, end] too.

efriedma updated this revision to Diff 144259.Apr 26 2018, 6:44 PM

Address review comments

fhahn accepted this revision.Apr 30 2018, 1:36 AM

Thanks Eli, LGTM. As you said, I think the patch does not make things worse with respect to the FIXME you added and nicely simplifies the induction variable rewriting. It is probably best to wait a few days with committing though, in case James or @hfinkel have any additional thoughts.

This revision is now accepted and ready to land.Apr 30 2018, 1:36 AM
This revision was automatically updated to reflect the committed changes.