This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Move mustprogress based no-self-wrap logic so it applies to all exit conditions
ClosedPublic

Authored by reames on Jun 9 2021, 2:48 PM.

Details

Summary

This change moves logic which we'd added specifically for less than tests so that it applies to equalities and greater than tests as well. The basic idea is that if we can show an IV cycles infinitely through the same series on self-wrap, and that the exit condition must be taken to prevent UB, we can conclude that it must be taken before self-wrap and thus infer said flag.

The motivation here is simple loops with unsigned induction variables w/non-one steps and inequality tests. A toy example would be:
for (unsigned i = 0; i != N; i += 2) { body; }

If body contains no side effects, and this is a mustprogress function, we can assume that this must be a finite loop and thus that the exit count is N/2.

Note that we canonicalize to NE tests in LFTR, so after the previous change we'd compute an exit count, perform LFTR, and then "forget" what the exit count was again.

A couple notes on things left out of this patch:

  • Looking back through multiple invertible functions, and anything other than ZExt. This is easy to add, but deserves it's own test coverage and review. I add Zext only so that I could delete the less than version of the same code.
  • Forming the IV as an addrec where possible. Doing this will (further) help our ability to handle extended ne tests, but will be done separately. (See the existing less than code for what I mean here - note that code will still be triggered if we prove no-self-wrap in this patch.)

Diff Detail

Event Timeline

reames created this revision.Jun 9 2021, 2:48 PM
reames requested review of this revision.Jun 9 2021, 2:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 9 2021, 2:48 PM
mkazantsev requested changes to this revision.Jun 9 2021, 9:32 PM

Please add tests for this this exit doesn't dominate the latch.

llvm/lib/Analysis/ScalarEvolution.cpp
8287

It is only true if this condition dominates the latch. If not, it might exit at some point during the iteration space, but it was guarded by a volatile condition that prevented it. If my understanding is correct, please specify this requirement explicitly in function's comment.

Maybe (I'm not sure) this still holds for non-dominating exits if it is the only exit from the loop.

8300

Please add early bail if IV is not affine. This will save compile time for complex addrecs.

8302

How about negative powers of 2? I think all your reasoning validly applies for them too.

8311

This should go before anything else to save compile time.

llvm/test/Analysis/ScalarEvolution/ne-overflow.ll
10

Please use ./utils/update_analyze_test_checks.py. It works well with SCEV.

This revision now requires changes to proceed.Jun 9 2021, 9:32 PM
reames added inline comments.Jun 11 2021, 10:12 AM
llvm/lib/Analysis/ScalarEvolution.cpp
8287

All exits being analyzed for exit counts must dominate the latch. See computeExitLimit.

I might be remembering this wrong, but I think you were even the person who added that code. :)

8300

I don't think it actually saves anything in this case. Both callers filter out non-affine ARs before calling this. I'll make it an assert though.

8302

I believe it does too. Incrementalism. And not terrible relevant for LT/EQ cases, it'll be more important when applied to GT.

8311

Er, I think you misread the code? It's at the earliest point it can legally be. It's proving no-self-wrap.

llvm/test/Analysis/ScalarEvolution/ne-overflow.ll
10

No actually, it doesn't. It just claims to. I've tried it with other test changes recently, and been very unpleasantly surprised. It mostly just results in missing check lines.

reames planned changes to this revision.Nov 17 2021, 2:49 PM

Next action item here is mine. (Rebase needed, tests can be landed, etc..)

reames updated this revision to Diff 388214.Nov 18 2021, 8:39 AM
reames retitled this revision from [SCEV] Extend mustprogress reasoning to ne exit tests to [SCEV] Move mustprogress based no-self-wrap logic so it applies to all exit conditions.
reames edited the summary of this revision. (Show Details)
reames added reviewers: fhahn, lebedev.ri.

This update reworks the patch, and hopefully makes it a lot more obviously correct. The patch is restructured to purely pull code up from howManyLessThans into the caller so that it handles all condition codes. This does reduce the scope slightly, but I've included planned extensions in the review description which should cover all cases.

nikic added inline comments.Nov 18 2021, 9:02 AM
llvm/lib/Analysis/ScalarEvolution.cpp
8335

Also check that the NW flag isn't already set?

11844

Maybe you want to extract canAsusmeNoSelfWrap as a static function and reuse? You repeat the same conditions in the new code, just without the comments.

reames added inline comments.Nov 18 2021, 9:09 AM
llvm/lib/Analysis/ScalarEvolution.cpp
8335

Oh, good catch.

11844

I'm hoping to delete it in a following commit. I replicated the explanation that seemed relevant, and the remaining use looks highly suspect. (I wrote the remaining use, and I'm not convinced by my own comments. At a minimum, I want to rework the comments.)

reames updated this revision to Diff 388235.Nov 18 2021, 9:25 AM

Address review comments

This revision was not accepted when it landed; it landed in state Needs Review.Nov 18 2021, 10:10 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.

Your right, that is unexpected. I don't really see anything in this code likely to be slow, do you see anything obvious? If not, I may need help testing a few variants to see what we see.

Any evidence on impact? Are we transforming those benchmarks more?

nikic added a comment.Nov 18 2021, 1:10 PM

Your right, that is unexpected. I don't really see anything in this code likely to be slow, do you see anything obvious? If not, I may need help testing a few variants to see what we see.

Any evidence on impact? Are we transforming those benchmarks more?

There is some impact, but it's very minor: https://llvm-compile-time-tracker.com/compare.php?from=7a14244cc645bbbcbf5056e7a00fadbb339e92ed&to=ea12c2cb9c4221095abfb2af7148140783040734&stat=size-text

I don't immediately see what could cause this.

I don't immediately see what could cause this.

Does 1a5666acb help? That's the only thing I could vaguely see causing this.

nikic added a comment.Nov 18 2021, 2:33 PM

I don't immediately see what could cause this.

Does 1a5666acb help? That's the only thing I could vaguely see causing this.

Nope, this didn't make any measurable difference.

Nope, this didn't make any measurable difference.

How about 734abbad7?

nikic added a comment.Nov 19 2021, 7:10 AM

Nope, this didn't make any measurable difference.

How about 734abbad7?

Also no difference. I suspect that we're seeing some kind of second order effect here, not an issue in the code itself.

Also no difference. I suspect that we're seeing some kind of second order effect here, not an issue in the code itself.

I agree and am going to revert the two speculative changes as they just complicate the code.

I don't really know what else to do here on the original regression. It sure seems like inferring flags is simply making some other piece of code slower. As things stand, I plan to leave the original code in tree as the regression is small, but I find that result unsatisfying.

Looking at callgrind profiles for one test case, the main additional cost is when simplifying IVs after unrolling, during the willOverflow check that zext/sext both operands. In getZeroExtendExpr we seem to spend more times in various proveNoWrapByXYZ() methods. It's not obvious to me how this change would cause that, I'd more expect the reverse effect (less need to infer additional flags because we added more here).

Looking at callgrind profiles for one test case, the main additional cost is when simplifying IVs after unrolling, during the willOverflow check that zext/sext both operands. In getZeroExtendExpr we seem to spend more times in various proveNoWrapByXYZ() methods. It's not obvious to me how this change would cause that, I'd more expect the reverse effect (less need to infer additional flags because we added more here).

All I can think of is that maybe we figure out the trip count for some loop, unroll it, and then spend time analyzing the newly introduced IVs? That's really the only interaction I can find.

Out of curiosity, what kind of unrolling do you see? Full, partial, max count full, or runtime?

nikic added a comment.Nov 19 2021, 2:19 PM

Looking at callgrind profiles for one test case, the main additional cost is when simplifying IVs after unrolling, during the willOverflow check that zext/sext both operands. In getZeroExtendExpr we seem to spend more times in various proveNoWrapByXYZ() methods. It's not obvious to me how this change would cause that, I'd more expect the reverse effect (less need to infer additional flags because we added more here).

All I can think of is that maybe we figure out the trip count for some loop, unroll it, and then spend time analyzing the newly introduced IVs? That's really the only interaction I can find.

This file (tddis.c from mafft) doesn't see any codegen change, so it's not new unrolling. It could be that we're now better able to analyze an unrolled loop though.

Out of curiosity, what kind of unrolling do you see? Full, partial, max count full, or runtime?

From the debug log, there's both full and runtime unrolling going on in this file -- but looking at the unrolling implementation, we only simplify IVs for non-complete unrolling, so this should be related to runtime unrolling.

@nikic I don't see anything obvious here, and don't consider the reported issue worthy of revert. Given that, I'm going to stop look into this. If you want to file a standalone bug with a reproducer, I can take a further look, but at the moment, the amount of information shared so far is not really easily actionable.