This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] When computing trip count, only zext if necessary
ClosedPublic

Authored by caojoshua on Mar 29 2023, 12:17 AM.

Details

Summary

This patch improves on https://reviews.llvm.org/D110587. To summarize
the patch, given backedge-taken count BC, trip count TC is BC + 1.
However, we don't know if BC we might overflow. So the patch modifies TC
computation to 1 + zext(BC).

This patch only adds the zext if necessary by looking at the constant
range. If we can determine that BC cannot be the max value for its
bitwidth, then we know adding 1 will not overflow, and the zext is not
needed. We apply loop guards before computing TC to get more data.

The primary motivation is to support my work on more precise trip
multiples in https://reviews.llvm.org/D141823. For example:

void test(unsigned n)
  __builtin_assume(n % 6 == 0);
  for (unsigned i = 0; i < n; ++i)
    foo();

Prior to this patch, we had `TC = 1 + zext(-1 + 6 * ((6 umax %n) /u
6))<nuw>`. SCEV range computation is able to determine that the BC
cannot be the max value, so the zext is not needed. The result is `TC
-> (6 * ((6 umax %n) /u 6))<nuw>`. From here, we would be able to
determine that %n is a multiple of 6.

There was one change in LoopCacheAnalysis/LoopInterchange required.
Before this patch, if a loop has BC = false, it would compute `TC -> 1 +
zext(false) -> 1`, which was fine. After this patch, it computes `TC -> 1
+ false = true`. CacheAnalysis would then sign extend the true, which
was not the intended the behavior. I modified CacheAnalysis such that
it would only zero extend trip counts.

This patch is not NFC, but also does not change any SCEV outputs. I
would like to get this patch out first to make work with trip multiples
easier.

Diff Detail

Event Timeline

caojoshua created this revision.Mar 29 2023, 12:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2023, 12:17 AM
caojoshua edited the summary of this revision. (Show Details)
This comment was removed by caojoshua.
caojoshua published this revision for review.Mar 29 2023, 12:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2023, 12:25 AM

Two levels of comments here.

First, you can probably use willNotOverflow(Opcode, LHS, RHS) here instead.

Second, I think you're solving the wrong problem with this patch. If "1 + zext((6 * %N) - 1)" does not simplify to "zext((6 * %N))" when truncating the constant and folding the constant produces the same answer, that seems like a SCEV folding bug, and is probably the more general fix. This would require the same basic proof, but inside getAddExpr. Oddly, we don't seem to have either the zext or sext case there already.

First, you can probably use willNotOverflow(Opcode, LHS, RHS) here instead.

Thanks. I'll try this.

Second, I think you're solving the wrong problem with this patch. If "1 + zext((6 * %N) - 1)" does not simplify to "zext((6 * %N))" when truncating the constant and folding the constant produces the same answer, that seems like a SCEV folding bug, and is probably the more general fix. This would require the same basic proof, but inside getAddExpr. Oddly, we don't seem to have either the zext or sext case there already.

Yes, I agree this is a general fix that would be beneficial for SCEV. However, I don't think its the same problem. You recommendation would give us zext(6 * %N), while my current solution gives us 6 * %N. I would prefer my result, but in practice, it probably does not matter. If you feel this extra code is not worth it, I'll add the changes to getAddExpr()

Use willNotOverflow().

nikic added inline comments.Mar 30 2023, 6:56 AM
llvm/lib/Analysis/LoopCacheAnalysis.cpp
337

The changes to LoopCacheAnalysis should be split out into a separate patch. I'm also wondering why this changes some but not all of the AnyExtends...

llvm/lib/Analysis/ScalarEvolution.cpp
8047

I'd prefer this to go back to the previous version. willNotOverflow() is very expensive and not needed here.

caojoshua added inline comments.Mar 30 2023, 10:07 PM
llvm/lib/Analysis/LoopCacheAnalysis.cpp
337

The places where I changed sext->zext are SCEVs that are computed from TripCount. In this case, RefCost is set to TripCount in the else block above.

I thought it made sense to have a single patch for these two changes. It would be harder to motivate the changes if I only submitted a patch for LoopCacheAnalysis. And the whole patch is small overall, so its not hard to read.

Revert back to checking overflow manually

caojoshua marked an inline comment as done.Mar 30 2023, 10:51 PM
caojoshua added inline comments.
llvm/lib/Analysis/ScalarEvolution.cpp
8047

I reverted back to previous version. I can see that willNotOverflow() is a bit overkill here. @reames what do you think?

nikic added inline comments.Mar 31 2023, 1:30 AM
llvm/lib/Analysis/LoopCacheAnalysis.cpp
337

But aren't the two AnyExtends in the else block above also on trip counts?

caojoshua added inline comments.Apr 1 2023, 1:07 AM
llvm/lib/Analysis/LoopCacheAnalysis.cpp
337

You're right. I think its just lack of test coverage in LoopCacheAnalysis / LoopInterchange, and that we should be using zext in those cases.

These cases don't come up very often. It's specifically when BackedgeTakenCount = false, or when there is a dead loop. The code we are talking about only applies when there are >=3 subscripts, and there probably just is not a test for it yet.

When I get the chance, I'll see if I can create a breaking test.

caojoshua added inline comments.Apr 2 2023, 12:25 AM
llvm/lib/Analysis/LoopCacheAnalysis.cpp
337

Recall the issue only occurs when BackedgeTakenCount = False, resulting in TripCount = True. For

Type *WiderType = SE.getWiderType(RefCost->getType(), TripCount->getType());

The WiderType is always going to just be 1. And the loop just multiplies True x True = True. We end up returning zext(True) = 1. If I didn't make the change here, we would have return sext(True) = -1.

I'm gonna push a change to make those zero extends anyway. The tests still pass. I think zero extends make sense here because the returned RefCost is always positive, or -1 if the cost is invalid.

For reference, I pushed a test that helped me debug this issue.

caojoshua updated this revision to Diff 510313.Apr 2 2023, 12:42 AM

Convert more sign extend to zero extend

nikic accepted this revision.Apr 10 2023, 12:41 AM

LGTM

This revision is now accepted and ready to land.Apr 10 2023, 12:41 AM