This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Simplify trunc to zero based on known bits
ClosedPublic

Authored by gilr on Jan 1 2021, 11:59 PM.

Details

Summary

Let getTruncateExpr() short-circuit to zero when the value being truncated is known to have at least as many trailing zeros as the target type.

Diff Detail

Event Timeline

gilr created this revision.Jan 1 2021, 11:59 PM
gilr requested review of this revision.Jan 1 2021, 11:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 1 2021, 11:59 PM
nikic added a subscriber: nikic.Jan 2 2021, 2:10 AM
nikic added inline comments.
llvm/lib/Analysis/ScalarEvolution.cpp
1157

This check should be lower in the function. Certainly doesn't make sense to do this before the folding set lookup or the constant folding. I think the right position is directly before a new SCEVTruncateExpr is allocated, i.e. after trunc has already been pushed through operations where possible.

lebedev.ri added inline comments.
llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-assumed-divisible-TC.ll
1

Please precommit the test

gilr marked 2 inline comments as done.Jan 2 2021, 7:42 AM
gilr updated this revision to Diff 314244.Jan 2 2021, 7:45 AM

Addressed comments.

fhahn added inline comments.Jan 2 2021, 7:58 AM
llvm/lib/Analysis/ScalarEvolution.cpp
1181

Are you worried about the cost of calling GetMintrailingZeros here? Could it happen that we have a arbitrary SCEV expression here for which we can get more accurate results thanks to GetMinTrailingZeros?

nikic added inline comments.Jan 2 2021, 8:10 AM
llvm/lib/Analysis/ScalarEvolution.cpp
1181

I believe this optimization can only be useful in the first place for SCEVUnknown (and SCEVPtrToInt, which is basically the same thing), as well as min/max expression. The latter only because we don't push truncates through min/max like we do for everything else. Possibly we should be doing that, but that's a different issue...

That said, from a code structure perspective, I'm not sure why this check is present in the "Depth > MaxCastDepth" branch at all. This is a recursion cut-off that is supposed to produce sub-optimal SCEV expressions, and there does not seem be any strong cause why this particular optimization needs to be applied unconditionally.

gilr added inline comments.Jan 2 2021, 11:27 PM
llvm/lib/Analysis/ScalarEvolution.cpp
1181

I believe this optimization can only be useful in the first place for SCEVUnknown

This patch was indeed motivated by assumptions, but the analysis first needs to get to the SCEVUnknowns. Calling GetMinTrailingZeros on any SCEV that got through GetTruncateExpr complements what ever the latter didn't simplify, but admittedly GetMinTrailingZeros is unboundedly recursive itself (even if cached and unrelated to getTruncateExpr's recursion).

This is a recursion cut-off that is supposed to produce sub-optimal SCEV expressions

Cutting off potentially exponential recursion at some point makes a lot of sense. Not sure it implies not doing any work at the leaves though.

To be on the safe side let's start by calling GetMinTrailingZeros only for SCEVUnknowns at the end of the function and extend as needed.

gilr updated this revision to Diff 314269.Jan 2 2021, 11:31 PM

Limit trailing-zeros check to SCEVUnknowns and only while depth limit is not reached.

nikic accepted this revision.Jan 3 2021, 1:49 AM

LGTM

This revision is now accepted and ready to land.Jan 3 2021, 1:49 AM
fhahn added inline comments.Jan 3 2021, 3:16 AM
llvm/lib/Analysis/ScalarEvolution.cpp
1181

I believe this optimization can only be useful in the first place for SCEVUnknown

This patch was indeed motivated by assumptions, but the analysis first needs to get to the SCEVUnknowns. Calling GetMinTrailingZeros on any SCEV that got through GetTruncateExpr complements what ever the latter didn't simplify, but admittedly GetMinTrailingZeros is unboundedly recursive itself (even if cached and unrelated to getTruncateExpr's recursion).

I am not sure, is there's anything conceptually making this only useful for SCEVUnknown?

GetMinTrailingZeroes can provide useful bounds for a range of expressions which may be helpful for this optimization. One example involving a UMax expression below. This example probably highlights a missing fold for truncates, but the main point is that the reasoning in this patch here is complimentary and may catch additional cases. There is some overlap with the folds in the function, but I am not sure this means we should restrict this only to SCEVUnknown. If GetMinTrailingZeros gets improved, it would be good to not miss out of the benefits in this function.

define i8 @trunc_to_assumed_zeros0(i32* %p, i32* %p.2, i1 %c) {
  %a = load i32, i32* %p
  %b = load i32, i32* %p.2
  %and.1 = and i32 %a, 255
  %cmp.1 = icmp eq i32 %and.1, 0

  tail call void @llvm.assume(i1 %cmp.1)
  %and.2 = and i32 %b, 255
  %cmp.2 = icmp eq i32 %and.2, 0
  tail call void @llvm.assume(i1 %cmp.2)
 
  %lt = icmp ugt i32 %a, %b
  %sel = select i1 %lt, i32 %a, i32 %b

  %t1 = trunc i32 %sel to i8
  %t2 = trunc i32 %a to i8
  %t3 = trunc i32 %b to i8
  ret i8 %t1
}
nikic added inline comments.Jan 3 2021, 3:33 AM
llvm/lib/Analysis/ScalarEvolution.cpp
1181

Just to be clear, I have no problem the fold being applied to all SCEV expressions, not just SCEVUnknown. My only concern was with where in this function the fold happens, not what it is applied to.

gilr added inline comments.Jan 3 2021, 3:54 AM
llvm/lib/Analysis/ScalarEvolution.cpp
1181

Right @fhahn. And the use of GetMinTrailingZeros should automatically be reduced if getTruncateExpr is added further simplifications. Only reason to restrict this to (the non-recursive) SCEVUnknown was being extra-careful regarding compile-time.
Since @nikic is also Ok with folding any SCEV at the end of the function I'll remove this restriction.
Thanks guys!

This revision was automatically updated to reflect the committed changes.
fhahn added inline comments.Jan 3 2021, 4:10 AM
llvm/lib/Analysis/ScalarEvolution.cpp
1181

SGTM, thanks!