Let getTruncateExpr() short-circuit to zero when the value being truncated is known to have at least as many trailing zeros as the target type.
Details
Diff Detail
Event Timeline
llvm/lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
1157 | This check should be lower in the function. Certainly doesn't make sense to do this before the folding set lookup or the constant folding. I think the right position is directly before a new SCEVTruncateExpr is allocated, i.e. after trunc has already been pushed through operations where possible. |
llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-assumed-divisible-TC.ll | ||
---|---|---|
1 | Please precommit the test |
llvm/lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
1181 | Are you worried about the cost of calling GetMintrailingZeros here? Could it happen that we have a arbitrary SCEV expression here for which we can get more accurate results thanks to GetMinTrailingZeros? |
llvm/lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
1181 | I believe this optimization can only be useful in the first place for SCEVUnknown (and SCEVPtrToInt, which is basically the same thing), as well as min/max expression. The latter only because we don't push truncates through min/max like we do for everything else. Possibly we should be doing that, but that's a different issue... That said, from a code structure perspective, I'm not sure why this check is present in the "Depth > MaxCastDepth" branch at all. This is a recursion cut-off that is supposed to produce sub-optimal SCEV expressions, and there does not seem be any strong cause why this particular optimization needs to be applied unconditionally. |
llvm/lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
1181 |
This patch was indeed motivated by assumptions, but the analysis first needs to get to the SCEVUnknowns. Calling GetMinTrailingZeros on any SCEV that got through GetTruncateExpr complements what ever the latter didn't simplify, but admittedly GetMinTrailingZeros is unboundedly recursive itself (even if cached and unrelated to getTruncateExpr's recursion).
Cutting off potentially exponential recursion at some point makes a lot of sense. Not sure it implies not doing any work at the leaves though. To be on the safe side let's start by calling GetMinTrailingZeros only for SCEVUnknowns at the end of the function and extend as needed. |
Limit trailing-zeros check to SCEVUnknowns and only while depth limit is not reached.
llvm/lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
1181 |
I am not sure, is there's anything conceptually making this only useful for SCEVUnknown? GetMinTrailingZeroes can provide useful bounds for a range of expressions which may be helpful for this optimization. One example involving a UMax expression below. This example probably highlights a missing fold for truncates, but the main point is that the reasoning in this patch here is complimentary and may catch additional cases. There is some overlap with the folds in the function, but I am not sure this means we should restrict this only to SCEVUnknown. If GetMinTrailingZeros gets improved, it would be good to not miss out of the benefits in this function. define i8 @trunc_to_assumed_zeros0(i32* %p, i32* %p.2, i1 %c) { %a = load i32, i32* %p %b = load i32, i32* %p.2 %and.1 = and i32 %a, 255 %cmp.1 = icmp eq i32 %and.1, 0 tail call void @llvm.assume(i1 %cmp.1) %and.2 = and i32 %b, 255 %cmp.2 = icmp eq i32 %and.2, 0 tail call void @llvm.assume(i1 %cmp.2) %lt = icmp ugt i32 %a, %b %sel = select i1 %lt, i32 %a, i32 %b %t1 = trunc i32 %sel to i8 %t2 = trunc i32 %a to i8 %t3 = trunc i32 %b to i8 ret i8 %t1 } |
llvm/lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
1181 | Just to be clear, I have no problem the fold being applied to all SCEV expressions, not just SCEVUnknown. My only concern was with where in this function the fold happens, not what it is applied to. |
llvm/lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
1181 | Right @fhahn. And the use of GetMinTrailingZeros should automatically be reduced if getTruncateExpr is added further simplifications. Only reason to restrict this to (the non-recursive) SCEVUnknown was being extra-careful regarding compile-time. |
llvm/lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
1181 | SGTM, thanks! |
This check should be lower in the function. Certainly doesn't make sense to do this before the folding set lookup or the constant folding. I think the right position is directly before a new SCEVTruncateExpr is allocated, i.e. after trunc has already been pushed through operations where possible.