Page MenuHomePhabricator

[LVI][CVP] Use block value in getPredicateAt()
Needs ReviewPublic

Authored by nikic on Thu, Oct 31, 2:38 PM.

Details

Summary

This resolves a long-standing deficiency in LVI: The getPredicateAt() method currently does not take into account the block value, i.e. it does not use any information from the current basic block, only information that is threaded over edges. This makes evaluation of icmps in CVP much weaker than it could be. I've included two real-life examples of icmps we fail to simplify under O3, but can handle after this change.

Diff Detail

Event Timeline

nikic created this revision.Thu, Oct 31, 2:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptThu, Oct 31, 2:38 PM
nikic marked 2 inline comments as done.Thu, Oct 31, 2:45 PM
nikic added inline comments.
llvm/test/Transforms/JumpThreading/bb-unreachable-from-entry.ll
13

This change is caused because we now successfully evaluate the icmp to a constant, and jump threading does not clean up after that (simplifycfg will). Previously this used some of the threading logic, which follows the chain to the end.

llvm/test/Transforms/JumpThreading/header-succ.ll
113

It took me a while to understand what is going here... what happens is that we have a branch on undef and previously this branch was folded in one direction (to an exit), while now it goes into the other (into an infinite loop). Both are valid. The reason is again that we evaluate the icmp to a constant.

Ooh, finally, nice!
What's going on with JumpThreading tests?

nikic added a comment.Fri, Nov 1, 9:58 AM

@lebedev.ri I added a couple of comments on the JT tests, probably after you opened the tab ^^ The reason for the diffs is that we now evaluate the condition to a constant, while previously these were phi threaded. It has a bit of an odd effect here, but I would expect that in practice folding a branch completely is generally preferable.

@lebedev.ri I added a couple of comments on the JT tests, probably after you opened the tab ^^

Yes.

The diff seems okay to me but i'm not deeply familiar with LVI.
I wonder who is the current code owner of LVI/CVP.

What correctness testing was done, test-suite?

alex added a subscriber: alex.Fri, Nov 1, 2:37 PM
nikic added a comment.Sat, Nov 2, 8:00 AM

What correctness testing was done, test-suite?

I've done a test-suite run now, it passes.

lebedev.ri accepted this revision.Sat, Nov 2, 8:55 AM

Okay, LG, thank you for looking into it.
@spatel / @reames ?

This revision is now accepted and ready to land.Sat, Nov 2, 8:55 AM

Interestingly, we seriously regress in NSW deductions:

| statistic                                 |     old |     new | delta |  % change |
| correlated-value-propagation.NumAShrs     |     209 |     438 |   229 | 109.5694% |
| correlated-value-propagation.NumAddNSW    |    4972 |    4237 |  -735 | -14.7828% |
| correlated-value-propagation.NumAddNUW    |    7141 |    7157 |    16 |   0.2241% |
| correlated-value-propagation.NumAddNW     |   12113 |   11394 |  -719 |  -5.9358% |
| correlated-value-propagation.NumAnd       |     442 |     441 |    -1 |  -0.2262% |
| correlated-value-propagation.NumCmps      |    1194 |    1183 |   -11 |  -0.9213% |
| correlated-value-propagation.NumDeadCases |     111 |     115 |     4 |   3.6036% |
| correlated-value-propagation.NumMulNSW    |     278 |     276 |    -2 |  -0.7194% |
| correlated-value-propagation.NumMulNUW    |    1326 |    1324 |    -2 |  -0.1508% |
| correlated-value-propagation.NumMulNW     |    1604 |    1600 |    -4 |  -0.2494% |
| correlated-value-propagation.NumNSW       |    7160 |    6415 |  -745 | -10.4050% |
| correlated-value-propagation.NumNUW       |   13306 |   13315 |     9 |   0.0676% |
| correlated-value-propagation.NumNW        |   20466 |   19730 |  -736 |  -3.5962% |
| correlated-value-propagation.NumOverflows |       7 |       4 |    -3 | -42.8571% |
| correlated-value-propagation.NumPhiCommon |     397 |     401 |     4 |   1.0076% |
| correlated-value-propagation.NumPhis      |   15360 |   15435 |    75 |   0.4883% |
| correlated-value-propagation.NumSDivs     |     207 |     304 |    97 |  46.8599% |
| correlated-value-propagation.NumSExt      |    6278 |    7226 |   948 |  15.1004% |
| correlated-value-propagation.NumSRems     |      28 |      49 |    21 |  75.0000% |
| correlated-value-propagation.NumShlNSW    |    1171 |    1160 |   -11 |  -0.9394% |
| correlated-value-propagation.NumShlNUW    |    2793 |    2783 |   -10 |  -0.3580% |
| correlated-value-propagation.NumShlNW     |    3964 |    3943 |   -21 |  -0.5298% |
| correlated-value-propagation.NumSubNSW    |     739 |     742 |     3 |   0.4060% |
| correlated-value-propagation.NumSubNUW    |    2046 |    2051 |     5 |   0.2444% |
| correlated-value-propagation.NumSubNW     |    2785 |    2793 |     8 |   0.2873% |
| correlated-value-propagation.NumUDivs     |     353 |     375 |    22 |   6.2323% |
| instcount.NumAShrInst                     |   13761 |   13588 |  -173 |  -1.2572% |
| instcount.NumAddInst                      |  277538 |  277585 |    47 |   0.0169% |
| instcount.NumAllocaInst                   |   28040 |   28043 |     3 |   0.0107% |
| instcount.NumAndInst                      |   66082 |   66163 |    81 |   0.1226% |
| instcount.NumBitCastInst                  |  675473 |  675587 |   114 |   0.0169% |
| instcount.NumBrInst                       |  712987 |  712797 |  -190 |  -0.0266% |
| instcount.NumCallInst                     |  528453 |  528461 |     8 |   0.0015% |
| instcount.NumExtractElementInst           |   19418 |   19420 |     2 |   0.0103% |
| instcount.NumFCmpInst                     |   11039 |   11037 |    -2 |  -0.0181% |
| instcount.NumFNegInst                     |    1270 |    1272 |     2 |   0.1575% |
| instcount.NumFSubInst                     |   90911 |   90915 |     4 |   0.0044% |
| instcount.NumGetElementPtrInst            | 1229297 | 1229518 |   221 |   0.0180% |
| instcount.NumICmpInst                     |  487467 |  487413 |   -54 |  -0.0111% |
| instcount.NumInvokeInst                   |   21780 |   21779 |    -1 |  -0.0046% |
| instcount.NumLShrInst                     |   27409 |   27587 |   178 |   0.6494% |
| instcount.NumLoadInst                     |  890831 |  890834 |     3 |   0.0003% |
| instcount.NumMulInst                      |   43841 |   43832 |    -9 |  -0.0205% |
| instcount.NumOrInst                       |  102648 |  102652 |     4 |   0.0039% |
| instcount.NumPHIInst                      |  317805 |  317726 |   -79 |  -0.0249% |
| instcount.NumPtrToIntInst                 |   16114 |   16116 |     2 |   0.0124% |
| instcount.NumRetInst                      |   88782 |   88777 |    -5 |  -0.0056% |
| instcount.NumSDivInst                     |    8732 |    8670 |   -62 |  -0.7100% |
| instcount.NumSExtInst                     |   79306 |   78745 |  -561 |  -0.7074% |
| instcount.NumSRemInst                     |    1679 |    1678 |    -1 |  -0.0596% |
| instcount.NumSelectInst                   |   46181 |   46251 |    70 |   0.1516% |
| instcount.NumShlInst                      |   40640 |   40575 |   -65 |  -0.1599% |
| instcount.NumShuffleVectorInst            |  100326 |  100317 |    -9 |  -0.0090% |
| instcount.NumStoreInst                    |  814080 |  814211 |   131 |   0.0161% |
| instcount.NumSubInst                      |   61973 |   61976 |     3 |   0.0048% |
| instcount.NumTruncInst                    |   62132 |   62166 |    34 |   0.0547% |
| instcount.NumUDivInst                     |    2526 |    2535 |     9 |   0.3563% |
| instcount.NumURemInst                     |    1589 |    1598 |     9 |   0.5664% |
| instcount.NumUnreachableInst              |   13486 |   13467 |   -19 |  -0.1409% |
| instcount.NumXorInst                      |   10632 |   10643 |    11 |   0.1035% |
| instcount.NumZExtInst                     |   68227 |   69001 |   774 |   1.1344% |
| instcount.TotalBlocks                     |  847787 |  847572 |  -215 |  -0.0254% |
| instcount.TotalFuncs                      |   88847 |   88843 |    -4 |  -0.0045% |
| instcount.TotalInsts                      | 7411666 | 7412146 |   480 |   0.0065% |
fhahn added a subscriber: fhahn.Mon, Nov 4, 1:55 AM
lebedev.ri added inline comments.Mon, Nov 4, 6:31 AM
llvm/lib/Analysis/LazyValueInfo.cpp
1856–1858

Should we be getting both, and combining their knowledge somehow?

reames added a comment.Mon, Nov 4, 9:36 AM

Can you give some context on the problem you're trying to solve? This doesn't look quite right, but maybe with some context I can make a suggestion as to how to approach cleanly?

llvm/lib/Analysis/LazyValueInfo.cpp
1852

Wait, no, please don't do this. Please don't bake in assumption about the semantics of the function based on the type of the argument. If this difference exists, we should fix/remove it.

I've long thought we needed to have a getValueAtBegin(BB) variant. Would that solve your use case?

lebedev.ri requested changes to this revision.Tue, Nov 5, 12:56 AM

(there is no "revoke review" so marking as "changes needed" instead)

This revision now requires changes to proceed.Tue, Nov 5, 12:56 AM
nikic marked an inline comment as done.Tue, Nov 5, 2:37 AM

@lebedev.ri Weirdly that does not match the results I get:

correlated-value-propagation.NumAShrs | 199 | 425
correlated-value-propagation.NumAddNSW | 1975 | 2056
correlated-value-propagation.NumAddNUW | 4156 | 4173
correlated-value-propagation.NumAddNW | 6131 | 6229
correlated-value-propagation.NumAnd | 194 | 204
correlated-value-propagation.NumCmps | 809 | 933
correlated-value-propagation.NumDeadCases | 110 | 114
correlated-value-propagation.NumMulNSW | 131 | 129
correlated-value-propagation.NumMulNUW | 832 | 830
correlated-value-propagation.NumMulNW | 963 | 959
correlated-value-propagation.NumNSW | 3781 | 3850
correlated-value-propagation.NumNUW | 7737 | 7745
correlated-value-propagation.NumNW | 11518 | 11595
correlated-value-propagation.NumOverflows | 7 | 4
correlated-value-propagation.NumPhiCommon | 393 | 393
correlated-value-propagation.NumPhis | 11375 | 11442
correlated-value-propagation.NumSDivs | 201 | 295
correlated-value-propagation.NumSExt | 3667 | 4602
correlated-value-propagation.NumSRems | 28 | 47
correlated-value-propagation.NumSelects | 25 | 25
correlated-value-propagation.NumShlNSW | 1082 | 1069
correlated-value-propagation.NumShlNUW | 2262 | 2251
correlated-value-propagation.NumShlNW | 3344 | 3320
correlated-value-propagation.NumSubNSW | 593 | 596
correlated-value-propagation.NumSubNUW | 487 | 491
correlated-value-propagation.NumSubNW | 1080 | 1087
correlated-value-propagation.NumUDivs | 189 | 211

I ran this together with a second patch to CVP which actually computes the icmp result for non-local values, which I was planning to submit as a followup. I don't think that part should impact the NSW deductions though.

llvm/lib/Analysis/LazyValueInfo.cpp
1856–1858

getValueInBlock() is strictly more powerful, so that should not be necessary.

nikic marked an inline comment as done.Tue, Nov 5, 2:49 AM

Can you give some context on the problem you're trying to solve? This doesn't look quite right, but maybe with some context I can make a suggestion as to how to approach cleanly?

I want to determine the result of icmps based on range information within the same basic block (see the two tests for examples). LVI has all the necessary machinery to do that, but right now that getPredicateAt() method which is used to evaluate icmps in JT and CVP does not actually make use of it. getPrediateAt() is (mostly) weaker than manually evaluating the predicate based on the getConstantRange() result, and CVP actually does that for some of the "non-negative" optimizations.

See also D44252, where you suggested (I think, if I understood correctly) to do what I'm trying to do here.

llvm/lib/Analysis/LazyValueInfo.cpp
1852

I agree that this is pretty weird. I can take a look at fixing the behavior of pointers, which I believe are in the wrong here (the integer behavior is already heavily embedded in how CVP works).

nikic updated this revision to Diff 228504.Fri, Nov 8, 11:50 AM

Rebase over changes to pointer handling.

There are some concerning changes to lvi-after-jumpthreading.ll which I probably missed before.

nikic marked an inline comment as done.Fri, Nov 8, 12:54 PM
nikic added inline comments.
llvm/test/Analysis/LazyValueAnalysis/lvi-after-jumpthreading.ll
25

There are two issues here. The first is that JumpThreading marks LVI as preserved, but only seems to preserve it in the sense that the result isn't incorrect -- but not necessarily the same as rerunning from scratch.

The second is that we now end up computing block values in a different order, and this has an impact on the results. Here is the debug output of running JumpThreading on this function:

Before this patch:

LVI Getting value   %iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ] at ''
  Result = overdefined
LVI Getting edge value i32 0 from 'entry' to 'loop'
  Result = constantrange<0, 1>
LVI Getting edge value   %iv.next = add nsw i32 %iv, 1 from 'backedge' to 'loop'
PUSH:   %iv.next = add nsw i32 %iv, 1 in backedge
PUSH:   %iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ] in backedge
PUSH:   %iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ] in loop
POP   %iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ] in loop = constantrange<-2147483648, 400>
POP   %iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ] in backedge = constantrange<0, 400>
POP   %iv.next = add nsw i32 %iv, 1 in backedge = constantrange<1, 401>
  Result = constantrange<1, 400>
LVI Getting value   %iv.next = add nsw i32 %iv, 1 at ''
  Result = overdefined
LVI Getting value   %iv.next = add nsw i32 %iv, 1 at ''
  Result = overdefined

After this patch:

LVI Getting block end value   %iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ] at 'loop'
PUSH:   %iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ] in loop
PUSH:   %iv.next = add nsw i32 %iv, 1 in backedge
PUSH:   %iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ] in backedge
POP   %iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ] in backedge = constantrange<0, -2147483648>
POP   %iv.next = add nsw i32 %iv, 1 in backedge = constantrange<1, -2147483648>
POP   %iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ] in loop = constantrange<0, 400>
  Result = constantrange<0, 400>
LVI Getting block end value   %iv.next = add nsw i32 %iv, 1 at 'backedge'
  Result = constantrange<1, -2147483648>
LVI Getting block end value   %iv.next = add nsw i32 %iv, 1 at 'backedge'
  Result = constantrange<1, -2147483648>

Previously we ended up computing %iv in loop first, then %iv in backedge, then %iv.next in backedge. Now we compute %iv in backedge first, then %iv.next in backedge`, then %iv in loop. This eventually results in different results being calculated.

This seems pretty concerning regarding the general design of LVI, and I'm not really sure what I should do about this.