This resolves a long-standing deficiency in LVI: The getPredicateAt() method currently does not take into account the block value, i.e. it does not use any information from the current basic block, only information that is threaded over edges. This makes evaluation of icmps in CVP much weaker than it could be. I've included two real-life examples of icmps we fail to simplify under O3, but can handle after this change.

# Details

# Diff Detail

### Event Timeline

llvm/test/Transforms/JumpThreading/bb-unreachable-from-entry.ll | ||
---|---|---|

13 | This change is caused because we now successfully evaluate the icmp to a constant, and jump threading does not clean up after that (simplifycfg will). Previously this used some of the threading logic, which follows the chain to the end. | |

llvm/test/Transforms/JumpThreading/header-succ.ll | ||

113 | It took me a while to understand what is going here... what happens is that we have a branch on undef and previously this branch was folded in one direction (to an exit), while now it goes into the other (into an infinite loop). Both are valid. The reason is again that we evaluate the icmp to a constant. |

@lebedev.ri I added a couple of comments on the JT tests, probably after you opened the tab ^^ The reason for the diffs is that we now evaluate the condition to a constant, while previously these were phi threaded. It has a bit of an odd effect here, but I would expect that in practice folding a branch completely is generally preferable.

Yes.

The diff seems okay to me but i'm not deeply familiar with LVI.

I wonder who is the current code owner of LVI/CVP.

What correctness testing was done, test-suite?

Interestingly, we seriously regress in NSW deductions:

| statistic | old | new | delta | % change | | correlated-value-propagation.NumAShrs | 209 | 438 | 229 | 109.5694% | | correlated-value-propagation.NumAddNSW | 4972 | 4237 | -735 | -14.7828% | | correlated-value-propagation.NumAddNUW | 7141 | 7157 | 16 | 0.2241% | | correlated-value-propagation.NumAddNW | 12113 | 11394 | -719 | -5.9358% | | correlated-value-propagation.NumAnd | 442 | 441 | -1 | -0.2262% | | correlated-value-propagation.NumCmps | 1194 | 1183 | -11 | -0.9213% | | correlated-value-propagation.NumDeadCases | 111 | 115 | 4 | 3.6036% | | correlated-value-propagation.NumMulNSW | 278 | 276 | -2 | -0.7194% | | correlated-value-propagation.NumMulNUW | 1326 | 1324 | -2 | -0.1508% | | correlated-value-propagation.NumMulNW | 1604 | 1600 | -4 | -0.2494% | | correlated-value-propagation.NumNSW | 7160 | 6415 | -745 | -10.4050% | | correlated-value-propagation.NumNUW | 13306 | 13315 | 9 | 0.0676% | | correlated-value-propagation.NumNW | 20466 | 19730 | -736 | -3.5962% | | correlated-value-propagation.NumOverflows | 7 | 4 | -3 | -42.8571% | | correlated-value-propagation.NumPhiCommon | 397 | 401 | 4 | 1.0076% | | correlated-value-propagation.NumPhis | 15360 | 15435 | 75 | 0.4883% | | correlated-value-propagation.NumSDivs | 207 | 304 | 97 | 46.8599% | | correlated-value-propagation.NumSExt | 6278 | 7226 | 948 | 15.1004% | | correlated-value-propagation.NumSRems | 28 | 49 | 21 | 75.0000% | | correlated-value-propagation.NumShlNSW | 1171 | 1160 | -11 | -0.9394% | | correlated-value-propagation.NumShlNUW | 2793 | 2783 | -10 | -0.3580% | | correlated-value-propagation.NumShlNW | 3964 | 3943 | -21 | -0.5298% | | correlated-value-propagation.NumSubNSW | 739 | 742 | 3 | 0.4060% | | correlated-value-propagation.NumSubNUW | 2046 | 2051 | 5 | 0.2444% | | correlated-value-propagation.NumSubNW | 2785 | 2793 | 8 | 0.2873% | | correlated-value-propagation.NumUDivs | 353 | 375 | 22 | 6.2323% | | instcount.NumAShrInst | 13761 | 13588 | -173 | -1.2572% | | instcount.NumAddInst | 277538 | 277585 | 47 | 0.0169% | | instcount.NumAllocaInst | 28040 | 28043 | 3 | 0.0107% | | instcount.NumAndInst | 66082 | 66163 | 81 | 0.1226% | | instcount.NumBitCastInst | 675473 | 675587 | 114 | 0.0169% | | instcount.NumBrInst | 712987 | 712797 | -190 | -0.0266% | | instcount.NumCallInst | 528453 | 528461 | 8 | 0.0015% | | instcount.NumExtractElementInst | 19418 | 19420 | 2 | 0.0103% | | instcount.NumFCmpInst | 11039 | 11037 | -2 | -0.0181% | | instcount.NumFNegInst | 1270 | 1272 | 2 | 0.1575% | | instcount.NumFSubInst | 90911 | 90915 | 4 | 0.0044% | | instcount.NumGetElementPtrInst | 1229297 | 1229518 | 221 | 0.0180% | | instcount.NumICmpInst | 487467 | 487413 | -54 | -0.0111% | | instcount.NumInvokeInst | 21780 | 21779 | -1 | -0.0046% | | instcount.NumLShrInst | 27409 | 27587 | 178 | 0.6494% | | instcount.NumLoadInst | 890831 | 890834 | 3 | 0.0003% | | instcount.NumMulInst | 43841 | 43832 | -9 | -0.0205% | | instcount.NumOrInst | 102648 | 102652 | 4 | 0.0039% | | instcount.NumPHIInst | 317805 | 317726 | -79 | -0.0249% | | instcount.NumPtrToIntInst | 16114 | 16116 | 2 | 0.0124% | | instcount.NumRetInst | 88782 | 88777 | -5 | -0.0056% | | instcount.NumSDivInst | 8732 | 8670 | -62 | -0.7100% | | instcount.NumSExtInst | 79306 | 78745 | -561 | -0.7074% | | instcount.NumSRemInst | 1679 | 1678 | -1 | -0.0596% | | instcount.NumSelectInst | 46181 | 46251 | 70 | 0.1516% | | instcount.NumShlInst | 40640 | 40575 | -65 | -0.1599% | | instcount.NumShuffleVectorInst | 100326 | 100317 | -9 | -0.0090% | | instcount.NumStoreInst | 814080 | 814211 | 131 | 0.0161% | | instcount.NumSubInst | 61973 | 61976 | 3 | 0.0048% | | instcount.NumTruncInst | 62132 | 62166 | 34 | 0.0547% | | instcount.NumUDivInst | 2526 | 2535 | 9 | 0.3563% | | instcount.NumURemInst | 1589 | 1598 | 9 | 0.5664% | | instcount.NumUnreachableInst | 13486 | 13467 | -19 | -0.1409% | | instcount.NumXorInst | 10632 | 10643 | 11 | 0.1035% | | instcount.NumZExtInst | 68227 | 69001 | 774 | 1.1344% | | instcount.TotalBlocks | 847787 | 847572 | -215 | -0.0254% | | instcount.TotalFuncs | 88847 | 88843 | -4 | -0.0045% | | instcount.TotalInsts | 7411666 | 7412146 | 480 | 0.0065% |

llvm/lib/Analysis/LazyValueInfo.cpp | ||
---|---|---|

1856–1858 | Should we be getting both, and combining their knowledge somehow? |

Can you give some context on the problem you're trying to solve? This doesn't look quite right, but maybe with some context I can make a suggestion as to how to approach cleanly?

llvm/lib/Analysis/LazyValueInfo.cpp | ||
---|---|---|

1852 | Wait, no, please don't do this. Please don't bake in assumption about the semantics of the function based on the type of the argument. If this difference exists, we should fix/remove it. I've long thought we needed to have a getValueAtBegin(BB) variant. Would that solve your use case? |

@lebedev.ri Weirdly that does not match the results I get:

correlated-value-propagation.NumAShrs | 199 | 425 correlated-value-propagation.NumAddNSW | 1975 | 2056 correlated-value-propagation.NumAddNUW | 4156 | 4173 correlated-value-propagation.NumAddNW | 6131 | 6229 correlated-value-propagation.NumAnd | 194 | 204 correlated-value-propagation.NumCmps | 809 | 933 correlated-value-propagation.NumDeadCases | 110 | 114 correlated-value-propagation.NumMulNSW | 131 | 129 correlated-value-propagation.NumMulNUW | 832 | 830 correlated-value-propagation.NumMulNW | 963 | 959 correlated-value-propagation.NumNSW | 3781 | 3850 correlated-value-propagation.NumNUW | 7737 | 7745 correlated-value-propagation.NumNW | 11518 | 11595 correlated-value-propagation.NumOverflows | 7 | 4 correlated-value-propagation.NumPhiCommon | 393 | 393 correlated-value-propagation.NumPhis | 11375 | 11442 correlated-value-propagation.NumSDivs | 201 | 295 correlated-value-propagation.NumSExt | 3667 | 4602 correlated-value-propagation.NumSRems | 28 | 47 correlated-value-propagation.NumSelects | 25 | 25 correlated-value-propagation.NumShlNSW | 1082 | 1069 correlated-value-propagation.NumShlNUW | 2262 | 2251 correlated-value-propagation.NumShlNW | 3344 | 3320 correlated-value-propagation.NumSubNSW | 593 | 596 correlated-value-propagation.NumSubNUW | 487 | 491 correlated-value-propagation.NumSubNW | 1080 | 1087 correlated-value-propagation.NumUDivs | 189 | 211

I ran this together with a second patch to CVP which actually computes the icmp result for non-local values, which I was planning to submit as a followup. I don't think that part should impact the NSW deductions though.

llvm/lib/Analysis/LazyValueInfo.cpp | ||
---|---|---|

1856–1858 | getValueInBlock() is strictly more powerful, so that should not be necessary. |

I want to determine the result of icmps based on range information within the same basic block (see the two tests for examples). LVI has all the necessary machinery to do that, but right now that getPredicateAt() method which is used to evaluate icmps in JT and CVP does not actually make use of it. getPrediateAt() is (mostly) weaker than manually evaluating the predicate based on the getConstantRange() result, and CVP actually does that for some of the "non-negative" optimizations.

See also D44252, where you suggested (I think, if I understood correctly) to do what I'm trying to do here.

llvm/lib/Analysis/LazyValueInfo.cpp | ||
---|---|---|

1852 | I agree that this is pretty weird. I can take a look at fixing the behavior of pointers, which I believe are in the wrong here (the integer behavior is already heavily embedded in how CVP works). |

Rebase over changes to pointer handling.

There are some concerning changes to lvi-after-jumpthreading.ll which I probably missed before.

llvm/test/Analysis/LazyValueAnalysis/lvi-after-jumpthreading.ll | ||
---|---|---|

25 | There are two issues here. The first is that JumpThreading marks LVI as preserved, but only seems to preserve it in the sense that the result isn't incorrect -- but not necessarily the same as rerunning from scratch. The second is that we now end up computing block values in a different order, and this has an impact on the results. Here is the debug output of running JumpThreading on this function: Before this patch: LVI Getting value %iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ] at '' Result = overdefined LVI Getting edge value i32 0 from 'entry' to 'loop' Result = constantrange<0, 1> LVI Getting edge value %iv.next = add nsw i32 %iv, 1 from 'backedge' to 'loop' PUSH: %iv.next = add nsw i32 %iv, 1 in backedge PUSH: %iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ] in backedge PUSH: %iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ] in loop POP %iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ] in loop = constantrange<-2147483648, 400> POP %iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ] in backedge = constantrange<0, 400> POP %iv.next = add nsw i32 %iv, 1 in backedge = constantrange<1, 401> Result = constantrange<1, 400> LVI Getting value %iv.next = add nsw i32 %iv, 1 at '' Result = overdefined LVI Getting value %iv.next = add nsw i32 %iv, 1 at '' Result = overdefined After this patch: LVI Getting block end value %iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ] at 'loop' PUSH: %iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ] in loop PUSH: %iv.next = add nsw i32 %iv, 1 in backedge PUSH: %iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ] in backedge POP %iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ] in backedge = constantrange<0, -2147483648> POP %iv.next = add nsw i32 %iv, 1 in backedge = constantrange<1, -2147483648> POP %iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ] in loop = constantrange<0, 400> Result = constantrange<0, 400> LVI Getting block end value %iv.next = add nsw i32 %iv, 1 at 'backedge' Result = constantrange<1, -2147483648> LVI Getting block end value %iv.next = add nsw i32 %iv, 1 at 'backedge' Result = constantrange<1, -2147483648> Previously we ended up computing This seems pretty concerning regarding the general design of LVI, and I'm not really sure what I should do about this. |

Please can you specify the order of these LVI patches, which one should go after which one, where one would start reviewing?