This is an archive of the discontinued LLVM Phabricator instance.

[ValueTracking] computeKnownBits - attempt to use a branch condition feeding a phi to improve known bits range (PR38280)
ClosedPublic

Authored by RKSimon on Aug 13 2022, 6:57 AM.

Details

Summary

If computeKnownBits encounters a phi node, and we fail to determine any known bits through direct analysis, see if the incoming value is part of a branch condition feeding the phi.

Handle cases where icmp(IncomingValue PRED Constant) is driving a branch instruction feeding that phi node - at the moment this only handles EQ/ULT/ULE predicate cases as they are the most straightforward to handle and most likely for branch-loop 'max upper bound' cases - we can extend this if/when necessary.

I investigated a more general icmp(LHS PRED RHS) KnownBits system, but the hard limits we put on value tracking depth through phi nodes meant that we were mainly catching constants anyhow.

Fixes the pointless vectorization in PR38280 / Issue #37628 (excessive unrolling still needs handling though)

Diff Detail

Event Timeline

RKSimon created this revision.Aug 13 2022, 6:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 13 2022, 6:57 AM
RKSimon requested review of this revision.Aug 13 2022, 6:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 13 2022, 6:57 AM
spatel added inline comments.Aug 15 2022, 7:29 AM
llvm/lib/Analysis/ValueTracking.cpp
1601

Make this a switch with a TODO? Presumably, there's some inverted pred / Known.One sibling to these. We might also get more knowledge of signbits with signed preds.

llvm/test/Transforms/InstCombine/known-phi-br.ll
35

This isn't testing the 'ne' pred (phi in the false block) that we wanted. Add another use to thwart that canonicalization? Or knownbits unit test for direct coverage?

57

This isn't testing the 'ule' pred that we wanted. Add another use to thwart that canonicalization? Or knownbits unit test for direct coverage?

83

This isn't testing the 'uge' pred that we wanted. Add another use to thwart that canonicalization? Or knownbits unit test for direct coverage?

125

Add a negative test just like this but with "x.mask & 3" to show that we didn't go overboard.

RKSimon retitled this revision from [ValueTracking] computeKnownBits - attempt to use a branch condition feeding a phi to improve known bits range to [ValueTracking] computeKnownBits - attempt to use a branch condition feeding a phi to improve known bits range (PR38280).Aug 15 2022, 9:34 AM
RKSimon edited the summary of this revision. (Show Details)
RKSimon updated this revision to Diff 452723.Aug 15 2022, 10:00 AM

address comments

RKSimon updated this revision to Diff 453005.Aug 16 2022, 7:45 AM

rebase - new change in the recently added remove-loop-phi-fastmul.ll

spatel accepted this revision.Aug 16 2022, 8:14 AM

LGTM - see inline comment to restore one test bit for 'ne'.
The 'ule' case looks fine via inspection, so I'm not concerned if we don't have a direct test for it here.
The new test changes to the loop latch code are potential trouble if it harms indvars or LSR, so you might want to pipe that through to codegen to see if there's any obvious regression before pushing.

llvm/test/Transforms/InstCombine/known-phi-br.ll
32

The extra use on just this one test actually did accomplish the goal - it kept the predicate as 'ne' because the invert of the branch is predicated on one-use.

This revision is now accepted and ready to land.Aug 16 2022, 8:14 AM

The new test changes to the loop latch code are potential trouble if it harms indvars or LSR, so you might want to pipe that through to codegen to see if there's any obvious regression before pushing.

Thanks Sanjay - a quick review of the codegen changes from remove-loop-phi-fastmul.ll suggested it was mainly neutral (x86/riscv64/powerpc64), although arm/aarch64/powerpc32 could struggle on some to schedule to a code order that didn't require an extra reg move. riscv32 was the only case that saw an improvement (both with i32 and i64 indvars).