This is an archive of the discontinued LLVM Phabricator instance.

Handle non-constant shifts in computeKnownBits, and use computeKnownBits for constant folding in InstCombine/Simplify
ClosedPublic

Authored by hfinkel on Sep 8 2015, 2:12 PM.

Details

Summary

First, the motivation: LLVM currently does not realize that:

((2072 >> (L == 0)) >> 7) & 1 == 0

where L is some arbitrary value. Whether you right-shift 2072 by 7 or by 8, the lowest-order bit is always zero. There are obviously several ways to go about fixing this, but the generic solution pursued in this patch is to teach computeKnownBits something about shifts by a non-constant amount. Currently, we give up completely on these. Instead, in cases where we know something about the low-order bits of the shift-amount operand, we can combine (and together) the associated restrictions for all shift amounts consistent with that knowledge. As a further generalization, I refactored all of the logic for all three kinds of shifts to have this capability. This works well in the above case, for example, because the dynamic shift amount can only be 0 or 1, and thus we can say a lot about the known bits of the result.

This brings us to the second part of this patch: Even when we know all of the bits of a value via computeKnownBits, nothing currently constant-folds the result. The patch introduces the necessary code into InstCombine and InstSimplify. I've added it into both because:

  1. InstCombine won't automatically pick up the associated logic in InstSimplify (InstCombine uses InstSimplify, but not via the API that passes in the original instruction).
  2. Putting the logic in InstCombine allows the resulting simplifications to become part of the iterative worklist
  3. Putting the logic in InstSimplify allows the resulting simplifications to be used by everywhere else that calls SimplifyInstruction (inlining, unrolling, and many others).

And this requires a small change to our definition of an ephemeral value so that we don't break the rest case from r246696 (where the icmp feeding the @llvm.assume, is also feeding a br). Under the old definition, the icmp would not be considered ephemeral (because it is used by the br), but this causes the assume to remove itself (in addition to simplifying the branch structure), and it seems more-useful to prevent that from happening.

Please review, and thanks again!

Diff Detail

Repository
rL LLVM

Event Timeline

hfinkel updated this revision to Diff 34250.Sep 8 2015, 2:12 PM
hfinkel retitled this revision from to Handle non-constant shifts in computeKnownBits, and use computeKnownBits for constant folding in InstCombine/Simplify.
hfinkel updated this object.
hfinkel added reviewers: sanjoy, majnemer, reames.
hfinkel set the repository for this revision to rL LLVM.
hfinkel added a subscriber: llvm-commits.
hfinkel updated this revision to Diff 34270.Sep 8 2015, 3:06 PM

Include the update for a regression test I missed in the last patch (load-combine-metadata.ll). Once we simplify based on known bits, if we have range metadata that restricts to a single value, then we'll use it to constant fold the result. This is not what the test was supposed to be testing, so I've just changed the ranges on the metadata in the test.

Roman Divacky asked on IRC about statistics from self-hosting. Here are some:

During a self-host (PPC64/Linux - LLVM + Clang + compiler-rt), computeKnownBits returns non-trivial results for shifts with dynamic shift amounts:

shl - About 350k times
lshr - About 9k times
ashr - About 125 times

And the logic that constant-folds values when we know all of the bits, fires:

InstSimplify - About 3.5k times
InstCombine - About 2k times
sanjoy edited edge metadata.Sep 9 2015, 8:33 PM

Some comments inline:

lib/Analysis/ValueTracking.cpp
965

Should we cap BitWidth to be some reasonable value (like 128) so that the loop down below does not run for too long in the worst case?

987

Why not start KnownZero and KnownOne as allOnesValue and then unconditionally

KnownZero &= KZF(KnownZero2, ShiftAmt);
KnownOne  &= KOF(KnownOne2, ShiftAmt);

in the loop?

If no shift amount is consistent with ShiftAmtKZ and ShiftAmtKO then KnownZero and KnownOne will remain allOnesValue; but that's fine since we proved that the shift's result is undef, and it's allowed to be 0 and -1 at the same time.

1180

Why not just return APIntOps::ashr(KnownZero, ShiftAmt)?

1188

Why not just return APIntOps::ashr(KnownOne, ShiftAmt)?

jmolloy added inline comments.
lib/Analysis/ValueTracking.cpp
983

It seems we could do slightly better here; If the shifter operand is known to be nonzero, we know that we're shifting by at least 1:

if (isKnownNonZero(I->getOperand(0), DL, Depth + 1, Q)) {
  KnownZero = KZF(KnownZero, 1);
  KnownOne = KOF(KnownOne, 1);
}

I have a similar patch that I was just about to send upstream, but obviously it conflicts with yours and yours handles many more cases.

Would you be able to put this test in too or should I wait until you've committed this and add it myself?

regehr added a subscriber: regehr.Sep 23 2015, 2:30 AM

Hi Hal, I just discovered this code. I've been working on a related patch based on the imprecisions that Souper found, that I posted about on llvm-dev.

My strategy was simply to find the imprecisions with the highest profile count and fix those, plus some random other cases that were easy.

If possible I'd like to pick the best parts of my patch and get them integrated with yours, if that is somehow possible. I'm totally new to Phabricator so for now I'll just link to my patch. Any comments about how to best proceed would be appreciated.

http://www.cs.utah.edu/~regehr/knownbits1.txt

I should add that my patch has no measurable effect on the time taken to build an LLVM/clang/compiler-rt. Also a Debug build of the patched compiler passes all tests.

Also, the clang that gets compiled by a patched compiler is about 4KB smaller than one built with a clang lacking my patch, so it is certainly enabling a few optimizations to fire.

Hi Duncan,

Of course I agree in principle -- but keep in mind that Souper's
computation of known bits is effectively optimal, and that my patch
eliminated the vast majority of observed imprecisions.

Also I have implemented precise abstract transfer functions for things
like this in the past, and always found them hard to get right! No fun
at all.

But really, I was hoping to simply take Hal's patch and add in the parts
of mine that fail to overlap with his! For example his patch does not
address bswap or ctpop.

John

hfinkel updated this revision to Diff 35792.Sep 25 2015, 9:21 PM
hfinkel edited edge metadata.
hfinkel removed rL LLVM as the repository for this revision.

Updated to reflect review suggestions.

lib/Analysis/ValueTracking.cpp
965

I don't think so. Our canonicalization sometimes creates large integer types, and there's nothing really expensive in the loop (and it is linear with the bit width). If this shows up on a profile somewhere because of large integer sizes, I'll certainly change my mind.

983

Either way is fine, but in the name of keeping changes small, it is probably better if you just add it yourself).

987

We actually can't return non-disjoint bit sets, even for undef, without breaking other code (there are at least two asserts in ValueTracking.cpp that check this explicitly, and I'm afraid of how much else).

That aside, I agree. I'll update the loop to use that form with a check at the end.

1180

Good idea (the logic was like this before, but using ashr certainly seems better).

1188

Good idea.

reames edited edge metadata.Sep 27 2015, 8:52 AM

Drive by review.

lib/Analysis/InstructionSimplify.cpp
4032

As a future enhancement, extending this to vectors and floating point types would make sense. A future patch is fine.

lib/Analysis/ValueTracking.cpp
959

Reading the code, it's really not clear what KnownX vs KnownX2 mean. Are there better names which could be used here? Same with KZF and KOF. At minimum, there's some documentation needed here.

1114

It's not immediately clear to me that the new code implements all of the cases the old code did. It would have been much easier to review if you'd first refactored out the helper function with the existing code, then added the new functionality. Not asking you to do that now, but doing that way would have made review easier.

lib/Transforms/InstCombine/InstructionCombining.cpp
2728

Doesn't instcombine call SimplifyInstruction internally? If so, why do we need to duplicate this block of code here?

hfinkel added inline comments.Sep 28 2015, 9:05 AM
lib/Analysis/InstructionSimplify.cpp
4032

Agreed.

lib/Analysis/ValueTracking.cpp
959

I'll improve this.

1114

Noted and agreed.

lib/Transforms/InstCombine/InstructionCombining.cpp
2728

It does, in a sense, but it does not call SimplifyInstruction directly. It calls the various helper functions that only get the opcode and operands, and so there's no instruction on which to call computeKnownBits.

I should add that my patch has no measurable effect on the time taken to build an LLVM/clang/compiler-rt. Also a Debug build of the patched compiler passes all tests.

Also, the clang that gets compiled by a patched compiler is about 4KB smaller than one built with a clang lacking my patch, so it is certainly enabling a few optimizations to fire.

This sounds great. There is a lot of follow-up work to be done here. I'm somewhat fearful of adding more functionality into this patch, it is already pretty far-reaching in terms of potential complications. Once I get this settled, I think it would be best to post follow-up patches for review.

Hal, sounds great, I'll follow up. Philip Reames has offered to help which is great since I'm not too familiar with the process or tools here!

hfinkel updated this revision to Diff 37199.Oct 12 2015, 5:02 PM
hfinkel edited edge metadata.

Rebased (and add a better comment explaining the function parameters of computeKnownBitsFromShiftOperator).

reames accepted this revision.Oct 14 2015, 10:48 AM
reames edited edge metadata.

LGTM w/minor comment addressed

Are you planning on implementing the follow on suggestions? If not, we should make sure they get tracked either as TODOs or bugs.

lib/Analysis/ValueTracking.cpp
983

It would be more clearly correct to use the two temporaries for this calculation. The current code is correct, but slightly confusing.

This revision is now accepted and ready to land.Oct 14 2015, 10:49 AM
majnemer accepted this revision.Oct 23 2015, 9:53 AM
majnemer edited edge metadata.

LGTM

lib/Analysis/ValueTracking.cpp
974

Could you make this auto *SA ?

1008–1014

InstSimplify already has logic to handle shift amounts >= bitwidth. Should we care whether or not computeKnownBits gives the same result?

hfinkel closed this revision.Oct 23 2015, 1:44 PM

I apologize for the delay, and thanks for the reviews! r251146.

lib/Analysis/ValueTracking.cpp
983

I added an additional comment about this before I committed.

1008–1014

I don't think that it can give the same result, because I can't return 'undef' here. We might be able to do that by returning conflicting KnownZero/KnownOne masks, but we'd need to fix some downstream code first.