This is an archive of the discontinued LLVM Phabricator instance.

Infer lowest bits of an integer Multiply when the low bits of the operands are known
ClosedPublic

Authored by PFerreira on Jun 8 2017, 4:50 AM.

Details

Summary

When the lowest bits of the operands to an integer multiply are known, the low bits of the result are deducible.
Code to deduce known-zero bottom bits already existed, but this change improves on that by deducing known-ones.

Diff Detail

Repository
rL LLVM

Event Timeline

PFerreira created this revision.Jun 8 2017, 4:50 AM
craig.topper added inline comments.Jun 9 2017, 12:28 PM
lib/Analysis/ValueTracking.cpp
382 ↗(On Diff #101893)

I think this line is equivalent to bottomKnown.getActiveBits().

Also please capitalize all variable names per coding standards.

384 ↗(On Diff #101893)

Shouldn't we be able to do something like this instead of a loop

BottomKnownOne = bottomKnown.getLoBits(trailKnown);
BottomKnownZero = (~bottomKnown).getLoBits(trailKnown);

Known.Zero |= BottomKnownZero;
Known.One |= BottomKnownOne;

385 ↗(On Diff #101893)

Use bottomKnown[bit]

PFerreira updated this revision to Diff 102158.Jun 12 2017, 1:28 AM

Updated the diff with the suggestions. I feel a bit silly for having made the first patch with that for-loop, when I used equivalent bit-ops further up the function.

Also changed an "add" to an "or" on the unit test to test known-bits of the "or" operation as well, there I already had an add there anyway.

PFerreira marked 3 inline comments as done.Jun 12 2017, 1:29 AM

New new patch addresses all of Craig's comments.

craig.topper added inline comments.Jun 19 2017, 10:57 AM
lib/Analysis/ValueTracking.cpp
356 ↗(On Diff #102158)

Is this And necessary? The Zero bits should be mutex with the One bits here.

PFerreira updated this revision to Diff 103176.Jun 20 2017, 2:07 AM

Update diff following Craig's comment.

PFerreira marked an inline comment as done.Jun 20 2017, 2:08 AM
PFerreira added inline comments.
lib/Analysis/ValueTracking.cpp
356 ↗(On Diff #102158)

I replaced with an assertion, but I can remove it if you think it's not necessary.

PFerreira marked an inline comment as done.Jun 28 2017, 6:31 AM

Prodding the review, in case the emails regarding this slipped through.

Ping - maybe before the 5.0 branch?

craig.topper accepted this revision.Jul 12 2017, 12:29 AM

Sorry I accidentally lost track of this.

LGTM

This revision is now accepted and ready to land.Jul 12 2017, 12:29 AM

Hm, I'm not sure what I'm supposed to do next - do I wait for someone with commit rights to put this in?

@sanjoy , can you please take a look at this? I recall this coming up as something we may not be able to do in the face of undef, etc.

efriedma requested changes to this revision.Jul 14 2017, 3:01 PM
efriedma added a subscriber: efriedma.

I recall this coming up as something we may not be able to do in the face of undef, etc.

I can't see anything special here compared to, for example, computeKnownBitsAddSub.

lib/Analysis/ValueTracking.cpp
383 ↗(On Diff #103176)

TrailKnown is wrong. http://rise4fun.com/Alive/s2b

Conservatively, you could just use "TrailKnown = TrailBitsKnown". Or maybe you could get a little more aggressive with some trickery involving trailing zeros; I haven't worked out the exact math.

This revision now requires changes to proceed.Jul 14 2017, 3:01 PM
sanjoy edited edge metadata.Jul 16 2017, 2:38 PM

@sanjoy , can you please take a look at this? I recall this coming up as something we may not be able to do in the face of undef, etc.

This seems fine to me given the current definition of undef.

PFerreira updated this revision to Diff 106838.Jul 17 2017, 1:10 AM
PFerreira edited edge metadata.

Use TrailBitsKnown instead.

Use TrailBitsKnown instead.

Please also add testcases to cover these cases (the case where we miscompiled, and maybe a case where we could improve this implementation).

Added the test you asked me about, slightly modified. I wanted to make sure that the ComputeKnownBits picked the correct bitwidth. The sample test you mentioned had the two inputs with the same bit width, so I just added two more. I can put it back to "3" if you prefer.

I tried to figure out how this could be improved, and considered your idea of leading zeros. I think I understood what you were thinking, but while trying to figure out the math I couldn't see how it would be possible. Having an extra leading known-zero on one of the operands over another operand would not help to figure out more digits, as far as I could see.

I tried to figure out how this could be improved, and considered your idea of leading zeros. I think I understood what you were thinking, but while trying to figure out the math I couldn't see how it would be possible. Having an extra leading known-zero on one of the operands over another operand would not help to figure out more digits, as far as I could see.

Trailing zeros, not leading zeros.

Suppose you have multiply operands "a" and "b". The bottom three bits of "a" are 110, and the bottom two bits of "b" are 11. "a" is divisible by 2, so "a * b == 2 * ((a / 2) * b)". The bottom two bits of "a / 2" are 11, and the bottom two bits of "b" are 11, so the bottom two bits of "(a / 2) * b" are "01". Therefore the bottom three bits of "2 * ((a / 2) * b)" are "010".

PFerreira updated this revision to Diff 107267.Jul 19 2017, 2:27 AM

Thanks for the suggestion. This was fun!

I've updated the diff with your improvement suggestion, and changed the expected results of the unit tests to match.
I took the opportunity to refactor the trail-zero computation since this new method can compute the trailing zeros of the result.

efriedma added inline comments.Jul 28 2017, 5:28 PM
lib/Analysis/ValueTracking.cpp
377 ↗(On Diff #107267)

Reusing the name ResultBitsKnown is confusing.

380 ↗(On Diff #107267)

These little comments aren't that useful for understanding the logic; needs one big paragraph explaining the logic for computing ResultBitsKnown.

PFerreira updated this revision to Diff 108912.Jul 31 2017, 6:35 AM

Added an explanation of what's being done.
Because I was the one writing it, it makes sense to me. Is it clear enough for others?

Just prodding for an update, in case the emails fell through.

Sorry, got buried in my inbox. Won't really have time to look again until next week.

Ping. I understand people might be busy with 5.0

I like to see a more formal proof for this sort of thing... but I don't have enough time to mess with alive. Got as far as http://rise4fun.com/Alive/a7O .

lib/Analysis/ValueTracking.cpp
365 ↗(On Diff #108912)

Is this supposed "b/n", rather than "b/s"?

I have been fiddling with Alive, but it has been crashing on me (even on simple proofs). I guessed this was a temporary issue, but after 2 weeks it is still not working with "Oops, it seems that this tool encountered an issue."
The last version of the expression I got was

Name: mul_to_const
Pre: C1 >= 0 && C1 < 32 && C2 >= 0 && C2 < 64 && CLZ1 == countTrailingZeros(C1) && CLZ2 == countTrailingZeros(C2) && C3==((1 << (6+CLZ1+CLZ2))-1)
%aa = shl i32 %a, 5
%bb = shl i32 %b, 6
%aaa = or i32 %aa, C1
%bbb = or i32 %bb, C2
%aaaa = lshr i32 %aaa, CLZ1
%bbbb = lshr i32 %bbb, CLZ2
%mul = mul i32 %aaaa, %bbbb
%adjust = add i32 C1, C2
%result = shl %mul, %adjust
%mask = and i32 %result, C3

=>

%mask = i32 ((C1*C2)&C3)

Alive is up again.

I tried a bit more; got an alive proof working. Does this look right? (The precondition for C7 is kind of complicated, but I think it matches the computation in this patch.)

https://rise4fun.com/Alive/zCv

Name: mul_to_const
Pre: (C1 >= 0) && (C1 < (1 << C5)) && (C2 >= 0) && (C2 < (1 << C6)) && (C7 == (1 << (umin(countTrailingZeros(C1), C5) + umin(countTrailingZeros(C2), C6) + umin(C5 - umin(countTrailingZeros(C1), C5), C6 - umin(countTrailingZeros(C2), C6)))) - 1)

%aa = shl i8 %a, C5
%bb = shl i8 %b, C6
%aaa = or i8 %aa, C1
%bbb = or i8 %bb, C2
%mul = mul i8 %aaa, %bbb
%mask = and i8 %mul, C7

  =>

%mask = i8 ((C1*C2)&C7)
PFerreira updated this revision to Diff 117654.Oct 4 2017, 3:45 AM

I've added Eli's proof to it, not sure exactly how to present it.

Took me a bit to digest it, but I agree that it matches what I'm trying to do in this patch. I can try to describe the definition of C7 in plain text (correlate it to the variables declared in the patch) if you prefer.

efriedma accepted this revision.Nov 6 2017, 3:03 PM

LGTM.

Sorry about the delay. I ran some tests, and it looks like this doesn't cause any regressions. Granted, it didn't lead to any performance improvement either (I guess we don't really use this information in many cases). But having the information around could be helpful in the future.

Do you have commit access?

This revision is now accepted and ready to land.Nov 6 2017, 3:03 PM

I do not have commit access.

The changes behind this patch were developed internally (in my Company) and provide improvements to specific Modules. In particular, it allows us to infer the bottom bits of some address calculations (where we can use faster memory operations when the pointer has some specific alignment). I can't really go into details unfortunately. I do hope that at some point in the future, someone else will benefit from this more openly :)

Ping. Nearly there!

sdardis added a subscriber: sdardis.Dec 7 2017, 7:57 AM

I can commit this on behalf of my former colleague if there are no objections. I'll wait a day or so.

This revision was automatically updated to reflect the committed changes.