When the lowest bits of the operands to an integer multiply are known, the low bits of the result are deducible.
Code to deduce known-zero bottom bits already existed, but this change improves on that by deducing known-ones.
Details
Diff Detail
Event Timeline
lib/Analysis/ValueTracking.cpp | ||
---|---|---|
395 | I think this line is equivalent to bottomKnown.getActiveBits(). Also please capitalize all variable names per coding standards. | |
397 | Shouldn't we be able to do something like this instead of a loop BottomKnownOne = bottomKnown.getLoBits(trailKnown); Known.Zero |= BottomKnownZero; | |
398 | Use bottomKnown[bit] |
Updated the diff with the suggestions. I feel a bit silly for having made the first patch with that for-loop, when I used equivalent bit-ops further up the function.
Also changed an "add" to an "or" on the unit test to test known-bits of the "or" operation as well, there I already had an add there anyway.
lib/Analysis/ValueTracking.cpp | ||
---|---|---|
356 | Is this And necessary? The Zero bits should be mutex with the One bits here. |
lib/Analysis/ValueTracking.cpp | ||
---|---|---|
356 | I replaced with an assertion, but I can remove it if you think it's not necessary. |
Hm, I'm not sure what I'm supposed to do next - do I wait for someone with commit rights to put this in?
@sanjoy , can you please take a look at this? I recall this coming up as something we may not be able to do in the face of undef, etc.
I recall this coming up as something we may not be able to do in the face of undef, etc.
I can't see anything special here compared to, for example, computeKnownBitsAddSub.
lib/Analysis/ValueTracking.cpp | ||
---|---|---|
395 | TrailKnown is wrong. http://rise4fun.com/Alive/s2b Conservatively, you could just use "TrailKnown = TrailBitsKnown". Or maybe you could get a little more aggressive with some trickery involving trailing zeros; I haven't worked out the exact math. |
Use TrailBitsKnown instead.
Please also add testcases to cover these cases (the case where we miscompiled, and maybe a case where we could improve this implementation).
Added the test you asked me about, slightly modified. I wanted to make sure that the ComputeKnownBits picked the correct bitwidth. The sample test you mentioned had the two inputs with the same bit width, so I just added two more. I can put it back to "3" if you prefer.
I tried to figure out how this could be improved, and considered your idea of leading zeros. I think I understood what you were thinking, but while trying to figure out the math I couldn't see how it would be possible. Having an extra leading known-zero on one of the operands over another operand would not help to figure out more digits, as far as I could see.
I tried to figure out how this could be improved, and considered your idea of leading zeros. I think I understood what you were thinking, but while trying to figure out the math I couldn't see how it would be possible. Having an extra leading known-zero on one of the operands over another operand would not help to figure out more digits, as far as I could see.
Trailing zeros, not leading zeros.
Suppose you have multiply operands "a" and "b". The bottom three bits of "a" are 110, and the bottom two bits of "b" are 11. "a" is divisible by 2, so "a * b == 2 * ((a / 2) * b)". The bottom two bits of "a / 2" are 11, and the bottom two bits of "b" are 11, so the bottom two bits of "(a / 2) * b" are "01". Therefore the bottom three bits of "2 * ((a / 2) * b)" are "010".
Thanks for the suggestion. This was fun!
I've updated the diff with your improvement suggestion, and changed the expected results of the unit tests to match.
I took the opportunity to refactor the trail-zero computation since this new method can compute the trailing zeros of the result.
Added an explanation of what's being done.
Because I was the one writing it, it makes sense to me. Is it clear enough for others?
I like to see a more formal proof for this sort of thing... but I don't have enough time to mess with alive. Got as far as http://rise4fun.com/Alive/a7O .
lib/Analysis/ValueTracking.cpp | ||
---|---|---|
371 | Is this supposed "b/n", rather than "b/s"? |
I have been fiddling with Alive, but it has been crashing on me (even on simple proofs). I guessed this was a temporary issue, but after 2 weeks it is still not working with "Oops, it seems that this tool encountered an issue."
The last version of the expression I got was
Name: mul_to_const
Pre: C1 >= 0 && C1 < 32 && C2 >= 0 && C2 < 64 && CLZ1 == countTrailingZeros(C1) && CLZ2 == countTrailingZeros(C2) && C3==((1 << (6+CLZ1+CLZ2))-1)
%aa = shl i32 %a, 5
%bb = shl i32 %b, 6
%aaa = or i32 %aa, C1
%bbb = or i32 %bb, C2
%aaaa = lshr i32 %aaa, CLZ1
%bbbb = lshr i32 %bbb, CLZ2
%mul = mul i32 %aaaa, %bbbb
%adjust = add i32 C1, C2
%result = shl %mul, %adjust
%mask = and i32 %result, C3
=>
%mask = i32 ((C1*C2)&C3)
Alive is up again.
I tried a bit more; got an alive proof working. Does this look right? (The precondition for C7 is kind of complicated, but I think it matches the computation in this patch.)
https://rise4fun.com/Alive/zCv
Name: mul_to_const Pre: (C1 >= 0) && (C1 < (1 << C5)) && (C2 >= 0) && (C2 < (1 << C6)) && (C7 == (1 << (umin(countTrailingZeros(C1), C5) + umin(countTrailingZeros(C2), C6) + umin(C5 - umin(countTrailingZeros(C1), C5), C6 - umin(countTrailingZeros(C2), C6)))) - 1) %aa = shl i8 %a, C5 %bb = shl i8 %b, C6 %aaa = or i8 %aa, C1 %bbb = or i8 %bb, C2 %mul = mul i8 %aaa, %bbb %mask = and i8 %mul, C7 => %mask = i8 ((C1*C2)&C7)
I've added Eli's proof to it, not sure exactly how to present it.
Took me a bit to digest it, but I agree that it matches what I'm trying to do in this patch. I can try to describe the definition of C7 in plain text (correlate it to the variables declared in the patch) if you prefer.
LGTM.
Sorry about the delay. I ran some tests, and it looks like this doesn't cause any regressions. Granted, it didn't lead to any performance improvement either (I guess we don't really use this information in many cases). But having the information around could be helpful in the future.
Do you have commit access?
I do not have commit access.
The changes behind this patch were developed internally (in my Company) and provide improvements to specific Modules. In particular, it allows us to infer the bottom bits of some address calculations (where we can use faster memory operations when the pointer has some specific alignment). I can't really go into details unfortunately. I do hope that at some point in the future, someone else will benefit from this more openly :)
I can commit this on behalf of my former colleague if there are no objections. I'll wait a day or so.
Is this And necessary? The Zero bits should be mutex with the One bits here.