This is an archive of the discontinued LLVM Phabricator instance.

[Constants, InstCombine] allow RHS (operand 1) identity constants for binops
AbandonedPublic

Authored by spatel on Jul 3 2018, 12:41 PM.

Details

Summary

This is a follow-up to D48830 to allow more shuffle folding using identity constants. Without this patch, we're only returning identity constants for commutative binops, so we miss opportunities to fold shifts and other binops.

LLVM binary opcode review: there are a total of 18 binops. There are 7 commutative binops (add, mul, and, or, xor, fadd, fmul) which we already fold. We're able to fold 4 more opcodes with this patch (shl, lshr, ashr, fdiv). Loosening the div/rem constraint in the shuffle fold will allow 2 more opcodes (udiv/sdiv). There are no folds for srem/urem/frem AFAIK. We don't bother with sub/fsub with constant operand 1 because those are canonicalized to add/fadd. 7 + 4 + 2 + 3 + 2 = 18.

Note that the more flexible definition for the identity constant shown here almost matches the logic of getDefaultConstantForOpcode() in D28907, so that patch could be reduced assuming this is ok and once we figure out how to handle -0.0 with fadd.

Diff Detail

Event Timeline

spatel created this revision.Jul 3 2018, 12:41 PM

I don't think this is wrong, but does this propagation of undef not make the new versions more poisonous?

I don't think this is wrong, but does this propagation of undef not make the new versions more poisonous?

That's a good question. My understanding is that no, there is no more poison here than before because in the original code the shuffle's undef mask element means that lane of the vector is undef, so nothing downstream that might be using that lane can be valid. After this transform, we're performing some binop with an undef operand, so again the result of that lane is undef, and nothing downstream can be any different.

This is ok for everything except integer div/rem because they're the only ops with immediate UB. cc'ing @nlopes @sanjoy @aqjune to see if that reasoning is bogus.

I think it depends on what shl i32 %x, undef is - undef can be concretized into an integer larger than 31, and according to https://llvm.org/docs/LangRef.html#id134 , it is poison, so conversion like the one in shuffle_select.ll may be invalid.
Defining shl with large second operand as poison explains another kind of optimizations, IMHO: For example, InstCombine optimizes (C2 << X) << C1 into (C2 << C1) << X. Defining shl with large shift as undef does not explain this optimimzation - If C1 = 4, C2 = 1 and X = 33, source is (1 << 33) << 4 = undef << 4 = (undef with low 4 bits zero), but after optimization it is (1 << 4) << 33 = undef. If large shift yields poison, this is explained, because poison << 4 = poison.

nlopes added a comment.Jul 4 2018, 1:51 AM

shl %x, undef is poison; we don't want more undefs :)
So this transformation for shifts is not correct. The way to make it correct would be to introduce a poison value and use it in the shuffle instructions instead of undef. I suggest we finally go ahead and do that.

spatel added a comment.Jul 4 2018, 5:16 AM

shl %x, undef is poison; we don't want more undefs :)
So this transformation for shifts is not correct. The way to make it correct would be to introduce a poison value and use it in the shuffle instructions instead of undef. I suggest we finally go ahead and do that.

  1. Does the 'undef' in the shuffle mask represent something different than the 'undef' in the shift amount?
  2. Are we distinguishing shifts from the other binary opcodes here? Or is this existing transform also incorrect?
%b = mul nsw nuw <4 x i32> %v, <i32 11, i32 12, i32 13, i32 14>
%s = shufflevector <4 x i32> %v, <4 x i32> %b, <4 x i32> <i32 undef, i32 5, i32 2, i32 7>
-->
%s = mul nuw nsw <4 x i32> [[V:%.*]], <i32 undef, i32 12, i32 1, i32 14>
nlopes added a comment.Jul 4 2018, 3:36 PM

shl %x, undef is poison; we don't want more undefs :)
So this transformation for shifts is not correct. The way to make it correct would be to introduce a poison value and use it in the shuffle instructions instead of undef. I suggest we finally go ahead and do that.

  1. Does the 'undef' in the shuffle mask represent something different than the 'undef' in the shift amount?

No: undef means any value in both cases.
The deal is that x << y is poison if y >= bitwidth. Since undef can be such a number, x << undef is poison.
Why is it poison? Because different CPU architectures (e.g. x86 and ARM) deal with this case differently, so we don't really want to define it. C/C++ allows to do that.

Shuffle is never poison if its inputs are not poison.

  1. Are we distinguishing shifts from the other binary opcodes here? Or is this existing transform also incorrect?
%b = mul nsw nuw <4 x i32> %v, <i32 11, i32 12, i32 13, i32 14>
%s = shufflevector <4 x i32> %v, <4 x i32> %b, <4 x i32> <i32 undef, i32 5, i32 2, i32 7>
-->
%s = mul nuw nsw <4 x i32> [[V:%.*]], <i32 undef, i32 12, i32 1, i32 14>

This transformation is also incorrect, unfortunately. If you drop nsw & nuw, it's correct.

spatel added a comment.Jul 5 2018, 9:25 AM

shl %x, undef is poison; we don't want more undefs :)
So this transformation for shifts is not correct. The way to make it correct would be to introduce a poison value and use it in the shuffle instructions instead of undef. I suggest we finally go ahead and do that.

  1. Does the 'undef' in the shuffle mask represent something different than the 'undef' in the shift amount?

No: undef means any value in both cases.
The deal is that x << y is poison if y >= bitwidth. Since undef can be such a number, x << undef is poison.
Why is it poison? Because different CPU architectures (e.g. x86 and ARM) deal with this case differently, so we don't really want to define it. C/C++ allows to do that.

Shuffle is never poison if its inputs are not poison.

  1. Are we distinguishing shifts from the other binary opcodes here? Or is this existing transform also incorrect?
%b = mul nsw nuw <4 x i32> %v, <i32 11, i32 12, i32 13, i32 14>
%s = shufflevector <4 x i32> %v, <4 x i32> %b, <4 x i32> <i32 undef, i32 5, i32 2, i32 7>
-->
%s = mul nuw nsw <4 x i32> [[V:%.*]], <i32 undef, i32 12, i32 1, i32 14>

This transformation is also incorrect, unfortunately. If you drop nsw & nuw, it's correct.

Ok, I think I get it now. Even though the shuffle mask's undef guarantees that any uses are also undef in those lanes, we're creating poison in the binops that didn't exist before. Therefore, other passes may use that to create unsound transforms. I don't think we have vector folds that are that sophisticated yet, but we can't open that door. It's similar to the problem we have with div/rem, but not as blatant. So the solution is to generalize the hack that I added for div/rem (getSafeVectorConstantForIntDivRem), so we can allow these transforms for any poison-producers. And the right way to replace undef with safe constants is almost what we have here: translate to the identity constant for a given binop when possible. Other transforms can turn that back into undef when they determine that it's safe.

spatel abandoned this revision.Jul 6 2018, 3:40 PM

I committed the enhancement to getBinOpIdentity() as an NFC preliminary because we can use that in D49047. So this patch will just be a one-liner to use the new call once we have all of the safety checks sorted out.