This is an archive of the discontinued LLVM Phabricator instance.

[Transform] Improve saddo with mixed signs
AbandonedPublic

Authored by dlrobertson on Mar 6 2019, 7:09 PM.

Details

Reviewers
nikic
spatel
Summary

Improve folding of the sadd.with.overflow intrinsic with add nsw
when given constants of mixed signs.

Diff Detail

Event Timeline

dlrobertson created this revision.Mar 6 2019, 7:09 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2019, 7:09 PM
nikic added inline comments.Mar 7 2019, 12:17 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
2060

The use of abs here makes me uncomfortable... abs(INT_MIN) is INT_MIN again. So if we take saddo(X +nsw INT_MIN, 1), then the check here is abs(INT_MIN) s< abs(1) which is INT_MIN s< 1, which is true. While this will ultimately give a conservatively correct result, you'd prefer this case to take the other branch.

nikic added a comment.Mar 7 2019, 4:27 AM

Generally I feel that the optimization we want to go for here is a more general one. We already detect always/never overflow conditions based on known bits. This is a variation that detects never overflow based on the constant range of the input, for one specific case. We have generic code in InstructionSimplify to determine the ConstantRange of binary operators and intrinsics (setLimitsForBinOp and setLimitsForIntrinsic) for cases where this can be done cheaply (inspecting only one instruction).

I'm thinking that we should generalize that code to take any Instruction, move it as an exported API into ValueTracking and then do a generic ConstantRange based always/never overflow check here. Does that sound reasonable @spatel?

spatel added a comment.Mar 7 2019, 1:33 PM

Generally I feel that the optimization we want to go for here is a more general one. We already detect always/never overflow conditions based on known bits. This is a variation that detects never overflow based on the constant range of the input, for one specific case. We have generic code in InstructionSimplify to determine the ConstantRange of binary operators and intrinsics (setLimitsForBinOp and setLimitsForIntrinsic) for cases where this can be done cheaply (inspecting only one instruction).

I'm thinking that we should generalize that code to take any Instruction, move it as an exported API into ValueTracking and then do a generic ConstantRange based always/never overflow check here. Does that sound reasonable @spatel?

Yes, if we can use the same/similar code for more general transforms, it makes sense to move it to ValueTracking.

I'm thinking that we should generalize that code to take any Instruction, move it as an exported API into ValueTracking and then do a generic ConstantRange based always/never overflow check here.

So do you mean update the computeKnownBitsFromOperator for sadd_with_overflow to also take into account the overflow flag?

nikic added a comment.Mar 9 2019, 1:41 PM

Moved constant range calculation into ValueTracking with rL355781.

So do you mean update the computeKnownBitsFromOperator for sadd_with_overflow to also take into account the overflow flag?

Based on the previous commit, you can call computeConstantRange() on the LHS, and then check whether adding the RHS constant to that range can ever overflow. Assuming a range of Lo <= X <= Hi and constant C: If C >= 0 then an overflow cannot occur if Hi <= SignedMax - C and always occurs if Lo > SignedMax -C. If C < 0 then an overflow cannot occur if Lo >= SignedMin - C and always occurs if Hi < SignedMin - C.

To go one step further, rather than limiting this optimization to just the with.overflow intrinsics, this could be done in computeOverflowForSignedAdd (and friends), as this would also benefit saturated math intrinsics, as well as nsw/nuw inference on normal adds. (It might make sense to start with computeOverflowForUnsignedAdd, as the logic for unsigned is simpler.)

nikic added a comment.Mar 10 2019, 3:21 PM

I've created D59193 to implement just the raw overflow checking logic based on ConstantRanges (without using it in ValueTracking).

nikic added a comment.Mar 16 2019, 5:28 AM

D59386 implements the full optimization for the unsigned case, and D59450 is preparation for the signed add case, though there's a few more followups necessary there (for signed sub, for extension to use computeConstantRange, and for handling AlwaysOverflows).

dlrobertson abandoned this revision.Mar 16 2019, 9:06 PM

D59386 implements the full optimization for the unsigned case, and D59450 is preparation for the signed add case, though there's a few more followups necessary there (for signed sub, for extension to use computeConstantRange, and for handling AlwaysOverflows).

Awesome! I'll move on to uaddo