Page MenuHomePhabricator

[InstCombine] Simplify a umul overflow check to a != 0 && b != 0.
ClosedPublic

Authored by fhahn on Feb 6 2020, 9:55 AM.

Details

Summary

This patch adds a simplification if an OR weakens the overflow condition
for umul.with.overflow by treating any non-zero result as overflow. In that
case, we overflow if both umul.with.overflow operands are != 0, as in that
case the result can only be 0, iff the multiplication overflows.

Code like this is generated by code using __builtin_mul_overflow with
negative integer constants, e.g.

bool test(unsigned long long v, unsigned long long *res) {
  return __builtin_mul_overflow(v, -4775807LL, res);
}

This simplification is very specific and I am not sure if visitOr is the
best place for it. Any other suggestions?

----------------------------------------
Name: D74141
  %res = umul_overflow {i8, i1} %a, %b
  %mul = extractvalue {i8, i1} %res, 0
  %overflow = extractvalue {i8, i1} %res, 1
  %cmp = icmp ne %mul, 0
  %ret = or i1 %overflow, %cmp
  ret i1 %ret
=>
  %t0 = icmp ne i8 %a, 0
  %t1 = icmp ne i8 %b, 0
  %ret = and i1 %t0, %t1
  ret i1 %ret
  %res = umul_overflow {i8, i1} %a, %b
  %mul = extractvalue {i8, i1} %res, 0
  %cmp = icmp ne %mul, 0
  %overflow = extractvalue {i8, i1} %res, 1

Done: 1
Optimization is correct!

Diff Detail

Event Timeline

fhahn created this revision.Feb 6 2020, 9:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2020, 9:55 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
nikic added a comment.Feb 7 2020, 10:18 AM

Code like this is generated by code using __builtin_mul_overflow with negative integer constants

I'm missing something here, why does clang generate this kind of code? I would have thought that __builtin_mul_overflow maps pretty directly the intrinsic.

fhahn added a comment.Feb 7 2020, 11:42 AM

Code like this is generated by code using __builtin_mul_overflow with negative integer constants

I'm missing something here, why does clang generate this kind of code? I would have thought that __builtin_mul_overflow maps pretty directly the intrinsic.

I am not entirely sure, but I guess the result type being unsigned forces umul_with_overflow, which treats both operands as unsigned. But if the integer operand is negative, __builtin_mul_overflow has to return true, so Clang needs to add extra checks for that. Clang could try to be a bit better with generating code here, but we would miss out on cases where we can prove that the integer operand is always negative in the middle end.

Gerolf added a subscriber: Gerolf.

+ Duncan, Michae for clang.
+ Amara, David for another opinion. Seems straight forward & LGTM.

-Gerolf

fhahn added a comment.Feb 14 2020, 2:44 AM

@dexonsmith, @Bigcheese please let me know if you think that this would be better/easier to do at the Clang codegen level.

lebedev.ri edited the summary of this revision. (Show Details)Feb 14 2020, 3:29 AM
lebedev.ri added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2726–2743

Consider doing the following instead:

// Check if the OR weakens the overflow condition for umul.with.overflow by
// treating any non-zero result as overflow. In that case, we overflow if both
// umul.with.overflow operands are != 0, as in that case the result can only
// be 0, iff the multiplication overflows.
CmpInst::Predicate Pred;
Value *MulWithOv;
if (match(&I,
          m_c_Or(m_ICmp(Pred, m_OneUse(m_ExtractValue<0>(m_Value(MulWithOv))),
                        m_ZeroInt()),
                 m_OneUse(m_ExtractValue<1>(m_Deferred(MulWithOv))))) &&
    Pred == CmpInst::ICMP_NE) {
  Value *A, *B;
  if (match(MulWithOv, m_Intrinsic<Intrinsic::umul_with_overflow>(
                           m_Value(A), m_Value(B))))
    return BinaryOperator::CreateAnd(
        Builder.CreateICmpNE(A, ConstantInt::get(A->getType(), 0)),
        Builder.CreateICmpNE(A, ConstantInt::get(B->getType(), 0)));
}
llvm/test/Transforms/InstCombine/umul-signed.ll
22 ↗(On Diff #242932)

There shouldn't be any -4775807 here, please replace it with %b

67 ↗(On Diff #242932)

Please add the same test for %mul, and %res

lebedev.ri added inline comments.Feb 14 2020, 3:34 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2726–2743

Also, use Builder.CreateIsNotNull()

fhahn updated this revision to Diff 244647.Feb 14 2020, 6:36 AM
fhahn marked 5 inline comments as done.

Address comments, thanks!

(re-adding dropped reviewers)

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2726–2743

Nice, m_c_Or and m_Deferred are very helpful!

lebedev.ri added inline comments.Feb 14 2020, 6:50 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2726–2743

Please adjust one-use checks as they were in the snippet,
both operands of or must be single-use, and we don't care if the intrinsic goes away or not.

fhahn marked an inline comment as done.Feb 14 2020, 7:02 AM
fhahn added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2726–2743

Please adjust one-use checks as they were in the snippet, both operands of or must be single-use

Do you mean both extractvalues? I am not sure, why do we need to restrict the number of uses for the multiplication result?

and we don't care if the intrinsic goes away or not.

Will this still be profitable if the simplification does not result in the umul.with.overflow call to go away?

lebedev.ri added inline comments.Feb 14 2020, 7:03 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2726–2743

Actually wait, i see where this is going.
So, we need to produce 3 instructions (&, !=0, !=0),
which means that we need to be sure that we eliminate two instructions.
The input pattern is

%res = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %a, i64 %b)
%overflow = extractvalue { i64, i1 } %res, 1
%mul = extractvalue { i64, i1 } %res, 0
%cmp  = icmp ne i64 %mul, 0
%overflow.1 = or i1 %overflow, %cmp

%overflow.1 is free to replace, and we need two extras.
They must be it's operands.

  • If the %overflow is one-use, then only the %mul result of %res is used, which guarantees that %mul&%res will be combined into a simple mul instruction, thus getting rid of the %mul instruction.
  • Even if %mul is one-use, %overflow might not be (if it is, see above), so we don't gain anything here.

TLDR: we only need a single use check, on %overflow = extractvalue { i64, i1 } %res, 1.
%mul = extractvalue { i64, i1 } %res, 0 is either also one-use and will go away,
or it will get folded into %res, either way we end up not increasing instruction count.

2741

TLDR: this check shouldn't be here.

lebedev.ri added inline comments.Feb 14 2020, 7:25 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2726–2743

Ah, i forgot about %cmp.

  • if %cmp is one-use and %mul is also one-use we also can transform. (if %mul isn't one-use %overflow would need to be, but then see above)
fhahn updated this revision to Diff 245027.Feb 17 2020, 12:57 PM

Remove UMulWithOv->hasNUses(2) as suggested @lebedev.ri.

fhahn marked an inline comment as done.Feb 17 2020, 1:07 PM
fhahn added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2741

I've removed the check and there's a test case (@test3_multiple_res_users).

In that case we end up replacing extract + icmp + or with icmp + icmp + or. Not sure if that will be better in the end, but at least it shouldn't end up worse.

fhahn updated this revision to Diff 245037.Feb 17 2020, 2:29 PM

Match m_oneuse(m_extract<1>) || m_oneuse(m_ICmp(Pred, m_oneuse(m_ExtractValue<0>)) as suggested.

I'm not sure if there's a nicer way to do that, e.g. binding parts of a pattern to a new name, so we can do the use checks later?

fhahn updated this revision to Diff 245039.Feb 17 2020, 2:40 PM
fhahn marked 2 inline comments as done.

use m_CombineAnd as suggested by @lebedev.ri .

lebedev.ri accepted this revision.Feb 17 2020, 2:49 PM

Thanks, this LGTM.

This revision is now accepted and ready to land.Feb 17 2020, 2:49 PM
This revision was automatically updated to reflect the committed changes.