This is an archive of the discontinued LLVM Phabricator instance.

[InstSimplify] Drop leftover "division-by-zero guard" around `@llvm.umul.with.overflow` inverted overflow bit
ClosedPublic

Authored by lebedev.ri on Jul 23 2019, 7:32 AM.

Details

Summary

Now that with D65143/D65144 we've produce @llvm.umul.with.overflow,
and with D65147 we've flattened the CFG, we now can see that
the guard may have been there to prevent division by zero is redundant.
We can simply drop it:

----------------------------------------
Name: no overflow or zero
  %iszero = icmp eq i4 %y, 0
  %umul = smul_overflow i4 %x, %y
  %umul.ov = extractvalue {i4, i1} %umul, 1
  %umul.ov.not = xor %umul.ov, -1
  %retval.0 = or i1 %iszero, %umul.ov.not
  ret i1 %retval.0
=>
  %iszero = icmp eq i4 %y, 0
  %umul = smul_overflow i4 %x, %y
  %umul.ov = extractvalue {i4, i1} %umul, 1
  %umul.ov.not = xor %umul.ov, -1
  %retval.0 = or i1 %iszero, %umul.ov.not
  ret i1 %umul.ov.not

Done: 1
Optimization is correct!

Note that this is inverted from what we have in a previous patch,
here we are looking for the inverted overflow bit.
And that inversion is kinda problematic - given this particular
pattern we neither hoist that not closer to ret (then the pattern
would have been identical to the one without inversion,
and would have been handled by the previous patch), neither
do the opposite transform. But regardless, we should handle this too.
I've filled PR42720.

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.Jul 23 2019, 7:32 AM
spatel accepted this revision.Jul 26 2019, 6:51 AM

LGTM

llvm/lib/Analysis/InstructionSimplify.cpp
1814–1827 ↗(On Diff #211299)

A couple of possibilities to reduce the code duplication, but you can decide if it is worth it:

  1. Make a tiny helper for this chunk of code that matches the extract/intrinsic.
  2. Add a parameter that tells this function whether we should match the EQ/NE and 'not' part of the pattern (distinguishes between if we were called by 'and' or 'or').
This revision is now accepted and ready to land.Jul 26 2019, 6:51 AM
lebedev.ri marked 2 inline comments as done.

Reduced duplication.

llvm/lib/Analysis/InstructionSimplify.cpp
1814–1827 ↗(On Diff #211299)

Yeah, this ended up having more duplication than i thought there will be.

lebedev.ri added a comment.EditedJul 26 2019, 8:45 AM

One more thing: given this IR:

define dso_local i64 @_Z17will_not_overflowmmPb(i64 %arg, i64 %arg1, i8* nocapture %arg2) {
  %tmp = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %arg, i64 %arg1)
  %tmp3 = extractvalue { i64, i1 } %tmp, 1
  %tmp4 = zext i1 %tmp3 to i8
  store i8 %tmp4, i8* %arg2, align 1
  %tmp5 = mul i64 %arg1, %arg
  ret i64 %tmp5
}

declare { i64, i1 } @llvm.umul.with.overflow.i64(i64, i64)

which pass should be responsible to merging those two multiplications into:

define dso_local i64 @_Z17will_not_overflowmmPb(i64 %arg, i64 %arg1, i8* nocapture %arg2) {
  %tmp = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %arg, i64 %arg1)
  %tmp3 = extractvalue { i64, i1 } %tmp, 1
  %tmp4 = zext i1 %tmp3 to i8
  store i8 %tmp4, i8* %arg2, align 1
  %tmp5 = extractvalue { i64, i1 } %tmp, 0
  ret i64 %tmp5
}

declare { i64, i1 } @llvm.umul.with.overflow.i64(i64, i64)

?

https://godbolt.org/z/CQW8it

----------------------------------------
Name: mul -> smul_overflow
  %r = mul i4 %x, %y
=>
  %tmp = smul_overflow i4 %x, %y
  %r = extractvalue {i4, i1} %tmp, 0

Done: 1
Optimization is correct!

----------------------------------------
Name: mul -> umul_overflow
  %r = mul i4 %x, %y
=>
  %tmp = umul_overflow i4 %x, %y
  %r = extractvalue {i4, i1} %tmp, 0

Done: 1
Optimization is correct!

which pass should be responsible to merging those two multiplications into:

This would be a shared problem for all of the overflow intrinsics. EarlyCSE is my first guess. Best to ask this on llvm-dev though, so we can get more feedback.

nikic added a comment.Jul 26 2019, 8:56 AM

There's some code related to this in GVN: https://github.com/llvm-mirror/llvm/blob/6c33c8991a6740ed8c87d5d3980a6469b415628d/lib/Transforms/Scalar/GVN.cpp#L333-L342 However it only works if an appropriate extractelement already exists.

Rebased, NFC.

lebedev.ri edited the summary of this revision. (Show Details)Aug 29 2019, 5:31 AM
This revision was automatically updated to reflect the committed changes.