This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold x & (-1 >> y) == x to x u<= (-1 >> y)
ClosedPublic

Authored by lebedev.ri on Jul 11 2018, 6:03 AM.

Details

Summary

https://bugs.llvm.org/show_bug.cgi?id=38123

This pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958
https://bugs.llvm.org/show_bug.cgi?id=21530
in unsigned case, therefore it is probably a good idea to improve it.

https://rise4fun.com/Alive/Rny
^ there are more opportunities for folds, i will follow up with them afterwards.

Caveat: this somehow exposes a missing opportunities
in test/Transforms/InstCombine/icmp-logical.ll
It seems, the problem is in foldLogOpOfMaskedICmps() in InstCombineAndOrXor.cpp.
But i'm not quite sure what is wrong, because it calls getMaskedTypeForICmpPair(),
which calls decomposeBitTestICmp() which should already work for these cases...
I would love to have some pointers on how to address it.

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.Jul 11 2018, 6:03 AM
lebedev.ri added inline comments.Jul 11 2018, 6:08 AM
test/Transforms/InstCombine/canonicalize-low-bit-mask-and-icmp-eq-to-icmp-ule.ll
17–19 ↗(On Diff #154978)

To spare time, yes, this is correct, https://rise4fun.com/Alive/J71

lebedev.ri added inline comments.Jul 11 2018, 6:30 AM
test/Transforms/InstCombine/icmp-mul-zext.ll
14–15 ↗(On Diff #154978)

This too, https://rise4fun.com/Alive/twl2, note that br inverted, too.

I'm not too worried about the missing fold in foldLogOpOfMaskedICmps(). That's a complex mess, so not surprising that it has logic holes. I've wondered if we could rewrite that using ranges to simplify that code.

lib/Transforms/InstCombine/InstCombineCompares.cpp
4447–4449 ↗(On Diff #154978)

We're also missing signed folds like:

Pre: isPowerOf2(C1+1)
%masked = and i32 %arg, C1
%truncheck = icmp sge i32 %masked, %arg
  =>
%truncheck = icmp sle i32 %arg, C1

https://rise4fun.com/Alive/K8H

...so I'd give this function a more general name: foldICmpWithMaskedVal?

4452–4458 ↗(On Diff #154978)

Untested, but this could be less indent-crazy if we give some of it a local name like:

auto m_Mask = m_CombineOr(m_LShr(m_AllOnes(), m_Value()), m_LowBitMask());
if (!match(&I, m_c_ICmp(SrcPred,
                        m_c_And(m_CombineAnd(m_Mask, m_Value(M)), m_Value(X)),
                        m_Deferred(X))))
4462 ↗(On Diff #154978)

I think you're planning to extend this soon, but it's best to leave a TODO comment with an explanation just in case that gets delayed.

4659–4660 ↗(On Diff #154978)

There's no logic to fold ordering within visitICmpInst()...the new function could be called from within foldICmpBinOp() rather than the top-level?

lebedev.ri marked 4 inline comments as done.

Address @spatel's review notes.

I'm not too worried about the missing fold in foldLogOpOfMaskedICmps(). That's a complex mess, so not surprising that it has logic holes. I've wondered if we could rewrite that using ranges to simplify that code.

That's, relieving to know!
I have looked a bit more, and i'm not sure what goes wrong,
so one regression can slip, that'd be great.

lib/Transforms/InstCombine/InstCombineCompares.cpp
4447–4449 ↗(On Diff #154978)

Yep, i didn't like that long name anyway.

4452–4458 ↗(On Diff #154978)

Ha, nice one.

4659–4660 ↗(On Diff #154978)

Hm, good idea.

spatel accepted this revision.Jul 11 2018, 9:29 AM

LGTM.

lib/Transforms/InstCombine/InstCombineCompares.cpp
2873 ↗(On Diff #155015)

Nit: make the formula up here more general and then put the specific formulas within the 'switch'.

This revision is now accepted and ready to land.Jul 11 2018, 9:29 AM

LGTM.

Thank you for such speedy review!

I'm wondering, what about the signed case?
https://godbolt.org/g/WvWX13
https://godbolt.org/g/DM7XA4
https://rise4fun.com/Alive/Qslx (check that 25 high bits are either all-ones, or all-zeros)

Can we do this in instcombine?
Or not, given that we were disabling some bit-fiddling transforms lately?

LGTM.

Thank you for such speedy review!

I'm wondering, what about the signed case?
https://godbolt.org/g/WvWX13
https://godbolt.org/g/DM7XA4
https://rise4fun.com/Alive/Qslx (check that 25 high bits are either all-ones, or all-zeros)

Can we do this in instcombine?
Or not, given that we were disabling some bit-fiddling transforms lately?

Actually, not sure that is any better, will leave that for now.
Though the reverse transform might be a good thing for dagcombine.

LGTM.

Thank you for such speedy review!

I'm wondering, what about the signed case?
https://godbolt.org/g/WvWX13
https://godbolt.org/g/DM7XA4
https://rise4fun.com/Alive/Qslx (check that 25 high bits are either all-ones, or all-zeros)

Can we do this in instcombine?
Or not, given that we were disabling some bit-fiddling transforms lately?

Actually, not sure that is any better, will leave that for now.
Though the reverse transform might be a good thing for dagcombine.

It's a good question, but probably better asked on the dev list than here. I think we prefer to canonicalize to the form with less instructions even if that means we lose information from the eliminated ops.

This revision was automatically updated to reflect the committed changes.
hjyamauchi added inline comments.Jul 11 2018, 3:35 PM
llvm/trunk/test/Transforms/InstCombine/icmp-logical.ll
91

It seems like the simplification of this change (D49179) triggers before this original simplification triggers and the original simplification no longer triggers? My guess is that this test just means to test a plain or-case and it may make sense to use some other values like 14 and 78 (shifted left by 1 bit, instead of 7 and 39) and preserve the original intention of the test.

Note the next test @masked_or_A_slightly_optimized has the same code as the after-simplification code of this function. Not sure if it is a just a coincidence.

lebedev.ri added inline comments.Jul 11 2018, 3:40 PM
llvm/trunk/test/Transforms/InstCombine/icmp-logical.ll
91

and the original simplification no longer triggers?

In any case, the original simplification clearly does not handle this pattern.

My guess is that this test just means to test a plain or-case and it may make sense to use some other values like 14 and 78 (shifted left by 1 bit, instead of 7 and 39) and preserve the original intention of the test.

Hmm, thanks, might be a good idea..

Note the next test @masked_or_A_slightly_optimized has the same code as the after-simplification code of this function. Not sure if it is a just a coincidence.

I have added @masked_or_A_slightly_optimized in rL336784 after noticing this regression, to have a test regardless.

LGTM.

Thank you for such speedy review!

I'm wondering, what about the signed case?
https://godbolt.org/g/WvWX13
https://godbolt.org/g/DM7XA4
https://rise4fun.com/Alive/Qslx (check that 25 high bits are either all-ones, or all-zeros)

Can we do this in instcombine?
Or not, given that we were disabling some bit-fiddling transforms lately?

Actually, not sure that is any better, will leave that for now.
Though the reverse transform might be a good thing for dagcombine.

It's a good question, but probably better asked on the dev list than here. I think we prefer to canonicalize to the form with less instructions even if that means we lose information from the eliminated ops.

For future reference, here is a more straight-forward fold https://rise4fun.com/Alive/XuW, that will afterwards fold back into and+icmp https://godbolt.org/g/bm3yZu
But any such fold will clearly need dagcombine work, since the 'naive' version with shifts seems to produce optimal assembly already.
So after all i'm not sure i'm motivated to look into the 'signed truncation pattern' right now..

LGTM.

Thank you for such speedy review!

I'm wondering, what about the signed case?
https://godbolt.org/g/WvWX13
https://godbolt.org/g/DM7XA4
https://rise4fun.com/Alive/Qslx (check that 25 high bits are either all-ones, or all-zeros)

Can we do this in instcombine?
Or not, given that we were disabling some bit-fiddling transforms lately?

Actually, not sure that is any better, will leave that for now.
Though the reverse transform might be a good thing for dagcombine.

It's a good question, but probably better asked on the dev list than here. I think we prefer to canonicalize to the form with less instructions even if that means we lose information from the eliminated ops.

For future reference, here is a more straight-forward fold https://rise4fun.com/Alive/XuW, that will afterwards fold back into and+icmp https://godbolt.org/g/bm3yZu
But any such fold will clearly need dagcombine work, since the 'naive' version with shifts seems to produce optimal assembly already.
So after all i'm not sure i'm motivated to look into the 'signed truncation pattern' right now..

Name: signed truncation check
  %old0 = shl i16 %x, 8
  %old1 = ashr exact i16 %old0, 8
  %ret = icmp eq i16 %old1, %x
=>
  %new0 = icmp slt i16 %x, 128
  %new1 = icmp sgt i16 %x, -129
  %ret = and i1 %new1, %new0

Name: and-of-icmps
  %new0 = icmp slt i16 %x, 128
  %new1 = icmp sgt i16 %x, -129
  %ret = and i1 %new1, %new0
=>
  %x.off = add i16 %x, 128
  %ret = icmp ult i16 %x.off, 256

Let me make sure I'm seeing it - as a question of IR canonicalization, we're deciding which of these 3 forms is best? I don't see any reason to favor the 1st two options over the 3rd (shorter) one. We already convert the 2nd to the 3rd (although as noted here, that transform is not happening as expected in all cases).

If the backend can't produce optimal code from the 3rd form, that's a concern, but it doesn't necessarily have to hold up the IR improvement. We might live with that (hopefully minor and temporary) problem because the IR improvement can lead to optimizations in other passes that we don't see in any of the minimal examples.

LGTM.

Thank you for such speedy review!

I'm wondering, what about the signed case?
https://godbolt.org/g/WvWX13
https://godbolt.org/g/DM7XA4
https://rise4fun.com/Alive/Qslx (check that 25 high bits are either all-ones, or all-zeros)

Can we do this in instcombine?
Or not, given that we were disabling some bit-fiddling transforms lately?

Actually, not sure that is any better, will leave that for now.
Though the reverse transform might be a good thing for dagcombine.

It's a good question, but probably better asked on the dev list than here. I think we prefer to canonicalize to the form with less instructions even if that means we lose information from the eliminated ops.

For future reference, here is a more straight-forward fold https://rise4fun.com/Alive/XuW, that will afterwards fold back into and+icmp https://godbolt.org/g/bm3yZu
But any such fold will clearly need dagcombine work, since the 'naive' version with shifts seems to produce optimal assembly already.
So after all i'm not sure i'm motivated to look into the 'signed truncation pattern' right now..

Name: signed truncation check
  %old0 = shl i16 %x, 8
  %old1 = ashr exact i16 %old0, 8
  %ret = icmp eq i16 %old1, %x
=>
  %new0 = icmp slt i16 %x, 128
  %new1 = icmp sgt i16 %x, -129
  %ret = and i1 %new1, %new0

Name: and-of-icmps
  %new0 = icmp slt i16 %x, 128
  %new1 = icmp sgt i16 %x, -129
  %ret = and i1 %new1, %new0
=>
  %x.off = add i16 %x, 128
  %ret = icmp ult i16 %x.off, 256

Let me make sure I'm seeing it - as a question of IR canonicalization, we're deciding which of these 3 forms is best?

Not really. I agree that the last one is the best from IR clarity standpoint.
I was just looking for some other variants of this pattern.
(Sadly souper was of no help here, strangely.)

I don't see any reason to favor the 1st two options over the 3rd (shorter) one. We already convert the 2nd to the 3rd (although as noted here, that transform is not happening as expected in all cases).

I agree.

If the backend can't produce optimal code from the 3rd form, that's a concern, but it doesn't necessarily have to hold up the IR improvement.
We might live with that (hopefully minor and temporary) problem because the IR improvement can lead to optimizations in other passes that we don't see in any of the minimal examples.

Filed https://bugs.llvm.org/show_bug.cgi?id=38149 to track the signed part of the pattern.

lebedev.ri added inline comments.Jul 12 2018, 8:01 AM
llvm/trunk/test/Transforms/InstCombine/icmp-logical.ll
91

@yamauchi committed in rL336912, thanks!