This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] remove one-use restriction for icmp+add constant fold
AbandonedPublic

Authored by spatel on Feb 25 2019, 9:36 AM.

Details

Summary

Remove the use of an add with constant when compared to another constant:
(A + C2) == C --> A == (C - C2)
(A + C2) != C --> A != (C - C2)

I noticed an inconsistency in the canonicalization of this pattern as part of D57516 and rL354746, so that is shown in the uaddo.ll test file.

I checked asm for some of the tests where the induction variable test is changing, and I don't see any diffs in the final results, so I'm assuming that something later (LSR?) converts that to optimized form either way?

Diff Detail

Event Timeline

spatel created this revision.Feb 25 2019, 9:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 25 2019, 9:36 AM
lebedev.ri accepted this revision.Feb 27 2019, 12:23 PM

LG, that one-use check seems unmotivated,
we will only produce a single instruction (icmp),
so it does not seem to matter whether we can get rid of that add.

This revision is now accepted and ready to land.Feb 27 2019, 12:23 PM

It looks like this was added on purpose way back in https://reviews.llvm.org/rL16473. That was so long ago that everything will have changed since then, but have you run any benchmarks for this?

I agree that LSR should probably be turning this back into a cmp based on the inc in any loop backedges. But I'm not sure it will do that equally well for all backends. And changing the canonical form of a loop backedge is quite a big change :)

Oh, and I presume most loop transforms will use SCEV's, which won't have changed? So we don't need to update the tests for them all, to make sure they are still doing what they should?

It looks like this was added on purpose way back in https://reviews.llvm.org/rL16473. That was so long ago that everything will have changed since then, but have you run any benchmarks for this?

Thanks for digging up the original commit! So it was a purposeful hack. :)
The problem with that hack of course is that crippling a canonicalization is not the same as adding the inverse canonicalization. Also, we are already doing this same transform for related patterns - there was just this odd-shaped hole carved out by this use limitation for some subset of constant values.

I did some more experiments, and my guess about LSR (-loop-reduce) was correct. It cleans up the IR for the motivating example in either incoming form (before/after this patch) to exactly the same output form for a given target. I checked x86, PPC, AArch64, and armv7 triples. I only have x86 hardware to run benchmarks, but I don't see anything above noise with this patch. Note that the LSR behavior holds for both -O1 and -O2 (potentially drastically different IR due to loop vectorization).

I agree that LSR should probably be turning this back into a cmp based on the inc in any loop backedges. But I'm not sure it will do that equally well for all backends. And changing the canonical form of a loop backedge is quite a big change :)
Oh, and I presume most loop transforms will use SCEV's, which won't have changed? So we don't need to update the tests for them all, to make sure they are still doing what they should?

I'm not too familiar with SCEV. I see that LSR uses them, and that output matches expectations. Is there some other experiment I can try to confirm?

Yep. I think that makes sense to me. Would be good to remove the hack. And LSR should handle most of these cases. I think there's times when it doesn't do what it should, though.

This is one example, out of libjpeg I think, but I got it from csibe. It's was just the first I looked at that was larger than before:
https://reviews.llvm.org/P8129
Compiled with something like "opt -instcombine -o - -S jdmarker.ll | llc - -o -", vs without the instcombine. Codesize is a little bigger than it was, I think maybe something about the loops (not in simple form?) is making the SCEV's not useful, so LSR isn't transforming these back.

I don't know if this is representative of the other changes. Over all the benchmarks I ran, this patch seemed to be (on average, very slightly) worse overall than before, both for perf and codesize. Not by a long way, and there are certainly some improvements in there.

This is another one from cmsis dsp (https://github.com/ARM-software/CMSIS_5/blob/develop/CMSIS/DSP/Source/MatrixFunctions/arm_mat_trans_q31.c). Apparently the original was 20% slower on a thumb1 target, but something daft is likely going on there, as it can sometimes make bad decisions.
https://reviews.llvm.org/P8130
There does look like there's some extra uxth's in there, I've not looked into why. I was compiling with something like "-target arm-arm-none-eabi -mcpu=cortex-m33 -O3" (or maybe m23 for the thumb1 case).

spatel planned changes to this revision.Feb 28 2019, 2:12 PM

Yep. I think that makes sense to me. Would be good to remove the hack. And LSR should handle most of these cases. I think there's times when it doesn't do what it should, though.

This is one example, out of libjpeg I think, but I got it from csibe. It's was just the first I looked at that was larger than before:
https://reviews.llvm.org/P8129
Compiled with something like "opt -instcombine -o - -S jdmarker.ll | llc - -o -", vs without the instcombine. Codesize is a little bigger than it was, I think maybe something about the loops (not in simple form?) is making the SCEV's not useful, so LSR isn't transforming these back.

I don't know if this is representative of the other changes. Over all the benchmarks I ran, this patch seemed to be (on average, very slightly) worse overall than before, both for perf and codesize. Not by a long way, and there are certainly some improvements in there.

Thanks for checking that. I see x86 getting bigger on that example too.
Given that this is just an academic change for me currently (we ended up needing to match both patterns for unsigned add overflow in the patches that led me here), I think we better try to make LSR stronger before pushing this one. I'm guessing this won't be at the top of my queue for a while, but I'll mark it as 'Plan changes' for now.

nikic added a comment.Apr 13 2019, 6:45 AM

There's a similar one use check for the non-equality case in https://github.com/llvm-mirror/llvm/blob/fb1e04ff3c51a617c4944b2e05e8aa1b8c543e22/lib/Transforms/InstCombine/InstCombineCompares.cpp#L2389. This prevents the trivially true condition in https://rust.godbolt.org/z/qg0RjC from folding. That check was added (or rather moved) in rL341831.

Just wanted to add this as a datapoint for a case where dropping the restriction would be nice.

There's a similar one use check for the non-equality case in https://github.com/llvm-mirror/llvm/blob/fb1e04ff3c51a617c4944b2e05e8aa1b8c543e22/lib/Transforms/InstCombine/InstCombineCompares.cpp#L2389. This prevents the trivially true condition in https://rust.godbolt.org/z/qg0RjC from folding. That check was added (or rather moved) in rL341831.

Just wanted to add this as a datapoint for a case where dropping the restriction would be nice.

It's not explicitly stated in the commit message, but the test added with rL341831 suggests we are missing some kind of overflow check optimization (cc @t.p.northover)? Or maybe the motivation was similar to the cases we have here - covering up for failures in LSR?

Either way, we have a real motivating example for correcting instcombine now. If there's not enough will to fix the underlying problems, we can probably justify adding an over-specific match to instsimplify to zap the compare in the rust example.

mati865 added a subscriber: mati865.May 8 2019, 2:13 AM

There's a similar one use check for the non-equality case in https://github.com/llvm-mirror/llvm/blob/fb1e04ff3c51a617c4944b2e05e8aa1b8c543e22/lib/Transforms/InstCombine/InstCombineCompares.cpp#L2389. This prevents the trivially true condition in https://rust.godbolt.org/z/qg0RjC from folding. That check was added (or rather moved) in rL341831.

Just wanted to add this as a datapoint for a case where dropping the restriction would be nice.

That should be resolved now, reverted in rG0f22e783a038.

It's not explicitly stated in the commit message, but the test added with rL341831 suggests we are missing some kind of overflow check optimization (cc @t.p.northover)? Or maybe the motivation was similar to the cases we have here - covering up for failures in LSR?

Either way, we have a real motivating example for correcting instcombine now. If there's not enough will to fix the underlying problems, we can probably justify adding an over-specific match to instsimplify to zap the compare in the rust example.

I've talked to @t.p.northover, and the original motivation is unknown/unreconstructible now.
That test is misleading - it wouldn't be transformed before rL341831 anyway - there are no no-wrap flags on the add there.

spatel added a comment.Dec 2 2019, 9:02 AM

There's a similar one use check for the non-equality case in https://github.com/llvm-mirror/llvm/blob/fb1e04ff3c51a617c4944b2e05e8aa1b8c543e22/lib/Transforms/InstCombine/InstCombineCompares.cpp#L2389. This prevents the trivially true condition in https://rust.godbolt.org/z/qg0RjC from folding. That check was added (or rather moved) in rL341831.

Just wanted to add this as a datapoint for a case where dropping the restriction would be nice.

That should be resolved now, reverted in rG0f22e783a038.

Nice. Is that enough to resolve the motivating bug in the rust example?
I think this patch is still held up until we figure out how to make LSR and/or SCEV reconstruct the original patterns that they are expecting.

There's a similar one use check for the non-equality case in https://github.com/llvm-mirror/llvm/blob/fb1e04ff3c51a617c4944b2e05e8aa1b8c543e22/lib/Transforms/InstCombine/InstCombineCompares.cpp#L2389. This prevents the trivially true condition in https://rust.godbolt.org/z/qg0RjC from folding. That check was added (or rather moved) in rL341831.

Just wanted to add this as a datapoint for a case where dropping the restriction would be nice.

That should be resolved now, reverted in rG0f22e783a038.

Nice. Is that enough to resolve the motivating bug in the rust example?

I don't work on rust, but running that ir through -instcombine says yes.

I think this patch is still held up until we figure out how to make LSR and/or SCEV reconstruct the original patterns that they are expecting.

Yes, i think so.

lebedev.ri resigned from this revision.Jan 12 2023, 4:43 PM

This review seems to be stuck/dead, consider abandoning if no longer relevant.

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 4:43 PM
spatel abandoned this revision.Jan 13 2023, 7:37 AM

Abandoning - I put a TODO comment on the fold that references this review in case someone wants to revive it.