This is an archive of the discontinued LLVM Phabricator instance.

[Codegenprepare][X86] Use usub with overflow opt for IV increment
ClosedPublic

Authored by mkazantsev on Feb 5 2021, 4:16 AM.

Details

Summary

Function replaceMathCmpWithIntrinsic artificially limits the scope
of the optimization, setting a requirement of two instructions be in
the same block, due to two reasons:

  • usage of DT for more general check is costly in terms of compile time;
  • risk of creating a new value that lives through multiple blocks.

Because of this, two semantically equivalent tests may be or not be the
subject of this opt depending on where the binary operation is located.
See test/CodeGen/X86/usub_inc_iv.ll for motivation

There is one important particular case where this limitation is too strict:
it is when the binary operation is the increment of the induction variable.
As result, the application of this opt becomes fragile and highly reliant on
where other passes decide to place IV increment. In most cases, they place
it in the end of the latch block, killing the opt opportunity (when in fact it
does not matter where to insert the actual instruction).

This patch handles this particular case separately.

  • The detector does not use dom tree and has constant cost;
  • The value of IV or IV.next lives through all loop in any case, so this should not create a new unexpected long-living value.

As result, the transform becomes more robust. It also seems to lead to
better code generation in some cases (see test/CodeGen/X86/lsr-loop-exit-cond.ll).

Diff Detail

Event Timeline

mkazantsev created this revision.Feb 5 2021, 4:16 AM
mkazantsev requested review of this revision.Feb 5 2021, 4:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2021, 4:16 AM

This seems fine, but I was curious what kind of interaction we should expect between CGP and global-isel, so I tried to pushing the last test through llc:

$ llc usub.ll -o - -global-isel 
	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 10, 15
LLVM ERROR: unable to legalize instruction: %4:_(s64), %5:_(s1) = G_USUBO %3:_, %6:_ (in function: test_02)

My understanding was that CGP is not needed with global-isel, but it's apparently running by default. Do we consider the current state of passes temporary and/or do we need to fix that crash before proceeding here?

mkazantsev added a comment.EditedFeb 7 2021, 8:11 PM

HI Sanjay,

I was able to reproduce this crash on this test even without my code changes. I don't know what is the current status of Global ISel, but apparently it (or the pipeline) already has a bug. I don't see why it should be a blocker for this one if we can see it in current trunc. Let's just file a separate bug.

mkazantsev added a comment.EditedFeb 7 2021, 10:19 PM

I've filed https://bugs.llvm.org/show_bug.cgi?id=49087 to address it and merged relevant test as 0fc1738eb75d613b9e16143b83e7cb80512e84eb. It fails even on the test not affected by the patch on current trunc, so I think it should not be an obstacle on this patch's way.

spatel accepted this revision.Feb 8 2021, 11:38 AM

I've filed https://bugs.llvm.org/show_bug.cgi?id=49087 to address it and merged relevant test as 0fc1738eb75d613b9e16143b83e7cb80512e84eb. It fails even on the test not affected by the patch on current trunc, so I think it should not be an obstacle on this patch's way.

Thanks for filing the bug!
LGTM

This revision is now accepted and ready to land.Feb 8 2021, 11:38 AM
reames added a comment.Feb 8 2021, 9:20 PM

If there is a use of the original phi later in the loop, then performing the transformation does cause the live interval of %iv and %iv.next to overlap when it originally didn't. I suspect this is still profitable, but it means the comments are slightly out of sink with reality. Example pseudo code:
{

%iv = ...
if %iv == 0) break;
use(%iv)
%iv.next = add i32 %iv, 1
continue;

}

I'd suggest tweaking this patch to add the hasOneUse check on the phi. With that change, I'd also LGTM this.

As a separate patch, we can consider relaxing the hasOneUse again, but we'll need slightly different reasoning than taken here.

mkazantsev planned changes to this revision.Feb 9 2021, 4:33 AM

I'd suggest tweaking this patch to add the hasOneUse check on the phi. With that change, I'd also LGTM this.

It won't pass a motivating test. Effectively, it doesn't really create a intersection of ranges of IV and IV.next, even if formally it does. In fact, comparison itself is the point where we compute something equivalent to IV.next. I'll write an elaborate comment explaining this.

Our internal testing has found a functional bug with this patch. I'm putting this on hold while investigating.

Fixed bugs, added tests demonstrating them.

This revision is now accepted and ready to land.Feb 10 2021, 4:30 AM
mkazantsev requested review of this revision.Feb 10 2021, 4:30 AM

I think this might need another round of the review, due to fixed bugs & comment change.

spatel added inline comments.Feb 10 2021, 9:32 AM
llvm/test/CodeGen/X86/usub_inc_iv.ll
130–132

I defer to @reames to continue the review (I don't have a good handle on the IV subtleties), but I think it would be easier to review if the test diffs are pre-committed, so we just see the minimal changes from the patch.

reames accepted this revision.Feb 10 2021, 9:47 AM

Still LGTM, thanks for adding the one use requested restriction and the inner loop case is a good catch too.

I'd agree that landing the new tests, then rebasing over, then committing would make post-commit review and any potential reverts easier to understand. Optional, but encouraged.

This revision is now accepted and ready to land.Feb 10 2021, 9:47 AM

Ok, I'll check in the negative tests separately. Thanks!

Patch reverted. I've missed one more important point: CmpInst should also dominate the latch to make it legal. I need to think how to make it without the actual DT check.