Page MenuHomePhabricator

[CGP] Make ICMP_EQ use CR result of ICMP_S(L|G)T dominators
ClosedPublic

Authored by Yi-Hong.Lyu on Apr 10 2019, 5:05 AM.

Details

Summary

For example:

long long test(long long a, long long b) {
  if (a << b > 0)
    return b;
  if (a << b < 0)
    return a;
  return a*b;
}

Produces:

# %bb.0:                                # %entry
        sld. 5, 3, 4
        ble 0, .LBB0_2
# %bb.1:                                # %return
        mr 3, 4
        blr
.LBB0_2:                                # %if.end
        cmpldi  5, 0
        li 5, 1
        isel 4, 4, 5, 2
        mulld 3, 4, 3
        blr

But the compare (cmpldi 5, 0) is redundant and can be removed (CR0 already contains the result of that comparison).

The root cause of this is that LLVM converts signed comparisons into equality comparison based on dominance. Equality comparisons are unsigned by default, so we get either a record-form or cmp (without the l for logical) feeding a cmpl. That is the situation we want to avoid here.

Diff Detail

Event Timeline

Yi-Hong.Lyu created this revision.Apr 10 2019, 5:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2019, 5:05 AM
Yi-Hong.Lyu retitled this revision from [PowerPC] Make ICMP_EQ use CR result of ICMP_S(L|G)T dominators to [CGP] Make ICMP_EQ use CR result of ICMP_S(L|G)T dominators.Apr 11 2019, 8:43 AM
Yi-Hong.Lyu added a reviewer: spatel.
nemanjai requested changes to this revision.Apr 23 2019, 5:46 AM

Please reduce the test case only to cases that are changing behaviour. Also, it is probably useful to commit the test case initially as an NFC patch so we can see the actual difference in codegen.

llvm/lib/CodeGen/CodeGenPrepare.cpp
1415

I think it would be very helpful to comment the code in this function. Specifically, it's good to state the pattern(s) you're looking for. Something like:

Block A:
icmp slt i32 %1, 100
...
Block B (with A being the single predecessor):
icmp eq i32 %1, 100 ; Equality is exactly opposite to SGT.
...
1420

Why do we restrict this to EQ rather than also supporting NE? Is this just the canonical form and NE never shows up?

1423

More descriptive names would probably aid readability - perhaps CmpOp1/CmpOp2.

1425

Why does this need to be restricted to constant comparisons? Could we not equivalently handle comparisons between two virtual registers?

1433

We could probably handle logical operations just as easily, couldn't we?

1452

Nit: naming convention - if you have other violations of CamelCase variable naming, please fix those as well.

llvm/test/CodeGen/PowerPC/use-cr-result-of-dom-icmp-st.ll
36

This is already in the form you want it to be in, so it doesn't really test anything this pass does.

This revision now requires changes to proceed.Apr 23 2019, 5:46 AM
Yi-Hong.Lyu marked 3 inline comments as done.May 8 2019, 6:52 AM
Yi-Hong.Lyu added inline comments.
llvm/lib/CodeGen/CodeGenPrepare.cpp
1420

The pattern we observed that could be optimized is something like:

Block A:
icmp slt i32 %1, 100
...
Block B (with A being the single predecessor):
icmp eq i32 %1, 100 ; Equality is exactly opposite to SGT.
...

icmp ne i32 %1, 100 is not supposed to be there, otherwise it would be a redundant LLVM IR.

1425

Well. I think we might be able to handle comparisons between two virtual registers equivalently. However I cannot generate test cases for it. I have tried:

long long test1(long long a, long long b, long long c) {
  if (a << b > c)
    return b;
  if (a << b < c)
    return a;
  return a * b;
}

Here is the generated LLVM IR:

; Function Attrs: norecurse nounwind readnone
define dso_local i64 @_Z5test1xxx(i64 %a, i64 %b, i64 %c) local_unnamed_addr #0 {
entry:
  %shl = shl i64 %a, %b
  %cmp = icmp sgt i64 %shl, %c
  br i1 %cmp, label %return, label %if.end

if.end:                                           ; preds = %entry
  %cmp2 = icmp slt i64 %shl, %c
  %mul = select i1 %cmp2, i64 1, i64 %b
  %spec.select = mul nsw i64 %mul, %a
  ret i64 %spec.select

return:                                           ; preds = %entry
  ret i64 %b
}

Apparently, it is not the test case we want. Do you have any test cases in mind?

1433

For other logical operations, we need to introduce a not instruction. It would add more LLVM IR and that might produce more instructions, which is we would like to avoid.

Change according to comments

Yi-Hong.Lyu marked 7 inline comments as done.May 9 2019, 5:15 AM

Rebase the master (NFC testcases merged). Since it is a rebased one, please see the latest commit for review

nemanjai added inline comments.May 16 2019, 2:43 PM
llvm/lib/CodeGen/CodeGenPrepare.cpp
1420

I don't follow why this would be redundant IR.

BlockA:
%cmp1 = icmp slt i32 %1, 100
br i1 %cmp1, BlockC, BlockB

BlockB: ; %1 is signed-greater-than-or-equal-to 100
%cmp2 = icmp ne i32 %1, 100
br i1 %cmp2, BlockD, BlockE
...
BlockC: ; %1 is signed-less-than 100
...
BlockD: ; %1 is signed-greater-than 100
...
BlockE: ; %1 is equal to 100

Of course, if the EQ case is the canonical form, that's a different statement, but I don't think NE is automatically redundant.

Yi-Hong.Lyu marked an inline comment as done.May 17 2019, 10:36 AM
Yi-Hong.Lyu added inline comments.
llvm/lib/CodeGen/CodeGenPrepare.cpp
1420

After some experiments, I think EQ is canonical form and NE never shows up. I change your pseudo LLVM IR a little bit to make it work:

; ModuleID = '<stdin>'
source_filename = "<stdin>"
target datalayout = "e-m:e-i64:64-n32:64"
target triple = "powerpc64le-unknown-linux-gnu"

define i64 @icmp_ne(i64 %a, i64 %b) {
BlockA:
  %shl = shl i64 %a, %b
  %cmp1 = icmp slt i64 %shl, 1
  br i1 %cmp1, label %BlockC, label %BlockB

BlockB: ; %1 is signed-greater-than-or-equal-to 1
  %cmp2 = icmp ne i64 %shl, 1
  br i1 %cmp2, label %BlockD, label %BlockE

BlockC: ; %1 is signed-less-than 1
  ret i64 %a

BlockD: ; %1 is signed-greater-than 1
  ret i64 %b

BlockE: ; %1 is equal to 1
  %mul = mul nsw i64 %a, %b
  ret i64 %mul
}

However, the NE would be transformed into EQ as:

$ opt -instcombine -S < icmp_ne.ll -o -
; ModuleID = '<stdin>'
source_filename = "<stdin>"
target datalayout = "e-m:e-i64:64-n32:64"
target triple = "powerpc64le-unknown-linux-gnu"

define i64 @icmp_ne(i64 %a, i64 %b) {
BlockA:
  %shl = shl i64 %a, %b
  %cmp1 = icmp slt i64 %shl, 1
  br i1 %cmp1, label %BlockC, label %BlockB

BlockB:                                           ; preds = %BlockA
  %cmp2 = icmp eq i64 %shl, 1
  br i1 %cmp2, label %BlockE, label %BlockD

BlockC:                                           ; preds = %BlockA
  ret i64 %a

BlockD:                                           ; preds = %BlockB
  ret i64 %b

BlockE:                                           ; preds = %BlockB
  %mul = mul nsw i64 %a, %b
  ret i64 %mul
}
Yi-Hong.Lyu marked an inline comment as done.May 18 2019, 5:29 AM

Relax the restriction so we can handle comparisons between two virtual registers equivalently. I rebase the patch to TOT so history would not work. Please review this commit directly.

lei added inline comments.Sep 12 2019, 4:41 PM
llvm/include/llvm/CodeGen/TargetLowering.h
476

nit: missing . at the end of the sentence.

llvm/lib/CodeGen/CodeGenPrepare.cpp
1415

Please add a brief description for this function and some inline documentation so we can easily follow the logic.

llvm/lib/Target/PowerPC/PPCISelLowering.h
653

follow previous conventions and put return false; in the next line.

lebedev.ri added inline comments.
llvm/include/llvm/CodeGen/TargetLowering.h
477

Please add more tests for other arches - x86, aarch64 at least

Yi-Hong.Lyu updated this revision to Diff 221979.EditedSep 26 2019, 10:26 AM
Yi-Hong.Lyu marked an inline comment as done.
  1. Address Lei's comments
  2. Add tests for other arches - X86, AArch64 by following Roman 's comments
Yi-Hong.Lyu marked 2 inline comments as done.Sep 26 2019, 10:27 AM

Add the option -cgp-icmp-eq2icmp-st to enable transformation on AArch64/X86 tests (complete addressing Roman's comments)

Yi-Hong.Lyu marked an inline comment as done.Sep 26 2019, 10:33 AM

To see how my transformation affect AArch64 and X86 tests. Please use History to compare Diff 5 (221979) and Diff 6 (221983)

Please precommit all the tests and rebase the patch.

And yes, please pre-commit all the tests as NFC so we can see just the code-gen differences.

llvm/include/llvm/CodeGen/TargetLowering.h
477

I think a more descriptive name would be:
isEqualityCmpFoldedWithSignedCmp()

Test cases are posted here https://reviews.llvm.org/D68534. I would need approval to commit the NFC patch.

Yi-Hong.Lyu updated this revision to Diff 223451.EditedOct 6 2019, 11:42 PM
Yi-Hong.Lyu added a reviewer: lebedev.ri.

Pushed https://reviews.llvm.org/D68534. Rebase the patch to TOT and address Roman and Nemanja' comments

Yi-Hong.Lyu marked an inline comment as done.Oct 6 2019, 11:44 PM

Some thoughts

llvm/lib/CodeGen/CodeGenPrepare.cpp
226

What's the plan with defaulting to false?

1440

What about ICmpInst::ICMP_NE?

1443

s/not only used by/has users other than/ s/or/and/

1443–1452

You want to use canFreelyInvertAllUsersOf() i think.

1464–1465

Is this guaranteed, or are some commutative cases being missed?

1474–1475

ICmpInst::Predicate NewPred = CmpInst::getSwappedPredicate(DomPred);

1485–1489

SI->swapValues();

Yi-Hong.Lyu marked 9 inline comments as done.

Address Roman's comments

Yi-Hong.Lyu marked 2 inline comments as done.Oct 23 2019, 1:02 PM
Yi-Hong.Lyu added inline comments.
llvm/lib/CodeGen/CodeGenPrepare.cpp
226

We indeed know it remove comparison on PowerPC but we don't know whether it has negative impacts for other targets. We disable it by default and let other target to decide whether to turn it on.

1440

Please see discussion in https://reviews.llvm.org/D60506?id=194492#inline-551643. In short, EQ is canonical form and NE never shows up.

1443–1452

canFreelyInvertAllUsersOf is internal to llvm/lib/Transforms/InstCombine/InstCombineInternal.h that I cannot include and use. Should I submit a NFC patch to move it to some header so I can invoke it?

1464–1465

It is guaranteed and I am not aware of any commutative cases being missed. Without this check, some pattern like

///   DomCond = icmp sgt/slt CmpOp0, CmpOp1 (might not be in DomBB)
///   ...
/// DomBB:
///   ...
///   br DomCond, CmpBB, FalseBB
/// CmpBB: (with DomBB being the single predecessor)
///   ...
///   Cmp = icmp eq CmpOp0, CmpOp1

would continue execution. Apparently the Cmp = icmp eq CmpOp0, CmpOp1 would be always false (i.e., redundant) and it is not the pattern we are looking for.

Yi-Hong.Lyu marked an inline comment as done.Oct 23 2019, 1:05 PM
nemanjai accepted this revision.Nov 1 2019, 4:17 AM

LGTM with some minor nits related to comments/usage addressed. I personally don't have an issue with you addressing this on the commit, but give it until early next week in case someone else speaks up regarding wanting another review.

llvm/lib/CodeGen/CodeGenPrepare.cpp
1446

Nit: the sequence isa<>; cast<> is discouraged. The preferred way is to dyn_cast and the subsequent checks are done if the return value is non-null.
Also, is the check for isConditional() actually needed? Can an unconditional branch use the result of a compare?

1460

I think a comment before this section along the following lines would help readability:

// We want to ensure that the only way control gets to the comparison
// of interest is that a less/greater than comparison on the same operands
// is false.
1474

Also a comment here would be nice. Something like:

// Convert the equality comparison to the opposite of the dominating
// comparison and swap the direction for all branch/select users.
// We have conceptually converted:
// Res = (a < b) ? <LT_RES> : (a == b) ? <EQ_RES> : <GT_RES>;
// to
// Res = (a < b) ? <LT_RES> : (a > b)  ? <GT_RES> : <EQ_RES>;
// And similarly for branches.
This revision is now accepted and ready to land.Nov 1 2019, 4:17 AM
This revision was automatically updated to reflect the committed changes.