This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
4/6
DivRemPairs.cpp
-
test/Transforms/DivRemPairs/
-
Transforms/
-
DivRemPairs/
-
MSP430/
-
div-rem-pairs.ll
-
Mips/
-
div-rem-pairs.ll
-
PowerPC/
-
div-rem-pairs.ll
-
RISCV/
-
div-rem-pairs.ll
-
X86/
2
div-expanded-rem-pair.ll
1/1
div-rem-pairs.ll

Differential D140647

Handle simple diamond CFG hoisting in DivRemPairs.
ClosedPublic

Authored by resistor on Dec 23 2022, 8:26 PM.

Download Raw Diff

Details

Reviewers

nikic
lebedev.ri

Commits

rG88e85aa58006: Handle simple diamond CFG hoisting in DivRemPairs.

Summary

Previous we only handled triangle CFGs. This patch expands that
to support diamonds, where the div and rem appear in the then/else
sides of a condition. In that case, we can hoist the div into the
shared predecessor.

This could be generalized further to use nearest common ancestors,
but some of the conditions for hoisting would then require
post-dominator information.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

resistor created this revision.Dec 23 2022, 8:26 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 23 2022, 8:26 PM

Herald added subscribers: atanasyan, jrtc27, hiraditya and 2 others. · View Herald Transcript

resistor requested review of this revision.Dec 23 2022, 8:26 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 23 2022, 8:26 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B204835: Diff 485184.Dec 23 2022, 10:01 PM

llvm/lib/Transforms/Scalar/DivRemPairs.cpp
278

This revision is now accepted and ready to land.Dec 24 2022, 8:53 AM

The changes in the tests actually show that this transformation is unprofitable.
Unlike the case above (where Div postdominates both PredBB and Rem), hosting the Div ~~increases~~ may increase critical path length.

Hm, could you provide some more information on what the motivation for this transform is? If we don't have a domination relationship, then we won't be able to reuse a result we have to compute anyway, so is the intention here a code size improvement for the case where a divrem op exists, because we now only need to materialize the instruction once? In that case though, does this transform make sense if we are going to use expanded form, as in the PowerPC example? We are replacing a div/rem on disjoint code paths with a common div plus mul+sub on one code path, which seems strictly worse?

By hoisting div/rem that would be executed anyway,
we allow it to be issued earlier than it would have,
and assuming that the inputs were already ready,
the instructions that use it's result may have to wait
less for said result to be available.

In D140647#4016068, @barannikov88 wrote:

The changes in the tests actually show that this transformation is unprofitable.
Unlike the case above (where Div postdominates both PredBB and Rem), hosting the Div ~~increases~~ may increase critical path length.

Div/rem are usually *extremely* slow, so that code was in a bad spot already.

In D140647#4016069, @nikic wrote:

Hm, could you provide some more information on what the motivation for this transform is?

If we don't have a domination relationship, then we won't be able to reuse a result we have to compute anyway

I don't follow at all. We clearly dominate both of the source blocks?

so is the intention here a code size improvement for the case where a divrem op exists,
because we now only need to materialize the instruction once? In that case though,
does this transform make sense if we are going to use expanded form, as in the PowerPC example?
We are replacing a div/rem on disjoint code paths with a common div plus mul+sub on one code path, which seems strictly worse?

Div/rem are usually *extremely* slow, so that code was in a bad spot already.

And we're going to make it twice as bad by hoisting the Div that might not be executed at all in the original case.
I.e. in the original case we execute either div or rem, but not both. After the transformation we execute div unconditionally, and additionally mul+sub in one of the cases.

I could agree that the latency issue may prevail in the case when rem is guaranteed to be expanded into div+mul+sub (and not into a libcall, for example).
Otherwise we're only going to increase the latency on the rem path.
However, this case should already be handled by MachineCSE, which has more context information such as register pressure.

May i suggest that reviewers first familiarize themselves
with the code they are reviewing, in particular with the
preconditions on the transformations,
in particular with the hasDivRemOp() TTI hook?
Hint: https://godbolt.org/z/K3dcxqEhE

craig.topper added a subscriber: craig.topper.Dec 24 2022, 12:17 PM

craig.topper added inline comments.

llvm/test/Transforms/DivRemPairs/X86/div-expanded-rem-pair.ll
181	Isn't this the same as T3? Doesn't entry dominate this block?

lebedev.ri added inline comments.Dec 24 2022, 12:26 PM

llvm/test/Transforms/DivRemPairs/X86/div-expanded-rem-pair.ll
181	We only deal with a single div-rem pair (in the sense, we don't deal with the case of multiple identical div's or rem's), and we don't CSE div's/rem's, and only do a single pass over the function, so this indeed falls through cracks.

In D140647#4016094, @lebedev.ri wrote:

May i suggest that reviewers first familiarize themselves
with the code they are reviewing, in particular with the
preconditions on the transformations,
in particular with the hasDivRemOp() TTI hook?
Hint: https://godbolt.org/z/K3dcxqEhE

The preconditions say:

// If the target supports div+rem and the instructions are in the same block
// already, there's nothing to do. The backend should handle this. If the
// target does not support div+rem, then we will decompose the rem.
if (HasDivRemOp && RemInst->getParent() == DivInst->getParent())
  continue;

So your example is irrelevant.

The relevant example is: https://godbolt.org/z/roGW5G54f
As you can see, in the case 'before' there are 3 instructions on each path.
In the case 'after' (which corresponds to the suggested transformation), the 'div' path contains 4 instructions and the 'rem' path contains 6 instructions.

I'm guessing what other reviewers are trying to say,
is that we might be missing something to to effect of:

if(DivBB != PredBB && RemBB != PredBB && !HasDivRemOp)
  continue; // Don't hoist both into predecessor if we don't have divrem instruction.

But it's not obvious to me if that it's actually worse,
because instruction counting is not really the right way
to gauge code performance.

Div/Rem is usually *really* that bad. We move really really
costly instruction from both branches to their predecessor

if branch is mispredicted, we won't have to recompute it
if the inputs are already avaliable, it can start executing earlier than it would have in the branches, thus we hide some of it's latency
in turn, instructions that depend on results of those instructions may get results earlier, and start executing earlier

In D140647#4016111, @lebedev.ri wrote:
I'm guessing what other reviewers are trying to say,
is that we might be missing something to to effect of:
if(DivBB != PredBB && RemBB != PredBB && !HasDivRemOp)
  continue; // Don't hoist both into predecessor if we don't have divrem instruction.
But it's not obvious to me if that it's actually worse,
because instruction counting is not really the right way
to gauge code performance.

Div/Rem is usually *really* that bad. We move really really
costly instruction from both branches to their predecessor

if branch is mispredicted, we won't have to recompute it

if the inputs are already avaliable, it can start executing earlier than it would have in the branches, thus we hide some of it's latency

in turn, instructions that depend on results of those instructions may get results earlier, and start executing earlier

Continuing with Sergei's example, the transformation *is* profitable from a code size and latency perspective on x86: https://godbolt.org/z/xo7P1MnGr
It seems like the simplest path forward here is to conditionalize the transformation as you suggest. I will look into this.

In D140647#4016069, @nikic wrote:

Hm, could you provide some more information on what the motivation for this transform is? If we don't have a domination relationship, then we won't be able to reuse a result we have to compute anyway, so is the intention here a code size improvement for the case where a divrem op exists, because we now only need to materialize the instruction once? In that case though, does this transform make sense if we are going to use expanded form, as in the PowerPC example? We are replacing a div/rem on disjoint code paths with a common div plus mul+sub on one code path, which seems strictly worse?

Note that there is actually a domination requirement, just not tested via dominator tree. See this example on x86 that shows where this is profitable for code size: https://godbolt.org/z/xo7P1MnGr

In D140647#4016111, @lebedev.ri wrote:

if(DivBB != PredBB && RemBB != PredBB && !HasDivRemOp)
  continue; // Don't hoist both into predecessor if we don't have divrem instruction.

That sounds fine to me, too.
@resistor
Could you please "port" PPC's @no_domination test to a couple of more targets to make sure there are no regressions? E.g. AArch64 (no rem), RISCV (rem, but no divrem) and MSP430 (libcall rem, no divrem)?

Conditionalize the transformation on HasDivRemOp and add testcases.

Herald added subscribers: • pcwang-thead, frasercrmck, luismarques and 19 others. · View Herald TranscriptDec 24 2022, 8:40 PM

Harbormaster completed remote builds in B204860: Diff 485215.Dec 24 2022, 9:20 PM

Thanks for the update. I have two more minor suggestions:

I believe we are missing negative tests for the case where either a) the div or rem block don't have a unique predecessor or b) the common predecessor has other successors. (Please let me know if I just missed them, there's a lot of test files...)

I would suggest a code comment on why we are doing this transform, because the motivation here is somewhat different than for all the other cases this pass handles (in other cases, we end up not executing a div/rem operation on some code path, while here we have a scheduling / code size optimization).

In addition to that, I believe this transform has multiple pre-existing correctness issues. These are kind of orthogonal to your patch, but I noticed them while reviewing the code:

If the terminator of the predecessor is not guaranteed to transfer, then we might hoist a dynamically dead div/rem, causing undefined behavior. In particular, this can happen with an invoke of a non-willreturn function.

If the predecessor is a catchswitch pad, it is not legal to hoist into it.

With the pending changes to callbr semantics (https://reviews.llvm.org/D135997), if callbr defines one of the div/rem operands, it may not be available after hoisting. This is not a problem yet though, so this is mostly something for @nickdesaulniers to keep an eye on. This might be a problem in other transforms as well.

Add negative tests. Enhance comments to explain the profitability heuristic.
Attempt to fix pre-existing correctness concerns WRT non-transferring predecessors.

Harbormaster completed remote builds in B204898: Diff 485265.Dec 25 2022, 9:23 PM

Can you please add these additional tests for the invoke/catchswitch cases?

declare void @dummy()

define i32 @invoke_not_willreturn(i32 %a, i32 %b) personality ptr null {
; CHECK-LABEL: @invoke_not_willreturn(
; CHECK-NEXT:  entry:
; CHECK-NEXT:    invoke void @dummy()
; CHECK-NEXT:    to label [[CONT:%.*]] unwind label [[LPAD:%.*]]
; CHECK:       cont:
; CHECK-NEXT:    [[DIV:%.*]] = sdiv i32 [[A:%.*]], [[B:%.*]]
; CHECK-NEXT:    ret i32 [[DIV]]
; CHECK:       lpad:
; CHECK-NEXT:    [[TMP0:%.*]] = landingpad { ptr, i32 }
; CHECK-NEXT:    cleanup
; CHECK-NEXT:    [[REM:%.*]] = srem i32 [[A]], [[B]]
; CHECK-NEXT:    ret i32 [[REM]]
;
entry:
  invoke void @dummy()
  to label %cont unwind label %lpad

cont:
  %div = sdiv i32 %a, %b
  ret i32 %div

lpad:
  landingpad { ptr, i32 }
  cleanup
  %rem = srem i32 %a, %b
  ret i32 %rem
}

define i32 @invoke_willreturn(i32 %a, i32 %b) personality ptr null {
; CHECK-LABEL: @invoke_willreturn(
; CHECK-NEXT:  entry:
; CHECK-NEXT:    [[DIV:%.*]] = sdiv i32 [[A:%.*]], [[B:%.*]]
; CHECK-NEXT:    [[REM:%.*]] = srem i32 [[A]], [[B]]
; CHECK-NEXT:    invoke void @dummy() #[[ATTR0:[0-9]+]]
; CHECK-NEXT:    to label [[CONT:%.*]] unwind label [[LPAD:%.*]]
; CHECK:       cont:
; CHECK-NEXT:    ret i32 [[DIV]]
; CHECK:       lpad:
; CHECK-NEXT:    [[TMP0:%.*]] = landingpad { ptr, i32 }
; CHECK-NEXT:    cleanup
; CHECK-NEXT:    ret i32 [[REM]]
;
entry:
  invoke void @dummy() willreturn
  to label %cont unwind label %lpad

cont:
  %div = sdiv i32 %a, %b
  ret i32 %div

lpad:
  landingpad { ptr, i32 }
  cleanup
  %rem = srem i32 %a, %b
  ret i32 %rem
}

; Use this personality function so that catchpad is guaranteed to transfer.
declare void @ProcessCLRException()

define i32 @catchswitch(i32 %a, i32 %b) personality ptr @ProcessCLRException {
; CHECK-LABEL: @catchswitch(
; CHECK-NEXT:  entry:
; CHECK-NEXT:    invoke void @dummy() #[[ATTR0]]
; CHECK-NEXT:    to label [[CONT:%.*]] unwind label [[LPAD:%.*]]
; CHECK:       cont:
; CHECK-NEXT:    ret i32 0
; CHECK:       lpad:
; CHECK-NEXT:    [[CS:%.*]] = catchswitch within none [label %cp] unwind label [[LPAD_END:%.*]]
; CHECK:       cp:
; CHECK-NEXT:    [[TMP0:%.*]] = catchpad within [[CS]] []
; CHECK-NEXT:    [[DIV:%.*]] = sdiv i32 [[A:%.*]], [[B:%.*]]
; CHECK-NEXT:    ret i32 [[DIV]]
; CHECK:       lpad.end:
; CHECK-NEXT:    [[TMP1:%.*]] = cleanuppad within none []
; CHECK-NEXT:    [[REM:%.*]] = srem i32 [[A]], [[B]]
; CHECK-NEXT:    ret i32 [[REM]]
;
entry:
  invoke void @dummy() willreturn
  to label %cont unwind label %lpad

cont:
  ret i32 0

lpad:
  %cs = catchswitch within none [label %cp] unwind label %lpad.end

cp:
  catchpad within %cs []
  %div = sdiv i32 %a, %b
  ret i32 %div

lpad.end:
  cleanuppad within none []
  %rem = srem i32 %a, %b
  ret i32 %rem
}

The catchswitch exception was a bit tricky to come up with, because we need both the right personality function and no unwind to caller, otherwise we will fail transfer checks already.

llvm/lib/Transforms/Scalar/DivRemPairs.cpp
284	"extra Div" -> Mul + Sub? I don't think we'd perform an actual div.
285	nit: `BasicBlock *`
287	nit: No braces for single-line if.
294	Hm, this is stricter than what is needed, most exception pads are fine. It looks like we don't really have a dedicated method for this, so just checking `!isa<CatchSwitchInst>(PredBB->getTerminator())` should be fine.
llvm/test/Transforms/DivRemPairs/X86/div-rem-pairs.ll
411	I don't think this tests the right case, because the udiv/urem ops are not the same. We can't have the ops be derived from a phi.

Update based on feedback.

resistor marked 5 inline comments as done.Dec 27 2022, 9:10 PM

Harbormaster completed remote builds in B205046: Diff 485456.Dec 27 2022, 9:57 PM

LGTM, thanks!

llvm/lib/Transforms/Scalar/DivRemPairs.cpp
278	nit: and and

Fix typo.

This revision was landed with ongoing or failed builds.Dec 28 2022, 10:24 AM

Closed by commit rG88e85aa58006: Handle simple diamond CFG hoisting in DivRemPairs. (authored by resistor). · Explain Why

This revision was automatically updated to reflect the committed changes.

resistor added a commit: rG88e85aa58006: Handle simple diamond CFG hoisting in DivRemPairs..

Harbormaster completed remote builds in B205087: Diff 485519.Dec 28 2022, 11:37 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

DivRemPairs.cpp

28 lines

test/

Transforms/

DivRemPairs/

MSP430/

div-rem-pairs.ll

35 lines

Mips/

div-rem-pairs.ll

7 lines

PowerPC/

div-rem-pairs.ll

4 lines

RISCV/

div-rem-pairs.ll

35 lines

X86/

div-expanded-rem-pair.ll

6 lines

div-rem-pairs.ll

232 lines

Diff 485521

llvm/lib/Transforms/Scalar/DivRemPairs.cpp

Show First 20 Lines • Show All 261 Lines • ▼ Show 20 Lines if (!DivDominates && !DT.dominates(RemInst, DivInst)) {

// | / // | /

// Div // Div

// //

// If the Rem block has a single predecessor and successor, and all paths // If the Rem block has a single predecessor and successor, and all paths

// from PredBB go to either RemBB or DivBB, and execution of RemBB and // from PredBB go to either RemBB or DivBB, and execution of RemBB and

// DivBB will always reach the Div/Rem, we can hoist Div to PredBB. If // DivBB will always reach the Div/Rem, we can hoist Div to PredBB. If

// we have a DivRem operation we can also hoist Rem. Otherwise we'll leave // we have a DivRem operation we can also hoist Rem. Otherwise we'll leave

// Rem where it is and rewrite it to mul/sub. // Rem where it is and rewrite it to mul/sub.

// FIXME: We could handle more hoisting cases. if (RemBB->getSingleSuccessor() == DivBB) {

if (RemBB->getSingleSuccessor() == DivBB)

PredBB = RemBB->getUniquePredecessor(); PredBB = RemBB->getUniquePredecessor();

if (PredBB && IsSafeToHoist(RemInst, RemBB) && // Look for something like this

IsSafeToHoist(DivInst, DivBB) && // PredBB

// / \

// Div Rem

// If the Rem and Din blocks share a unique predecessor, and all

lebedev.riUnsubmitted

Not Done

// Div Rem

- // If the Rem and Din blocks share a unique predecessor, and and all paths

+ // If the Rem and Div blocks share a unique predecessor, and and all paths

// from PredBB go to either RemBB or DivBB, and execution of RemBB and

lebedev.ri:

nikicUnsubmitted

Not Done

nit: and and

nikic: nit: and and

// paths from PredBB go to either RemBB or DivBB, and execution of RemBB

// and DivBB will always reach the Div/Rem, we can hoist Div to PredBB.

// If we have a DivRem operation we can also hoist Rem. By hoisting both

// ops to the same block, we reduce code size and allow the DivRem to

// issue sooner. Without a DivRem op, this transformation is

// unprofitable because we would end up performing an extra Mul+Sub on

nikicUnsubmitted

Done

"extra Div" -> Mul + Sub? I don't think we'd perform an actual div.

nikic: "extra Div" -> Mul + Sub? I don't think we'd perform an actual div.

// the Rem path.

nikicUnsubmitted

Done

nit: BasicBlock *

nikic: nit: `BasicBlock *`

} else if (BasicBlock *RemPredBB = RemBB->getUniquePredecessor()) {

// This hoist is only profitable when the target has a DivRem op.

nikicUnsubmitted

Done

nit: No braces for single-line if.

nikic: nit: No braces for single-line if.

if (HasDivRemOp && RemPredBB == DivBB->getUniquePredecessor())

PredBB = RemPredBB;

}

// FIXME: We could handle more hoisting cases.

if (PredBB && !isa<CatchSwitchInst>(PredBB->getTerminator()) &&

isGuaranteedToTransferExecutionToSuccessor(PredBB->getTerminator()) &&

nikicUnsubmitted

Done

Hm, this is stricter than what is needed, most exception pads are fine. It looks like we don't really have a dedicated method for this, so just checking !isa<CatchSwitchInst>(PredBB->getTerminator()) should be fine.

nikic: Hm, this is stricter than what is needed, most exception pads are fine. It looks like we don't…

IsSafeToHoist(RemInst, RemBB) && IsSafeToHoist(DivInst, DivBB) &&

all_of(successors(PredBB), all_of(successors(PredBB),

[&](BasicBlock *BB) { return BB == DivBB || BB == RemBB; }) && [&](BasicBlock *BB) { return BB == DivBB || BB == RemBB; }) &&

all_of(predecessors(DivBB), all_of(predecessors(DivBB),

[&](BasicBlock *BB) { return BB == RemBB || BB == PredBB; })) { [&](BasicBlock *BB) { return BB == RemBB || BB == PredBB; })) {

DivDominates = true; DivDominates = true;

DivInst->moveBefore(PredBB->getTerminator()); DivInst->moveBefore(PredBB->getTerminator());

Changed = true; Changed = true;

if (HasDivRemOp) { if (HasDivRemOp) {

▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

llvm/test/Transforms/DivRemPairs/MSP430/div-rem-pairs.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -passes=div-rem-pairs -S -mtriple=msp430-unknown-unknown \| FileCheck %s

				; Do not hoist to the common predecessor block since we don't
				; have a div-rem operation.

				define i32 @no_domination(i1 %cmp, i32 %a, i32 %b) {
				; CHECK-LABEL: @no_domination(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 [[CMP:%.]], label [[IF:%.]], label [[ELSE:%.*]]
				; CHECK: if:
				; CHECK-NEXT: [[DIV:%.]] = sdiv i32 [[A:%.]], [[B:%.*]]
				; CHECK-NEXT: br label [[END:%.*]]
				; CHECK: else:
				; CHECK-NEXT: [[REM:%.*]] = srem i32 [[A]], [[B]]
				; CHECK-NEXT: br label [[END]]
				; CHECK: end:
				; CHECK-NEXT: [[RET:%.*]] = phi i32 [ [[DIV]], [[IF]] ], [ [[REM]], [[ELSE]] ]
				; CHECK-NEXT: ret i32 [[RET]]
				;
				entry:
				br i1 %cmp, label %if, label %else

				if:
				%div = sdiv i32 %a, %b
				br label %end

				else:
				%rem = srem i32 %a, %b
				br label %end

				end:
				%ret = phi i32 [ %div, %if ], [ %rem, %else ]
				ret i32 %ret
				}

llvm/test/Transforms/DivRemPairs/Mips/div-rem-pairs.ll

Show First 20 Lines • Show All 311 Lines • ▼ Show 20 Lines	if:
%rem = urem i128 %a, %b		%rem = urem i128 %a, %b
br label %end		br label %end

end:		end:
%ret = phi i128 [ %rem, %if ], [ 3, %entry ]		%ret = phi i128 [ %rem, %if ], [ 3, %entry ]
ret i128 %ret		ret i128 %ret
}		}

; We don't hoist if one op does not dominate the other,		; Hoist both ops to the common predecessor block.
; but we could hoist both ops to the common predecessor block?

define i32 @no_domination(i1 %cmp, i32 %a, i32 %b) {		define i32 @no_domination(i1 %cmp, i32 %a, i32 %b) {
; CHECK-LABEL: @no_domination(		; CHECK-LABEL: @no_domination(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[DIV:%.]] = sdiv i32 [[A:%.]], [[B:%.*]]
		; CHECK-NEXT: [[REM:%.*]] = srem i32 [[A]], [[B]]
; CHECK-NEXT: br i1 [[CMP:%.]], label [[IF:%.]], label [[ELSE:%.*]]		; CHECK-NEXT: br i1 [[CMP:%.]], label [[IF:%.]], label [[ELSE:%.*]]
; CHECK: if:		; CHECK: if:
; CHECK-NEXT: [[DIV:%.]] = sdiv i32 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: br label [[END:%.*]]		; CHECK-NEXT: br label [[END:%.*]]
; CHECK: else:		; CHECK: else:
; CHECK-NEXT: [[REM:%.*]] = srem i32 [[A]], [[B]]
; CHECK-NEXT: br label [[END]]		; CHECK-NEXT: br label [[END]]
; CHECK: end:		; CHECK: end:
; CHECK-NEXT: [[RET:%.*]] = phi i32 [ [[DIV]], [[IF]] ], [ [[REM]], [[ELSE]] ]		; CHECK-NEXT: [[RET:%.*]] = phi i32 [ [[DIV]], [[IF]] ], [ [[REM]], [[ELSE]] ]
; CHECK-NEXT: ret i32 [[RET]]		; CHECK-NEXT: ret i32 [[RET]]
;		;
entry:		entry:
br i1 %cmp, label %if, label %else		br i1 %cmp, label %if, label %else

Show All 13 Lines

llvm/test/Transforms/DivRemPairs/PowerPC/div-rem-pairs.ll

Show First 20 Lines • Show All 325 Lines • ▼ Show 20 Lines	if:
%rem = urem i128 %a, %b		%rem = urem i128 %a, %b
br label %end		br label %end

end:		end:
%ret = phi i128 [ %rem, %if ], [ 3, %entry ]		%ret = phi i128 [ %rem, %if ], [ 3, %entry ]
ret i128 %ret		ret i128 %ret
}		}

; We don't hoist if one op does not dominate the other,		; Do not hoist to the common predecessor block since we don't
; but we could hoist both ops to the common predecessor block?		; have a div-rem operation.

define i32 @no_domination(i1 %cmp, i32 %a, i32 %b) {		define i32 @no_domination(i1 %cmp, i32 %a, i32 %b) {
; CHECK-LABEL: @no_domination(		; CHECK-LABEL: @no_domination(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 [[CMP:%.]], label [[IF:%.]], label [[ELSE:%.*]]		; CHECK-NEXT: br i1 [[CMP:%.]], label [[IF:%.]], label [[ELSE:%.*]]
; CHECK: if:		; CHECK: if:
; CHECK-NEXT: [[DIV:%.]] = sdiv i32 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[DIV:%.]] = sdiv i32 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: br label [[END:%.*]]		; CHECK-NEXT: br label [[END:%.*]]
Show All 23 Lines

llvm/test/Transforms/DivRemPairs/RISCV/div-rem-pairs.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -passes=div-rem-pairs -S -mtriple=riscv64-unknown-unknown \| FileCheck %s

				; Do not hoist to the common predecessor block since we don't
				; have a div-rem operation.

				define i32 @no_domination(i1 %cmp, i32 %a, i32 %b) {
				; CHECK-LABEL: @no_domination(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 [[CMP:%.]], label [[IF:%.]], label [[ELSE:%.*]]
				; CHECK: if:
				; CHECK-NEXT: [[DIV:%.]] = sdiv i32 [[A:%.]], [[B:%.*]]
				; CHECK-NEXT: br label [[END:%.*]]
				; CHECK: else:
				; CHECK-NEXT: [[REM:%.*]] = srem i32 [[A]], [[B]]
				; CHECK-NEXT: br label [[END]]
				; CHECK: end:
				; CHECK-NEXT: [[RET:%.*]] = phi i32 [ [[DIV]], [[IF]] ], [ [[REM]], [[ELSE]] ]
				; CHECK-NEXT: ret i32 [[RET]]
				;
				entry:
				br i1 %cmp, label %if, label %else

				if:
				%div = sdiv i32 %a, %b
				br label %end

				else:
				%rem = srem i32 %a, %b
				br label %end

				end:
				%ret = phi i32 [ %div, %if ], [ %rem, %else ]
				ret i32 %ret
				}

llvm/test/Transforms/DivRemPairs/X86/div-expanded-rem-pair.ll

Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	end:
ret i128 %ret		ret i128 %ret
}		}

; Even in expanded form, we can end up with div and rem in different basic		; Even in expanded form, we can end up with div and rem in different basic
; blocks neither of which dominates each another.		; blocks neither of which dominates each another.
define i32 @can_have_divrem_in_mutually_nondominating_bbs(i1 %cmp, i32 %a, i32 %b) {		define i32 @can_have_divrem_in_mutually_nondominating_bbs(i1 %cmp, i32 %a, i32 %b) {
; CHECK-LABEL: @can_have_divrem_in_mutually_nondominating_bbs(		; CHECK-LABEL: @can_have_divrem_in_mutually_nondominating_bbs(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[T3:%.]] = udiv i32 [[A:%.]], [[B:%.*]]
		; CHECK-NEXT: [[T2_RECOMPOSED:%.*]] = urem i32 [[A]], [[B]]
; CHECK-NEXT: br i1 [[CMP:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]		; CHECK-NEXT: br i1 [[CMP:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[T0:%.]] = udiv i32 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[T0:%.*]] = udiv i32 [[A]], [[B]]
		craig.topperUnsubmitted Not Done Reply Inline Actions Isn't this the same as T3? Doesn't entry dominate this block? craig.topper: Isn't this the same as T3? Doesn't entry dominate this block?
		lebedev.riUnsubmitted Not Done Reply Inline Actions We only deal with a single div-rem pair (in the sense, we don't deal with the case of multiple identical div's or rem's), and we don't CSE div's/rem's, and only do a single pass over the function, so this indeed falls through cracks. lebedev.ri: We only deal with a single div-rem pair (in the sense, we don't deal with the case of multiple…
; CHECK-NEXT: [[T1:%.*]] = mul nuw i32 [[T0]], [[B]]		; CHECK-NEXT: [[T1:%.*]] = mul nuw i32 [[T0]], [[B]]
; CHECK-NEXT: [[T2_RECOMPOSED:%.*]] = urem i32 [[A]], [[B]]
; CHECK-NEXT: br label [[END:%.*]]		; CHECK-NEXT: br label [[END:%.*]]
; CHECK: if.else:		; CHECK: if.else:
; CHECK-NEXT: [[T3:%.*]] = udiv i32 [[A]], [[B]]
; CHECK-NEXT: br label [[END]]		; CHECK-NEXT: br label [[END]]
; CHECK: end:		; CHECK: end:
; CHECK-NEXT: [[RET:%.*]] = phi i32 [ [[T2_RECOMPOSED]], [[IF_THEN]] ], [ [[T3]], [[IF_ELSE]] ]		; CHECK-NEXT: [[RET:%.*]] = phi i32 [ [[T2_RECOMPOSED]], [[IF_THEN]] ], [ [[T3]], [[IF_ELSE]] ]
; CHECK-NEXT: ret i32 [[RET]]		; CHECK-NEXT: ret i32 [[RET]]
;		;
entry:		entry:
br i1 %cmp, label %if.then, label %if.else		br i1 %cmp, label %if.then, label %if.else

▲ Show 20 Lines • Show All 371 Lines • Show Last 20 Lines

llvm/test/Transforms/DivRemPairs/X86/div-rem-pairs.ll

Show First 20 Lines • Show All 302 Lines • ▼ Show 20 Lines	if:
%rem = urem i128 %a, %b		%rem = urem i128 %a, %b
br label %end		br label %end

end:		end:
%ret = phi i128 [ %rem, %if ], [ 3, %entry ]		%ret = phi i128 [ %rem, %if ], [ 3, %entry ]
ret i128 %ret		ret i128 %ret
}		}

; We don't hoist if one op does not dominate the other,		; Hoist both ops to the common predecessor block.
; but we could hoist both ops to the common predecessor block?

define i32 @no_domination(i1 %cmp, i32 %a, i32 %b) {		define i32 @no_domination(i1 %cmp, i32 %a, i32 %b) {
; CHECK-LABEL: @no_domination(		; CHECK-LABEL: @no_domination(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[DIV:%.]] = sdiv i32 [[A:%.]], [[B:%.*]]
		; CHECK-NEXT: [[REM:%.*]] = srem i32 [[A]], [[B]]
; CHECK-NEXT: br i1 [[CMP:%.]], label [[IF:%.]], label [[ELSE:%.*]]		; CHECK-NEXT: br i1 [[CMP:%.]], label [[IF:%.]], label [[ELSE:%.*]]
; CHECK: if:		; CHECK: if:
; CHECK-NEXT: [[DIV:%.]] = sdiv i32 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: br label [[END:%.*]]		; CHECK-NEXT: br label [[END:%.*]]
; CHECK: else:		; CHECK: else:
; CHECK-NEXT: [[REM:%.*]] = srem i32 [[A]], [[B]]
; CHECK-NEXT: br label [[END]]		; CHECK-NEXT: br label [[END]]
; CHECK: end:		; CHECK: end:
; CHECK-NEXT: [[RET:%.*]] = phi i32 [ [[DIV]], [[IF]] ], [ [[REM]], [[ELSE]] ]		; CHECK-NEXT: [[RET:%.*]] = phi i32 [ [[DIV]], [[IF]] ], [ [[REM]], [[ELSE]] ]
; CHECK-NEXT: ret i32 [[RET]]		; CHECK-NEXT: ret i32 [[RET]]
;		;
entry:		entry:
br i1 %cmp, label %if, label %else		br i1 %cmp, label %if, label %else

if:		if:
%div = sdiv i32 %a, %b		%div = sdiv i32 %a, %b
br label %end		br label %end

else:		else:
%rem = srem i32 %a, %b		%rem = srem i32 %a, %b
br label %end		br label %end

end:		end:
%ret = phi i32 [ %div, %if ], [ %rem, %else ]		%ret = phi i32 [ %div, %if ], [ %rem, %else ]
ret i32 %ret		ret i32 %ret
}		}

		define i64 @diamond_pred_has_other_sucessors(i64 %a, i64 %b, i64 %c) {
		; CHECK-LABEL: @diamond_pred_has_other_sucessors(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: switch i64 [[A:%.]], label [[RETURN:%.]] [
		; CHECK-NEXT: i64 0, label [[SW_BB:%.*]]
		; CHECK-NEXT: i64 1, label [[SW_BB1:%.*]]
		; CHECK-NEXT: ]
		; CHECK: sw.bb:
		; CHECK-NEXT: [[DIV:%.]] = udiv i64 [[B:%.]], [[C:%.*]]
		; CHECK-NEXT: br label [[RETURN]]
		; CHECK: sw.bb1:
		; CHECK-NEXT: [[REM:%.*]] = urem i64 [[B]], [[C]]
		; CHECK-NEXT: br label [[RETURN]]
		; CHECK: return:
		; CHECK-NEXT: [[RETVAL_0:%.]] = phi i64 [ [[REM]], [[SW_BB1]] ], [ [[DIV]], [[SW_BB]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-NEXT: ret i64 [[RETVAL_0]]
		;
		entry:
		switch i64 %a, label %return [
		i64 0, label %sw.bb
		i64 1, label %sw.bb1
		]

		sw.bb: ; preds = %entry
		%div = udiv i64 %b, %c
		br label %return

		sw.bb1: ; preds = %entry
		%rem = urem i64 %b, %c
		br label %return

		return: ; preds = %entry, %sw.bb1, %sw.bb
		%retval.0 = phi i64 [ %rem, %sw.bb1 ], [ %div, %sw.bb ], [ 0, %entry ]
		ret i64 %retval.0
		}

		define i64 @diamond_div_no_unique_predecessor(i64 %a, i64 %b, i64 %c) {
		; CHECK-LABEL: @diamond_div_no_unique_predecessor(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[CMP:%.]] = icmp slt i64 [[A:%.]], -1
		; CHECK-NEXT: br i1 [[CMP]], label [[FOO:%.]], label [[IF_END:%.]]
		; CHECK: if.end:
		; CHECK-NEXT: [[CMP1:%.*]] = icmp sgt i64 [[A]], 1
		; CHECK-NEXT: br i1 [[CMP1]], label [[BAR:%.]], label [[BAZ:%.]]
		; CHECK: baz:
		; CHECK-NEXT: [[DIV:%.]] = udiv i64 [[B:%.]], [[C:%.*]]
		; CHECK-NEXT: br label [[RETURN:%.*]]
		; CHECK: foo:
		; CHECK-NEXT: br label [[BAZ]]
		; CHECK: bar:
		; CHECK-NEXT: [[REM:%.*]] = urem i64 [[B]], [[C]]
		; CHECK-NEXT: br label [[RETURN]]
		; CHECK: return:
		; CHECK-NEXT: [[RETVAL_0:%.*]] = phi i64 [ [[DIV]], [[BAZ]] ], [ [[REM]], [[BAR]] ]
		; CHECK-NEXT: ret i64 [[RETVAL_0]]
		;
		entry:
		%cmp = icmp slt i64 %a, -1
		br i1 %cmp, label %foo, label %if.end

		if.end:
		%cmp1 = icmp sgt i64 %a, 1
		br i1 %cmp1, label %bar, label %baz

		baz:
		%div = udiv i64 %b, %c
		br label %return

		foo:
		nikicUnsubmitted Done Reply Inline Actions I don't think this tests the right case, because the udiv/urem ops are not the same. We can't have the ops be derived from a phi. nikic: I don't think this tests the right case, because the udiv/urem ops are not the same. We can't…
		br label %baz

		bar:
		%rem = urem i64 %b, %c
		br label %return

		return:
		%retval.0 = phi i64 [ %div, %baz ], [ %rem, %bar ]
		ret i64 %retval.0
		}

		define i64 @diamond_rem_no_unique_predecessor(i64 %a, i64 %b, i64 %c) {
		; CHECK-LABEL: @diamond_rem_no_unique_predecessor(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[CMP:%.]] = icmp slt i64 [[A:%.]], -1
		; CHECK-NEXT: br i1 [[CMP]], label [[FOO:%.]], label [[IF_END:%.]]
		; CHECK: if.end:
		; CHECK-NEXT: [[CMP1:%.*]] = icmp sgt i64 [[A]], 1
		; CHECK-NEXT: br i1 [[CMP1]], label [[BAR:%.]], label [[BAZ:%.]]
		; CHECK: baz:
		; CHECK-NEXT: [[REM:%.]] = urem i64 [[B:%.]], [[C:%.*]]
		; CHECK-NEXT: br label [[RETURN:%.*]]
		; CHECK: foo:
		; CHECK-NEXT: br label [[BAZ]]
		; CHECK: bar:
		; CHECK-NEXT: [[DIV:%.*]] = udiv i64 [[B]], [[C]]
		; CHECK-NEXT: br label [[RETURN]]
		; CHECK: return:
		; CHECK-NEXT: [[RETVAL_0:%.*]] = phi i64 [ [[REM]], [[BAZ]] ], [ [[DIV]], [[BAR]] ]
		; CHECK-NEXT: ret i64 [[RETVAL_0]]
		;
		entry:
		%cmp = icmp slt i64 %a, -1
		br i1 %cmp, label %foo, label %if.end

		if.end:
		%cmp1 = icmp sgt i64 %a, 1
		br i1 %cmp1, label %bar, label %baz

		baz:
		%rem = urem i64 %b, %c
		br label %return

		foo:
		br label %baz

		bar:
		%div = udiv i64 %b, %c
		br label %return

		return:
		%retval.0 = phi i64 [ %rem, %baz ], [ %div, %bar ]
		ret i64 %retval.0
		}

		declare void @dummy()

		define i32 @invoke_not_willreturn(i32 %a, i32 %b) personality ptr null {
		; CHECK-LABEL: @invoke_not_willreturn(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: invoke void @dummy()
		; CHECK-NEXT: to label [[CONT:%.]] unwind label [[LPAD:%.]]
		; CHECK: cont:
		; CHECK-NEXT: [[DIV:%.]] = sdiv i32 [[A:%.]], [[B:%.*]]
		; CHECK-NEXT: ret i32 [[DIV]]
		; CHECK: lpad:
		; CHECK-NEXT: [[TMP0:%.*]] = landingpad { ptr, i32 }
		; CHECK-NEXT: cleanup
		; CHECK-NEXT: [[REM:%.*]] = srem i32 [[A]], [[B]]
		; CHECK-NEXT: ret i32 [[REM]]
		;
		entry:
		invoke void @dummy()
		to label %cont unwind label %lpad

		cont:
		%div = sdiv i32 %a, %b
		ret i32 %div

		lpad:
		landingpad { ptr, i32 }
		cleanup
		%rem = srem i32 %a, %b
		ret i32 %rem
		}

		define i32 @invoke_willreturn(i32 %a, i32 %b) personality ptr null {
		; CHECK-LABEL: @invoke_willreturn(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[DIV:%.]] = sdiv i32 [[A:%.]], [[B:%.*]]
		; CHECK-NEXT: [[REM:%.*]] = srem i32 [[A]], [[B]]
		; CHECK-NEXT: invoke void @dummy() #[[ATTR0:[0-9]+]]
		; CHECK-NEXT: to label [[CONT:%.]] unwind label [[LPAD:%.]]
		; CHECK: cont:
		; CHECK-NEXT: ret i32 [[DIV]]
		; CHECK: lpad:
		; CHECK-NEXT: [[TMP0:%.*]] = landingpad { ptr, i32 }
		; CHECK-NEXT: cleanup
		; CHECK-NEXT: ret i32 [[REM]]
		;
		entry:
		invoke void @dummy() willreturn
		to label %cont unwind label %lpad

		cont:
		%div = sdiv i32 %a, %b
		ret i32 %div

		lpad:
		landingpad { ptr, i32 }
		cleanup
		%rem = srem i32 %a, %b
		ret i32 %rem
		}

		; Use this personality function so that catchpad is guaranteed to transfer.
		declare void @ProcessCLRException()

		define i32 @catchswitch(i32 %a, i32 %b) personality ptr @ProcessCLRException {
		; CHECK-LABEL: @catchswitch(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: invoke void @dummy() #[[ATTR0]]
		; CHECK-NEXT: to label [[CONT:%.]] unwind label [[LPAD:%.]]
		; CHECK: cont:
		; CHECK-NEXT: ret i32 0
		; CHECK: lpad:
		; CHECK-NEXT: [[CS:%.]] = catchswitch within none [label %cp] unwind label [[LPAD_END:%.]]
		; CHECK: cp:
		; CHECK-NEXT: [[TMP0:%.*]] = catchpad within [[CS]] []
		; CHECK-NEXT: [[DIV:%.]] = sdiv i32 [[A:%.]], [[B:%.*]]
		; CHECK-NEXT: ret i32 [[DIV]]
		; CHECK: lpad.end:
		; CHECK-NEXT: [[TMP1:%.*]] = cleanuppad within none []
		; CHECK-NEXT: [[REM:%.*]] = srem i32 [[A]], [[B]]
		; CHECK-NEXT: ret i32 [[REM]]
		;
		entry:
		invoke void @dummy() willreturn
		to label %cont unwind label %lpad

		cont:
		ret i32 0

		lpad:
		%cs = catchswitch within none [label %cp] unwind label %lpad.end

		cp:
		catchpad within %cs []
		%div = sdiv i32 %a, %b
		ret i32 %div

		lpad.end:
		cleanuppad within none []
		%rem = srem i32 %a, %b
		ret i32 %rem
		}

This is an archive of the discontinued LLVM Phabricator instance.

Handle simple diamond CFG hoisting in DivRemPairs.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 485521

llvm/lib/Transforms/Scalar/DivRemPairs.cpp

llvm/test/Transforms/DivRemPairs/MSP430/div-rem-pairs.ll

llvm/test/Transforms/DivRemPairs/Mips/div-rem-pairs.ll

llvm/test/Transforms/DivRemPairs/PowerPC/div-rem-pairs.ll

llvm/test/Transforms/DivRemPairs/RISCV/div-rem-pairs.ll

llvm/test/Transforms/DivRemPairs/X86/div-expanded-rem-pair.ll

llvm/test/Transforms/DivRemPairs/X86/div-rem-pairs.ll

Handle simple diamond CFG hoisting in DivRemPairs.
ClosedPublic