This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] allow speculation of div/rem when sibling op exists (PR31028)
AbandonedPublic

Authored by spatel on Mar 13 2017, 2:18 PM.

Download Raw Diff

Details

Reviewers

filcab
jmolloy
efriedma

Summary

As noted in:
https://bugs.llvm.org/show_bug.cgi?id=31028
...I initially thought this would be a CGP patch and limited to targets like x86. But doing this in SimplifyCFG improves code even for targets like AArch64 that don't have a divrem instruction. That's because we can replace compare and branch with csel.

I know that the hoisting of a rem may be a stretch of the conservative limits of SimplifyCFG, but the benefits of collapsing the blocks seems like a worthy transform. We could (as we do for other expensive ops) split this back up in CGP if it is a concern.

For the example in the PR, AArch64 had:

  mov      w8, w0
  sdiv    w0, w8, w1
  msub    w8, w0, w1, w8
  cmp       w8, #42         // =42
  b.eq    .LBB0_2    <--- nothing in the backend is flattening this
  orr     w0, wzr, #0x3
.LBB0_2:
  ret

After:

sdiv	w8, w0, w1
msub	w9, w8, w1, w0
cmp		w9, #42         // =42
orr	w9, wzr, #0x3
csel	w0, w8, w9, eq
ret

On x86, we had:

  movl	%edi, %ecx
  movl	%ecx, %eax
  cltd
  idivl	%esi
  movl	$3, %eax
  cmpl	$42, %edx
  jne	.LBB0_2
# BB#1:                            
  movl	%ecx, %eax
  cltd
  idivl	%esi    <--- very expensive and useless instruction
.LBB0_2:                
  retq

After:

movl	%edi, %eax
cltd
idivl	%esi
cmpl	$42, %edx
movl	$3, %ecx
cmovnel	%ecx, %eax
retq

Diff Detail

Event Timeline

spatel created this revision.Mar 13 2017, 2:18 PM

Herald added subscribers: mcrosier, aemerson. · View Herald TranscriptMar 13 2017, 2:18 PM

filcab added inline comments.Mar 13 2017, 3:05 PM

lib/Transforms/Utils/SimplifyCFG.cpp
1921	Please make one test per pair, too.

Nevermind, thanks for pointing out you already have the tests I asked for. I somehow skipped over them when reviewing.

Every CPU can lower divrem to something cheaper than a separate div and rem; that seems fine.

My one concern here is that for a target without a hardware divrem for the width in question, if the rem is inside the conditional, you're speculating a multiply. This is fine if it gets lowered to a hardware instruction, but might be problematic if it gets lowered to a libcall.

In D30910#699943, @efriedma wrote:

Every CPU can lower divrem to something cheaper than a separate div and rem; that seems fine.

My one concern here is that for a target without a hardware divrem for the width in question, if the rem is inside the conditional, you're speculating a multiply. This is fine if it gets lowered to a hardware instruction, but might be problematic if it gets lowered to a libcall.

Right - I figured I was close to the edge for the hoisting rem case. I see 3 possible solutions:

Fix that up in CGP (the despeculation machinery already exists for things like llvm.cttz).
Limit the transform to legal ops/types here in SimplifyCFG (TTI should give us that?)
Don't hoist rem in this patch, just div. We would still catch the case in the bug report.

Any prefs or other options?

TTI::getUserCost, maybe? Not sure our costs are accurate for illegal types.

I wonder if it would make sense to perform this transform in EarlyCSE or something like that instead; it might be worthwhile even if we can't speculate the whole basic block.

Err, I meant TTI::getOperationCost.

spatel mentioned this in D31037: [EarlyCSE] hoist div/rem when sibling op exists (PR31028).Mar 16 2017, 9:58 AM

RKSimon added a subscriber: RKSimon.Mar 18 2017, 3:21 PM

spatel mentioned this in D37121: [DivRemHoist] add a pass to move div/rem pairs into the same block (PR31028).Aug 24 2017, 2:54 PM

spatel mentioned this in rL312862: [DivRempairs] add a pass to optimize div/rem pairs (PR31028).Sep 9 2017, 6:40 AM

Abandoning. We've solved this in a more complete way with a dedicated pass in D37121, so I don't think there's a need for a lesser hoisting transform. I'll update the comments in the test file.

Revision Contents

Path

Size

lib/

Transforms/

Utils/

SimplifyCFG.cpp

49 lines

test/

Transforms/

SimplifyCFG/

div-rem-pairs.ll

127 lines

Diff 91609

lib/Transforms/Utils/SimplifyCFG.cpp

Show First 20 Lines • Show All 1,894 Lines • ▼ Show 20 Lines	if (auto *SI = dyn_cast<StoreInst>(&CurI)) {
return SI->getValueOperand();		return SI->getValueOperand();
return nullptr; // Unknown store.		return nullptr; // Unknown store.
}		}
}		}

return nullptr;		return nullptr;
}		}

		/// Allow hoisting into the specified block for division/remainder for matching
		/// siblings such as sdiv X, Y and srem X, Y. This bypasses normal safety and
		/// cost checks because:
		/// (1) The hoisted op is guaranteed safe to execute because any potential
		/// faulting/UB caused by that op must already occur in the predecessor
		/// block from the sibling.
		/// (2) The true cost of the speculated div/rem will either be:
		/// (a) free for a target with unified div/rem instructions.
		/// (b) free when speculating a division because we are already calculating
		/// the remainder (remainder requires calculating the quotient).
		/// (c) low when speculating a remainder because we are already calculating
		/// the division (the cost of the division dominates multiply + sub).
		static bool allowSpeculationForDivRem(Instruction &I, BasicBlock &PredBB) {
		unsigned SiblingOpcode;
		switch (I.getOpcode()) {
		case Instruction::SDiv: SiblingOpcode = Instruction::SRem; break;
		case Instruction::UDiv: SiblingOpcode = Instruction::URem; break;
		case Instruction::SRem: SiblingOpcode = Instruction::SDiv; break;
		case Instruction::URem: SiblingOpcode = Instruction::UDiv; break;
		filcabUnsubmitted Not Done Reply Inline Actions Please make one test per pair, too. filcab: Please make one test per pair, too.
		default: return false;
		}

		return any_of(PredBB, [&](Instruction &PredI) {
		return PredI.getOpcode() == SiblingOpcode &&
		PredI.getOperand(0) == I.getOperand(0) &&
		PredI.getOperand(1) == I.getOperand(1);
		});
		}

/// \brief Speculate a conditional basic block flattening the CFG.		/// \brief Speculate a conditional basic block flattening the CFG.
///		///
/// Note that this is a very risky transform currently. Speculating		/// Note that this is a very risky transform currently. Speculating
/// instructions like this is most often not desirable. Instead, there is an MI		/// instructions like this is most often not desirable. Instead, there is an MI
/// pass which can do it with full awareness of the resource constraints.		/// pass which can do it with full awareness of the resource constraints.
/// However, some cases are "obvious" and we should do directly. An example of		/// However, some cases are "obvious" and we should do directly. An example of
/// this is speculating a single, reasonably cheap instruction.		/// this is speculating a single, reasonably cheap instruction.
///		///
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	if (isa<DbgInfoIntrinsic>(I))
continue;		continue;

// Only speculatively execute a single instruction (not counting the		// Only speculatively execute a single instruction (not counting the
// terminator) for now.		// terminator) for now.
++SpeculationCost;		++SpeculationCost;
if (SpeculationCost > 1)		if (SpeculationCost > 1)
return false;		return false;

		if (!allowSpeculationForDivRem(I, BB)) {
// Don't hoist the instruction if it's unsafe or expensive.		// Don't hoist the instruction if it's unsafe or expensive.
if (!isSafeToSpeculativelyExecute(I) &&		if (!isSafeToSpeculativelyExecute(I) &&
!(HoistCondStores && (SpeculatedStoreValue = isSafeToSpeculateStore(		!(HoistCondStores && (SpeculatedStoreValue = isSafeToSpeculateStore(
I, BB, ThenBB, EndBB))))		I, BB, ThenBB, EndBB))))
return false;		return false;
if (!SpeculatedStoreValue &&		if (!SpeculatedStoreValue &&
ComputeSpeculationCost(I, TTI) >		ComputeSpeculationCost(I, TTI) >
PHINodeFoldingThreshold * TargetTransformInfo::TCC_Basic)		PHINodeFoldingThreshold * TargetTransformInfo::TCC_Basic)
return false;		return false;
		}

// Store the store speculation candidate.		// Store the store speculation candidate.
if (SpeculatedStoreValue)		if (SpeculatedStoreValue)
SpeculatedStore = cast<StoreInst>(I);		SpeculatedStore = cast<StoreInst>(I);

// Do not hoist the instruction if any of its operands are defined but not		// Do not hoist the instruction if any of its operands are defined but not
// used in BB. The transformation will prevent the operand from		// used in BB. The transformation will prevent the operand from
// being sunk into the use block.		// being sunk into the use block.
▲ Show 20 Lines • Show All 4,030 Lines • Show Last 20 Lines

test/Transforms/SimplifyCFG/div-rem-pairs.ll

	; RUN: opt -simplifycfg -S < %s \| FileCheck %s			; RUN: opt -simplifycfg -S < %s \| FileCheck %s

	; FIXME: Hoist the sdiv because it's safe and free.			; Hoist the sdiv because it's safe and free.
	; PR31028 - https://bugs.llvm.org/show_bug.cgi?id=31028			; PR31028 - https://bugs.llvm.org/show_bug.cgi?id=31028

	define i32 @hoist_sdiv(i32 %a, i32 %b) {			define i32 @hoist_sdiv(i32 %a, i32 %b) {
	; CHECK-LABEL: @hoist_sdiv(			; CHECK-LABEL: @hoist_sdiv(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[REM:%.*]] = srem i32 %a, %b			; CHECK-NEXT: [[REM:%.*]] = srem i32 %a, %b
	; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[REM]], 42			; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[REM]], 42
	; CHECK-NEXT: br i1 [[CMP]], label %if, label %end
	; CHECK: if:
	; CHECK-NEXT: [[DIV:%.*]] = sdiv i32 %a, %b			; CHECK-NEXT: [[DIV:%.*]] = sdiv i32 %a, %b
	; CHECK-NEXT: br label %end			; CHECK-NEXT: [[DIV_:%.*]] = select i1 [[CMP]], i32 [[DIV]], i32 3
	; CHECK: end:			; CHECK-NEXT: ret i32 [[DIV_]]
	; CHECK-NEXT: [[RET:%.*]] = phi i32 [ [[DIV]], %if ], [ 3, %entry ]
	; CHECK-NEXT: ret i32 [[RET]]
	;			;
	entry:			entry:
	%rem = srem i32 %a, %b			%rem = srem i32 %a, %b
	%cmp = icmp eq i32 %rem, 42			%cmp = icmp eq i32 %rem, 42
	br i1 %cmp, label %if, label %end			br i1 %cmp, label %if, label %end

	if:			if:
	%div = sdiv i32 %a, %b			%div = sdiv i32 %a, %b
	br label %end			br label %end

	end:			end:
	%ret = phi i32 [ %div, %if ], [ 3, %entry ]			%ret = phi i32 [ %div, %if ], [ 3, %entry ]
	ret i32 %ret			ret i32 %ret
	}			}

	; FIXME: Hoist the udiv because it's safe and free.			; Hoist the udiv because it's safe and free.

	define i64 @hoist_udiv(i64 %a, i64 %b) {			define i64 @hoist_udiv(i64 %a, i64 %b) {
	; CHECK-LABEL: @hoist_udiv(			; CHECK-LABEL: @hoist_udiv(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[REM:%.*]] = urem i64 %a, %b			; CHECK-NEXT: [[REM:%.*]] = urem i64 %a, %b
	; CHECK-NEXT: [[CMP:%.*]] = icmp eq i64 [[REM]], 42			; CHECK-NEXT: [[CMP:%.*]] = icmp eq i64 [[REM]], 42
	; CHECK-NEXT: br i1 [[CMP]], label %if, label %end
	; CHECK: if:
	; CHECK-NEXT: [[DIV:%.*]] = udiv i64 %a, %b			; CHECK-NEXT: [[DIV:%.*]] = udiv i64 %a, %b
	; CHECK-NEXT: br label %end			; CHECK-NEXT: [[DIV_:%.*]] = select i1 [[CMP]], i64 [[DIV]], i64 3
	; CHECK: end:			; CHECK-NEXT: ret i64 [[DIV_]]
	; CHECK-NEXT: [[RET:%.*]] = phi i64 [ [[DIV]], %if ], [ 3, %entry ]
	; CHECK-NEXT: ret i64 [[RET]]
	;			;
	entry:			entry:
	%rem = urem i64 %a, %b			%rem = urem i64 %a, %b
	%cmp = icmp eq i64 %rem, 42			%cmp = icmp eq i64 %rem, 42
	br i1 %cmp, label %if, label %end			br i1 %cmp, label %if, label %end

	if:			if:
	%div = udiv i64 %a, %b			%div = udiv i64 %a, %b
	br label %end			br label %end

	end:			end:
	%ret = phi i64 [ %div, %if ], [ 3, %entry ]			%ret = phi i64 [ %div, %if ], [ 3, %entry ]
	ret i64 %ret			ret i64 %ret
	}			}

	; FIXME: Hoist the srem because it's safe and likely free.			; Hoist the srem because it's safe and likely free.

	define i16 @hoist_srem(i16 %a, i16 %b) {			define i16 @hoist_srem(i16 %a, i16 %b) {
	; CHECK-LABEL: @hoist_srem(			; CHECK-LABEL: @hoist_srem(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[DIV:%.*]] = sdiv i16 %a, %b			; CHECK-NEXT: [[DIV:%.*]] = sdiv i16 %a, %b
	; CHECK-NEXT: [[CMP:%.*]] = icmp eq i16 [[DIV]], 42			; CHECK-NEXT: [[CMP:%.*]] = icmp eq i16 [[DIV]], 42
	; CHECK-NEXT: br i1 [[CMP]], label %if, label %end
	; CHECK: if:
	; CHECK-NEXT: [[REM:%.*]] = srem i16 %a, %b			; CHECK-NEXT: [[REM:%.*]] = srem i16 %a, %b
	; CHECK-NEXT: br label %end			; CHECK-NEXT: [[REM_:%.*]] = select i1 [[CMP]], i16 [[REM]], i16 3
	; CHECK: end:			; CHECK-NEXT: ret i16 [[REM_]]
	; CHECK-NEXT: [[RET:%.*]] = phi i16 [ [[REM]], %if ], [ 3, %entry ]
	; CHECK-NEXT: ret i16 [[RET]]
	;			;
	entry:			entry:
	%div = sdiv i16 %a, %b			%div = sdiv i16 %a, %b
	%cmp = icmp eq i16 %div, 42			%cmp = icmp eq i16 %div, 42
	br i1 %cmp, label %if, label %end			br i1 %cmp, label %if, label %end

	if:			if:
	%rem = srem i16 %a, %b			%rem = srem i16 %a, %b
	br label %end			br label %end

	end:			end:
	%ret = phi i16 [ %rem, %if ], [ 3, %entry ]			%ret = phi i16 [ %rem, %if ], [ 3, %entry ]
	ret i16 %ret			ret i16 %ret
	}			}

	; FIXME: Hoist the urem because it's safe and likely free.			; Hoist the urem because it's safe and likely free.

	define i8 @hoist_urem(i8 %a, i8 %b) {			define i8 @hoist_urem(i8 %a, i8 %b) {
	; CHECK-LABEL: @hoist_urem(			; CHECK-LABEL: @hoist_urem(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[DIV:%.*]] = udiv i8 %a, %b			; CHECK-NEXT: [[DIV:%.*]] = udiv i8 %a, %b
	; CHECK-NEXT: [[CMP:%.*]] = icmp eq i8 [[DIV]], 42			; CHECK-NEXT: [[CMP:%.*]] = icmp eq i8 [[DIV]], 42
	; CHECK-NEXT: br i1 [[CMP]], label %if, label %end
	; CHECK: if:
	; CHECK-NEXT: [[REM:%.*]] = urem i8 %a, %b			; CHECK-NEXT: [[REM:%.*]] = urem i8 %a, %b
	; CHECK-NEXT: br label %end			; CHECK-NEXT: [[REM_:%.*]] = select i1 [[CMP]], i8 [[REM]], i8 3
	; CHECK: end:			; CHECK-NEXT: ret i8 [[REM_]]
	; CHECK-NEXT: [[RET:%.*]] = phi i8 [ [[REM]], %if ], [ 3, %entry ]
	; CHECK-NEXT: ret i8 [[RET]]
	;			;
	entry:			entry:
	%div = udiv i8 %a, %b			%div = udiv i8 %a, %b
	%cmp = icmp eq i8 %div, 42			%cmp = icmp eq i8 %div, 42
	br i1 %cmp, label %if, label %end			br i1 %cmp, label %if, label %end

	if:			if:
	%rem = urem i8 %a, %b			%rem = urem i8 %a, %b
	br label %end			br label %end

	end:			end:
	%ret = phi i8 [ %rem, %if ], [ 3, %entry ]			%ret = phi i8 [ %rem, %if ], [ 3, %entry ]
	ret i8 %ret			ret i8 %ret
	}			}

				; If the ops don't match, don't do anything: signedness.

				define i32 @dont_hoist_urem(i32 %a, i32 %b) {
				; CHECK-LABEL: @dont_hoist_urem(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[DIV:%.*]] = sdiv i32 %a, %b
				; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[DIV]], 42
				; CHECK-NEXT: br i1 [[CMP]], label %if, label %end
				; CHECK: if:
				; CHECK-NEXT: [[REM:%.*]] = urem i32 %a, %b
				; CHECK-NEXT: br label %end
				; CHECK: end:
				; CHECK-NEXT: [[RET:%.*]] = phi i32 [ [[REM]], %if ], [ 3, %entry ]
				; CHECK-NEXT: ret i32 [[RET]]
				;
				entry:
				%div = sdiv i32 %a, %b
				%cmp = icmp eq i32 %div, 42
				br i1 %cmp, label %if, label %end

				if:
				%rem = urem i32 %a, %b
				br label %end

				end:
				%ret = phi i32 [ %rem, %if ], [ 3, %entry ]
				ret i32 %ret
				}

				; If the ops don't match, don't do anything: operation.

				define i32 @dont_hoist_srem(i32 %a, i32 %b) {
				; CHECK-LABEL: @dont_hoist_srem(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[REM:%.*]] = urem i32 %a, %b
				; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[REM]], 42
				; CHECK-NEXT: br i1 [[CMP]], label %if, label %end
				; CHECK: if:
				; CHECK-NEXT: [[REM2:%.*]] = srem i32 %a, %b
				; CHECK-NEXT: br label %end
				; CHECK: end:
				; CHECK-NEXT: [[RET:%.*]] = phi i32 [ [[REM2]], %if ], [ 3, %entry ]
				; CHECK-NEXT: ret i32 [[RET]]
				;
				entry:
				%rem = urem i32 %a, %b
				%cmp = icmp eq i32 %rem, 42
				br i1 %cmp, label %if, label %end

				if:
				%rem2 = srem i32 %a, %b
				br label %end

				end:
				%ret = phi i32 [ %rem2, %if ], [ 3, %entry ]
				ret i32 %ret
				}

				; If the ops don't match, don't do anything: operands.

				define i32 @dont_hoist_sdiv(i32 %a, i32 %b, i32 %c) {
				; CHECK-LABEL: @dont_hoist_sdiv(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[REM:%.*]] = srem i32 %a, %b
				; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[REM]], 42
				; CHECK-NEXT: br i1 [[CMP]], label %if, label %end
				; CHECK: if:
				; CHECK-NEXT: [[DIV:%.*]] = sdiv i32 %a, %c
				; CHECK-NEXT: br label %end
				; CHECK: end:
				; CHECK-NEXT: [[RET:%.*]] = phi i32 [ [[DIV]], %if ], [ 3, %entry ]
				; CHECK-NEXT: ret i32 [[RET]]
				;
				entry:
				%rem = srem i32 %a, %b
				%cmp = icmp eq i32 %rem, 42
				br i1 %cmp, label %if, label %end

				if:
				%div = sdiv i32 %a, %c
				br label %end

				end:
				%ret = phi i32 [ %div, %if ], [ 3, %entry ]
				ret i32 %ret
				}