Download Raw Diff

Details

Reviewers

yrouban
reames
tejohnson
xur
Carrot
skatkov
hfinkel

Commits

Summary

Using/updating a dominator tree to match math overflow patterns may be very expensive in compile-time, so just handle the single-block case.

Also, we were restarting the iterator loops when doing the overflow intrinsic transforms by marking the dominator tree for update. That was done to prevent iterating over a removed instruction. But we can postpone the deletion using the existing "RemovedInsts" structure, and that means we don't need to update the DT.

See post-commit thread for rL354298 for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html

Diff Detail

Repository: rL LLVM

Event Timeline

spatel created this revision.Apr 24 2019, 8:59 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 24 2019, 8:59 AM

Herald added subscribers: hiraditya, mcrosier. · View Herald Transcript

I think I'm missing something basic here. CGP will iterate until done, including trivial DCE. Given that, why not just leave the instructions in the IR and let the next iteration remove them? It would trigger an extra iteration, which might be undesirable, but would it be incorrect? That would seem better than invalidating the DomTree, which just forces a new iteration anyways...

Actually, it doesn't look like it includes trivial DCE. Maybe it should?

Assuming the answer is that yet, that would be correct, then this patch becomes a smallish optimization over an extra iteration right?

Thinking through the logic, we don't need to worry about the cmp instruction since the caller advanced the iterator beyond it. So, the only concern is the binary operator. But correct me if I'm wrong, don't you require the BO to dominate the Cmp? If so, then haven't we already iterated past that instruction? How do we have an iterator invalidation problem at all?

In D61075#1478122, @reames wrote:

I think I'm missing something basic here. CGP will iterate until done, including trivial DCE. Given that, why not just leave the instructions in the IR and let the next iteration remove them? It would trigger an extra iteration, which might be undesirable, but would it be incorrect? That would seem better than invalidating the DomTree, which just forces a new iteration anyways...

Actually, it doesn't look like it includes trivial DCE. Maybe it should?

Assuming the answer is that yet, that would be correct, then this patch becomes a smallish optimization over an extra iteration right?

Yes, that sounds right. If CGP included DCE, then it would be easier to just rely on that for cleanup. Unfortunately, it doesn't. I don't know the full history here, but I suspect that since CGP was intended to be a temporary hack (see opening comment in the source file), nobody has bothered to invest in doing better.

But since this patch didn't solve the compile-time problem anyway, I'm going to propose a bigger fix anyway...

Thinking through the logic, we don't need to worry about the cmp instruction since the caller advanced the iterator beyond it. So, the only concern is the binary operator. But correct me if I'm wrong, don't you require the BO to dominate the Cmp? If so, then haven't we already iterated past that instruction? How do we have an iterator invalidation problem at all?

No, the binop does not necessarily dominate the compare if they are within a single block. If they are independent ops (no def-use relationship), then we don't know which order they are in within that block. We have regression tests with those patterns.

The history of these overflow transforms is that we wanted to do these as canonicalizations in instcombine, but that caused perf regressions. So then the add overflow transform got dumped here in CGP with a TLI hook to allow it. Then, I noticed that we weren't getting that transform consistently with some constant operands or the subtract overflow sibling. So I improved the pattern matching, and idealistically/naively tried to allow the transforms in all cross-block cases as long as the target said it was legal.

But that still caused perf regressions, so the transform was limited in 2 ways: (1) avoid hoisting the binop into a dominating cmp block and (2) avoid hoisting either op unless we had a direct predecessor/successor relationship.

At this point, I'm going to propose that we just give up on any cross-block attempts. I don't have any perf data to show that we got any wins from that, and I don't think anyone has the will to fix the backend problems and compile-time issues.

Patch updated:
Don't try to match this pattern across blocks. That means we are not using a dominator tree at all, so if this doesn't restore compile-time perf, I'm not sure what would. :)

Limiting this transform to a single block means it really shouldn't be in CGP anymore. It could be implemented in SDAG. But I'd prefer to leave that to a follow-up patch once we confirm that this (1) restores compile-time and (2) does not cause perf regressions.

Ping.

hfinkel added a subscriber: hfinkel.May 2 2019, 10:13 AM

hfinkel added inline comments.

llvm/lib/CodeGen/CodeGenPrepare.cpp
1191 ↗	(On Diff #196682)	I'm actually not sure why this would be true, at least in the multiple-blocks case. If they're in the same block, then the dominance check can be expensive, but that's because of the scanning involved. And, if that's true, then an OBB cache might address the problem. In any case, I don't think that having point (3) here is useful, could be misleading, and isn't necessary given the first two points.
1206 ↗	(On Diff #196682)	If the DT was expensive, I'd actually suspect it was this scanning that made it so on large blocks. Does this happen multiple times such that a OBB cache would help?

spatel marked an inline comment as done.May 2 2019, 2:06 PM

spatel added inline comments.

llvm/lib/CodeGen/CodeGenPrepare.cpp
1206 ↗	(On Diff #196682)	I only have 1 test case to play with (I'll try to attach it to this review for reference), and it doesn't have large blocks. It's a ~66K line function with ~6K blocks. From what I can tell, the problem is that we reset the DT on any made change up at line 472. That seems like overkill, but I don't have the motivation to sort out which of the transforms in the loop could bypass that reset.

code-gen-prepare-test.ll5 MBDownload

- this is the test file provided by @yrouban that showed the compile-time regression and would improve by avoiding DT altogether as proposed in this patch.

hfinkel added inline comments.May 2 2019, 2:11 PM

llvm/lib/CodeGen/CodeGenPrepare.cpp
1206 ↗	(On Diff #196682)	Okay. If you're going to mention that DT is slow, please put in the comment something about the fact that it's slow because we need to recompute it a lot (and, perhaps, why).

Patch updated:
Added more details to the code comment about using the dominator tree. Ie, if anyone wants to use it for this transform (or possibly others), they are probably going to have to alter the function-level CGP loop to avoid resetting the DT on every change.

LGTM

This revision is now accepted and ready to land.May 2 2019, 5:24 PM

Closed by commit rL359879: [CodeGenPrepare] limit overflow intrinsic matching to a single basic block (authored by spatel). · Explain WhyMay 3 2019, 6:07 AM

This revision was automatically updated to reflect the committed changes.

Reopening because the commit was reverted at rL359908. Must've missed some detail about using the RemovedInsts structure.

This revision is now accepted and ready to land.May 3 2019, 10:43 AM

spatel planned changes to this revision.May 3 2019, 10:44 AM

This revision was not accepted when it landed; it landed in state Changes Planned.May 4 2019, 5:44 AM

Closed by commit rL359969: [CodeGenPrepare] limit overflow intrinsic matching to a single basic block (2nd… (authored by spatel). · Explain Why

This revision was automatically updated to reflect the committed changes.

Diff 198136

llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,171 Lines • ▼ Show 20 Lines	if (SrcVT != DstVT)
return false;		return false;

return SinkCast(CI);		return SinkCast(CI);
}		}

bool CodeGenPrepare::replaceMathCmpWithIntrinsic(BinaryOperator *BO,		bool CodeGenPrepare::replaceMathCmpWithIntrinsic(BinaryOperator *BO,
CmpInst *Cmp,		CmpInst *Cmp,
Intrinsic::ID IID) {		Intrinsic::ID IID) {
		if (BO->getParent() != Cmp->getParent()) {
		// We used to use a dominator tree here to allow multi-block optimization.
		// But that was problematic because:
		// 1. It could cause a perf regression by hoisting the math op into the
		// critical path.
		// 2. It could cause a perf regression by creating a value that was live
		// across multiple blocks and increasing register pressure.
		// 3. Use of a dominator tree could cause large compile-time regression.
		// This is because we recompute the DT on every change in the main CGP
		// run-loop. The recomputing is probably unnecessary in many cases, so if
		// that was fixed, using a DT here would be ok.
		return false;
		}

// We allow matching the canonical IR (add X, C) back to (usubo X, -C).		// We allow matching the canonical IR (add X, C) back to (usubo X, -C).
Value *Arg0 = BO->getOperand(0);		Value *Arg0 = BO->getOperand(0);
Value *Arg1 = BO->getOperand(1);		Value *Arg1 = BO->getOperand(1);
if (BO->getOpcode() == Instruction::Add &&		if (BO->getOpcode() == Instruction::Add &&
IID == Intrinsic::usub_with_overflow) {		IID == Intrinsic::usub_with_overflow) {
assert(isa<Constant>(Arg1) && "Unexpected input for usubo");		assert(isa<Constant>(Arg1) && "Unexpected input for usubo");
Arg1 = ConstantExpr::getNeg(cast<Constant>(Arg1));		Arg1 = ConstantExpr::getNeg(cast<Constant>(Arg1));
}		}

Instruction *InsertPt;		// Insert at the first instruction of the pair.
if (BO->hasOneUse() && BO->user_back() == Cmp) {		Instruction *InsertPt = nullptr;
// If the math is only used by the compare, insert at the compare to keep		for (Instruction &Iter : *Cmp->getParent()) {
// the condition in the same block as its users. (CGP aggressively sinks		if (&Iter == BO \|\| &Iter == Cmp) {
// compares to help out SDAG.)		InsertPt = &Iter;
InsertPt = Cmp;		break;
} else {
// The math and compare may be independent instructions. Check dominance to
// determine the insertion point for the intrinsic.
bool MathDominates = getDT(*BO->getFunction()).dominates(BO, Cmp);
if (!MathDominates && !getDT(*BO->getFunction()).dominates(Cmp, BO))
return false;

BasicBlock MathBB = BO->getParent(), CmpBB = Cmp->getParent();
if (MathBB != CmpBB) {
// Avoid hoisting an extra op into a dominating block and creating a
// potentially longer critical path.
if (!MathDominates)
return false;
// Check that the insertion doesn't create a value that is live across
// more than two blocks, so to minimise the increase in register pressure.
BasicBlock *Dominator = MathDominates ? MathBB : CmpBB;
BasicBlock *Dominated = MathDominates ? CmpBB : MathBB;
auto Successors = successors(Dominator);
if (llvm::find(Successors, Dominated) == Successors.end())
return false;
}		}

InsertPt = MathDominates ? cast<Instruction>(BO) : cast<Instruction>(Cmp);
}		}
		assert(InsertPt != nullptr && "Parent block did not contain cmp or binop");

IRBuilder<> Builder(InsertPt);		IRBuilder<> Builder(InsertPt);
Value *MathOV = Builder.CreateBinaryIntrinsic(IID, Arg0, Arg1);		Value *MathOV = Builder.CreateBinaryIntrinsic(IID, Arg0, Arg1);
Value *Math = Builder.CreateExtractValue(MathOV, 0, "math");		Value *Math = Builder.CreateExtractValue(MathOV, 0, "math");
Value *OV = Builder.CreateExtractValue(MathOV, 1, "ov");		Value *OV = Builder.CreateExtractValue(MathOV, 1, "ov");
BO->replaceAllUsesWith(Math);		BO->replaceAllUsesWith(Math);
Cmp->replaceAllUsesWith(OV);		Cmp->replaceAllUsesWith(OV);
BO->eraseFromParent();		BO->eraseFromParent();
▲ Show 20 Lines • Show All 6,095 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/cgp-usubo.ll

	Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: movl %edi, (%rsi)			; CHECK-NEXT: movl %edi, (%rsi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%s = sub i32 0, %x			%s = sub i32 0, %x
	%ov = icmp ne i32 %x, 0			%ov = icmp ne i32 %x, 0
	store i32 %s, i32* %p			store i32 %s, i32* %p
	ret i1 %ov			ret i1 %ov
	}			}

	; Verify insertion point for multi-BB.			; This used to verify insertion point for multi-BB, but now we just bail out.

	declare void @call(i1)			declare void @call(i1)

	define i1 @usubo_ult_sub_dominates_i64(i64 %x, i64 %y, i64* %p, i1 %cond) nounwind {			define i1 @usubo_ult_sub_dominates_i64(i64 %x, i64 %y, i64* %p, i1 %cond) nounwind {
	; CHECK-LABEL: usubo_ult_sub_dominates_i64:			; CHECK-LABEL: usubo_ult_sub_dominates_i64:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: testb $1, %cl			; CHECK-NEXT: testb $1, %cl
	; CHECK-NEXT: je .LBB8_2			; CHECK-NEXT: je .LBB8_2
	; CHECK-NEXT: # %bb.1: # %t			; CHECK-NEXT: # %bb.1: # %t
	; CHECK-NEXT: subq %rsi, %rdi			; CHECK-NEXT: movq %rdi, %rax
	; CHECK-NEXT: setb %al			; CHECK-NEXT: subq %rsi, %rax
	; CHECK-NEXT: movq %rdi, (%rdx)			; CHECK-NEXT: movq %rax, (%rdx)
	; CHECK-NEXT: testb $1, %cl			; CHECK-NEXT: testb $1, %cl
	; CHECK-NEXT: jne .LBB8_3			; CHECK-NEXT: je .LBB8_2
				; CHECK-NEXT: # %bb.3: # %end
				; CHECK-NEXT: cmpq %rsi, %rdi
				; CHECK-NEXT: setb %al
				; CHECK-NEXT: retq
	; CHECK-NEXT: .LBB8_2: # %f			; CHECK-NEXT: .LBB8_2: # %f
	; CHECK-NEXT: movl %ecx, %eax			; CHECK-NEXT: movl %ecx, %eax
	; CHECK-NEXT: .LBB8_3: # %end
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	br i1 %cond, label %t, label %f			br i1 %cond, label %t, label %f

	t:			t:
	%s = sub i64 %x, %y			%s = sub i64 %x, %y
	store i64 %s, i64* %p			store i64 %s, i64* %p
	br i1 %cond, label %end, label %f			br i1 %cond, label %end, label %f
	▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/CodeGenPrepare/X86/optimizeSelect-DT.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -codegenprepare < %s \| FileCheck %s			; RUN: opt -S -codegenprepare < %s \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	define i1 @PR41004(i32 %x, i32 %y, i32 %t1) {			define i1 @PR41004(i32 %x, i32 %y, i32 %t1) {
	; CHECK-LABEL: @PR41004(			; CHECK-LABEL: @PR41004(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[T0:%.]] = icmp eq i32 [[Y:%.]], 1			; CHECK-NEXT: [[T0:%.]] = icmp eq i32 [[Y:%.]], 1
	; CHECK-NEXT: br i1 [[T0]], label [[SELECT_TRUE_SINK:%.]], label [[SELECT_END:%.]]			; CHECK-NEXT: br i1 [[T0]], label [[SELECT_TRUE_SINK:%.]], label [[SELECT_END:%.]]
	; CHECK: select.true.sink:			; CHECK: select.true.sink:
	; CHECK-NEXT: [[REM:%.]] = srem i32 [[X:%.]], 2			; CHECK-NEXT: [[REM:%.]] = srem i32 [[X:%.]], 2
	; CHECK-NEXT: br label [[SELECT_END]]			; CHECK-NEXT: br label [[SELECT_END]]
	; CHECK: select.end:			; CHECK: select.end:
	; CHECK-NEXT: [[MUL:%.]] = phi i32 [ [[REM]], [[SELECT_TRUE_SINK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[MUL:%.]] = phi i32 [ [[REM]], [[SELECT_TRUE_SINK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP0:%.]] = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 [[T1:%.]], i32 1)			; CHECK-NEXT: [[NEG:%.]] = add i32 [[T1:%.]], -1
	; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i32, i1 } [[TMP0]], 0			; CHECK-NEXT: [[ADD:%.*]] = add i32 [[NEG]], [[MUL]]
	; CHECK-NEXT: [[OV:%.*]] = extractvalue { i32, i1 } [[TMP0]], 1			; CHECK-NEXT: [[TOBOOL:%.*]] = icmp eq i32 [[T1]], 0
	; CHECK-NEXT: [[ADD:%.*]] = add i32 [[MATH]], [[MUL]]			; CHECK-NEXT: ret i1 [[TOBOOL]]
	; CHECK-NEXT: ret i1 [[OV]]
	;			;
	entry:			entry:
	%rem = srem i32 %x, 2			%rem = srem i32 %x, 2
	%t0 = icmp eq i32 %y, 1			%t0 = icmp eq i32 %y, 1
	%mul = select i1 %t0, i32 %rem, i32 0			%mul = select i1 %t0, i32 %rem, i32 0
	%neg = add i32 %t1, -1			%neg = add i32 %t1, -1
	%add = add i32 %neg, %mul			%add = add i32 %neg, %mul
	br label %if			br label %if

	if:			if:
	%tobool = icmp eq i32 %t1, 0			%tobool = icmp eq i32 %t1, 0
	ret i1 %tobool			ret i1 %tobool
	}			}

llvm/trunk/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
; CHECK-NEXT: ret i64 [[Q]]		; CHECK-NEXT: ret i64 [[Q]]
;		;
%add = add i64 %b, %a		%add = add i64 %b, %a
%cmp = icmp ugt i64 %b, %add		%cmp = icmp ugt i64 %b, %add
%Q = select i1 %cmp, i64 %b, i64 42		%Q = select i1 %cmp, i64 %b, i64 42
ret i64 %Q		ret i64 %Q
}		}

		; TODO? CGP sinks the compare before we have a chance to form the overflow intrinsic.

define i64 @uaddo4(i64 %a, i64 %b, i1 %c) nounwind ssp {		define i64 @uaddo4(i64 %a, i64 %b, i1 %c) nounwind ssp {
; CHECK-LABEL: @uaddo4(		; CHECK-LABEL: @uaddo4(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[ADD:%.]] = add i64 [[B:%.]], [[A:%.*]]
; CHECK-NEXT: br i1 [[C:%.]], label [[NEXT:%.]], label [[EXIT:%.*]]		; CHECK-NEXT: br i1 [[C:%.]], label [[NEXT:%.]], label [[EXIT:%.*]]
; CHECK: next:		; CHECK: next:
; CHECK-NEXT: [[TMP0:%.]] = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 [[B:%.]], i64 [[A:%.*]])		; CHECK-NEXT: [[TMP0:%.*]] = icmp ugt i64 [[B]], [[ADD]]
; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i64, i1 } [[TMP0]], 0		; CHECK-NEXT: [[Q:%.*]] = select i1 [[TMP0]], i64 [[B]], i64 42
; CHECK-NEXT: [[OV:%.*]] = extractvalue { i64, i1 } [[TMP0]], 1
; CHECK-NEXT: [[Q:%.*]] = select i1 [[OV]], i64 [[B]], i64 42
; CHECK-NEXT: ret i64 [[Q]]		; CHECK-NEXT: ret i64 [[Q]]
; CHECK: exit:		; CHECK: exit:
; CHECK-NEXT: ret i64 0		; CHECK-NEXT: ret i64 0
;		;
entry:		entry:
%add = add i64 %b, %a		%add = add i64 %b, %a
%cmp = icmp ugt i64 %b, %add		%cmp = icmp ugt i64 %b, %add
br i1 %c, label %next, label %exit		br i1 %c, label %next, label %exit
▲ Show 20 Lines • Show All 290 Lines • ▼ Show 20 Lines
; CHECK-NEXT: ret i1 [[OV1]]		; CHECK-NEXT: ret i1 [[OV1]]
;		;
%s = sub i32 0, %x		%s = sub i32 0, %x
%ov = icmp ne i32 %x, 0		%ov = icmp ne i32 %x, 0
store i32 %s, i32* %p		store i32 %s, i32* %p
ret i1 %ov		ret i1 %ov
}		}

; Verify insertion point for multi-BB.		; This used to verify insertion point for multi-BB, but now we just bail out.

declare void @call(i1)		declare void @call(i1)

define i1 @usubo_ult_sub_dominates_i64(i64 %x, i64 %y, i64* %p, i1 %cond) {		define i1 @usubo_ult_sub_dominates_i64(i64 %x, i64 %y, i64* %p, i1 %cond) {
; CHECK-LABEL: @usubo_ult_sub_dominates_i64(		; CHECK-LABEL: @usubo_ult_sub_dominates_i64(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 [[COND:%.]], label [[T:%.]], label [[F:%.*]]		; CHECK-NEXT: br i1 [[COND:%.]], label [[T:%.]], label [[F:%.*]]
; CHECK: t:		; CHECK: t:
; CHECK-NEXT: [[TMP0:%.]] = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 [[X:%.]], i64 [[Y:%.*]])		; CHECK-NEXT: [[S:%.]] = sub i64 [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i64, i1 } [[TMP0]], 0		; CHECK-NEXT: store i64 [[S]], i64* [[P:%.*]]
; CHECK-NEXT: [[OV1:%.*]] = extractvalue { i64, i1 } [[TMP0]], 1
; CHECK-NEXT: store i64 [[MATH]], i64* [[P:%.*]]
; CHECK-NEXT: br i1 [[COND]], label [[END:%.*]], label [[F]]		; CHECK-NEXT: br i1 [[COND]], label [[END:%.*]], label [[F]]
; CHECK: f:		; CHECK: f:
; CHECK-NEXT: ret i1 [[COND]]		; CHECK-NEXT: ret i1 [[COND]]
; CHECK: end:		; CHECK: end:
; CHECK-NEXT: ret i1 [[OV1]]		; CHECK-NEXT: [[OV:%.*]] = icmp ult i64 [[X]], [[Y]]
		; CHECK-NEXT: ret i1 [[OV]]
;		;
entry:		entry:
br i1 %cond, label %t, label %f		br i1 %cond, label %t, label %f

t:		t:
%s = sub i64 %x, %y		%s = sub i64 %x, %y
store i64 %s, i64* %p		store i64 %s, i64* %p
br i1 %cond, label %end, label %f		br i1 %cond, label %end, label %f
▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	true:
%svalue = add i64 %key, -1		%svalue = add i64 %key, -1
store i64 %svalue, i64* %p64		store i64 %svalue, i64* %p64
br label %exit		br label %exit

exit:		exit:
ret void		ret void
}		}

		; This was crashing when trying to delay instruction removal/deletion.

		declare i64 @llvm.objectsize.i64.p0i8(i8*, i1 immarg, i1 immarg, i1 immarg) #0

		define hidden fastcc void @crash() {
		; CHECK-LABEL: @crash(
		; CHECK-NEXT: [[TMP1:%.*]] = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 undef, i64 undef)
		; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i64, i1 } [[TMP1]], 0
		; CHECK-NEXT: [[OV:%.*]] = extractvalue { i64, i1 } [[TMP1]], 1
		; CHECK-NEXT: [[T2:%.*]] = select i1 undef, i1 undef, i1 [[OV]]
		; CHECK-NEXT: unreachable
		;
		%t0 = add i64 undef, undef
		%t1 = icmp ult i64 %t0, undef
		%t2 = select i1 undef, i1 undef, i1 %t1
		%t3 = call i64 @llvm.objectsize.i64.p0i8(i8* nonnull undef, i1 false, i1 false, i1 false)
		%t4 = icmp ugt i64 %t3, 7
		unreachable
		}

; Check that every instruction inserted by -codegenprepare has a debug location.		; Check that every instruction inserted by -codegenprepare has a debug location.
; DEBUG: CheckModuleDebugify: PASS		; DEBUG: CheckModuleDebugify: PASS

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGenPrepare] limit overflow intrinsic matching to a single basic block
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 198136

llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp

llvm/trunk/test/CodeGen/X86/cgp-usubo.ll

llvm/trunk/test/Transforms/CodeGenPrepare/X86/optimizeSelect-DT.ll

llvm/trunk/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGenPrepare] limit overflow intrinsic matching to a single basic blockClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 198136

llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp

llvm/trunk/test/CodeGen/X86/cgp-usubo.ll

llvm/trunk/test/Transforms/CodeGenPrepare/X86/optimizeSelect-DT.ll

llvm/trunk/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll

[CodeGenPrepare] limit overflow intrinsic matching to a single basic block
ClosedPublic