This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
-
CodeGenPrepare.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
2020_12_02_decrementing_loop.ll
-
lsr-loop-exit-cond.ll
1
usub_inc_iv.ll

Differential D96119

[Codegenprepare][X86] Use usub with overflow opt for IV increment
ClosedPublic

Authored by mkazantsev on Feb 5 2021, 4:16 AM.

Download Raw Diff

Details

Reviewers

spatel
aqjune
reames
craig.topper

Commits

rG418c218efa95: Return "[Codegenprepare][X86] Use usub with overflow opt for IV increment"
rG3d15b7e7dfc3: [Codegenprepare][X86] Use usub with overflow opt for IV increment

Summary

Function replaceMathCmpWithIntrinsic artificially limits the scope
of the optimization, setting a requirement of two instructions be in
the same block, due to two reasons:

usage of DT for more general check is costly in terms of compile time;
risk of creating a new value that lives through multiple blocks.

Because of this, two semantically equivalent tests may be or not be the
subject of this opt depending on where the binary operation is located.
See test/CodeGen/X86/usub_inc_iv.ll for motivation

There is one important particular case where this limitation is too strict:
it is when the binary operation is the increment of the induction variable.
As result, the application of this opt becomes fragile and highly reliant on
where other passes decide to place IV increment. In most cases, they place
it in the end of the latch block, killing the opt opportunity (when in fact it
does not matter where to insert the actual instruction).

This patch handles this particular case separately.

The detector does not use dom tree and has constant cost;
The value of IV or IV.next lives through all loop in any case, so this should not create a new unexpected long-living value.

As result, the transform becomes more robust. It also seems to lead to
better code generation in some cases (see test/CodeGen/X86/lsr-loop-exit-cond.ll).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mkazantsev created this revision.Feb 5 2021, 4:16 AM

Herald added subscribers: pengfei, jfb, hiraditya. · View Herald TranscriptFeb 5 2021, 4:16 AM

mkazantsev requested review of this revision.Feb 5 2021, 4:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2021, 4:16 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

This seems fine, but I was curious what kind of interaction we should expect between CGP and global-isel, so I tried to pushing the last test through llc:

$ llc usub.ll -o - -global-isel 
	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 10, 15
LLVM ERROR: unable to legalize instruction: %4:_(s64), %5:_(s1) = G_USUBO %3:_, %6:_ (in function: test_02)

My understanding was that CGP is not needed with global-isel, but it's apparently running by default. Do we consider the current state of passes temporary and/or do we need to fix that crash before proceeding here?

HI Sanjay,

I was able to reproduce this crash on this test even without my code changes. I don't know what is the current status of Global ISel, but apparently it (or the pipeline) already has a bug. I don't see why it should be a blocker for this one if we can see it in current trunc. Let's just file a separate bug.

I've filed https://bugs.llvm.org/show_bug.cgi?id=49087 to address it and merged relevant test as 0fc1738eb75d613b9e16143b83e7cb80512e84eb. It fails even on the test not affected by the patch on current trunc, so I think it should not be an obstacle on this patch's way.

In D96119#2547901, @mkazantsev wrote:

I've filed https://bugs.llvm.org/show_bug.cgi?id=49087 to address it and merged relevant test as 0fc1738eb75d613b9e16143b83e7cb80512e84eb. It fails even on the test not affected by the patch on current trunc, so I think it should not be an obstacle on this patch's way.

Thanks for filing the bug!
LGTM

This revision is now accepted and ready to land.Feb 8 2021, 11:38 AM

If there is a use of the original phi later in the loop, then performing the transformation does cause the live interval of %iv and %iv.next to overlap when it originally didn't. I suspect this is still profitable, but it means the comments are slightly out of sink with reality. Example pseudo code:
{

%iv = ...
if %iv == 0) break;
use(%iv)
%iv.next = add i32 %iv, 1
continue;

}

I'd suggest tweaking this patch to add the hasOneUse check on the phi. With that change, I'd also LGTM this.

As a separate patch, we can consider relaxing the hasOneUse again, but we'll need slightly different reasoning than taken here.

I'd suggest tweaking this patch to add the hasOneUse check on the phi. With that change, I'd also LGTM this.

It won't pass a motivating test. Effectively, it doesn't really create a intersection of ranges of IV and IV.next, even if formally it does. In fact, comparison itself is the point where we compute something equivalent to IV.next. I'll write an elaborate comment explaining this.

Our internal testing has found a functional bug with this patch. I'm putting this on hold while investigating.

Fixed bugs, added tests demonstrating them.

This revision is now accepted and ready to land.Feb 10 2021, 4:30 AM

I think this might need another round of the review, due to fixed bugs & comment change.

spatel added inline comments.Feb 10 2021, 9:32 AM

llvm/test/CodeGen/X86/usub_inc_iv.ll
130–132	I defer to @reames to continue the review (I don't have a good handle on the IV subtleties), but I think it would be easier to review if the test diffs are pre-committed, so we just see the minimal changes from the patch.

Still LGTM, thanks for adding the one use requested restriction and the inner loop case is a good catch too.

I'd agree that landing the new tests, then rebasing over, then committing would make post-commit review and any potential reverts easier to understand. Optional, but encouraged.

This revision is now accepted and ready to land.Feb 10 2021, 9:47 AM

Ok, I'll check in the negative tests separately. Thanks!

Closed by commit rG3d15b7e7dfc3: [Codegenprepare][X86] Use usub with overflow opt for IV increment (authored by mkazantsev). · Explain WhyFeb 10 2021, 9:00 PM

This revision was automatically updated to reflect the committed changes.

mkazantsev added a commit: rG3d15b7e7dfc3: [Codegenprepare][X86] Use usub with overflow opt for IV increment.

mkazantsev added a reverting change: rG90081f3020e3: Revert "[Codegenprepare][X86] Use usub with overflow opt for IV increment".Feb 11 2021, 2:52 AM

Patch reverted. I've missed one more important point: CmpInst should also dominate the latch to make it legal. I need to think how to make it without the actual DT check.

mkazantsev added a commit: rG418c218efa95: Return "[Codegenprepare][X86] Use usub with overflow opt for IV increment".Feb 11 2021, 4:50 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

CodeGenPrepare.cpp

33 lines

test/

CodeGen/

X86/

2020_12_02_decrementing_loop.ll

14 lines

lsr-loop-exit-cond.ll

42 lines

usub_inc_iv.ll

17 lines

Diff 322902

llvm/lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,278 Lines • ▼ Show 20 Lines	static bool OptimizeNoopCopyExpression(CastInst *CI, const TargetLowering &TLI,

return SinkCast(CI);		return SinkCast(CI);
}		}

bool CodeGenPrepare::replaceMathCmpWithIntrinsic(BinaryOperator *BO,		bool CodeGenPrepare::replaceMathCmpWithIntrinsic(BinaryOperator *BO,
Value Arg0, Value Arg1,		Value Arg0, Value Arg1,
CmpInst *Cmp,		CmpInst *Cmp,
Intrinsic::ID IID) {		Intrinsic::ID IID) {
if (BO->getParent() != Cmp->getParent()) {		auto isIVIncrement = [this, &Cmp](BinaryOperator *BO) {
		auto *PN = dyn_cast<PHINode>(BO->getOperand(0));
		if (!PN)
		return false;
		const Loop *L = LI->getLoopFor(BO->getParent());
		if (!L \|\| L->getHeader() != PN->getParent() \|\| !L->getLoopLatch())
		return false;
		if (PN->getIncomingValueForBlock(L->getLoopLatch()) != BO)
		return false;
		if (auto *Step = dyn_cast<Instruction>(BO->getOperand(1)))
		if (L->contains(Step->getParent()))
		return false;
		// IV increment may have other users than the IV. We do not want to make
		// dominance queries to analyze the legality of moving it towards the cmp,
		// so just check that there is no other users.
		if (!BO->hasOneUse())
		return false;
		// Do not risk on moving increment into a child loop.
		if (LI->getLoopFor(Cmp->getParent()) != L)
		return false;
		return true;
		};
		if (BO->getParent() != Cmp->getParent() && !isIVIncrement(BO)) {
// We used to use a dominator tree here to allow multi-block optimization.		// We used to use a dominator tree here to allow multi-block optimization.
// But that was problematic because:		// But that was problematic because:
// 1. It could cause a perf regression by hoisting the math op into the		// 1. It could cause a perf regression by hoisting the math op into the
// critical path.		// critical path.
// 2. It could cause a perf regression by creating a value that was live		// 2. It could cause a perf regression by creating a value that was live
// across multiple blocks and increasing register pressure.		// across multiple blocks and increasing register pressure.
// 3. Use of a dominator tree could cause large compile-time regression.		// 3. Use of a dominator tree could cause large compile-time regression.
// This is because we recompute the DT on every change in the main CGP		// This is because we recompute the DT on every change in the main CGP
// run-loop. The recomputing is probably unnecessary in many cases, so if		// run-loop. The recomputing is probably unnecessary in many cases, so if
// that was fixed, using a DT here would be ok.		// that was fixed, using a DT here would be ok.
		//
		// There is one important particular case we still want to handle: if BO is
		// the IV increment. Important properties that make it profitable:
		// - We can speculate IV increment anywhere in the loop (as long as the
		// indvar Phi is its only user);
		// - Upon computing Cmp, we effectively compute something equivalent to the
		// IV increment (despite it loops differently in the IR). So moving it up
		// to the cmp point does not really increase register pressure.
return false;		return false;
}		}

// We allow matching the canonical IR (add X, C) back to (usubo X, -C).		// We allow matching the canonical IR (add X, C) back to (usubo X, -C).
if (BO->getOpcode() == Instruction::Add &&		if (BO->getOpcode() == Instruction::Add &&
IID == Intrinsic::usub_with_overflow) {		IID == Intrinsic::usub_with_overflow) {
assert(isa<Constant>(Arg1) && "Unexpected input for usubo");		assert(isa<Constant>(Arg1) && "Unexpected input for usubo");
Arg1 = ConstantExpr::getNeg(cast<Constant>(Arg1));		Arg1 = ConstantExpr::getNeg(cast<Constant>(Arg1));
}		}

// Insert at the first instruction of the pair.		// Insert at the first instruction of the pair.
▲ Show 20 Lines • Show All 6,689 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/2020_12_02_decrementing_loop.ll

	Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines

	failure: ; preds = %backedge			failure: ; preds = %backedge
	unreachable			unreachable
	}			}

	define i32 @test_02(i32* %p, i64 %len, i32 %x) {			define i32 @test_02(i32* %p, i64 %len, i32 %x) {
	; CHECK-LABEL: test_02:			; CHECK-LABEL: test_02:
	; CHECK: ## %bb.0: ## %entry			; CHECK: ## %bb.0: ## %entry
				; CHECK-NEXT: movq %rsi, %rax
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB2_1: ## %loop			; CHECK-NEXT: LBB2_1: ## %loop
	; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: testq %rsi, %rsi			; CHECK-NEXT: subq $1, %rax
	; CHECK-NEXT: je LBB2_4			; CHECK-NEXT: jb LBB2_4
	; CHECK-NEXT: ## %bb.2: ## %backedge			; CHECK-NEXT: ## %bb.2: ## %backedge
	; CHECK-NEXT: ## in Loop: Header=BB2_1 Depth=1			; CHECK-NEXT: ## in Loop: Header=BB2_1 Depth=1
	; CHECK-NEXT: cmpl %edx, -4(%rdi,%rsi,4)			; CHECK-NEXT: cmpl %edx, -4(%rdi,%rsi,4)
	; CHECK-NEXT: leaq -1(%rsi), %rsi			; CHECK-NEXT: movq %rax, %rsi
	; CHECK-NEXT: jne LBB2_1			; CHECK-NEXT: jne LBB2_1
	; CHECK-NEXT: ## %bb.3: ## %failure			; CHECK-NEXT: ## %bb.3: ## %failure
	; CHECK-NEXT: ud2			; CHECK-NEXT: ud2
	; CHECK-NEXT: LBB2_4: ## %exit			; CHECK-NEXT: LBB2_4: ## %exit
	; CHECK-NEXT: movl $-1, %eax			; CHECK-NEXT: movl $-1, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%start = add i64 %len, -1			%start = add i64 %len, -1
	Show All 18 Lines

	failure: ; preds = %backedge			failure: ; preds = %backedge
	unreachable			unreachable
	}			}

	define i32 @test_03(i32* %p, i64 %len, i32 %x) {			define i32 @test_03(i32* %p, i64 %len, i32 %x) {
	; CHECK-LABEL: test_03:			; CHECK-LABEL: test_03:
	; CHECK: ## %bb.0: ## %entry			; CHECK: ## %bb.0: ## %entry
				; CHECK-NEXT: movq %rsi, %rax
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB3_1: ## %loop			; CHECK-NEXT: LBB3_1: ## %loop
	; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: testq %rsi, %rsi			; CHECK-NEXT: subq $1, %rax
	; CHECK-NEXT: je LBB3_4			; CHECK-NEXT: jb LBB3_4
	; CHECK-NEXT: ## %bb.2: ## %backedge			; CHECK-NEXT: ## %bb.2: ## %backedge
	; CHECK-NEXT: ## in Loop: Header=BB3_1 Depth=1			; CHECK-NEXT: ## in Loop: Header=BB3_1 Depth=1
	; CHECK-NEXT: cmpl %edx, -4(%rdi,%rsi,4)			; CHECK-NEXT: cmpl %edx, -4(%rdi,%rsi,4)
	; CHECK-NEXT: leaq -1(%rsi), %rsi			; CHECK-NEXT: movq %rax, %rsi
	; CHECK-NEXT: jne LBB3_1			; CHECK-NEXT: jne LBB3_1
	; CHECK-NEXT: ## %bb.3: ## %failure			; CHECK-NEXT: ## %bb.3: ## %failure
	; CHECK-NEXT: ud2			; CHECK-NEXT: ud2
	; CHECK-NEXT: LBB3_4: ## %exit			; CHECK-NEXT: LBB3_4: ## %exit
	; CHECK-NEXT: movl $-1, %eax			; CHECK-NEXT: movl $-1, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%start = add i64 %len, -100			%start = add i64 %len, -100
	Show All 22 Lines

llvm/test/CodeGen/X86/lsr-loop-exit-cond.ll

	Show All 10 Lines
	; GENERIC: ## %bb.0: ## %entry			; GENERIC: ## %bb.0: ## %entry
	; GENERIC-NEXT: pushq %rbp			; GENERIC-NEXT: pushq %rbp
	; GENERIC-NEXT: pushq %r14			; GENERIC-NEXT: pushq %r14
	; GENERIC-NEXT: pushq %rbx			; GENERIC-NEXT: pushq %rbx
	; GENERIC-NEXT: ## kill: def $ecx killed $ecx def $rcx			; GENERIC-NEXT: ## kill: def $ecx killed $ecx def $rcx
	; GENERIC-NEXT: movl (%rdx), %eax			; GENERIC-NEXT: movl (%rdx), %eax
	; GENERIC-NEXT: movl 4(%rdx), %ebx			; GENERIC-NEXT: movl 4(%rdx), %ebx
	; GENERIC-NEXT: decl %ecx			; GENERIC-NEXT: decl %ecx
	; GENERIC-NEXT: leaq 20(%rdx), %r14			; GENERIC-NEXT: leaq 20(%rdx), %r11
	; GENERIC-NEXT: movq _Te0@{{.*}}(%rip), %r9			; GENERIC-NEXT: movq _Te0@{{.*}}(%rip), %r9
	; GENERIC-NEXT: movq _Te1@{{.*}}(%rip), %r8			; GENERIC-NEXT: movq _Te1@{{.*}}(%rip), %r8
	; GENERIC-NEXT: movq _Te3@{{.*}}(%rip), %r10			; GENERIC-NEXT: movq _Te3@{{.*}}(%rip), %r10
	; GENERIC-NEXT: movq %rcx, %r11			; GENERIC-NEXT: movq %rcx, %r14
	; GENERIC-NEXT: .p2align 4, 0x90			; GENERIC-NEXT: .p2align 4, 0x90
	; GENERIC-NEXT: LBB0_1: ## %bb			; GENERIC-NEXT: LBB0_1: ## %bb
	; GENERIC-NEXT: ## =>This Inner Loop Header: Depth=1			; GENERIC-NEXT: ## =>This Inner Loop Header: Depth=1
	; GENERIC-NEXT: movzbl %al, %edi			; GENERIC-NEXT: movzbl %al, %edi
	; GENERIC-NEXT: ## kill: def $eax killed $eax def $rax			; GENERIC-NEXT: ## kill: def $eax killed $eax def $rax
	; GENERIC-NEXT: shrl $24, %eax			; GENERIC-NEXT: shrl $24, %eax
	; GENERIC-NEXT: movl %ebx, %ebp			; GENERIC-NEXT: movl %ebx, %ebp
	; GENERIC-NEXT: shrl $16, %ebp			; GENERIC-NEXT: shrl $16, %ebp
	; GENERIC-NEXT: movzbl %bpl, %ebp			; GENERIC-NEXT: movzbl %bpl, %ebp
	; GENERIC-NEXT: movl (%r8,%rbp,4), %ebp			; GENERIC-NEXT: movl (%r8,%rbp,4), %ebp
	; GENERIC-NEXT: xorl (%r9,%rax,4), %ebp			; GENERIC-NEXT: xorl (%r9,%rax,4), %ebp
	; GENERIC-NEXT: xorl -12(%r14), %ebp			; GENERIC-NEXT: xorl -12(%r11), %ebp
	; GENERIC-NEXT: shrl $24, %ebx			; GENERIC-NEXT: shrl $24, %ebx
	; GENERIC-NEXT: movl (%r10,%rdi,4), %edi			; GENERIC-NEXT: movl (%r10,%rdi,4), %edi
	; GENERIC-NEXT: xorl (%r9,%rbx,4), %edi			; GENERIC-NEXT: xorl (%r9,%rbx,4), %edi
	; GENERIC-NEXT: xorl -8(%r14), %edi			; GENERIC-NEXT: xorl -8(%r11), %edi
	; GENERIC-NEXT: movl %ebp, %eax			; GENERIC-NEXT: movl %ebp, %eax
	; GENERIC-NEXT: shrl $24, %eax			; GENERIC-NEXT: shrl $24, %eax
	; GENERIC-NEXT: movl (%r9,%rax,4), %eax			; GENERIC-NEXT: movl (%r9,%rax,4), %eax
	; GENERIC-NEXT: testq %r11, %r11			; GENERIC-NEXT: subq $1, %r14
	; GENERIC-NEXT: je LBB0_3			; GENERIC-NEXT: jb LBB0_3
	; GENERIC-NEXT: ## %bb.2: ## %bb1			; GENERIC-NEXT: ## %bb.2: ## %bb1
	; GENERIC-NEXT: ## in Loop: Header=BB0_1 Depth=1			; GENERIC-NEXT: ## in Loop: Header=BB0_1 Depth=1
	; GENERIC-NEXT: movl %edi, %ebx			; GENERIC-NEXT: movl %edi, %ebx
	; GENERIC-NEXT: shrl $16, %ebx			; GENERIC-NEXT: shrl $16, %ebx
	; GENERIC-NEXT: movzbl %bl, %ebx			; GENERIC-NEXT: movzbl %bl, %ebx
	; GENERIC-NEXT: xorl (%r8,%rbx,4), %eax			; GENERIC-NEXT: xorl (%r8,%rbx,4), %eax
	; GENERIC-NEXT: xorl -4(%r14), %eax			; GENERIC-NEXT: xorl -4(%r11), %eax
	; GENERIC-NEXT: shrl $24, %edi			; GENERIC-NEXT: shrl $24, %edi
	; GENERIC-NEXT: movzbl %bpl, %ebx			; GENERIC-NEXT: movzbl %bpl, %ebx
	; GENERIC-NEXT: movl (%r10,%rbx,4), %ebx			; GENERIC-NEXT: movl (%r10,%rbx,4), %ebx
	; GENERIC-NEXT: xorl (%r9,%rdi,4), %ebx			; GENERIC-NEXT: xorl (%r9,%rdi,4), %ebx
	; GENERIC-NEXT: xorl (%r14), %ebx			; GENERIC-NEXT: xorl (%r11), %ebx
	; GENERIC-NEXT: decq %r11			; GENERIC-NEXT: addq $16, %r11
	; GENERIC-NEXT: addq $16, %r14
	; GENERIC-NEXT: jmp LBB0_1			; GENERIC-NEXT: jmp LBB0_1
	; GENERIC-NEXT: LBB0_3: ## %bb2			; GENERIC-NEXT: LBB0_3: ## %bb2
	; GENERIC-NEXT: shlq $4, %rcx			; GENERIC-NEXT: shlq $4, %rcx
	; GENERIC-NEXT: andl $-16777216, %eax ## imm = 0xFF000000			; GENERIC-NEXT: andl $-16777216, %eax ## imm = 0xFF000000
	; GENERIC-NEXT: movl %edi, %ebx			; GENERIC-NEXT: movl %edi, %ebx
	; GENERIC-NEXT: shrl $16, %ebx			; GENERIC-NEXT: shrl $16, %ebx
	; GENERIC-NEXT: movzbl %bl, %ebx			; GENERIC-NEXT: movzbl %bl, %ebx
	; GENERIC-NEXT: movzbl 2(%r8,%rbx,4), %ebx			; GENERIC-NEXT: movzbl 2(%r8,%rbx,4), %ebx
	Show All 27 Lines
	; ATOM: ## %bb.0: ## %entry			; ATOM: ## %bb.0: ## %entry
	; ATOM-NEXT: pushq %rbp			; ATOM-NEXT: pushq %rbp
	; ATOM-NEXT: pushq %r15			; ATOM-NEXT: pushq %r15
	; ATOM-NEXT: pushq %r14			; ATOM-NEXT: pushq %r14
	; ATOM-NEXT: pushq %rbx			; ATOM-NEXT: pushq %rbx
	; ATOM-NEXT: ## kill: def $ecx killed $ecx def $rcx			; ATOM-NEXT: ## kill: def $ecx killed $ecx def $rcx
	; ATOM-NEXT: movl (%rdx), %r15d			; ATOM-NEXT: movl (%rdx), %r15d
	; ATOM-NEXT: movl 4(%rdx), %eax			; ATOM-NEXT: movl 4(%rdx), %eax
	; ATOM-NEXT: leaq 20(%rdx), %r14			; ATOM-NEXT: leaq 20(%rdx), %r11
	; ATOM-NEXT: movq _Te0@{{.*}}(%rip), %r9			; ATOM-NEXT: movq _Te0@{{.*}}(%rip), %r9
	; ATOM-NEXT: movq _Te1@{{.*}}(%rip), %r8			; ATOM-NEXT: movq _Te1@{{.*}}(%rip), %r8
	; ATOM-NEXT: movq _Te3@{{.*}}(%rip), %r10			; ATOM-NEXT: movq _Te3@{{.*}}(%rip), %r10
	; ATOM-NEXT: decl %ecx			; ATOM-NEXT: decl %ecx
	; ATOM-NEXT: movq %rcx, %r11			; ATOM-NEXT: movq %rcx, %r14
	; ATOM-NEXT: .p2align 4, 0x90			; ATOM-NEXT: .p2align 4, 0x90
	; ATOM-NEXT: LBB0_1: ## %bb			; ATOM-NEXT: LBB0_1: ## %bb
	; ATOM-NEXT: ## =>This Inner Loop Header: Depth=1			; ATOM-NEXT: ## =>This Inner Loop Header: Depth=1
	; ATOM-NEXT: movl %eax, %edi			; ATOM-NEXT: movl %eax, %edi
	; ATOM-NEXT: movl %r15d, %ebp			; ATOM-NEXT: movl %r15d, %ebp
	; ATOM-NEXT: shrl $24, %eax			; ATOM-NEXT: shrl $24, %eax
	; ATOM-NEXT: shrl $16, %edi			; ATOM-NEXT: shrl $16, %edi
	; ATOM-NEXT: shrl $24, %ebp			; ATOM-NEXT: shrl $24, %ebp
	; ATOM-NEXT: movzbl %dil, %edi			; ATOM-NEXT: movzbl %dil, %edi
	; ATOM-NEXT: movl (%r8,%rdi,4), %ebx			; ATOM-NEXT: movl (%r8,%rdi,4), %ebx
	; ATOM-NEXT: movzbl %r15b, %edi			; ATOM-NEXT: movzbl %r15b, %edi
	; ATOM-NEXT: xorl (%r9,%rbp,4), %ebx			; ATOM-NEXT: xorl (%r9,%rbp,4), %ebx
	; ATOM-NEXT: movl (%r10,%rdi,4), %edi			; ATOM-NEXT: movl (%r10,%rdi,4), %edi
	; ATOM-NEXT: xorl -12(%r14), %ebx			; ATOM-NEXT: xorl -12(%r11), %ebx
	; ATOM-NEXT: xorl (%r9,%rax,4), %edi			; ATOM-NEXT: xorl (%r9,%rax,4), %edi
	; ATOM-NEXT: movl %ebx, %eax			; ATOM-NEXT: movl %ebx, %eax
	; ATOM-NEXT: xorl -8(%r14), %edi			; ATOM-NEXT: xorl -8(%r11), %edi
	; ATOM-NEXT: shrl $24, %eax			; ATOM-NEXT: shrl $24, %eax
	; ATOM-NEXT: movl (%r9,%rax,4), %r15d			; ATOM-NEXT: movl (%r9,%rax,4), %r15d
	; ATOM-NEXT: testq %r11, %r11			; ATOM-NEXT: subq $1, %r14
	; ATOM-NEXT: movl %edi, %eax			; ATOM-NEXT: movl %edi, %eax
	; ATOM-NEXT: je LBB0_3			; ATOM-NEXT: jb LBB0_3
	; ATOM-NEXT: ## %bb.2: ## %bb1			; ATOM-NEXT: ## %bb.2: ## %bb1
	; ATOM-NEXT: ## in Loop: Header=BB0_1 Depth=1			; ATOM-NEXT: ## in Loop: Header=BB0_1 Depth=1
	; ATOM-NEXT: shrl $16, %eax			; ATOM-NEXT: shrl $16, %eax
	; ATOM-NEXT: shrl $24, %edi			; ATOM-NEXT: shrl $24, %edi
	; ATOM-NEXT: decq %r11			; ATOM-NEXT: movzbl %al, %eax
	; ATOM-NEXT: movzbl %al, %ebp			; ATOM-NEXT: xorl (%r8,%rax,4), %r15d
	; ATOM-NEXT: movzbl %bl, %eax			; ATOM-NEXT: movzbl %bl, %eax
	; ATOM-NEXT: movl (%r10,%rax,4), %eax			; ATOM-NEXT: movl (%r10,%rax,4), %eax
	; ATOM-NEXT: xorl (%r8,%rbp,4), %r15d			; ATOM-NEXT: xorl -4(%r11), %r15d
	; ATOM-NEXT: xorl (%r9,%rdi,4), %eax			; ATOM-NEXT: xorl (%r9,%rdi,4), %eax
	; ATOM-NEXT: xorl -4(%r14), %r15d			; ATOM-NEXT: xorl (%r11), %eax
	; ATOM-NEXT: xorl (%r14), %eax			; ATOM-NEXT: addq $16, %r11
	; ATOM-NEXT: addq $16, %r14
	; ATOM-NEXT: jmp LBB0_1			; ATOM-NEXT: jmp LBB0_1
	; ATOM-NEXT: LBB0_3: ## %bb2			; ATOM-NEXT: LBB0_3: ## %bb2
	; ATOM-NEXT: shrl $16, %eax			; ATOM-NEXT: shrl $16, %eax
	; ATOM-NEXT: shrl $8, %edi			; ATOM-NEXT: shrl $8, %edi
	; ATOM-NEXT: movzbl %bl, %ebp			; ATOM-NEXT: movzbl %bl, %ebp
	; ATOM-NEXT: andl $-16777216, %r15d ## imm = 0xFF000000			; ATOM-NEXT: andl $-16777216, %r15d ## imm = 0xFF000000
	; ATOM-NEXT: shlq $4, %rcx			; ATOM-NEXT: shlq $4, %rcx
	; ATOM-NEXT: movzbl %al, %eax			; ATOM-NEXT: movzbl %al, %eax
	▲ Show 20 Lines • Show All 234 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/usub_inc_iv.ll

	Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
	}			}

	; TODO: We can use trick with usub here.			; TODO: We can use trick with usub here.
	define i32 @test_02(i32* %p, i64 %len, i32 %x) {			define i32 @test_02(i32* %p, i64 %len, i32 %x) {
	; CHECK-LABEL: @test_02(			; CHECK-LABEL: @test_02(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[BACKEDGE:%.]] ], [ [[LEN:%.]], [[ENTRY:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[MATH:%.]], [[BACKEDGE:%.]] ], [ [[LEN:%.]], [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[COND_1:%.*]] = icmp eq i64 [[IV]], 0			; CHECK-NEXT: [[TMP0:%.*]] = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 [[IV]], i64 1)
	; CHECK-NEXT: br i1 [[COND_1]], label [[EXIT:%.*]], label [[BACKEDGE]]			; CHECK-NEXT: [[MATH]] = extractvalue { i64, i1 } [[TMP0]], 0
				; CHECK-NEXT: [[OV:%.*]] = extractvalue { i64, i1 } [[TMP0]], 1
				; CHECK-NEXT: br i1 [[OV]], label [[EXIT:%.*]], label [[BACKEDGE]]
	; CHECK: backedge:			; CHECK: backedge:
	; CHECK-NEXT: [[SUNKADDR:%.*]] = mul i64 [[IV]], 4			; CHECK-NEXT: [[SUNKADDR:%.*]] = mul i64 [[IV]], 4
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P:%.]] to i8			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[P:%.]] to i8
	; CHECK-NEXT: [[SUNKADDR1:%.]] = getelementptr i8, i8 [[TMP0]], i64 [[SUNKADDR]]			; CHECK-NEXT: [[SUNKADDR1:%.]] = getelementptr i8, i8 [[TMP1]], i64 [[SUNKADDR]]
	; CHECK-NEXT: [[SUNKADDR2:%.]] = getelementptr i8, i8 [[SUNKADDR1]], i64 -4			; CHECK-NEXT: [[SUNKADDR2:%.]] = getelementptr i8, i8 [[SUNKADDR1]], i64 -4
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[SUNKADDR2]] to i32*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[SUNKADDR2]] to i32*
	; CHECK-NEXT: [[LOADED:%.]] = load atomic i32, i32 [[TMP1]] unordered, align 4			; CHECK-NEXT: [[LOADED:%.]] = load atomic i32, i32 [[TMP2]] unordered, align 4
	; CHECK-NEXT: [[COND_2:%.]] = icmp eq i32 [[LOADED]], [[X:%.]]			; CHECK-NEXT: [[COND_2:%.]] = icmp eq i32 [[LOADED]], [[X:%.]]
	; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], -1
	; CHECK-NEXT: br i1 [[COND_2]], label [[FAILURE:%.*]], label [[LOOP]]			; CHECK-NEXT: br i1 [[COND_2]], label [[FAILURE:%.*]], label [[LOOP]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret i32 -1			; CHECK-NEXT: ret i32 -1
	; CHECK: failure:			; CHECK: failure:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	%scevgep = getelementptr i32, i32* %p, i64 -1			%scevgep = getelementptr i32, i32* %p, i64 -1
	br label %loop			br label %loop

	loop: ; preds = %backedge, %entry			loop: ; preds = %backedge, %entry
	%iv = phi i64 [ %iv.next, %backedge ], [ %len, %entry ]			%iv = phi i64 [ %iv.next, %backedge ], [ %len, %entry ]
	%cond_1 = icmp eq i64 %iv, 0			%cond_1 = icmp eq i64 %iv, 0
	br i1 %cond_1, label %exit, label %backedge			br i1 %cond_1, label %exit, label %backedge

				spatelUnsubmitted Not Done Reply Inline Actions I defer to @reames to continue the review (I don't have a good handle on the IV subtleties), but I think it would be easier to review if the test diffs are pre-committed, so we just see the minimal changes from the patch. spatel: I defer to @reames to continue the review (I don't have a good handle on the IV subtleties)…
	backedge: ; preds = %loop			backedge: ; preds = %loop
	%scevgep1 = getelementptr i32, i32* %scevgep, i64 %iv			%scevgep1 = getelementptr i32, i32* %scevgep, i64 %iv
	%loaded = load atomic i32, i32* %scevgep1 unordered, align 4			%loaded = load atomic i32, i32* %scevgep1 unordered, align 4
	%cond_2 = icmp eq i32 %loaded, %x			%cond_2 = icmp eq i32 %loaded, %x
	%iv.next = add i64 %iv, -1			%iv.next = add i64 %iv, -1
	br i1 %cond_2, label %failure, label %loop			br i1 %cond_2, label %failure, label %loop

	exit: ; preds = %loop			exit: ; preds = %loop
	▲ Show 20 Lines • Show All 120 Lines • Show Last 20 Lines