This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
2
MachineLICM.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
cmpxchg-idioms.ll
-
ARM/
-
2011-04-11-MachineLICMBug.ll
-
atomic-cmpxchg.ll
-
X86/
-
licm-nested.ll
3
loop-search.ll
-
tail-dup-merge-loop-headers.ll

Differential D24760

Failure to hoist constant out of loop
AbandonedPublic

Authored by avt77 on Sep 20 2016, 6:17 AM.

Download Raw Diff

Details

Reviewers

spatel
qcolombet
RKSimon
reames
simon.f.whittaker
hfinkel

Summary

Poor code gen identified in Andreas Fredriksson's GDC 2016 Talk 'Taming the Jaguar: x86 Optimization at Insomniac Games':
http://schedule.gdconf.com/session/taming-the-jaguar-x86-optimization-at-insomniac-games

(see details in PR27136)

Diff Detail

Event Timeline

avt77 updated this revision to Diff 71919.Sep 20 2016, 6:17 AM

avt77 retitled this revision from to Failure to hoist constant out of loop.

avt77 updated this object.

avt77 added a reviewer: ABataev.

I updated the patch after the first review from Alexey Bataev

Another Alexey's request resolved.

avt77 added reviewers: simon.f.whittaker, RKSimon, spatel.Sep 21 2016, 5:33 AM

Fixed the test accordingly to rksimon request

avt77 updated this object.Sep 22 2016, 7:16 AM

Metadata removed from the test

The previously committed test was updated to mirror changes in CodeGen. Please, review this version.

This certainly seems like the right thing to do. Have you run the test suite, etc. to check for performance regressions?

In D24760#558980, @hfinkel wrote:

This certainly seems like the right thing to do. Have you run the test suite, etc. to check for performance regressions?

No, I did not run any special test suite. Could you give me a hint how to do it?

djasper removed a reviewer: djasper.Oct 4 2016, 11:29 PM

In D24760#558980, @hfinkel wrote:

This certainly seems like the right thing to do. Have you run the test suite, etc. to check for performance regressions?

All tests from llvm-test-suite passed and there are no performance degradations. I used the following command:

lnt runtest nt -v -sandbox SANDBOX --cc WORKSPACE/build/bin/clang --test-suite ~/llvm-test-suite

Is it enough or I should do more testing?

A couple of small comments, but fair warning: I am not a qualified reviewer for this code and won't be able to sign off.

lib/CodeGen/MachineLICM.cpp
1083	If I understand the issue correctly, this might be better phrased in terms of this predicate. Essentially, we know that we can rematerialize the constant without creating a copy even if there is a PHI use. I can see why this follows for loop exit phis. I'm not quite as clear I follow for phi's inside the loop.
test/CodeGen/X86/loop-search.ll
59	Just to make sure I understand the case you're trying to fix, the previous code fell over on this instruction as it is a use of the instruction to materialize the constant right?

Move the comment about hoisted constant into the proper place

avt77 added inline comments.Oct 11 2016, 4:04 AM

lib/CodeGen/MachineLICM.cpp
1083	I'm not sure I understand you. The idea is very simple and you noticed about: yes, we can rematerialize the constant without creating a copy that's why it's profitable to hoist it. Is it OK?
test/CodeGen/X86/loop-search.ll
59	It seems I put my comment in the wrong place: it should be moved from line 28 to line 33. I'll fix it asap. The previous code prepared TRUE result (constant) inside loop body (see lines 18, 19 on the left). Now we materialize this constant outside the loop (see lines 32, 33 on the right).

I created a simple test to check the performance:
#include <stdint.h>
#include <stdbool.h>

#define N 1000000000
#define M 100000000000000000

bool search(long needle, const uint32_t *haystack, int count) {

for (int i = 0; i < count; ++i)
  if (needle == haystack[i])
    return true;
return false;

}

long a [N];

int main () {

int i;
for (i = 0; i < N; i++) {
  a[i] = i;
}
for (long j = 0; j < M; j++) {
  search (j % N, (const uint32_t *)&a, N);
}

}
I checked this test with and without my fix like here:

time ./search.o

The improvement is about 8%.

Hal, could you give me your LGTM on this patch?

In D24760#570267, @avt77 wrote:
I created a simple test to check the performance:
#include <stdint.h>
#include <stdbool.h>

#define N 1000000000
#define M 100000000000000000

bool search(long needle, const uint32_t *haystack, int count) {
for (int i = 0; i < count; ++i)
  if (needle == haystack[i])
    return true;
return false;
}

long a [N];

int main () {
int i;
for (i = 0; i < N; i++) {
  a[i] = i;
}
for (long j = 0; j < M; j++) {
  search (j % N, (const uint32_t *)&a, N);
}
}
I checked this test with and without my fix like here:

time ./search.o

The improvement is about 8%.

Hal, could you give me your LGTM on this patch?

Okay; LGTM. Can you also put together a patch to add your test to the test-suite? That way we can track the performance.

This revision is now accepted and ready to land.Oct 14 2016, 7:14 AM

Sorry, don't understand: do you mean I should add my perf test into the patch? But it is not a LIT test, it's an application. How can I add it to the test suit? I'll do it with my follow-up patch (on the next week) but please explain me how I should do it.

In D24760#570503, @avt77 wrote:

Sorry, don't understand: do you mean I should add my perf test into the patch? But it is not a LIT test, it's an application. How can I add it to the test suit? I'll do it with my follow-up patch (on the next week) but please explain me how I should do it.

Applications get added to the test suite (not the lit-run regressions tests). You know how to run the test suite (you did it with lnt as you indicated above). You could submit a patch to add your test to https://llvm.org/svn/llvm-project/test-suite/trunk, probably by placing it in the SingleSource/UnitTests subdirectory. The patch gets directed to llvm-commits as with this one.

avt77 added a reviewer: kparzysz.Oct 18 2016, 11:59 PM

kparzysz, not long ago you added test tail-dup-merge-loop-headers.ll (maybe you added some other tests as well at that time). And this test (plus some others) failed now if I add this patch to the trunk code. Could you review my really tiny patch and could you give me a hint what's wrong with your code if I apply this patch?

In D24760#573827, @avt77 wrote:

kparzysz, not long ago you added test tail-dup-merge-loop-headers.ll (maybe you added some other tests as well at that time). And this test (plus some others) failed now if I add this patch to the trunk code. Could you review my really tiny patch and could you give me a hint what's wrong with your code if I apply this patch?

That was actually Kyle Butt (@iteratee) who added that test.

For tail-dup-merge-loop-headers.ll

That is a layout test, and your change affected the final layout. Your change is innocuous because it's really checking for the presence/absence of blocks.

In general:
By my count, you have 5 other tests to look at. Please look carefully at them to see if you can understand the intent of the test, and if the resulting code is in keeping with the intent of the test. If so, feel free to change the test to match. If you can't, try to get someone to look at the specific test.

avt77 edited edge metadata.Oct 20 2016, 7:13 AM

avt77 added subscribers: ab, jroelofs, logan.

logan, jroelofs, ab, iteratee, tnorthover: this patch fails your tests:
atomic-cmpxchg
cmpxchg-idioms
ifcvt-rescan-diamonds
But it seems the new version is better than the current one. Could you review the newly generated code and allow me to fix the tests?

atomic-cmpxchg.current.s749 BDownload

atomic-cmpxchg.new.s736 BDownload

tail-dup-merge-loop-headers.new.s4 KBDownload

cmpxchg-idioms.current.s1 KBDownload

ifcvt-rescan-diamonds.current.s2 KBDownload

ifcvt-rescan-diamonds.new.s2 KBDownload

cmpxchg-idioms.new.s1 KBDownload

tail-dup-merge-loop-headers.current.s4 KBDownload

In D24760#575410, @avt77 wrote:

logan, jroelofs, ab, iteratee, tnorthover: this patch fails your tests:
atomic-cmpxchg
cmpxchg-idioms
ifcvt-rescan-diamonds
But it seems the new version is better than the current one. Could you review the newly generated code and allow me to fix the tests?
atomic-cmpxchg.current.s749 BDownload

atomic-cmpxchg.new.s736 BDownload

tail-dup-merge-loop-headers.new.s4 KBDownload

cmpxchg-idioms.current.s1 KBDownload

ifcvt-rescan-diamonds.current.s2 KBDownload

ifcvt-rescan-diamonds.new.s2 KBDownload

cmpxchg-idioms.new.s1 KBDownload

tail-dup-merge-loop-headers.current.s4 KBDownload

Mind uploading these as full-context diffs? Phab's way of showing them as raw files isn't particularly helpful reviewing / making comments.

for ifcvt-rescan-diamonds.ll Nothing is broken, the test just gets optimized away.

Change the 0 in the phi of %cond.end84 to a load and you'll be fine. Change the CHECK lines to match.

I've fixed all failed tests. Test owners please review my changes: they are rather small that's why it should not take a lot of time from you.

Herald added a subscriber: anna. · View Herald TranscriptOct 26 2016, 7:35 AM

I don't see any changes to ifcvt-rescan-diamonds.ll

In D24760#580189, @iteratee wrote:

I don't see any changes to ifcvt-rescan-diamonds.ll

Yes, the problem has disappeared - don't know why. Sorry for boring.

Hi @avt77:

For atomic-cmpxchg.ll, I found that your output is the same as the version I originally committed. The test was updated by @danielcdh in D24818 / rL284757. You may wish to add him as reviewer.

Looks like the hoisting issue described in PR27136 is already fixed at head? My guess is that the fix is from https://reviews.llvm.org/rL284757

In D24760#584793, @danielcdh wrote:

Looks like the hoisting issue described in PR27136 is already fixed at head? My guess is that the fix is from https://reviews.llvm.org/rL284757

Yes, you're right: the current trunk does not have the issue with test/CodeGen/X86/loop-search.ll. It was fixed. But the hosting issue is still here: this patch really improves code for several tests. Should we continue with this patch or we should simply close it because loop_search.ll was fixed?

kparzysz resigned from this revision.Nov 2 2016, 6:39 AM

kparzysz removed a reviewer: kparzysz.

danielcdh added inline comments.Nov 2 2016, 10:18 AM

test/CodeGen/X86/loop-search.ll
15	Looks like hoisting this instruction is not the best choice because it will be executed speculatively. More specifically, if the search fails, i.e. the loop terminated when i==count (no early exit), the performance will be worse because the hoisted movb will be redundant. OTOH, if the loop terminated with early exit (needle == haystack[i]), there will be no redundancy, but the life range of ax will be much longer and overlaps with many other life-ranges. This will add extra burden to RA. So looks to me hoisting is an overall loss here?

ABataev resigned from this revision.Feb 13 2017, 11:44 AM

Abandon this? Its seems to be fixed in trunk - there are still some issues in PR27136 but no constant hoisting.

This revision is now accepted and ready to land.Apr 7 2018, 9:17 AM

Herald added a subscriber: javed.absar. · View Herald TranscriptApr 7 2018, 9:17 AM

Stripping approval

This revision now requires changes to proceed.Apr 7 2018, 9:18 AM

The trunk fixed the issue.

Revision Contents

Path

Size

lib/

CodeGen/

MachineLICM.cpp

10 lines

test/

CodeGen/

AArch64/

cmpxchg-idioms.ll

12 lines

ARM/

2011-04-11-MachineLICMBug.ll

1 line

atomic-cmpxchg.ll

33 lines

X86/

licm-nested.ll

2 lines

loop-search.ll

19 lines

tail-dup-merge-loop-headers.ll

1 line

Diff 75885

lib/CodeGen/MachineLICM.cpp

	Show First 20 Lines • Show All 1,056 Lines • ▼ Show 20 Lines
	}			}

	/// Return true if it is potentially profitable to hoist the given loop			/// Return true if it is potentially profitable to hoist the given loop
	/// invariant.			/// invariant.
	bool MachineLICM::IsProfitableToHoist(MachineInstr &MI) {			bool MachineLICM::IsProfitableToHoist(MachineInstr &MI) {
	if (MI.isImplicitDef())			if (MI.isImplicitDef())
	return true;			return true;

				// Rematerializable instructions should always be hoisted since the register
				// allocator can just pull them down again when needed.
				if (TII->isTriviallyReMaterializable(MI, AA))
				return true;

	// Besides removing computation from the loop, hoisting an instruction has			// Besides removing computation from the loop, hoisting an instruction has
	// these effects:			// these effects:
	//			//
	// - The value defined by the instruction becomes live across the entire			// - The value defined by the instruction becomes live across the entire
	// loop. This increases register pressure in the loop.			// loop. This increases register pressure in the loop.
	//			//
	// - If the value is used by a PHI in the loop, a copy will be required for			// - If the value is used by a PHI in the loop, a copy will be required for
	// lowering the PHI after extending the live range.			// lowering the PHI after extending the live range.
	//			//
	// - When hoisting the last use of a value in the loop, that value no longer			// - When hoisting the last use of a value in the loop, that value no longer
	// needs to be live in the loop. This lowers register pressure in the loop.			// needs to be live in the loop. This lowers register pressure in the loop.

	bool CheapInstr = IsCheapInstruction(MI);			bool CheapInstr = IsCheapInstruction(MI);
	bool CreatesCopy = HasLoopPHIUse(&MI);			bool CreatesCopy = HasLoopPHIUse(&MI);
				reamesUnsubmitted Not Done Reply Inline Actions If I understand the issue correctly, this might be better phrased in terms of this predicate. Essentially, we know that we can rematerialize the constant without creating a copy even if there is a PHI use. I can see why this follows for loop exit phis. I'm not quite as clear I follow for phi's inside the loop. reames: If I understand the issue correctly, this might be better phrased in terms of this predicate.
				avt77AuthorUnsubmitted Not Done Reply Inline Actions I'm not sure I understand you. The idea is very simple and you noticed about: yes, we can rematerialize the constant without creating a copy that's why it's profitable to hoist it. Is it OK? avt77: I'm not sure I understand you. The idea is very simple and you noticed about: yes, we can…

	// Don't hoist a cheap instruction if it would create a copy in the loop.			// Don't hoist a cheap instruction if it would create a copy in the loop.
	if (CheapInstr && CreatesCopy) {			if (CheapInstr && CreatesCopy) {
	DEBUG(dbgs() << "Won't hoist cheap instr with loop PHI use: " << MI);			DEBUG(dbgs() << "Won't hoist cheap instr with loop PHI use: " << MI);
	return false;			return false;
	}			}

	// Rematerializable instructions should always be hoisted since the register
	// allocator can just pull them down again when needed.
	if (TII->isTriviallyReMaterializable(MI, AA))
	return true;

	// FIXME: If there are long latency loop-invariant instructions inside the			// FIXME: If there are long latency loop-invariant instructions inside the
	// loop at this point, why didn't the optimizer's LICM hoist them?			// loop at this point, why didn't the optimizer's LICM hoist them?
	for (unsigned i = 0, e = MI.getDesc().getNumOperands(); i != e; ++i) {			for (unsigned i = 0, e = MI.getDesc().getNumOperands(); i != e; ++i) {
	const MachineOperand &MO = MI.getOperand(i);			const MachineOperand &MO = MI.getOperand(i);
	if (!MO.isReg() \|\| MO.isImplicit())			if (!MO.isReg() \|\| MO.isImplicit())
	continue;			continue;
	unsigned Reg = MO.getReg();			unsigned Reg = MO.getReg();
	if (!TargetRegisterInfo::isVirtualRegister(Reg))			if (!TargetRegisterInfo::isVirtualRegister(Reg))
	▲ Show 20 Lines • Show All 290 Lines • Show Last 20 Lines

test/CodeGen/AArch64/cmpxchg-idioms.ll

	; RUN: llc -mtriple=aarch64-apple-ios7.0 -o - %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-apple-ios7.0 -o - %s \| FileCheck %s

	define i32 @test_return(i32* %p, i32 %oldval, i32 %newval) {			define i32 @test_return(i32* %p, i32 %oldval, i32 %newval) {
	; CHECK-LABEL: test_return:			; CHECK-LABEL: test_return:

	; CHECK: [[LOOP:LBB[0-9]+_[0-9]+]]:			; CHECK: [[LOOP:LBB[0-9]+_[0-9]+]]:
	; CHECK: ldaxr [[LOADED:w[0-9]+]], [x0]			; CHECK: ldaxr [[LOADED:w[0-9]+]], [x8]
	; CHECK: cmp [[LOADED]], w1			; CHECK: cmp [[LOADED]], w1
	; CHECK: b.ne [[FAILED:LBB[0-9]+_[0-9]+]]			; CHECK: b.ne [[FAILED:LBB[0-9]+_[0-9]+]]

	; CHECK: stlxr [[STATUS:w[0-9]+]], {{w[0-9]+}}, [x0]			; CHECK: stlxr [[STATUS:w[0-9]+]], {{w[0-9]+}}, [x8]
	; CHECK: cbnz [[STATUS]], [[LOOP]]			; CHECK: cbnz [[STATUS]], [[LOOP]]

	; CHECK-NOT: cmp {{w[0-9]+}}, {{w[0-9]+}}
	; CHECK: orr w0, wzr, #0x1
	; CHECK: ret

	; CHECK: [[FAILED]]:			; CHECK: [[FAILED]]:
	; CHECK-NOT: cmp {{w[0-9]+}}, {{w[0-9]+}}			; CHECK-NOT: cmp {{w[0-9]+}}, {{w[0-9]+}}
	; CHECK: mov w0, wzr			; CHECK: mov w0, wzr
	; CHECK: ret			; CHECK: ret

	%pair = cmpxchg i32* %p, i32 %oldval, i32 %newval seq_cst seq_cst			%pair = cmpxchg i32* %p, i32 %oldval, i32 %newval seq_cst seq_cst
	%success = extractvalue { i32, i1 } %pair, 1			%success = extractvalue { i32, i1 } %pair, 1
	%conv = zext i1 %success to i32			%conv = zext i1 %success to i32
	ret i32 %conv			ret i32 %conv
	}			}

	define i1 @test_return_bool(i8* %value, i8 %oldValue, i8 %newValue) {			define i1 @test_return_bool(i8* %value, i8 %oldValue, i8 %newValue) {
	; CHECK-LABEL: test_return_bool:			; CHECK-LABEL: test_return_bool:

	; CHECK: [[LOOP:LBB[0-9]+_[0-9]+]]:			; CHECK: [[LOOP:LBB[0-9]+_[0-9]+]]:
	; CHECK: ldaxrb [[LOADED:w[0-9]+]], [x0]			; CHECK: ldaxrb [[LOADED:w[0-9]+]], [x0]
	; CHECK: cmp [[LOADED]], w1, uxtb			; CHECK: cmp [[LOADED]], w1, uxtb
	; CHECK: b.ne [[FAILED:LBB[0-9]+_[0-9]+]]			; CHECK: b.ne [[FAILED:LBB[0-9]+_[0-9]+]]

	; CHECK: stlxrb [[STATUS:w[0-9]+]], {{w[0-9]+}}, [x0]			; CHECK: stlxrb [[STATUS:w[0-9]+]], {{w[0-9]+}}, [x0]
	; CHECK: cbnz [[STATUS]], [[LOOP]]			; CHECK: cbnz [[STATUS]], [[LOOP]]

	; CHECK-NOT: cmp {{w[0-9]+}}, {{w[0-9]+}}			; CHECK-NOT: cmp {{w[0-9]+}}, {{w[0-9]+}}
	; FIXME: DAG combine should be able to deal with this.
	; CHECK: orr [[TMP:w[0-9]+]], wzr, #0x1
	; CHECK: eor w0, [[TMP]], #0x1
	; CHECK: ret

	; CHECK: [[FAILED]]:			; CHECK: [[FAILED]]:
	; CHECK-NOT: cmp {{w[0-9]+}}, {{w[0-9]+}}			; CHECK-NOT: cmp {{w[0-9]+}}, {{w[0-9]+}}
	; CHECK: mov [[TMP:w[0-9]+]], wzr			; CHECK: mov [[TMP:w[0-9]+]], wzr
	; CHECK: eor w0, [[TMP]], #0x1			; CHECK: eor w0, [[TMP]], #0x1
	; CHECK: ret			; CHECK: ret

	%pair = cmpxchg i8* %value, i8 %oldValue, i8 %newValue acq_rel monotonic			%pair = cmpxchg i8* %value, i8 %oldValue, i8 %newValue acq_rel monotonic
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

test/CodeGen/ARM/2011-04-11-MachineLICMBug.ll

	Show All 10 Lines
	for.cond:			for.cond:
	%0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]			%0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
	%cmp = icmp ult i32 %0, %size			%cmp = icmp ult i32 %0, %size
	br i1 %cmp, label %for.body, label %return			br i1 %cmp, label %for.body, label %return

	for.body:			for.body:
	; CHECK: %for.			; CHECK: %for.
	; CHECK: mov{{.*}} r{{[0-9]+}}, #{{[01]}}			; CHECK: mov{{.*}} r{{[0-9]+}}, #{{[01]}}
	; CHECK: mov{{.*}} r{{[0-9]+}}, #{{[01]}}
	; CHECK-NOT: mov r{{[0-9]+}}, #{{[01]}}			; CHECK-NOT: mov r{{[0-9]+}}, #{{[01]}}
	%arrayidx = getelementptr i32, i32* %A, i32 %0			%arrayidx = getelementptr i32, i32* %A, i32 %0
	%tmp4 = load i32, i32* %arrayidx, align 4			%tmp4 = load i32, i32* %arrayidx, align 4
	%cmp6 = icmp eq i32 %tmp4, %value			%cmp6 = icmp eq i32 %tmp4, %value
	br i1 %cmp6, label %return, label %for.inc			br i1 %cmp6, label %return, label %for.inc

	for.inc:			for.inc:
	%inc = add i32 %0, 1			%inc = add i32 %0, 1
	br label %for.cond			br label %for.cond

	return:			return:
	%retval.0 = phi i1 [ true, %for.body ], [ false, %for.cond ]			%retval.0 = phi i1 [ true, %for.body ], [ false, %for.cond ]
	ret i1 %retval.0			ret i1 %retval.0
	}			}

test/CodeGen/ARM/atomic-cmpxchg.ll

	Show All 29 Lines
	; CHECK-THUMB: movs [[R2:r[0-9]+]], #0			; CHECK-THUMB: movs [[R2:r[0-9]+]], #0
	; CHECK-THUMB: cmp [[R1]], {{r[0-9]+}}			; CHECK-THUMB: cmp [[R1]], {{r[0-9]+}}
	; CHECK-THUMB: beq			; CHECK-THUMB: beq
	; CHECK-THUMB: push {[[R2]]}			; CHECK-THUMB: push {[[R2]]}
	; CHECK-THUMB: pop {r0}			; CHECK-THUMB: pop {r0}

	; CHECK-ARMV6-LABEL: test_cmpxchg_res_i8:			; CHECK-ARMV6-LABEL: test_cmpxchg_res_i8:
	; CHECK-ARMV6-NEXT: .fnstart			; CHECK-ARMV6-NEXT: .fnstart
				; CHECK-ARMV6-NEXT: mov [[REG:r[0-9]+]], r0
	; CHECK-ARMV6-NEXT: uxtb [[DESIRED:r[0-9]+]], r1			; CHECK-ARMV6-NEXT: uxtb [[DESIRED:r[0-9]+]], r1
				; CHECK-ARMV6-NEXT: mov r0, #0
	; CHECK-ARMV6-NEXT: [[TRY:.LBB[0-9_]+]]:			; CHECK-ARMV6-NEXT: [[TRY:.LBB[0-9_]+]]:
	; CHECK-ARMV6-NEXT: ldrexb [[LD:r[0-9]+]], [r0]			; CHECK-ARMV6-NEXT: ldrexb [[LD:r[0-9]+]], {{\[}}[[REG]]{{\]}}
	; CHECK-ARMV6-NEXT: cmp [[LD]], [[DESIRED]]			; CHECK-ARMV6-NEXT: cmp [[LD]], [[DESIRED]]
	; CHECK-ARMV6-NEXT: movne [[RES:r[0-9]+]], #0
	; CHECK-ARMV6-NEXT: bxne lr			; CHECK-ARMV6-NEXT: bxne lr
	; CHECK-ARMV6-NEXT: strexb [[SUCCESS:r[0-9]+]], r2, [r0]			; CHECK-ARMV6-NEXT: strexb [[SUCCESS:r[0-9]+]], r2, {{\[}}[[REG]]{{\]}}
	; CHECK-ARMV6-NEXT: cmp [[SUCCESS]], #0			; CHECK-ARMV6-NEXT: cmp [[SUCCESS]], #0
	; CHECK-ARMV6-NEXT: moveq [[RES]], #1			; CHECK-ARMV6-NEXT: bne [[TRY]]
	; CHECK-ARMV6-NEXT: bxeq lr			; CHECK-ARMV6-NEXT: mov r0, #1
	; CHECK-ARMV6-NEXT: b [[TRY]]			; CHECK-ARMV6-NEXT: bx lr

	; CHECK-THUMBV6-LABEL: test_cmpxchg_res_i8:			; CHECK-THUMBV6-LABEL: test_cmpxchg_res_i8:
	; CHECK-THUMBV6: mov [[EXPECTED:r[0-9]+]], r1			; CHECK-THUMBV6: mov [[EXPECTED:r[0-9]+]], r1
	; CHECK-THUMBV6-NEXT: bl __sync_val_compare_and_swap_1			; CHECK-THUMBV6-NEXT: bl __sync_val_compare_and_swap_1
	; CHECK-THUMBV6-NEXT: mov [[RES:r[0-9]+]], r0			; CHECK-THUMBV6-NEXT: mov [[RES:r[0-9]+]], r0
	; CHECK-THUMBV6-NEXT: movs r0, #1			; CHECK-THUMBV6-NEXT: movs r0, #1
	; CHECK-THUMBV6-NEXT: movs [[ZERO:r[0-9]+]], #0			; CHECK-THUMBV6-NEXT: movs [[ZERO:r[0-9]+]], #0
	; CHECK-THUMBV6-NEXT: cmp [[RES]], [[EXPECTED]]			; CHECK-THUMBV6-NEXT: cmp [[RES]], [[EXPECTED]]
	; CHECK-THUMBV6-NEXT: beq [[END:.LBB[0-9_]+]]			; CHECK-THUMBV6-NEXT: beq [[END:.LBB[0-9_]+]]
	; CHECK-THUMBV6-NEXT: mov r0, [[ZERO]]			; CHECK-THUMBV6-NEXT: mov r0, [[ZERO]]
	; CHECK-THUMBV6-NEXT: [[END]]:			; CHECK-THUMBV6-NEXT: [[END]]:
	; CHECK-THUMBV6-NEXT: pop {{.*}}pc}			; CHECK-THUMBV6-NEXT: pop {{.*}}pc}

	; CHECK-ARMV7-LABEL: test_cmpxchg_res_i8:			; CHECK-ARMV7-LABEL: test_cmpxchg_res_i8:
	; CHECK-ARMV7-NEXT: .fnstart			; CHECK-ARMV7-NEXT: .fnstart
				; CHECK-ARMV7-NEXT: mov [[REG:r[0-9]+]], r0
	; CHECK-ARMV7-NEXT: uxtb [[DESIRED:r[0-9]+]], r1			; CHECK-ARMV7-NEXT: uxtb [[DESIRED:r[0-9]+]], r1
				; CHECK-ARMV7-NEXT: mov r0, #1
	; CHECK-ARMV7-NEXT: b [[TRY:.LBB[0-9_]+]]			; CHECK-ARMV7-NEXT: b [[TRY:.LBB[0-9_]+]]
	; CHECK-ARMV7-NEXT: [[HEAD:.LBB[0-9_]+]]:			; CHECK-ARMV7-NEXT: [[HEAD:.LBB[0-9_]+]]:
	; CHECK-ARMV7-NEXT: strexb [[SUCCESS:r[0-9]+]], r2, [r0]			; CHECK-ARMV7-NEXT: strexb [[SUCCESS:r[0-9]+]], r2, {{\[}}[[REG]]{{\]}}
	; CHECK-ARMV7-NEXT: cmp [[SUCCESS]], #0			; CHECK-ARMV7-NEXT: cmp [[SUCCESS]], #0
	; CHECK-ARMV7-NEXT: moveq [[RES:r[0-9]+]], #1
	; CHECK-ARMV7-NEXT: bxeq lr			; CHECK-ARMV7-NEXT: bxeq lr
	; CHECK-ARMV7-NEXT: [[TRY]]:			; CHECK-ARMV7-NEXT: [[TRY]]:
	; CHECK-ARMV7-NEXT: ldrexb [[LD:r[0-9]+]], [r0]			; CHECK-ARMV7-NEXT: ldrexb [[LD:r[0-9]+]], {{\[}}[[REG]]{{\]}}
	; CHECK-ARMV7-NEXT: cmp [[LD]], [[DESIRED]]			; CHECK-ARMV7-NEXT: cmp [[LD]], [[DESIRED]]
	; CHECK-ARMV7-NEXT: beq [[HEAD]]			; CHECK-ARMV7-NEXT: beq [[HEAD]]
	; CHECK-ARMV7-NEXT: clrex			; CHECK-ARMV7-NEXT: clrex
	; CHECK-ARMV7-NEXT: mov [[RES]], #0			; CHECK-ARMV7-NEXT: mov r0, #0
	; CHECK-ARMV7-NEXT: bx lr			; CHECK-ARMV7-NEXT: bx lr

	; CHECK-THUMBV7-LABEL: test_cmpxchg_res_i8:			; CHECK-THUMBV7-LABEL: test_cmpxchg_res_i8:
	; CHECK-THUMBV7-NEXT: .fnstart			; CHECK-THUMBV7-NEXT: .fnstart
	; CHECK-THUMBV7-NEXT: uxtb [[DESIRED:r[0-9]+]], r1			; CHECK-THUMBV7-NEXT: mov [[REG:r[0-9]+]], r0
				; CHECK-THUMBV7-NEXT: uxtb{{.*}} [[DESIRED:r[0-9]+]], r1
				; CHECK-THUMBV7-NEXT: movs r0, #1
	; CHECK-THUMBV7-NEXT: b [[TRYLD:.LBB[0-9_]+]]			; CHECK-THUMBV7-NEXT: b [[TRYLD:.LBB[0-9_]+]]
	; CHECK-THUMBV7-NEXT: [[TRYST:.LBB[0-9_]+]]:			; CHECK-THUMBV7-NEXT: [[TRYST:.LBB[0-9_]+]]:
	; CHECK-THUMBV7-NEXT: strexb [[SUCCESS:r[0-9]+]], r2, [r0]			; CHECK-THUMBV7-NEXT: strexb [[SUCCESS:r[0-9]+]], r2, {{\[}}[[REG]]{{\]}}
	; CHECK-THUMBV7-NEXT: cmp [[SUCCESS]], #0			; CHECK-THUMBV7-NEXT: cmp [[SUCCESS]], #0
	; CHECK-THUMBV7-NEXT: itt eq			; CHECK-THUMBV7-NEXT: it eq
	; CHECK-THUMBV7-NEXT: moveq r0, #1
	; CHECK-THUMBV7-NEXT: bxeq lr			; CHECK-THUMBV7-NEXT: bxeq lr
	; CHECK-THUMBV7-NEXT: [[TRYLD]]:			; CHECK-THUMBV7-NEXT: [[TRYLD]]:
	; CHECK-THUMBV7-NEXT: ldrexb [[LD:r[0-9]+]], [r0]			; CHECK-THUMBV7-NEXT: ldrexb [[LD:r[0-9]+]], {{\[}}[[REG]]{{\]}}
	; CHECK-THUMBV7-NEXT: cmp [[LD]], [[DESIRED]]			; CHECK-THUMBV7-NEXT: cmp [[LD]], [[DESIRED]]
	; CHECK-THUMBV7-NEXT: beq [[TRYST:.LBB[0-9_]+]]			; CHECK-THUMBV7-NEXT: beq [[TRYST:.LBB[0-9_]+]]
	; CHECK-THUMBV7-NEXT: clrex			; CHECK-THUMBV7-NEXT: clrex
	; CHECK-THUMBV7-NEXT: movs r0, #0			; CHECK-THUMBV7-NEXT: movs r0, #0
	; CHECK-THUMBV7-NEXT: bx lr			; CHECK-THUMBV7-NEXT: bx lr

test/CodeGen/X86/licm-nested.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: llc -mtriple=x86_64-apple-darwin -march=x86-64 < %s -o /dev/null -stats -info-output-file - \| grep "hoisted out of loops" \| grep 4			; RUN: llc -mtriple=x86_64-apple-darwin -march=x86-64 < %s -o /dev/null -stats -info-output-file - \| grep "hoisted out of loops" \| grep 6

	; MachineLICM should be able to hoist the symbolic addresses out of			; MachineLICM should be able to hoist the symbolic addresses out of
	; the inner loops.			; the inner loops.

	@main.flags = internal global [8193 x i8] zeroinitializer, align 16 ; <[8193 x i8]*> [#uses=3]			@main.flags = internal global [8193 x i8] zeroinitializer, align 16 ; <[8193 x i8]*> [#uses=3]
	@.str = private constant [11 x i8] c"Count: %d\0A\00" ; <[11 x i8]*> [#uses=1]			@.str = private constant [11 x i8] c"Count: %d\0A\00" ; <[11 x i8]*> [#uses=1]

	define i32 @main(i32 %argc, i8** nocapture %argv) nounwind ssp {			define i32 @main(i32 %argc, i8** nocapture %argv) nounwind ssp {
	▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

test/CodeGen/X86/loop-search.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-apple-darwin \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-apple-darwin \| FileCheck %s

	; This test comes from PR27136			; This test comes from PR27136
	; We should hoist loop constant invariant			; We should hoist loop constant invariant

	define zeroext i1 @search(i32 %needle, i32* nocapture readonly %haystack, i32 %count) {			define zeroext i1 @search(i32 %needle, i32* nocapture readonly %haystack, i32 %count) {
	; CHECK-LABEL: search:			; CHECK-LABEL: search:
	; CHECK: ## BB#0: ## %entry			; CHECK: ## BB#0: ## %entry
	; CHECK-NEXT: testl %edx, %edx			; CHECK-NEXT: testl %edx, %edx
	; CHECK-NEXT: jle LBB0_1			; CHECK-NEXT: jle LBB0_1
	; CHECK-NEXT: ## BB#4: ## %for.body.preheader			; CHECK-NEXT: ## BB#4: ## %for.body.preheader
	; CHECK-NEXT: movslq %edx, %rax			; CHECK-NEXT: movslq %edx, %rcx
	; CHECK-NEXT: xorl %ecx, %ecx			; ###### This loop invariant was hoisted from the for body ######
				; CHECK-NEXT: movb $1, %al
				danielcdhUnsubmitted Not Done Reply Inline Actions Looks like hoisting this instruction is not the best choice because it will be executed speculatively. More specifically, if the search fails, i.e. the loop terminated when i==count (no early exit), the performance will be worse because the hoisted movb will be redundant. OTOH, if the loop terminated with early exit (needle == haystack[i]), there will be no redundancy, but the life range of ax will be much longer and overlaps with many other life-ranges. This will add extra burden to RA. So looks to me hoisting is an overall loss here? danielcdh: Looks like hoisting this instruction is not the best choice because it will be executed…
				; CHECK-NEXT: xorl %edx, %edx
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB0_5: ## %for.body			; CHECK-NEXT: LBB0_5: ## %for.body
	; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: cmpl %edi, (%rsi,%rcx,4)			; CHECK-NEXT: cmpl %edi, (%rsi,%rdx,4)
	; CHECK-NEXT: je LBB0_6			; CHECK-NEXT: je LBB0_6
	; CHECK-NEXT: ## BB#2: ## %for.cond			; CHECK-NEXT: ## BB#2: ## %for.cond
	; CHECK-NEXT: ## in Loop: Header=BB0_5 Depth=1			; CHECK-NEXT: ## in Loop: Header=BB0_5 Depth=1
	; CHECK-NEXT: incq %rcx			; CHECK-NEXT: incq %rdx
	; CHECK-NEXT: cmpq %rax, %rcx			; CHECK-NEXT: cmpq %rcx, %rdx
	; CHECK-NEXT: jl LBB0_5			; CHECK-NEXT: jl LBB0_5
	; ### FIXME: BB#3 and LBB0_1 should be merged
	; CHECK-NEXT: ## BB#3:			; CHECK-NEXT: ## BB#3:
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: ## kill: %AL<def> %AL<kill> %EAX<kill>			; CHECK-NEXT: ## kill: %AL<def> %AL<kill> %EAX<kill>
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	; CHECK-NEXT: LBB0_1:			; CHECK-NEXT: LBB0_1:
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: ## kill: %AL<def> %AL<kill> %EAX<kill>			; CHECK-NEXT: ## kill: %AL<def> %AL<kill> %EAX<kill>
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	; CHECK-NEXT: LBB0_6:			; CHECK-NEXT: LBB0_6: ## %cleanup
	; CHECK-NEXT: movb $1, %al
	; CHECK-NEXT: ## kill: %AL<def> %AL<kill> %EAX<kill>			; CHECK-NEXT: ## kill: %AL<def> %AL<kill> %EAX<kill>
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	;
	entry:			entry:
	%cmp5 = icmp sgt i32 %count, 0			%cmp5 = icmp sgt i32 %count, 0
	br i1 %cmp5, label %for.body.preheader, label %cleanup			br i1 %cmp5, label %for.body.preheader, label %cleanup

	for.body.preheader: ; preds = %entry			for.body.preheader: ; preds = %entry
	%0 = sext i32 %count to i64			%0 = sext i32 %count to i64
	br label %for.body			br label %for.body

	for.cond: ; preds = %for.body			for.cond: ; preds = %for.body
	%cmp = icmp slt i64 %indvars.iv.next, %0			%cmp = icmp slt i64 %indvars.iv.next, %0
	br i1 %cmp, label %for.body, label %cleanup.loopexit			br i1 %cmp, label %for.body, label %cleanup.loopexit

	for.body: ; preds = %for.body.preheader, %for.cond			for.body: ; preds = %for.body.preheader, %for.cond
	%indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.cond ]			%indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.cond ]
	%arrayidx = getelementptr inbounds i32, i32* %haystack, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %haystack, i64 %indvars.iv
	%1 = load i32, i32* %arrayidx, align 4			%1 = load i32, i32* %arrayidx, align 4
	%cmp1 = icmp eq i32 %1, %needle			%cmp1 = icmp eq i32 %1, %needle
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	br i1 %cmp1, label %cleanup.loopexit, label %for.cond			br i1 %cmp1, label %cleanup.loopexit, label %for.cond

	cleanup.loopexit: ; preds = %for.cond, %for.body			cleanup.loopexit: ; preds = %for.cond, %for.body
	%.ph = phi i1 [ false, %for.cond ], [ true, %for.body ]			%.ph = phi i1 [ false, %for.cond ], [ true, %for.body ]
				reamesUnsubmitted Not Done Reply Inline Actions Just to make sure I understand the case you're trying to fix, the previous code fell over on this instruction as it is a use of the instruction to materialize the constant right? reames: Just to make sure I understand the case you're trying to fix, the previous code fell over on…
				avt77AuthorUnsubmitted Not Done Reply Inline Actions It seems I put my comment in the wrong place: it should be moved from line 28 to line 33. I'll fix it asap. The previous code prepared TRUE result (constant) inside loop body (see lines 18, 19 on the left). Now we materialize this constant outside the loop (see lines 32, 33 on the right). avt77: It seems I put my comment in the wrong place: it should be moved from line 28 to line 33. I'll…
	br label %cleanup			br label %cleanup

	cleanup: ; preds = %cleanup.loopexit, %entry			cleanup: ; preds = %cleanup.loopexit, %entry
	%2 = phi i1 [ false, %entry ], [ %.ph, %cleanup.loopexit ]			%2 = phi i1 [ false, %entry ], [ %.ph, %cleanup.loopexit ]
	ret i1 %2			ret i1 %2
	}			}

test/CodeGen/X86/tail-dup-merge-loop-headers.ll

	Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; The rest of the blocks in the function are noise unfortunately. Bugpoint			; The rest of the blocks in the function are noise unfortunately. Bugpoint
	; couldn't shrink the test any further.			; couldn't shrink the test any further.

	; CHECK-LABEL: loop_shared_header			; CHECK-LABEL: loop_shared_header
	; CHECK: # %entry			; CHECK: # %entry
	; CHECK: # %shared_preheader			; CHECK: # %shared_preheader
	; CHECK: # %shared_loop_header			; CHECK: # %shared_loop_header
	; CHECK: # %inner_loop_body			; CHECK: # %inner_loop_body
	; CHECK: # %outer_loop_latch
	; CHECK: # %merge_predecessor_split			; CHECK: # %merge_predecessor_split
	; CHECK: # %outer_loop_latch			; CHECK: # %outer_loop_latch
	; CHECK: # %cleanup			; CHECK: # %cleanup
	define i32 @loop_shared_header(i8* %exe, i32 %exesz, i32 %headsize, i32 %min, i32 %wwprva, i32 %e_lfanew, i8* readonly %wwp, i32 %wwpsz, i16 zeroext %sects) local_unnamed_addr #0 {			define i32 @loop_shared_header(i8* %exe, i32 %exesz, i32 %headsize, i32 %min, i32 %wwprva, i32 %e_lfanew, i8* readonly %wwp, i32 %wwpsz, i16 zeroext %sects) local_unnamed_addr #0 {
	entry:			entry:
	%0 = load i32, i32* undef, align 4			%0 = load i32, i32* undef, align 4
	%mul = shl nsw i32 %0, 2			%mul = shl nsw i32 %0, 2
	br i1 undef, label %if.end19, label %cleanup			br i1 undef, label %if.end19, label %cleanup
	▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines