This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
InstrEmitter.cpp
-
Target/AMDGPU/
-
AMDGPU/
-
SIInstrInfo.cpp
-
Utils/
-
AMDGPUBaseInfo.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
arm64-atomic.ll
-
combine-comparisons-by-cse.ll
-
ARM/
-
2011-08-25-ldmia_ret.ll
-
atomic-64bit.ll
-
atomic-cmp.ll
-
atomic-ops-v8.ll
-
PowerPC/
-
vsx.ll
-
SystemZ/
-
cond-move-03.ll

Differential D49994

Allow constraining virtual register's class within reason
Needs ReviewPublic

Authored by alexey.zhikhar on Jul 30 2018, 10:06 AM.

Download Raw Diff

Details

Reviewers

uweigand
efriedma
bogner
javed.absar
t.p.northover

Summary

At the very end of instruction selection, in InstrEmitter,handle
overlapping register classes to eliminate redundant copy instructions.

Also, update various tests.

Given the following code:

a = def
c = CopyToReg a

The current implementation of InstrEmitter checks whether a and`c`
belong to the same register class, and, if so, coalesces CopyToReg away:

a = def
c = CopyToReg a
    =>
c = def

In pseudocode, the algorithm can be expressed as

if RegClass(c) == RegClass(a):
    make it "c = def"

However, in a case where register classes are not exactly equal
but overlap, the CopyToReg is not eliminated. This patch checks
whether two register classes overlap and the number of registers
in the overlap is greater than MinRCSize. In pseudocode:

if |RegClass(c) ∩ RegClass(a)| ≥ MinRCSize:
    make it "c = def"

Corresponding discussion on llvm-dev:

http://lists.llvm.org/pipermail/llvm-dev/2018-May/123663.html
https://groups.google.com/forum/#!topic/llvm-dev/BHFhRkYY2ng

Patch by Ulrich Weigand.

Diff Detail

Event Timeline

alexey.zhikhar created this revision.Jul 30 2018, 10:06 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptJul 30 2018, 10:06 AM

Probably needs tests. (Not that i know how to write one here)

alexey.zhikhar edited the summary of this revision. (Show Details)Jul 30 2018, 10:13 AM

alexey.zhikhar edited the summary of this revision. (Show Details)

In D49994#1180443, @lebedev.ri wrote:

Probably needs tests. (Not that i know how to write one here)

Unfortunately, our backend is out-of-tree, so we can't upstream test cases for this patch. Jonas, you previously mentioned that this patch helps Z, do you happen to have a test case? @jonpa

In D49994#1180497, @alexey.zhikhar wrote:

In D49994#1180443, @lebedev.ri wrote:

Probably needs tests. (Not that i know how to write one here)

Unfortunately, our backend is out-of-tree, so we can't upstream test cases for this patch. Jonas, you previously mentioned that this patch helps Z, do you happen to have a test case? @jonpa

IIRC, we tried this patch in an experimental setting (when evaluating a certain aspect of the SystemZ backend), so the test case we had then will not work on trunk :-/

I could derive a new test case quite easily, but I am personally not that sure exactly what to test for. I mean, after the coalescer, register allocator etc, with all those complex interactions of multiple optimizations, what type of test should we have that express the thing we are trying to fix? Is there really a clear case which really should be improved? Or is it more that some cases improve while others end up getting worse?

I made a test evaluation of this patch a while ago when I posted it, and it seemed then that the total number of COPYs marginally improve (decrease), while there is a very slight increase in spilling.

I could make a new test case that simply improves by e.g. having a COPY less, but I would like to know that this is really handling something specific and not just randomly end up getting better...

Is this making sense, and what are your thoughts? What happens if you run your out-of-tree test cases with the SystemZ (-mcpu=z13) backend? Do you see any improvement? In that case, it seems like we have a test of real worth...

dmgreen added a subscriber: dmgreen.Jul 31 2018, 4:18 AM

In D49994#1181808, @jonpa wrote:

Is this making sense, and what are your thoughts? What happens if you run your out-of-tree test cases with the SystemZ (-mcpu=z13) backend? Do you see any improvement? In that case, it seems like we have a test of real worth...

The test cases that we have heavily rely on our backend's intrinsics, so they cannot be compiled for SystemZ.

Providing a reliable test case for this patch does not seem as an easy (or even worthwhile) task but I'm open to suggestions.

This patch fails 14 tests; however, it seems that the problem is not in the patch but in flaky test cases. I took a quick look at CodeGen/PowerPC/vsx.ll: the test case fails after not finding copy instructions, which were redundant and expectedly removed by this patch.

Failing Tests (14): 
    LLVM :: CodeGen/AArch64/and-sink.ll
    LLVM :: CodeGen/AArch64/arm64-atomic.ll
    LLVM :: CodeGen/AArch64/arm64-fast-isel-conversion-fallback.ll
    LLVM :: CodeGen/AArch64/combine-comparisons-by-cse.ll
    LLVM :: CodeGen/AArch64/optimize-cond-branch.ll
    LLVM :: CodeGen/AArch64/redundant-copy-elim-empty-mbb.ll
    LLVM :: CodeGen/AMDGPU/early-if-convert.ll
    LLVM :: CodeGen/ARM/2011-04-11-MachineLICMBug.ll
    LLVM :: CodeGen/ARM/2011-08-25-ldmia_ret.ll
    LLVM :: CodeGen/ARM/atomic-64bit.ll
    LLVM :: CodeGen/ARM/atomic-cmp.ll
    LLVM :: CodeGen/ARM/atomic-ops-v8.ll
    LLVM :: CodeGen/PowerPC/vsx.ll
    LLVM :: CodeGen/SystemZ/cond-move-03.ll

tc_2spill.ll5 KBDownload

This is the test case we were working with. Compile with

llc -mcpu=z13 ./tc_2spill.ll

Looking at it now, it seems that with the patch there are fewer COPYs directly after instruction selection, but it seems that in the output there are the same number of register moves, with the less desired difference that with the patch one of them is no longer hoisted out of the loop. So in this case it seems that the net result is one less hoisted register move... :-/

In D49994#1182189, @alexey.zhikhar wrote:

This patch fails 14 tests; however, it seems that the problem is not in the patch but in flaky test cases. I took a quick look at CodeGen/PowerPC/vsx.ll: the test case fails after not finding copy instructions, which were redundant and expectedly removed by this patch.

I'm currently working on fixing the failures.

@jonpa Jonas, I see a different assembly in one of SystemZ unit tests; my memory of Z is pretty fuzzy, so could you please make sense of the change to see whether it is reasonable?

Test name is CodeGen/SystemZ/cond-move-03.ll, test case f2().

Before the change:

        #APP
        dummy %r0 
        #NO_APP
        #APP
        stepa %r1 
        #NO_APP
        #APP
        stepb %r0 
        #NO_APP
        clijhe  %r2, 42, .LBB1_2
# %bb.1:
        risblg  %r0, %r1, 0, 159, 32
.LBB1_2:
        risbhg  %r1, %r0, 0, 159, 32
        #APP
        stepc %r1 
        #NO_APP
        #APP
        dummy %r0 
        #NO_APP
        br      %r14

After the change:

#APP
dummy %r0 
#NO_APP
#APP
stepa %r1 
#NO_APP
#APP
stepb %r0 
#NO_APP
clfi    %r2, 42
risbhg  %r2, %r0, 0, 159, 32
locfhrl %r2, %r1 
#APP
stepc %r2 
#NO_APP
#APP
dummy %r0 
#NO_APP
br      %r14

The new code for f2 in cond-move-03.ll is in fact better, since it now actually uses the conditional move instruction instead of a branch ...

In D49994#1184441, @uweigand wrote:

The new code for f2 in cond-move-03.ll is in fact better, since it now actually uses the conditional move instruction instead of a branch ...

Thanks, Ulrich. Does it fix FIXME: We should commute the LOCRMux to save one move. or is it unrelated?

In D49994#1184515, @alexey.zhikhar wrote:

In D49994#1184441, @uweigand wrote:

The new code for f2 in cond-move-03.ll is in fact better, since it now actually uses the conditional move instruction instead of a branch ...

Thanks, Ulrich. Does it fix FIXME: We should commute the LOCRMux to save one move. or is it unrelated?

It doesn't so much fix it, as make it no longer applicable. Given that we use the hi->hi conditional move, we must have a lo->hi move (the risbgh) anyway in this constellation.

So in short, yes, you can remove the fixme now :-)

All the test failures are mismatches between expected assembly and assembly produced. One exception is CodeGen/AMDGPU/early-if-convert.ll, where the backend fails with an assertion:

llc: lib/Target/AMDGPU/SIInstrInfo.cpp:1794: virtual bool llvm::SIInstrInfo::canInsertSelect(const llvm::MachineBasicBlock&, llvm::ArrayRef<llvm::MachineOperand>, unsigned int, unsigned int, int&, int&, int&) const: Assertion `MRI.getRegClass(FalseReg) == RC' failed.

I will prioritize this higher.

Differential is updated with:

Fixed unit tests for SystemZ and PowerPC
Bug fix for AMDGPU: AMD's TargetInstrInfo::canInsertSelect() had an assertion that was too restrictive + fixed missing switch cases.
Fixed unit tests for atomic operations on ARM: note that the number of mov-s did not increase after applying the patch.

Herald added a reviewer: javed.absar. · View Herald TranscriptAug 7 2018, 12:10 PM

Herald added subscribers: nhaehnle, nemanjai, arsenm. · View Herald Transcript

@t.p.northover @asl @rengolin

For some of the failing ARM/AArch64 tests, I see additional mov-s being performed; for example, in CodeGen/AArch64/and-sink.ll: if you take a look at the assembly after applying the patch, you will see an addtional mov for the trace when %c (w1) equals to zero. I'm not sure how important it is, so I would appreciate some feedback from ARM/AArch64 backend people. Please note that for performance-critical atomic compare-and-swap operations, performance is unchanged.

; RUN: llc -mtriple=aarch64-linux-gnu -verify-machineinstrs < %s | FileCheck %s

@A = global i32 zeroinitializer

; Test that and is sunk into cmp block to form tbz.
define i32 @and_sink1(i32 %a, i1 %c) {
  %and = and i32 %a, 4
  br i1 %c, label %bb0, label %bb2
bb0:
  %cmp = icmp eq i32 %and, 0
  store i32 0, i32* @A
  br i1 %cmp, label %bb1, label %bb2
bb1:
  ret i32 1
bb2:
  ret i32 0
}

Original assembly:

and_sink1:                              // @and_sink1
        .cfi_startproc
// %bb.0:
        tbz     w1, #0, .LBB0_3
// %bb.1:                               // %bb0
        adrp    x8, A
        str     wzr, [x8, :lo12:A]
        tbnz    w0, #2, .LBB0_3
// %bb.2:
        orr     w0, wzr, #0x1
        ret
.LBB0_3:                                // %bb2
        mov     w0, wzr
        ret
.Lfunc_end0:
        .size   and_sink1, .Lfunc_end0-and_sink1
        .cfi_endproc

Assembly after applying the patch:

and_sink1:                              // @and_sink1
        .cfi_startproc
// %bb.0:
        tbz     w1, #0, .LBB0_2
// %bb.1:                               // %bb0
        adrp    x8, A
        str     wzr, [x8, :lo12:A]
        orr     w8, wzr, #0x1
        tbz     w0, #2, .LBB0_3
.LBB0_2:                                // %bb2
        mov     w8, wzr
.LBB0_3:                                // %bb1
        mov     w0, w8
        ret
.Lfunc_end0:
        .size   and_sink1, .Lfunc_end0-and_sink1
        .cfi_endproc

Here's a list of ARM/AArch64 tests that are suspicious due to additional mov operations:

LLVM :: CodeGen/AArch64/redundant-copy-elim-empty-mbb.ll
LLVM :: CodeGen/ARM/2011-04-11-MachineLICMBug.ll
LLVM :: CodeGen/AArch64/and-sink.ll
LLVM :: CodeGen/AArch64/arm64-fast-isel-conversion-fallback.ll
LLVM :: CodeGen/AArch64/optimize-cond-branch.ll

Also, CodeGen/ARM011-08-25-ldmi_ret.ll spills one additional register.

I'd like an explanation for why the generated code is changing for AArch64... generating extra copies clearly seems like a downside. And there isn't any obvious reason for this change to impact register allocation: on AArch64, all i32 register classes contain exactly the same set of allocatable registers.

In D49994#1191386, @efriedma wrote:

I'd like an explanation for why the generated code is changing for AArch64... generating extra copies clearly seems like a downside. And there isn't any obvious reason for this change to impact register allocation: on AArch64, all i32 register classes contain exactly the same set of allocatable registers.

Agreed. ARM people might have some thoughts of what the root cause might be, and I'd love to investigate the leads.

nhaehnle removed a subscriber: nhaehnle.Aug 8 2018, 5:09 AM

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

InstrEmitter.cpp

7 lines

Target/

AMDGPU/

SIInstrInfo.cpp

6 lines

Utils/

AMDGPUBaseInfo.cpp

2 lines

test/

CodeGen/

AArch64/

arm64-atomic.ll

23 lines

combine-comparisons-by-cse.ll

20 lines

ARM/

2011-08-25-ldmia_ret.ll

2 lines

atomic-64bit.ll

6 lines

atomic-cmp.ll

17 lines

atomic-ops-v8.ll

16 lines

PowerPC/

vsx.ll

24 lines

SystemZ/

cond-move-03.ll

18 lines

Diff 159565

lib/CodeGen/SelectionDAG/InstrEmitter.cpp

Show First 20 Lines • Show All 242 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i < II.getNumDefs(); ++i) {

if (!VRBase && !IsClone && !IsCloned)		if (!VRBase && !IsClone && !IsCloned)
for (SDNode *User : Node->uses()) {		for (SDNode *User : Node->uses()) {
if (User->getOpcode() == ISD::CopyToReg &&		if (User->getOpcode() == ISD::CopyToReg &&
User->getOperand(2).getNode() == Node &&		User->getOperand(2).getNode() == Node &&
User->getOperand(2).getResNo() == i) {		User->getOperand(2).getResNo() == i) {
unsigned Reg = cast<RegisterSDNode>(User->getOperand(1))->getReg();		unsigned Reg = cast<RegisterSDNode>(User->getOperand(1))->getReg();
if (TargetRegisterInfo::isVirtualRegister(Reg)) {		if (TargetRegisterInfo::isVirtualRegister(Reg)) {
const TargetRegisterClass *RegRC = MRI->getRegClass(Reg);		// Allow constraining the virtual register's class within reason,
if (RegRC == RC) {		// just like what AddRegisterOperand will allow.
		const TargetRegisterClass *ConstrainedRC
		= MRI->constrainRegClass(Reg, RC, MinRCSize);
		if (ConstrainedRC) {
VRBase = Reg;		VRBase = Reg;
MIB.addReg(VRBase, RegState::Define);		MIB.addReg(VRBase, RegState::Define);
break;		break;
}		}
}		}
}		}
}		}

▲ Show 20 Lines • Show All 874 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstrInfo.cpp

Show First 20 Lines • Show All 1,785 Lines • ▼ Show 20 Lines	bool SIInstrInfo::canInsertSelect(const MachineBasicBlock &MBB,
unsigned TrueReg, unsigned FalseReg,		unsigned TrueReg, unsigned FalseReg,
int &CondCycles,		int &CondCycles,
int &TrueCycles, int &FalseCycles) const {		int &TrueCycles, int &FalseCycles) const {
switch (Cond[0].getImm()) {		switch (Cond[0].getImm()) {
case VCCNZ:		case VCCNZ:
case VCCZ: {		case VCCZ: {
const MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();		const MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();
const TargetRegisterClass *RC = MRI.getRegClass(TrueReg);		const TargetRegisterClass *RC = MRI.getRegClass(TrueReg);
assert(MRI.getRegClass(FalseReg) == RC);		assert(MRI.getTargetRegisterInfo()->getCommonSubClass(RC,
		MRI.getRegClass(FalseReg)));

int NumInsts = AMDGPU::getRegBitWidth(RC->getID()) / 32;		int NumInsts = AMDGPU::getRegBitWidth(RC->getID()) / 32;
CondCycles = TrueCycles = FalseCycles = NumInsts; // ???		CondCycles = TrueCycles = FalseCycles = NumInsts; // ???

// Limit to equal cost for branch vs. N v_cndmask_b32s.		// Limit to equal cost for branch vs. N v_cndmask_b32s.
return !RI.isSGPRClass(RC) && NumInsts <= 6;		return !RI.isSGPRClass(RC) && NumInsts <= 6;
}		}
case SCC_TRUE:		case SCC_TRUE:
case SCC_FALSE: {		case SCC_FALSE: {
// FIXME: We could insert for VGPRs if we could replace the original compare		// FIXME: We could insert for VGPRs if we could replace the original compare
// with a vector one.		// with a vector one.
const MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();		const MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();
const TargetRegisterClass *RC = MRI.getRegClass(TrueReg);		const TargetRegisterClass *RC = MRI.getRegClass(TrueReg);
assert(MRI.getRegClass(FalseReg) == RC);		assert(MRI.getTargetRegisterInfo()->getCommonSubClass(RC,
		MRI.getRegClass(FalseReg)));

int NumInsts = AMDGPU::getRegBitWidth(RC->getID()) / 32;		int NumInsts = AMDGPU::getRegBitWidth(RC->getID()) / 32;

// Multiples of 8 can do s_cselect_b64		// Multiples of 8 can do s_cselect_b64
if (NumInsts % 2 == 0)		if (NumInsts % 2 == 0)
NumInsts /= 2;		NumInsts /= 2;

CondCycles = TrueCycles = FalseCycles = NumInsts; // ???		CondCycles = TrueCycles = FalseCycles = NumInsts; // ???
▲ Show 20 Lines • Show All 3,255 Lines • Show Last 20 Lines

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

	Show First 20 Lines • Show All 787 Lines • ▼ Show 20 Lines
	// (move from MC* level to Target* level). Return size in bits.			// (move from MC* level to Target* level). Return size in bits.
	unsigned getRegBitWidth(unsigned RCID) {			unsigned getRegBitWidth(unsigned RCID) {
	switch (RCID) {			switch (RCID) {
	case AMDGPU::SGPR_32RegClassID:			case AMDGPU::SGPR_32RegClassID:
	case AMDGPU::VGPR_32RegClassID:			case AMDGPU::VGPR_32RegClassID:
	case AMDGPU::VS_32RegClassID:			case AMDGPU::VS_32RegClassID:
	case AMDGPU::SReg_32RegClassID:			case AMDGPU::SReg_32RegClassID:
	case AMDGPU::SReg_32_XM0RegClassID:			case AMDGPU::SReg_32_XM0RegClassID:
				case AMDGPU::SReg_32_XM0_XEXECRegClassID:
	return 32;			return 32;
	case AMDGPU::SGPR_64RegClassID:			case AMDGPU::SGPR_64RegClassID:
	case AMDGPU::VS_64RegClassID:			case AMDGPU::VS_64RegClassID:
	case AMDGPU::SReg_64RegClassID:			case AMDGPU::SReg_64RegClassID:
				case AMDGPU::SReg_64_XEXECRegClassID:
	case AMDGPU::VReg_64RegClassID:			case AMDGPU::VReg_64RegClassID:
	return 64;			return 64;
	case AMDGPU::VReg_96RegClassID:			case AMDGPU::VReg_96RegClassID:
	return 96;			return 96;
	case AMDGPU::SGPR_128RegClassID:			case AMDGPU::SGPR_128RegClassID:
	case AMDGPU::SReg_128RegClassID:			case AMDGPU::SReg_128RegClassID:
	case AMDGPU::VReg_128RegClassID:			case AMDGPU::VReg_128RegClassID:
	return 128;			return 128;
	▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-atomic.ll

	Show All 14 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acquire acquire			%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acquire acquire
	%val = extractvalue { i32, i1 } %pair, 0			%val = extractvalue { i32, i1 } %pair, 0
	ret i32 %val			ret i32 %val
	}			}

	define i32 @val_compare_and_swap_from_load(i32* %p, i32 %cmp, i32* %pnew) #0 {			define i32 @val_compare_and_swap_from_load(i32* %p, i32 %cmp, i32* %pnew) #0 {
	; CHECK-LABEL: val_compare_and_swap_from_load:			; CHECK-LABEL: val_compare_and_swap_from_load:
				; CHECK-NEXT: mov x[[P:[0-9]+]], x0
	; CHECK-NEXT: ldr [[NEW:w[0-9]+]], [x2]			; CHECK-NEXT: ldr [[NEW:w[0-9]+]], [x2]
	; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:			; CHECK-NEXT: [[TRYBB:.?LBB[0-9_]+]]:
	; CHECK-NEXT: ldaxr w[[RESULT:[0-9]+]], [x0]			; CHECK-NEXT: ldaxr w0, [x[[P]]]
	; CHECK-NEXT: cmp w[[RESULT]], w1			; CHECK-NEXT: cmp w0, w1
	; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]			; CHECK-NEXT: b.ne [[FAILBB:.?LBB[0-9_]+]]
	; CHECK-NEXT: stxr [[SCRATCH_REG:w[0-9]+]], [[NEW]], [x0]			; CHECK-NEXT: stxr [[SCRATCH_REG:w[0-9]+]], [[NEW]], [x[[P]]]
	; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]			; CHECK-NEXT: cbnz [[SCRATCH_REG]], [[TRYBB]]
	; CHECK-NEXT: mov x0, x[[RESULT]]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: [[FAILBB]]:			; CHECK-NEXT: [[FAILBB]]:
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: mov x0, x[[RESULT]]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%new = load i32, i32* %pnew			%new = load i32, i32* %pnew
	%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acquire acquire			%pair = cmpxchg i32* %p, i32 %cmp, i32 %new acquire acquire
	%val = extractvalue { i32, i1 } %pair, 0			%val = extractvalue { i32, i1 } %pair, 0
	ret i32 %val			ret i32 %val
	}			}

	define i32 @val_compare_and_swap_rel(i32* %p, i32 %cmp, i32 %new) #0 {			define i32 @val_compare_and_swap_rel(i32* %p, i32 %cmp, i32 %new) #0 {
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; CHECK: cbnz [[SCRATCH_REG]], [[TRYBB]]			; CHECK: cbnz [[SCRATCH_REG]], [[TRYBB]]
	; CHECK: mov x0, x[[DEST_REG]]			; CHECK: mov x0, x[[DEST_REG]]
	%val = atomicrmw nand i32* %p, i32 7 release			%val = atomicrmw nand i32* %p, i32 7 release
	ret i32 %val			ret i32 %val
	}			}

	define i64 @fetch_and_nand_64(i64* %p) #0 {			define i64 @fetch_and_nand_64(i64* %p) #0 {
	; CHECK-LABEL: fetch_and_nand_64:			; CHECK-LABEL: fetch_and_nand_64:
	; CHECK: mov x[[ADDR:[0-9]+]], x0
	; CHECK: [[TRYBB:.?LBB[0-9_]+]]:			; CHECK: [[TRYBB:.?LBB[0-9_]+]]:
	; CHECK: ldaxr x[[DEST_REG:[0-9]+]], [x[[ADDR]]]			; CHECK: ldaxr x[[DEST_REG:[0-9]+]], [x0]
	; CHECK: mvn w[[TMP_REG:[0-9]+]], w[[DEST_REG]]			; CHECK: mvn w[[TMP_REG:[0-9]+]], w[[DEST_REG]]
	; CHECK: orr [[SCRATCH2_REG:x[0-9]+]], x[[TMP_REG]], #0xfffffffffffffff8			; CHECK: orr [[SCRATCH2_REG:x[0-9]+]], x[[TMP_REG]], #0xfffffffffffffff8
	; CHECK: stlxr [[SCRATCH_REG:w[0-9]+]], [[SCRATCH2_REG]], [x[[ADDR]]]			; CHECK: stlxr [[SCRATCH_REG:w[0-9]+]], [[SCRATCH2_REG]], [x0]
	; CHECK: cbnz [[SCRATCH_REG]], [[TRYBB]]			; CHECK: cbnz [[SCRATCH_REG]], [[TRYBB]]
				; CHECK: mov x0, x[[DEST_REG]]
				; CHECK: ret

	%val = atomicrmw nand i64* %p, i64 7 acq_rel			%val = atomicrmw nand i64* %p, i64 7 acq_rel
	ret i64 %val			ret i64 %val
	}			}

	define i32 @fetch_and_or(i32* %p) #0 {			define i32 @fetch_and_or(i32* %p) #0 {
	; CHECK-LABEL: fetch_and_or:			; CHECK-LABEL: fetch_and_or:
	; CHECK: mov [[OLDVAL_REG:w[0-9]+]], #5			; CHECK: mov [[OLDVAL_REG:w[0-9]+]], #5
	; CHECK: [[TRYBB:.?LBB[0-9_]+]]:			; CHECK: [[TRYBB:.?LBB[0-9_]+]]:
	; CHECK: ldaxr w[[DEST_REG:[0-9]+]], [x0]			; CHECK: ldaxr w[[DEST_REG:[0-9]+]], [x0]
	; CHECK: orr [[SCRATCH2_REG:w[0-9]+]], w[[DEST_REG]], [[OLDVAL_REG]]			; CHECK: orr [[SCRATCH2_REG:w[0-9]+]], w[[DEST_REG]], [[OLDVAL_REG]]
	; CHECK-NOT: stlxr [[SCRATCH2_REG]], [[SCRATCH2_REG]]			; CHECK-NOT: stlxr [[SCRATCH2_REG]], [[SCRATCH2_REG]]
	; CHECK: stlxr [[SCRATCH_REG:w[0-9]+]], [[SCRATCH2_REG]], [x0]			; CHECK: stlxr [[SCRATCH_REG:w[0-9]+]], [[SCRATCH2_REG]], [x0]
	; CHECK: cbnz [[SCRATCH_REG]], [[TRYBB]]			; CHECK: cbnz [[SCRATCH_REG]], [[TRYBB]]
	; CHECK: mov x0, x[[DEST_REG]]			; CHECK: mov x0, x[[DEST_REG]]
	%val = atomicrmw or i32* %p, i32 5 seq_cst			%val = atomicrmw or i32* %p, i32 5 seq_cst
	ret i32 %val			ret i32 %val
	}			}

	define i64 @fetch_and_or_64(i64* %p) #0 {			define i64 @fetch_and_or_64(i64* %p) #0 {
	; CHECK: fetch_and_or_64:			; CHECK: fetch_and_or_64:
	; CHECK: mov x[[ADDR:[0-9]+]], x0
	; CHECK: [[TRYBB:.?LBB[0-9_]+]]:			; CHECK: [[TRYBB:.?LBB[0-9_]+]]:
	; CHECK: ldxr [[DEST_REG:x[0-9]+]], [x[[ADDR]]]			; CHECK: ldxr [[DEST_REG:x[0-9]+]], [x0]
	; CHECK: orr [[SCRATCH2_REG:x[0-9]+]], [[DEST_REG]], #0x7			; CHECK: orr [[SCRATCH2_REG:x[0-9]+]], [[DEST_REG]], #0x7
	; CHECK: stxr [[SCRATCH_REG:w[0-9]+]], [[SCRATCH2_REG]], [x[[ADDR]]]			; CHECK: stxr [[SCRATCH_REG:w[0-9]+]], [[SCRATCH2_REG]], [x0]
	; CHECK: cbnz [[SCRATCH_REG]], [[TRYBB]]			; CHECK: cbnz [[SCRATCH_REG]], [[TRYBB]]
				; CHECK: mov x0, x[[ADDR:[0-9]+]]
				; CHECK: ret
	%val = atomicrmw or i64* %p, i64 7 monotonic			%val = atomicrmw or i64* %p, i64 7 monotonic
	ret i64 %val			ret i64 %val
	}			}

	define void @acquire_fence() #0 {			define void @acquire_fence() #0 {
	fence acquire			fence acquire
	ret void			ret void
	; CHECK-LABEL: acquire_fence:			; CHECK-LABEL: acquire_fence:
	▲ Show 20 Lines • Show All 246 Lines • Show Last 20 Lines

test/CodeGen/AArch64/combine-comparisons-by-cse.ll

; RUN: llc < %s -mtriple=aarch64-linux-gnu \| FileCheck %s		; RUN: llc < %s -mtriple=aarch64-linux-gnu \| FileCheck %s

; marked as external to prevent possible optimizations		; marked as external to prevent possible optimizations
@a = external global i32		@a = external global i32
@b = external global i32		@b = external global i32
@c = external global i32		@c = external global i32
@d = external global i32		@d = external global i32

; (a > 10 && b == c) \|\| (a >= 10 && b == d)		; (a > 10 && b == c) \|\| (a >= 10 && b == d)
define i32 @combine_gt_ge_10() #0 {		define i32 @combine_gt_ge_10() #0 {
; CHECK-LABEL: combine_gt_ge_10		; CHECK-LABEL: combine_gt_ge_10
; CHECK: cmp		; CHECK: cmp
; CHECK: b.le		; CHECK: b.le
; CHECK: ret		; CHECK: b.ne
		; CHECK-NEXT: b {{.?LBB[0-9_]+}}
; CHECK-NOT: cmp		; CHECK-NOT: cmp
; CHECK: b.lt		; CHECK: b.lt
		; CHECK: ret
entry:		entry:
%0 = load i32, i32* @a, align 4		%0 = load i32, i32* @a, align 4
%cmp = icmp sgt i32 %0, 10		%cmp = icmp sgt i32 %0, 10
br i1 %cmp, label %land.lhs.true, label %lor.lhs.false		br i1 %cmp, label %land.lhs.true, label %lor.lhs.false

land.lhs.true: ; preds = %entry		land.lhs.true: ; preds = %entry
%1 = load i32, i32* @b, align 4		%1 = load i32, i32* @b, align 4
%2 = load i32, i32* @c, align 4		%2 = load i32, i32* @c, align 4
Show All 18 Lines	return: ; preds = %if.end, %land.lhs.true3, %land.lhs.true
ret i32 %retval.0		ret i32 %retval.0
}		}

; (a > 5 && b == c) \|\| (a < 5 && b == d)		; (a > 5 && b == c) \|\| (a < 5 && b == d)
define i32 @combine_gt_lt_5() #0 {		define i32 @combine_gt_lt_5() #0 {
; CHECK-LABEL: combine_gt_lt_5		; CHECK-LABEL: combine_gt_lt_5
; CHECK: cmp		; CHECK: cmp
; CHECK: b.le		; CHECK: b.le
; CHECK: ret		; CHECK: b {{.?LBB[0-9_]+}}
; CHECK-NOT: cmp		; CHECK-NOT: cmp
; CHECK: b.ge		; CHECK: b.ge
		; CHECK: ret
entry:		entry:
%0 = load i32, i32* @a, align 4		%0 = load i32, i32* @a, align 4
%cmp = icmp sgt i32 %0, 5		%cmp = icmp sgt i32 %0, 5
br i1 %cmp, label %land.lhs.true, label %lor.lhs.false		br i1 %cmp, label %land.lhs.true, label %lor.lhs.false

land.lhs.true: ; preds = %entry		land.lhs.true: ; preds = %entry
%1 = load i32, i32* @b, align 4		%1 = load i32, i32* @b, align 4
%2 = load i32, i32* @c, align 4		%2 = load i32, i32* @c, align 4
Show All 18 Lines	return: ; preds = %if.end, %land.lhs.true3, %land.lhs.true
ret i32 %retval.0		ret i32 %retval.0
}		}

; (a < 5 && b == c) \|\| (a <= 5 && b == d)		; (a < 5 && b == c) \|\| (a <= 5 && b == d)
define i32 @combine_lt_ge_5() #0 {		define i32 @combine_lt_ge_5() #0 {
; CHECK-LABEL: combine_lt_ge_5		; CHECK-LABEL: combine_lt_ge_5
; CHECK: cmp		; CHECK: cmp
; CHECK: b.ge		; CHECK: b.ge
; CHECK: ret		; CHECK: b.ne
		; CHECK-NEXT: b {{.?LBB[0-9_]+}}
; CHECK-NOT: cmp		; CHECK-NOT: cmp
; CHECK: b.gt		; CHECK: b.gt
		; CHECK: ret
entry:		entry:
%0 = load i32, i32* @a, align 4		%0 = load i32, i32* @a, align 4
%cmp = icmp slt i32 %0, 5		%cmp = icmp slt i32 %0, 5
br i1 %cmp, label %land.lhs.true, label %lor.lhs.false		br i1 %cmp, label %land.lhs.true, label %lor.lhs.false

land.lhs.true: ; preds = %entry		land.lhs.true: ; preds = %entry
%1 = load i32, i32* @b, align 4		%1 = load i32, i32* @b, align 4
%2 = load i32, i32* @c, align 4		%2 = load i32, i32* @c, align 4
Show All 18 Lines	return: ; preds = %if.end, %land.lhs.true3, %land.lhs.true
ret i32 %retval.0		ret i32 %retval.0
}		}

; (a < 5 && b == c) \|\| (a > 5 && b == d)		; (a < 5 && b == c) \|\| (a > 5 && b == d)
define i32 @combine_lt_gt_5() #0 {		define i32 @combine_lt_gt_5() #0 {
; CHECK-LABEL: combine_lt_gt_5		; CHECK-LABEL: combine_lt_gt_5
; CHECK: cmp		; CHECK: cmp
; CHECK: b.ge		; CHECK: b.ge
; CHECK: ret		; CHECK: b {{.?LBB[0-9_]+}}
; CHECK-NOT: cmp		; CHECK-NOT: cmp
; CHECK: b.le		; CHECK: b.le
		; CHECK: ret
entry:		entry:
%0 = load i32, i32* @a, align 4		%0 = load i32, i32* @a, align 4
%cmp = icmp slt i32 %0, 5		%cmp = icmp slt i32 %0, 5
br i1 %cmp, label %land.lhs.true, label %lor.lhs.false		br i1 %cmp, label %land.lhs.true, label %lor.lhs.false

land.lhs.true: ; preds = %entry		land.lhs.true: ; preds = %entry
%1 = load i32, i32* @b, align 4		%1 = load i32, i32* @b, align 4
%2 = load i32, i32* @c, align 4		%2 = load i32, i32* @c, align 4
Show All 18 Lines	return: ; preds = %if.end, %land.lhs.true3, %land.lhs.true
ret i32 %retval.0		ret i32 %retval.0
}		}

; (a > -5 && b == c) \|\| (a < -5 && b == d)		; (a > -5 && b == c) \|\| (a < -5 && b == d)
define i32 @combine_gt_lt_n5() #0 {		define i32 @combine_gt_lt_n5() #0 {
; CHECK-LABEL: combine_gt_lt_n5		; CHECK-LABEL: combine_gt_lt_n5
; CHECK: cmn		; CHECK: cmn
; CHECK: b.le		; CHECK: b.le
; CHECK: ret		; CHECK: b {{.?LBB[0-9_]+}}
; CHECK-NOT: cmn		; CHECK-NOT: cmn
; CHECK: b.ge		; CHECK: b.ge
		; CHECK: ret
entry:		entry:
%0 = load i32, i32* @a, align 4		%0 = load i32, i32* @a, align 4
%cmp = icmp sgt i32 %0, -5		%cmp = icmp sgt i32 %0, -5
br i1 %cmp, label %land.lhs.true, label %lor.lhs.false		br i1 %cmp, label %land.lhs.true, label %lor.lhs.false

land.lhs.true: ; preds = %entry		land.lhs.true: ; preds = %entry
%1 = load i32, i32* @b, align 4		%1 = load i32, i32* @b, align 4
%2 = load i32, i32* @c, align 4		%2 = load i32, i32* @c, align 4
Show All 18 Lines	return: ; preds = %if.end, %land.lhs.true3, %land.lhs.true
ret i32 %retval.0		ret i32 %retval.0
}		}

; (a < -5 && b == c) \|\| (a > -5 && b == d)		; (a < -5 && b == c) \|\| (a > -5 && b == d)
define i32 @combine_lt_gt_n5() #0 {		define i32 @combine_lt_gt_n5() #0 {
; CHECK-LABEL: combine_lt_gt_n5		; CHECK-LABEL: combine_lt_gt_n5
; CHECK: cmn		; CHECK: cmn
; CHECK: b.ge		; CHECK: b.ge
; CHECK: ret		; CHECK: b {{.?LBB[0-9_]+}}
; CHECK-NOT: cmn		; CHECK-NOT: cmn
; CHECK: b.le		; CHECK: b.le
		; CHECK: ret
entry:		entry:
%0 = load i32, i32* @a, align 4		%0 = load i32, i32* @a, align 4
%cmp = icmp slt i32 %0, -5		%cmp = icmp slt i32 %0, -5
br i1 %cmp, label %land.lhs.true, label %lor.lhs.false		br i1 %cmp, label %land.lhs.true, label %lor.lhs.false

land.lhs.true: ; preds = %entry		land.lhs.true: ; preds = %entry
%1 = load i32, i32* @b, align 4		%1 = load i32, i32* @b, align 4
%2 = load i32, i32* @c, align 4		%2 = load i32, i32* @c, align 4
▲ Show 20 Lines • Show All 265 Lines • Show Last 20 Lines

test/CodeGen/ARM/2011-08-25-ldmia_ret.ll

	Show All 36 Lines
	; CHECK: movw			; CHECK: movw
	; CHECK-NOT: [[RRET]]			; CHECK-NOT: [[RRET]]
	; CHECK: , #63707			; CHECK: , #63707
	; CHECK-NOT: [[RRET]]			; CHECK-NOT: [[RRET]]
	; CHECK: tst			; CHECK: tst
	; If-convert the return			; If-convert the return
	; CHECK: it ne			; CHECK: it ne
	; Fold the CSR+return into a pop			; Fold the CSR+return into a pop
	; CHECK: pop {r4, r5, r7, pc}			; CHECK: pop {r4, r5, r6, r7, pc}
	sw.bb18:			sw.bb18:
	%call20 = tail call i32 @bar(i32 %in2) nounwind			%call20 = tail call i32 @bar(i32 %in2) nounwind
	switch i32 %call20, label %sw.default56 [			switch i32 %call20, label %sw.default56 [
	i32 168, label %sw.bb21			i32 168, label %sw.bb21
	i32 165, label %sw.bb21			i32 165, label %sw.bb21
	i32 261, label %sw.epilog58			i32 261, label %sw.epilog58
	i32 188, label %sw.epilog58			i32 188, label %sw.epilog58
	i32 187, label %sw.epilog58			i32 187, label %sw.epilog58
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

test/CodeGen/ARM/atomic-64bit.ll

	Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; CHECK: bne			; CHECK: bne
	; CHECK: dmb {{ish$}}			; CHECK: dmb {{ish$}}

	; CHECK-THUMB-LABEL: test3:			; CHECK-THUMB-LABEL: test3:
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}
	; CHECK-THUMB: ldrexd [[REG1:[a-z0-9]+]], [[REG2:[a-z0-9]+]]			; CHECK-THUMB: ldrexd [[REG1:[a-z0-9]+]], [[REG2:[a-z0-9]+]]
	; CHECK-THUMB-LE-DAG: and.w [[REG3:[a-z0-9]+]], [[REG1]]			; CHECK-THUMB-LE-DAG: and.w [[REG3:[a-z0-9]+]], [[REG1]]
	; CHECK-THUMB-LE-DAG: and.w [[REG4:[a-z0-9]+]], [[REG2]]			; CHECK-THUMB-LE-DAG: and.w [[REG4:[a-z0-9]+]], [[REG2]]
	; CHECK-THUMB-BE-DAG: and.w [[REG4:[a-z0-9]+]], [[REG2]]
	; CHECK-THUMB-BE-DAG: and.w [[REG3:[a-z0-9]+]], [[REG1]]			; CHECK-THUMB-BE-DAG: and.w [[REG3:[a-z0-9]+]], [[REG1]]
				; CHECK-THUMB-BE-DAG: and.w [[REG4:[a-z0-9]+]], [[REG2]]
	; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[REG3]], [[REG4]]			; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[REG3]], [[REG4]]
	; CHECK-THUMB: cmp			; CHECK-THUMB: cmp
	; CHECK-THUMB: bne			; CHECK-THUMB: bne
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}

	%r = atomicrmw and i64* %ptr, i64 %val seq_cst			%r = atomicrmw and i64* %ptr, i64 %val seq_cst
	ret i64 %r			ret i64 %r
	}			}
	Show All 11 Lines
	; CHECK: bne			; CHECK: bne
	; CHECK: dmb {{ish$}}			; CHECK: dmb {{ish$}}

	; CHECK-THUMB-LABEL: test4:			; CHECK-THUMB-LABEL: test4:
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}
	; CHECK-THUMB: ldrexd [[REG1:[a-z0-9]+]], [[REG2:[a-z0-9]+]]			; CHECK-THUMB: ldrexd [[REG1:[a-z0-9]+]], [[REG2:[a-z0-9]+]]
	; CHECK-THUMB-LE-DAG: orr.w [[REG3:[a-z0-9]+]], [[REG1]]			; CHECK-THUMB-LE-DAG: orr.w [[REG3:[a-z0-9]+]], [[REG1]]
	; CHECK-THUMB-LE-DAG: orr.w [[REG4:[a-z0-9]+]], [[REG2]]			; CHECK-THUMB-LE-DAG: orr.w [[REG4:[a-z0-9]+]], [[REG2]]
	; CHECK-THUMB-BE-DAG: orr.w [[REG4:[a-z0-9]+]], [[REG2]]
	; CHECK-THUMB-BE-DAG: orr.w [[REG3:[a-z0-9]+]], [[REG1]]			; CHECK-THUMB-BE-DAG: orr.w [[REG3:[a-z0-9]+]], [[REG1]]
				; CHECK-THUMB-BE-DAG: orr.w [[REG4:[a-z0-9]+]], [[REG2]]
	; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[REG3]], [[REG4]]			; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[REG3]], [[REG4]]
	; CHECK-THUMB: cmp			; CHECK-THUMB: cmp
	; CHECK-THUMB: bne			; CHECK-THUMB: bne
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}

	%r = atomicrmw or i64* %ptr, i64 %val seq_cst			%r = atomicrmw or i64* %ptr, i64 %val seq_cst
	ret i64 %r			ret i64 %r
	}			}
	Show All 11 Lines
	; CHECK: bne			; CHECK: bne
	; CHECK: dmb {{ish$}}			; CHECK: dmb {{ish$}}

	; CHECK-THUMB-LABEL: test5:			; CHECK-THUMB-LABEL: test5:
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}
	; CHECK-THUMB: ldrexd [[REG1:[a-z0-9]+]], [[REG2:[a-z0-9]+]]			; CHECK-THUMB: ldrexd [[REG1:[a-z0-9]+]], [[REG2:[a-z0-9]+]]
	; CHECK-THUMB-LE-DAG: eor.w [[REG3:[a-z0-9]+]], [[REG1]]			; CHECK-THUMB-LE-DAG: eor.w [[REG3:[a-z0-9]+]], [[REG1]]
	; CHECK-THUMB-LE-DAG: eor.w [[REG4:[a-z0-9]+]], [[REG2]]			; CHECK-THUMB-LE-DAG: eor.w [[REG4:[a-z0-9]+]], [[REG2]]
	; CHECK-THUMB-BE-DAG: eor.w [[REG4:[a-z0-9]+]], [[REG2]]
	; CHECK-THUMB-BE-DAG: eor.w [[REG3:[a-z0-9]+]], [[REG1]]			; CHECK-THUMB-BE-DAG: eor.w [[REG3:[a-z0-9]+]], [[REG1]]
				; CHECK-THUMB-BE-DAG: eor.w [[REG4:[a-z0-9]+]], [[REG2]]
	; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[REG3]], [[REG4]]			; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[REG3]], [[REG4]]
	; CHECK-THUMB: cmp			; CHECK-THUMB: cmp
	; CHECK-THUMB: bne			; CHECK-THUMB: bne
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}

	%r = atomicrmw xor i64* %ptr, i64 %val seq_cst			%r = atomicrmw xor i64* %ptr, i64 %val seq_cst
	ret i64 %r			ret i64 %r
	}			}
	▲ Show 20 Lines • Show All 271 Lines • Show Last 20 Lines

test/CodeGen/ARM/atomic-cmp.ll

	; RUN: llc < %s -mtriple=armv7-apple-darwin -verify-machineinstrs \| FileCheck %s -check-prefix=ARM			; RUN: llc < %s -mtriple=armv7-apple-darwin -verify-machineinstrs \| FileCheck %s
	; RUN: llc < %s -mtriple=thumbv7-apple-darwin -verify-machineinstrs \| FileCheck %s -check-prefix=T2			; RUN: llc < %s -mtriple=thumbv7-apple-darwin -verify-machineinstrs \| FileCheck %s
	; rdar://8964854			; rdar://8964854

	define i8 @t(i8* %a, i8 %b, i8 %c) nounwind {			define i8 @t(i8* %a, i8 %b, i8 %c) nounwind {
	; ARM-LABEL: t:			; CHECK-LABEL: t:
	; ARM: ldrexb			; CHECK: ldrexb
	; ARM: strexb			; CHECK: strexb
	; ARM: clrex			; CHECK: clrex

	; T2-LABEL: t:
	; T2: strexb
	; T2: ldrexb
	; T2: clrex
	%tmp0 = cmpxchg i8* %a, i8 %b, i8 %c monotonic monotonic			%tmp0 = cmpxchg i8* %a, i8 %b, i8 %c monotonic monotonic
	%tmp1 = extractvalue { i8, i1 } %tmp0, 0			%tmp1 = extractvalue { i8, i1 } %tmp0, 0
	ret i8 %tmp1			ret i8 %tmp1
	}			}

test/CodeGen/ARM/atomic-ops-v8.ll

	Show First 20 Lines • Show All 1,031 Lines • ▼ Show 20 Lines
	define i8 @test_atomic_cmpxchg_i8(i8 zeroext %wanted, i8 zeroext %new) nounwind {			define i8 @test_atomic_cmpxchg_i8(i8 zeroext %wanted, i8 zeroext %new) nounwind {
	; CHECK-LABEL: test_atomic_cmpxchg_i8:			; CHECK-LABEL: test_atomic_cmpxchg_i8:
	%pair = cmpxchg i8* @var8, i8 %wanted, i8 %new acquire acquire			%pair = cmpxchg i8* @var8, i8 %wanted, i8 %new acquire acquire
	%old = extractvalue { i8, i1 } %pair, 0			%old = extractvalue { i8, i1 } %pair, 0
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr
	; CHECK-DAG: movw r[[ADDR:[0-9]+]], :lower16:var8			; CHECK-DAG: movw r[[ADDR:[0-9]+]], :lower16:var8
	; CHECK-DAG: movt r[[ADDR]], :upper16:var8			; CHECK-DAG: movt r[[ADDR]], :upper16:var8
	; CHECK-THUMB-DAG: mov r[[WANTED:[0-9]+]], r0

	; CHECK: .LBB{{[0-9]+}}_1:			; CHECK: .LBB{{[0-9]+}}_1:
	; CHECK: ldaexb r[[OLD:[0-9]+]], [r[[ADDR]]]			; CHECK: ldaexb r[[OLD:[0-9]+]], [r[[ADDR]]]
	; r0 below is a reasonable guess but could change: it certainly comes into the			; r0 below is a reasonable guess but could change: it certainly comes into the
	; function there.			; function there.
	; CHECK-ARM-NEXT: cmp r[[OLD]], r0			; CHECK-NEXT: cmp r[[OLD]], r0
	; CHECK-THUMB-NEXT: cmp r[[OLD]], r[[WANTED]]
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_4			; CHECK-NEXT: bne .LBB{{[0-9]+}}_4
	; CHECK-NEXT: %bb.2:			; CHECK-NEXT: %bb.2:
	; As above, r1 is a reasonable guess.			; As above, r1 is a reasonable guess.
	; CHECK: strexb [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]			; CHECK: strexb [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]
	; CHECK-NEXT: cmp [[STATUS]], #0			; CHECK-NEXT: cmp [[STATUS]], #0
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_1			; CHECK-NEXT: bne .LBB{{[0-9]+}}_1
	; CHECK-ARM: mov r0, r[[OLD]]			; CHECK: mov r0, r[[OLD]]
	; CHECK: bx lr			; CHECK: bx lr
	; CHECK-NEXT: .LBB{{[0-9]+}}_4:			; CHECK-NEXT: .LBB{{[0-9]+}}_4:
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr

	; CHECK-ARM: mov r0, r[[OLD]]			; CHECK: mov r0, r[[OLD]]
	; CHECK-ARM-NEXT: bx lr			; CHECK-NEXT: bx lr
	ret i8 %old			ret i8 %old
	}			}

	define i16 @test_atomic_cmpxchg_i16(i16 zeroext %wanted, i16 zeroext %new) nounwind {			define i16 @test_atomic_cmpxchg_i16(i16 zeroext %wanted, i16 zeroext %new) nounwind {
	; CHECK-LABEL: test_atomic_cmpxchg_i16:			; CHECK-LABEL: test_atomic_cmpxchg_i16:
	%pair = cmpxchg i16* @var16, i16 %wanted, i16 %new seq_cst seq_cst			%pair = cmpxchg i16* @var16, i16 %wanted, i16 %new seq_cst seq_cst
	%old = extractvalue { i16, i1 } %pair, 0			%old = extractvalue { i16, i1 } %pair, 0
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr
	; CHECK-DAG: movw r[[ADDR:[0-9]+]], :lower16:var16			; CHECK-DAG: movw r[[ADDR:[0-9]+]], :lower16:var16
	; CHECK-DAG: movt r[[ADDR]], :upper16:var16			; CHECK-DAG: movt r[[ADDR]], :upper16:var16
	; CHECK-THUMB-DAG: mov r[[WANTED:[0-9]+]], r0

	; CHECK: .LBB{{[0-9]+}}_1:			; CHECK: .LBB{{[0-9]+}}_1:
	; CHECK: ldaexh r[[OLD:[0-9]+]], [r[[ADDR]]]			; CHECK: ldaexh r[[OLD:[0-9]+]], [r[[ADDR]]]
	; r0 below is a reasonable guess but could change: it certainly comes into the			; r0 below is a reasonable guess but could change: it certainly comes into the
	; function there.			; function there.
	; CHECK-ARM-NEXT: cmp r[[OLD]], r0			; CHECK-NEXT: cmp r[[OLD]], r0
	; CHECK-THUMB-NEXT: cmp r[[OLD]], r[[WANTED]]
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_4			; CHECK-NEXT: bne .LBB{{[0-9]+}}_4
	; CHECK-NEXT: %bb.2:			; CHECK-NEXT: %bb.2:
	; As above, r1 is a reasonable guess.			; As above, r1 is a reasonable guess.
	; CHECK: stlexh [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]			; CHECK: stlexh [[STATUS:r[0-9]+]], r1, [r[[ADDR]]]
	; CHECK-NEXT: cmp [[STATUS]], #0			; CHECK-NEXT: cmp [[STATUS]], #0
	; CHECK-NEXT: bne .LBB{{[0-9]+}}_1			; CHECK-NEXT: bne .LBB{{[0-9]+}}_1
	; CHECK-ARM: mov r0, r[[OLD]]			; CHECK: mov r0, r[[OLD]]
	; CHECK: bx lr			; CHECK: bx lr
	; CHECK-NEXT: .LBB{{[0-9]+}}_4:			; CHECK-NEXT: .LBB{{[0-9]+}}_4:
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	; CHECK-NOT: mcr			; CHECK-NOT: mcr

	; CHECK-ARM: mov r0, r[[OLD]]			; CHECK-ARM: mov r0, r[[OLD]]
	; CHECK-ARM-NEXT: bx lr			; CHECK-ARM-NEXT: bx lr
	▲ Show 20 Lines • Show All 330 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/vsx.ll

Show First 20 Lines • Show All 253 Lines • ▼ Show 20 Lines	entry:
%w = xor <8 x i16> %v, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>		%w = xor <8 x i16> %v, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>
ret <8 x i16> %w		ret <8 x i16> %w

; CHECK-REG-LABEL: @test15		; CHECK-REG-LABEL: @test15
; CHECK-REG: xxlnor 34, 34, 35		; CHECK-REG: xxlnor 34, 34, 35
; CHECK-REG: blr		; CHECK-REG: blr

; CHECK-FISL-LABEL: @test15		; CHECK-FISL-LABEL: @test15
; CHECK-FISL: xxlor 0, 34, 35		; CHECK-FISL: xxlor 36, 34, 35
; CHECK-FISL: xxlor 36, 0, 0		; CHECK-FISL: xxlnor 34, 34, 35
; CHECK-FISL: xxlnor 0, 34, 35
; CHECK-FISL: xxlor 34, 0, 0
; CHECK-FISL-NOT: lis		; CHECK-FISL-NOT: lis
; CHECK-FISL-NOT: ori		; CHECK-FISL-NOT: ori
; CHECK-FISL: li 3, -16		; CHECK-FISL: li 3, -16
; CHECK-FISL-NOT: lis		; CHECK-FISL-NOT: lis
; CHECK-FISL-NOT: ori		; CHECK-FISL-NOT: ori
; CHECK-FISL: stxvd2x 36, 1, 3		; CHECK-FISL: stxvd2x 36, 1, 3
; CHECK-FISL: blr		; CHECK-FISL: blr

; CHECK-LE-LABEL: @test15		; CHECK-LE-LABEL: @test15
; CHECK-LE: xxlnor 34, 34, 35		; CHECK-LE: xxlnor 34, 34, 35
; CHECK-LE: blr		; CHECK-LE: blr
}		}

define <16 x i8> @test16(<16 x i8> %a, <16 x i8> %b) {		define <16 x i8> @test16(<16 x i8> %a, <16 x i8> %b) {
entry:		entry:
%v = or <16 x i8> %a, %b		%v = or <16 x i8> %a, %b
%w = xor <16 x i8> %v, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>		%w = xor <16 x i8> %v, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
ret <16 x i8> %w		ret <16 x i8> %w

; CHECK-REG-LABEL: @test16		; CHECK-REG-LABEL: @test16
; CHECK-REG: xxlnor 34, 34, 35		; CHECK-REG: xxlnor 34, 34, 35
; CHECK-REG: blr		; CHECK-REG: blr

; CHECK-FISL-LABEL: @test16		; CHECK-FISL-LABEL: @test16
; CHECK-FISL: xxlor 0, 34, 35		; CHECK-FISL: xxlor 36, 34, 35
; CHECK-FISL: xxlor 36, 0, 0		; CHECK-FISL: xxlnor 34, 34, 35
; CHECK-FISL: xxlnor 0, 34, 35
; CHECK-FISL: xxlor 34, 0, 0
; CHECK-FISL-NOT: lis		; CHECK-FISL-NOT: lis
; CHECK-FISL-NOT: ori		; CHECK-FISL-NOT: ori
; CHECK-FISL: li 3, -16		; CHECK-FISL: li 3, -16
; CHECK-FISL-NOT: lis		; CHECK-FISL-NOT: lis
; CHECK-FISL-NOT: ori		; CHECK-FISL-NOT: ori
; CHECK-FISL: stxvd2x 36, 1, 3		; CHECK-FISL: stxvd2x 36, 1, 3
; CHECK-FISL: blr		; CHECK-FISL: blr

Show All 28 Lines	entry:
%v = and <8 x i16> %a, %w		%v = and <8 x i16> %a, %w
ret <8 x i16> %v		ret <8 x i16> %v

; CHECK-REG-LABEL: @test18		; CHECK-REG-LABEL: @test18
; CHECK-REG: xxlandc 34, 34, 35		; CHECK-REG: xxlandc 34, 34, 35
; CHECK-REG: blr		; CHECK-REG: blr

; CHECK-FISL-LABEL: @test18		; CHECK-FISL-LABEL: @test18
; CHECK-FISL: xxlnor 0, 35, 35		; CHECK-FISL: xxlnor 36, 35, 35
; CHECK-FISL: xxlor 36, 0, 0		; CHECK-FISL: xxlandc 34, 34, 35
; CHECK-FISL: xxlandc 0, 34, 35
; CHECK-FISL: xxlor 34, 0, 0
; CHECK-FISL-NOT: lis		; CHECK-FISL-NOT: lis
; CHECK-FISL-NOT: ori		; CHECK-FISL-NOT: ori
; CHECK-FISL: li 3, -16		; CHECK-FISL: li 3, -16
; CHECK-FISL-NOT: lis		; CHECK-FISL-NOT: lis
; CHECK-FISL-NOT: ori		; CHECK-FISL-NOT: ori
; CHECK-FISL: stxvd2x 36, 1, 3		; CHECK-FISL: stxvd2x 36, 1, 3
; CHECK-FISL: blr		; CHECK-FISL: blr

; CHECK-LE-LABEL: @test18		; CHECK-LE-LABEL: @test18
; CHECK-LE: xxlandc 34, 34, 35		; CHECK-LE: xxlandc 34, 34, 35
; CHECK-LE: blr		; CHECK-LE: blr
}		}

define <16 x i8> @test19(<16 x i8> %a, <16 x i8> %b) {		define <16 x i8> @test19(<16 x i8> %a, <16 x i8> %b) {
entry:		entry:
%w = xor <16 x i8> %b, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>		%w = xor <16 x i8> %b, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
%v = and <16 x i8> %a, %w		%v = and <16 x i8> %a, %w
ret <16 x i8> %v		ret <16 x i8> %v

; CHECK-REG-LABEL: @test19		; CHECK-REG-LABEL: @test19
; CHECK-REG: xxlandc 34, 34, 35		; CHECK-REG: xxlandc 34, 34, 35
; CHECK-REG: blr		; CHECK-REG: blr

; CHECK-FISL-LABEL: @test19		; CHECK-FISL-LABEL: @test19
; CHECK-FISL: xxlnor 0, 35, 35		; CHECK-FISL: xxlnor 36, 35, 35
; CHECK-FISL: xxlor 36, 0, 0		; CHECK-FISL: xxlandc 34, 34, 35
; CHECK-FISL: xxlandc 0, 34, 35
; CHECK-FISL: xxlor 34, 0, 0
; CHECK-FISL-NOT: lis		; CHECK-FISL-NOT: lis
; CHECK-FISL-NOT: ori		; CHECK-FISL-NOT: ori
; CHECK-FISL: li 3, -16		; CHECK-FISL: li 3, -16
; CHECK-FISL-NOT: lis		; CHECK-FISL-NOT: lis
; CHECK-FISL-NOT: ori		; CHECK-FISL-NOT: ori
; CHECK-FISL: stxvd2x 36, 1, 3		; CHECK-FISL: stxvd2x 36, 1, 3
; CHECK-FISL: blr		; CHECK-FISL: blr

▲ Show 20 Lines • Show All 874 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/cond-move-03.ll

Show All 14 Lines	; CHECK: br %r14
%a = call i32 asm sideeffect "stepa $0", "=h"()		%a = call i32 asm sideeffect "stepa $0", "=h"()
%b = call i32 asm sideeffect "stepb $0", "=h"()		%b = call i32 asm sideeffect "stepb $0", "=h"()
%cond = icmp ult i32 %limit, 42		%cond = icmp ult i32 %limit, 42
%res = select i1 %cond, i32 %a, i32 %b		%res = select i1 %cond, i32 %a, i32 %b
call void asm sideeffect "stepc $0", "h"(i32 %res)		call void asm sideeffect "stepc $0", "h"(i32 %res)
ret void		ret void
}		}

; FIXME: We should commute the LOCRMux to save one move.
define void @f2(i32 %limit) {		define void @f2(i32 %limit) {
; CHECK-LABEL: f2:		; CHECK-LABEL: f2:
; CHECK-DAG: stepa [[REG1:%r[0-5]]]		; CHECK-DAG: stepa [[REG1:%r[0-5]]]
; CHECK-DAG: stepb [[REG2:%r[0-5]]]		; CHECK-DAG: stepb [[REG2:%r[0-5]]]
; CHECK-DAG: clijhe %r2, 42,		; CHECK-DAG: clfi %r2, 42
; CHECK: risblg [[REG2]], [[REG1]], 0, 159, 32		; CHECK: risbhg %r2, [[REG2]], 0, 159, 32
; CHECK: risbhg [[REG1]], [[REG2]], 0, 159, 32		; CHECK: locfhrl %r2, [[REG1]]
; CHECK: stepc [[REG1]]		; CHECK: stepc %r2
; CHECK: br %r14		; CHECK: br %r14
%dummy = call i32 asm sideeffect "dummy $0", "=h"()		%dummy = call i32 asm sideeffect "dummy $0", "=h"()
%a = call i32 asm sideeffect "stepa $0", "=h"()		%a = call i32 asm sideeffect "stepa $0", "=h"()
%b = call i32 asm sideeffect "stepb $0", "=r"()		%b = call i32 asm sideeffect "stepb $0", "=r"()
%cond = icmp ult i32 %limit, 42		%cond = icmp ult i32 %limit, 42
%res = select i1 %cond, i32 %a, i32 %b		%res = select i1 %cond, i32 %a, i32 %b
call void asm sideeffect "stepc $0", "h"(i32 %res)		call void asm sideeffect "stepc $0", "h"(i32 %res)
call void asm sideeffect "dummy $0", "h"(i32 %dummy)		call void asm sideeffect "dummy $0", "h"(i32 %dummy)
Show All 13 Lines	; CHECK: br %r14
%b = call i32 asm sideeffect "stepb $0", "=h"()		%b = call i32 asm sideeffect "stepb $0", "=h"()
%cond = icmp ult i32 %limit, 42		%cond = icmp ult i32 %limit, 42
%res = select i1 %cond, i32 %a, i32 %b		%res = select i1 %cond, i32 %a, i32 %b
call void asm sideeffect "stepc $0", "h"(i32 %res)		call void asm sideeffect "stepc $0", "h"(i32 %res)
call void asm sideeffect "dummy $0", "h"(i32 %dummy)		call void asm sideeffect "dummy $0", "h"(i32 %dummy)
ret void		ret void
}		}

; FIXME: We should commute the LOCRMux to save one move.
define void @f4(i32 %limit) {		define void @f4(i32 %limit) {
; CHECK-LABEL: f4:		; CHECK-LABEL: f4:
; CHECK-DAG: stepa [[REG1:%r[0-5]]]		; CHECK-DAG: stepa [[REG1:%r[0-5]]]
; CHECK-DAG: stepb [[REG2:%r[0-5]]]		; CHECK-DAG: stepb [[REG2:%r[0-5]]]
; CHECK-DAG: clijhe %r2, 42,		; CHECK-DAG: clfi %r2, 42
; CHECK: risbhg [[REG2]], [[REG1]], 0, 159, 32		; CHECK: risblg [[REG2]], [[REG2]], 0, 159, 32
; CHECK: risblg [[REG1]], [[REG2]], 0, 159, 32		; CHECK: locrl [[REG2]], [[REG1]]
; CHECK: stepc [[REG1]]		; CHECK: stepc [[REG2]]
; CHECK: br %r14		; CHECK: br %r14
%dummy = call i32 asm sideeffect "dummy $0", "=h"()		%dummy = call i32 asm sideeffect "dummy $0", "=h"()
%a = call i32 asm sideeffect "stepa $0", "=r"()		%a = call i32 asm sideeffect "stepa $0", "=r"()
%b = call i32 asm sideeffect "stepb $0", "=h"()		%b = call i32 asm sideeffect "stepb $0", "=h"()
%cond = icmp ult i32 %limit, 42		%cond = icmp ult i32 %limit, 42
%res = select i1 %cond, i32 %a, i32 %b		%res = select i1 %cond, i32 %a, i32 %b
call void asm sideeffect "stepc $0", "r"(i32 %res)		call void asm sideeffect "stepc $0", "r"(i32 %res)
call void asm sideeffect "dummy $0", "h"(i32 %dummy)		call void asm sideeffect "dummy $0", "h"(i32 %dummy)
▲ Show 20 Lines • Show All 136 Lines • Show Last 20 Lines