This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
2/2
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
GlobalISel/
-
arm64-atomic-128.ll
-
arm64-atomic.ll
2
arm64-atomic-128.ll
-
atomic-ops-lse.ll
1/2
atomicrmw-xchg-fp.ll

Differential D110069

AArch64: use `CAS` instead of `LDX`/`STX` for more ops if available
ClosedPublic

Authored by t.p.northover on Sep 20 2021, 6:27 AM.

Download Raw Diff

Details

Reviewers

sebpop
dmgreen
ilinpv
tmatheson
efriedma

Summary

This covers 128-bit loads, and atomicrmw operations without a single native instruction. Using CAS saves a bit of code size and has a better chance of succeeding with high contention on some systems.

Diff Detail

Unit TestsFailed

	Time	Test
	90 ms	x64 debian > LLVM.Transforms/LoopFusion::guarded.ll
	60,200 ms	x64 debian > SanitizerCommon-Unit._/Sanitizer-x86_64-Test/36::48
	60,060 ms	x64 debian > SanitizerCommon-Unit._/Sanitizer-x86_64-Test/42::48

Event Timeline

t.p.northover created this revision.Sep 20 2021, 6:27 AM

Herald added subscribers: hiraditya, kristof.beyls, mcrosier. · View Herald TranscriptSep 20 2021, 6:27 AM

t.p.northover requested review of this revision.Sep 20 2021, 6:27 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 20 2021, 6:27 AM

Harbormaster completed remote builds in B124652: Diff 373571.Sep 20 2021, 7:12 AM

Ping.

The patch looks good to me.
Applying this patch solves this issue:
https://github.com/llvm/llvm-project/issues/55199
which is a reduced test from Rust generated code.

Could one of the ARM64 maintainers also approve this change?

This revision is now accepted and ready to land.Dec 12 2022, 11:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 12 2022, 11:51 AM

sebpop added reviewers: dmgreen, ilinpv, tmatheson.Dec 12 2022, 12:05 PM

Didn't realize this was up for review; happened to spot it on the list.

Some of these sequence seem extremely long. It should be a little better on main, since we improved ccmp formation, but can we rearrange operations somehow so we need fewer mov operations in the fast path?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
21876	A comment here might be useful.
21915–21918	80 cols Comment needs to be updated.
llvm/test/CodeGen/AArch64/arm64-atomic-128.ll
774	This bug got fixed, right?
llvm/test/CodeGen/AArch64/atomicrmw-xchg-fp.ll
107	These moves seem very strange.

efriedma added inline comments.Dec 12 2022, 6:24 PM

llvm/test/CodeGen/AArch64/arm64-atomic-128.ll
774	Nevermind, this isn't a bug; got it confused with a different issue.

t.p.northover marked 2 inline comments as done.Dec 13 2022, 7:11 AM

t.p.northover added inline comments.

llvm/test/CodeGen/AArch64/atomicrmw-xchg-fp.ll
107	The first two are part of forming an `xseqregclass` thing from component registers, the second two are because `CASP` clobbers its input but we want to compare against it afterwards. Still not ideal, but not completely out there.

Updating diff to ToT.

Harbormaster completed remote builds in B202834: Diff 482464.Dec 13 2022, 7:59 AM

Please fix the commit message to fix the incorrect claim that this reduces codesize; the only case that actually decreases in size is the atomic load.

Please file a followup bug for the extra mov instructions.

Otherwise LGTM, I guess; improving the performance under contention probably outweighs any inefficiency caused by the extra instructions.

(We might also want to mess with the code sequences for outlining, but that doesn't need to be this patch.)

Thanks Eli. I filed https://github.com/llvm/llvm-project/issues/59516 for the moves and pushed it with the changes you asked for (10d34f5538e0).

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

11 lines

test/

CodeGen/

AArch64/

GlobalISel/

13 lines

28 lines

777 lines

3 lines

16 lines

Diff 482464

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 21,866 Lines • ▼ Show 20 Lines	AArch64TargetLowering::shouldExpandAtomicLoadInIR(LoadInst *LI) const {
// At -O0, fast-regalloc cannot cope with the live vregs necessary to		// At -O0, fast-regalloc cannot cope with the live vregs necessary to
// implement atomicrmw without spilling. If the target address is also on the		// implement atomicrmw without spilling. If the target address is also on the
// stack and close enough to the spill slot, this can lead to a situation		// stack and close enough to the spill slot, this can lead to a situation
// where the monitor always gets cleared and the atomic operation can never		// where the monitor always gets cleared and the atomic operation can never
// succeed. So at -O0 lower this operation to a CAS loop.		// succeed. So at -O0 lower this operation to a CAS loop.
if (getTargetMachine().getOptLevel() == CodeGenOpt::None)		if (getTargetMachine().getOptLevel() == CodeGenOpt::None)
return AtomicExpansionKind::CmpXChg;		return AtomicExpansionKind::CmpXChg;

return AtomicExpansionKind::LLSC;		// Using CAS for an atomic load has a better chance of succeeding under high
		// contention situations, and can save code size. So use it if available.
		efriedmaUnsubmitted Done Reply Inline Actions A comment here might be useful. efriedma: A comment here might be useful.
		return Subtarget->hasLSE() ? AtomicExpansionKind::CmpXChg
		: AtomicExpansionKind::LLSC;
}		}

// For the real atomic operations, we have ldxr/stxr up to 128 bits,		// For the real atomic operations, we have ldxr/stxr up to 128 bits,
TargetLowering::AtomicExpansionKind		TargetLowering::AtomicExpansionKind
AArch64TargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const {		AArch64TargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const {
if (AI->isFloatingPointOperation())		if (AI->isFloatingPointOperation())
return AtomicExpansionKind::CmpXChg;		return AtomicExpansionKind::CmpXChg;

Show All 20 Lines	if (Subtarget->outlineAtomics()) {
}		}
}		}
}		}

// At -O0, fast-regalloc cannot cope with the live vregs necessary to		// At -O0, fast-regalloc cannot cope with the live vregs necessary to
// implement atomicrmw without spilling. If the target address is also on the		// implement atomicrmw without spilling. If the target address is also on the
// stack and close enough to the spill slot, this can lead to a situation		// stack and close enough to the spill slot, this can lead to a situation
// where the monitor always gets cleared and the atomic operation can never		// where the monitor always gets cleared and the atomic operation can never
// succeed. So at -O0 lower this operation to a CAS loop.		// succeed. So at -O0 lower this operation to a CAS loop. Also worthwhile if
if (getTargetMachine().getOptLevel() == CodeGenOpt::None)		// we have a single CAS instruction that can replace the loop.
		if (getTargetMachine().getOptLevel() == CodeGenOpt::None \|\|
		Subtarget->hasLSE())
		efriedmaUnsubmitted Done Reply Inline Actions 80 cols Comment needs to be updated. efriedma: 80 cols Comment needs to be updated.
return AtomicExpansionKind::CmpXChg;		return AtomicExpansionKind::CmpXChg;

return AtomicExpansionKind::LLSC;		return AtomicExpansionKind::LLSC;
}		}

TargetLowering::AtomicExpansionKind		TargetLowering::AtomicExpansionKind
AArch64TargetLowering::shouldExpandAtomicCmpXchgInIR(		AArch64TargetLowering::shouldExpandAtomicCmpXchgInIR(
AtomicCmpXchgInst *AI) const {		AtomicCmpXchgInst *AI) const {
▲ Show 20 Lines • Show All 1,650 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic-128.ll

	Show First 20 Lines • Show All 354 Lines • ▼ Show 20 Lines
	; CHECK-LLSC-O1-NEXT: // %bb.2: // %atomicrmw.end			; CHECK-LLSC-O1-NEXT: // %bb.2: // %atomicrmw.end
	; CHECK-LLSC-O1-NEXT: mov v0.d[0], x9			; CHECK-LLSC-O1-NEXT: mov v0.d[0], x9
	; CHECK-LLSC-O1-NEXT: mov v0.d[1], x8			; CHECK-LLSC-O1-NEXT: mov v0.d[1], x8
	; CHECK-LLSC-O1-NEXT: str q0, [x3]			; CHECK-LLSC-O1-NEXT: str q0, [x3]
	; CHECK-LLSC-O1-NEXT: ret			; CHECK-LLSC-O1-NEXT: ret
	;			;
	; CHECK-CAS-O1-LABEL: atomic_load_relaxed:			; CHECK-CAS-O1-LABEL: atomic_load_relaxed:
	; CHECK-CAS-O1: // %bb.0:			; CHECK-CAS-O1: // %bb.0:
	; CHECK-CAS-O1-NEXT: .LBB4_1: // %atomicrmw.start			; CHECK-CAS-O1-NEXT: mov x0, xzr
	; CHECK-CAS-O1-NEXT: // =>This Inner Loop Header: Depth=1			; CHECK-CAS-O1-NEXT: mov x1, xzr
	; CHECK-CAS-O1-NEXT: ldxp x9, x8, [x2]			; CHECK-CAS-O1-NEXT: casp x0, x1, x0, x1, [x2]
	; CHECK-CAS-O1-NEXT: stxp w10, x9, x8, [x2]			; CHECK-CAS-O1-NEXT: mov v0.d[0], x0
	; CHECK-CAS-O1-NEXT: cbnz w10, .LBB4_1			; CHECK-CAS-O1-NEXT: mov v0.d[1], x1
	; CHECK-CAS-O1-NEXT: // %bb.2: // %atomicrmw.end
	; CHECK-CAS-O1-NEXT: mov v0.d[0], x9
	; CHECK-CAS-O1-NEXT: mov v0.d[1], x8
	; CHECK-CAS-O1-NEXT: str q0, [x3]			; CHECK-CAS-O1-NEXT: str q0, [x3]
	; CHECK-CAS-O1-NEXT: ret			; CHECK-CAS-O1-NEXT: ret
	;			;
	; CHECK-LLSC-O0-LABEL: atomic_load_relaxed:			; CHECK-LLSC-O0-LABEL: atomic_load_relaxed:
	; CHECK-LLSC-O0: // %bb.0:			; CHECK-LLSC-O0: // %bb.0:
	; CHECK-LLSC-O0-NEXT: mov x11, xzr			; CHECK-LLSC-O0-NEXT: mov x11, xzr
	; CHECK-LLSC-O0-NEXT: .LBB4_1: // =>This Inner Loop Header: Depth=1			; CHECK-LLSC-O0-NEXT: .LBB4_1: // =>This Inner Loop Header: Depth=1
	; CHECK-LLSC-O0-NEXT: ldxp x9, x8, [x2]			; CHECK-LLSC-O0-NEXT: ldxp x9, x8, [x2]
	▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll

	Show First 20 Lines • Show All 356 Lines • ▼ Show 20 Lines
	; CHECK-NOLSE-O0-NEXT: b LBB6_5			; CHECK-NOLSE-O0-NEXT: b LBB6_5
	; CHECK-NOLSE-O0-NEXT: LBB6_5: ; %atomicrmw.end			; CHECK-NOLSE-O0-NEXT: LBB6_5: ; %atomicrmw.end
	; CHECK-NOLSE-O0-NEXT: ldr w0, [sp, #12] ; 4-byte Folded Reload			; CHECK-NOLSE-O0-NEXT: ldr w0, [sp, #12] ; 4-byte Folded Reload
	; CHECK-NOLSE-O0-NEXT: add sp, sp, #32			; CHECK-NOLSE-O0-NEXT: add sp, sp, #32
	; CHECK-NOLSE-O0-NEXT: ret			; CHECK-NOLSE-O0-NEXT: ret
	;			;
	; CHECK-LSE-O1-LABEL: fetch_and_nand:			; CHECK-LSE-O1-LABEL: fetch_and_nand:
	; CHECK-LSE-O1: ; %bb.0:			; CHECK-LSE-O1: ; %bb.0:
				; CHECK-LSE-O1-NEXT: mov x8, x0
				; CHECK-LSE-O1-NEXT: ldr w0, [x0]
	; CHECK-LSE-O1-NEXT: LBB6_1: ; %atomicrmw.start			; CHECK-LSE-O1-NEXT: LBB6_1: ; %atomicrmw.start
	; CHECK-LSE-O1-NEXT: ; =>This Inner Loop Header: Depth=1			; CHECK-LSE-O1-NEXT: ; =>This Inner Loop Header: Depth=1
	; CHECK-LSE-O1-NEXT: ldxr w8, [x0]			; CHECK-LSE-O1-NEXT: mov x9, x0
	; CHECK-LSE-O1-NEXT: and w9, w8, #0x7			; CHECK-LSE-O1-NEXT: and w10, w0, #0x7
	; CHECK-LSE-O1-NEXT: mvn w9, w9			; CHECK-LSE-O1-NEXT: mvn w10, w10
	; CHECK-LSE-O1-NEXT: stlxr w10, w9, [x0]			; CHECK-LSE-O1-NEXT: casl w0, w10, [x8]
	; CHECK-LSE-O1-NEXT: cbnz w10, LBB6_1			; CHECK-LSE-O1-NEXT: cmp w0, w9
				; CHECK-LSE-O1-NEXT: b.ne LBB6_1
	; CHECK-LSE-O1-NEXT: ; %bb.2: ; %atomicrmw.end			; CHECK-LSE-O1-NEXT: ; %bb.2: ; %atomicrmw.end
	; CHECK-LSE-O1-NEXT: mov x0, x8
	; CHECK-LSE-O1-NEXT: ret			; CHECK-LSE-O1-NEXT: ret
	;			;
	; CHECK-LSE-O0-LABEL: fetch_and_nand:			; CHECK-LSE-O0-LABEL: fetch_and_nand:
	; CHECK-LSE-O0: ; %bb.0:			; CHECK-LSE-O0: ; %bb.0:
	; CHECK-LSE-O0-NEXT: sub sp, sp, #32			; CHECK-LSE-O0-NEXT: sub sp, sp, #32
	; CHECK-LSE-O0-NEXT: str x0, [sp, #16] ; 8-byte Folded Spill			; CHECK-LSE-O0-NEXT: str x0, [sp, #16] ; 8-byte Folded Spill
	; CHECK-LSE-O0-NEXT: ldr w8, [x0]			; CHECK-LSE-O0-NEXT: ldr w8, [x0]
	; CHECK-LSE-O0-NEXT: str w8, [sp, #28] ; 4-byte Folded Spill			; CHECK-LSE-O0-NEXT: str w8, [sp, #28] ; 4-byte Folded Spill
	▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; CHECK-NOLSE-O0-NEXT: b LBB7_5			; CHECK-NOLSE-O0-NEXT: b LBB7_5
	; CHECK-NOLSE-O0-NEXT: LBB7_5: ; %atomicrmw.end			; CHECK-NOLSE-O0-NEXT: LBB7_5: ; %atomicrmw.end
	; CHECK-NOLSE-O0-NEXT: ldr x0, [sp, #8] ; 8-byte Folded Reload			; CHECK-NOLSE-O0-NEXT: ldr x0, [sp, #8] ; 8-byte Folded Reload
	; CHECK-NOLSE-O0-NEXT: add sp, sp, #32			; CHECK-NOLSE-O0-NEXT: add sp, sp, #32
	; CHECK-NOLSE-O0-NEXT: ret			; CHECK-NOLSE-O0-NEXT: ret
	;			;
	; CHECK-LSE-O1-LABEL: fetch_and_nand_64:			; CHECK-LSE-O1-LABEL: fetch_and_nand_64:
	; CHECK-LSE-O1: ; %bb.0:			; CHECK-LSE-O1: ; %bb.0:
				; CHECK-LSE-O1-NEXT: mov x8, x0
				; CHECK-LSE-O1-NEXT: ldr x0, [x0]
	; CHECK-LSE-O1-NEXT: LBB7_1: ; %atomicrmw.start			; CHECK-LSE-O1-NEXT: LBB7_1: ; %atomicrmw.start
	; CHECK-LSE-O1-NEXT: ; =>This Inner Loop Header: Depth=1			; CHECK-LSE-O1-NEXT: ; =>This Inner Loop Header: Depth=1
	; CHECK-LSE-O1-NEXT: ldaxr x8, [x0]			; CHECK-LSE-O1-NEXT: mov x9, x0
	; CHECK-LSE-O1-NEXT: and x9, x8, #0x7			; CHECK-LSE-O1-NEXT: and x10, x0, #0x7
	; CHECK-LSE-O1-NEXT: mvn x9, x9			; CHECK-LSE-O1-NEXT: mvn x10, x10
	; CHECK-LSE-O1-NEXT: stlxr w10, x9, [x0]			; CHECK-LSE-O1-NEXT: casal x0, x10, [x8]
	; CHECK-LSE-O1-NEXT: cbnz w10, LBB7_1			; CHECK-LSE-O1-NEXT: cmp x0, x9
				; CHECK-LSE-O1-NEXT: b.ne LBB7_1
	; CHECK-LSE-O1-NEXT: ; %bb.2: ; %atomicrmw.end			; CHECK-LSE-O1-NEXT: ; %bb.2: ; %atomicrmw.end
	; CHECK-LSE-O1-NEXT: mov x0, x8
	; CHECK-LSE-O1-NEXT: ret			; CHECK-LSE-O1-NEXT: ret
	;			;
	; CHECK-LSE-O0-LABEL: fetch_and_nand_64:			; CHECK-LSE-O0-LABEL: fetch_and_nand_64:
	; CHECK-LSE-O0: ; %bb.0:			; CHECK-LSE-O0: ; %bb.0:
	; CHECK-LSE-O0-NEXT: sub sp, sp, #32			; CHECK-LSE-O0-NEXT: sub sp, sp, #32
	; CHECK-LSE-O0-NEXT: str x0, [sp, #16] ; 8-byte Folded Spill			; CHECK-LSE-O0-NEXT: str x0, [sp, #16] ; 8-byte Folded Spill
	; CHECK-LSE-O0-NEXT: ldr x8, [x0]			; CHECK-LSE-O0-NEXT: ldr x8, [x0]
	; CHECK-LSE-O0-NEXT: str x8, [sp, #24] ; 8-byte Folded Spill			; CHECK-LSE-O0-NEXT: str x8, [sp, #24] ; 8-byte Folded Spill
	▲ Show 20 Lines • Show All 2,493 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-atomic-128.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=arm64-linux-gnu -verify-machineinstrs -mcpu=cyclone \| FileCheck %s -check-prefixes=CHECK,NOOUTLINE			; RUN: llc < %s -mtriple=arm64-linux-gnu -verify-machineinstrs -mcpu=cyclone \| FileCheck %s -check-prefixes=NOOUTLINE
	; RUN: llc < %s -mtriple=arm64-linux-gnu -verify-machineinstrs -mcpu=cyclone -mattr=+outline-atomics \| FileCheck %s -check-prefixes=CHECK,OUTLINE			; RUN: llc < %s -mtriple=arm64-linux-gnu -verify-machineinstrs -mcpu=cyclone -mattr=+outline-atomics \| FileCheck %s -check-prefixes=OUTLINE
	; RUN: llc < %s -mtriple=arm64-linux-gnu -verify-machineinstrs -mcpu=cyclone -mattr=+lse \| FileCheck %s -check-prefixes=CHECK,LSE			; RUN: llc < %s -mtriple=arm64-linux-gnu -verify-machineinstrs -mcpu=cyclone -mattr=+lse \| FileCheck %s -check-prefixes=LSE

	@var = global i128 0			@var = global i128 0

	define i128 @val_compare_and_swap(i128* %p, i128 %oldval, i128 %newval) {			define i128 @val_compare_and_swap(i128* %p, i128 %oldval, i128 %newval) {
	; NOOUTLINE-LABEL: val_compare_and_swap:			; NOOUTLINE-LABEL: val_compare_and_swap:
	; NOOUTLINE: // %bb.0:			; NOOUTLINE: // %bb.0:
	; NOOUTLINE-NEXT: .LBB0_1: // =>This Inner Loop Header: Depth=1			; NOOUTLINE-NEXT: .LBB0_1: // =>This Inner Loop Header: Depth=1
	; NOOUTLINE-NEXT: ldaxp x8, x1, [x0]			; NOOUTLINE-NEXT: ldaxp x8, x1, [x0]
	▲ Show 20 Lines • Show All 192 Lines • ▼ Show 20 Lines
	; LSE-NEXT: mov x1, x3			; LSE-NEXT: mov x1, x3
	; LSE-NEXT: ret			; LSE-NEXT: ret
	%pair = cmpxchg i128* %p, i128 %oldval, i128 %newval monotonic monotonic			%pair = cmpxchg i128* %p, i128 %oldval, i128 %newval monotonic monotonic
	%val = extractvalue { i128, i1 } %pair, 0			%val = extractvalue { i128, i1 } %pair, 0
	ret i128 %val			ret i128 %val
	}			}

	define void @fetch_and_nand(i128* %p, i128 %bits) {			define void @fetch_and_nand(i128* %p, i128 %bits) {
	; CHECK-LABEL: fetch_and_nand:			; NOOUTLINE-LABEL: fetch_and_nand:
	; CHECK: // %bb.0:			; NOOUTLINE: // %bb.0:
	; CHECK-NEXT: .LBB4_1: // %atomicrmw.start			; NOOUTLINE-NEXT: .LBB4_1: // %atomicrmw.start
	; CHECK-NEXT: // =>This Inner Loop Header: Depth=1			; NOOUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ldxp x9, x8, [x0]			; NOOUTLINE-NEXT: ldxp x9, x8, [x0]
	; CHECK-NEXT: and x10, x9, x2			; NOOUTLINE-NEXT: and x10, x9, x2
	; CHECK-NEXT: and x11, x8, x3			; NOOUTLINE-NEXT: and x11, x8, x3
	; CHECK-NEXT: mvn x11, x11			; NOOUTLINE-NEXT: mvn x11, x11
	; CHECK-NEXT: mvn x10, x10			; NOOUTLINE-NEXT: mvn x10, x10
	; CHECK-NEXT: stlxp w12, x10, x11, [x0]			; NOOUTLINE-NEXT: stlxp w12, x10, x11, [x0]
	; CHECK-NEXT: cbnz w12, .LBB4_1			; NOOUTLINE-NEXT: cbnz w12, .LBB4_1
	; CHECK-NEXT: // %bb.2: // %atomicrmw.end			; NOOUTLINE-NEXT: // %bb.2: // %atomicrmw.end
	; CHECK-NEXT: adrp x10, :got:var			; NOOUTLINE-NEXT: adrp x10, :got:var
	; CHECK-NEXT: ldr x10, [x10, :got_lo12:var]			; NOOUTLINE-NEXT: ldr x10, [x10, :got_lo12:var]
	; CHECK-NEXT: stp x9, x8, [x10]			; NOOUTLINE-NEXT: stp x9, x8, [x10]
	; CHECK-NEXT: ret			; NOOUTLINE-NEXT: ret
				;
				; OUTLINE-LABEL: fetch_and_nand:
				; OUTLINE: // %bb.0:
				; OUTLINE-NEXT: .LBB4_1: // %atomicrmw.start
				; OUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
				; OUTLINE-NEXT: ldxp x9, x8, [x0]
				; OUTLINE-NEXT: and x10, x9, x2
				; OUTLINE-NEXT: and x11, x8, x3
				; OUTLINE-NEXT: mvn x11, x11
				; OUTLINE-NEXT: mvn x10, x10
				; OUTLINE-NEXT: stlxp w12, x10, x11, [x0]
				; OUTLINE-NEXT: cbnz w12, .LBB4_1
				; OUTLINE-NEXT: // %bb.2: // %atomicrmw.end
				; OUTLINE-NEXT: adrp x10, :got:var
				; OUTLINE-NEXT: ldr x10, [x10, :got_lo12:var]
				; OUTLINE-NEXT: stp x9, x8, [x10]
				; OUTLINE-NEXT: ret
				;
				; LSE-LABEL: fetch_and_nand:
				; LSE: // %bb.0:
				; LSE-NEXT: ldp x4, x5, [x0]
				; LSE-NEXT: .LBB4_1: // %atomicrmw.start
				; LSE-NEXT: // =>This Inner Loop Header: Depth=1
				; LSE-NEXT: mov x7, x5
				; LSE-NEXT: mov x6, x4
				; LSE-NEXT: and x8, x7, x3
				; LSE-NEXT: and x9, x4, x2
				; LSE-NEXT: mvn x10, x9
				; LSE-NEXT: mvn x11, x8
				; LSE-NEXT: mov x4, x6
				; LSE-NEXT: mov x5, x7
				; LSE-NEXT: caspl x4, x5, x10, x11, [x0]
				; LSE-NEXT: cmp x5, x7
				; LSE-NEXT: ccmp x4, x6, #0, eq
				; LSE-NEXT: b.ne .LBB4_1
				; LSE-NEXT: // %bb.2: // %atomicrmw.end
				; LSE-NEXT: adrp x8, :got:var
				; LSE-NEXT: ldr x8, [x8, :got_lo12:var]
				; LSE-NEXT: stp x4, x5, [x8]
				; LSE-NEXT: ret

	%val = atomicrmw nand i128* %p, i128 %bits release			%val = atomicrmw nand i128* %p, i128 %bits release
	store i128 %val, i128* @var, align 16			store i128 %val, i128* @var, align 16
	ret void			ret void
	}			}

	define void @fetch_and_or(i128* %p, i128 %bits) {			define void @fetch_and_or(i128* %p, i128 %bits) {
	; CHECK-LABEL: fetch_and_or:			; NOOUTLINE-LABEL: fetch_and_or:
	; CHECK: // %bb.0:			; NOOUTLINE: // %bb.0:
	; CHECK-NEXT: .LBB5_1: // %atomicrmw.start			; NOOUTLINE-NEXT: .LBB5_1: // %atomicrmw.start
	; CHECK-NEXT: // =>This Inner Loop Header: Depth=1			; NOOUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ldaxp x9, x8, [x0]			; NOOUTLINE-NEXT: ldaxp x9, x8, [x0]
	; CHECK-NEXT: orr x10, x8, x3			; NOOUTLINE-NEXT: orr x10, x8, x3
	; CHECK-NEXT: orr x11, x9, x2			; NOOUTLINE-NEXT: orr x11, x9, x2
	; CHECK-NEXT: stlxp w12, x11, x10, [x0]			; NOOUTLINE-NEXT: stlxp w12, x11, x10, [x0]
	; CHECK-NEXT: cbnz w12, .LBB5_1			; NOOUTLINE-NEXT: cbnz w12, .LBB5_1
	; CHECK-NEXT: // %bb.2: // %atomicrmw.end			; NOOUTLINE-NEXT: // %bb.2: // %atomicrmw.end
	; CHECK-NEXT: adrp x10, :got:var			; NOOUTLINE-NEXT: adrp x10, :got:var
	; CHECK-NEXT: ldr x10, [x10, :got_lo12:var]			; NOOUTLINE-NEXT: ldr x10, [x10, :got_lo12:var]
	; CHECK-NEXT: stp x9, x8, [x10]			; NOOUTLINE-NEXT: stp x9, x8, [x10]
	; CHECK-NEXT: ret			; NOOUTLINE-NEXT: ret
				;
				; OUTLINE-LABEL: fetch_and_or:
				; OUTLINE: // %bb.0:
				; OUTLINE-NEXT: .LBB5_1: // %atomicrmw.start
				; OUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
				; OUTLINE-NEXT: ldaxp x9, x8, [x0]
				; OUTLINE-NEXT: orr x10, x8, x3
				; OUTLINE-NEXT: orr x11, x9, x2
				; OUTLINE-NEXT: stlxp w12, x11, x10, [x0]
				; OUTLINE-NEXT: cbnz w12, .LBB5_1
				; OUTLINE-NEXT: // %bb.2: // %atomicrmw.end
				; OUTLINE-NEXT: adrp x10, :got:var
				; OUTLINE-NEXT: ldr x10, [x10, :got_lo12:var]
				; OUTLINE-NEXT: stp x9, x8, [x10]
				; OUTLINE-NEXT: ret
				;
				; LSE-LABEL: fetch_and_or:
				; LSE: // %bb.0:
				; LSE-NEXT: ldp x4, x5, [x0]
				; LSE-NEXT: .LBB5_1: // %atomicrmw.start
				; LSE-NEXT: // =>This Inner Loop Header: Depth=1
				; LSE-NEXT: mov x7, x5
				; LSE-NEXT: mov x6, x4
				; LSE-NEXT: orr x8, x4, x2
				; LSE-NEXT: orr x9, x7, x3
				; LSE-NEXT: mov x4, x6
				; LSE-NEXT: mov x5, x7
				; LSE-NEXT: caspal x4, x5, x8, x9, [x0]
				; LSE-NEXT: cmp x5, x7
				; LSE-NEXT: ccmp x4, x6, #0, eq
				; LSE-NEXT: b.ne .LBB5_1
				; LSE-NEXT: // %bb.2: // %atomicrmw.end
				; LSE-NEXT: adrp x8, :got:var
				; LSE-NEXT: ldr x8, [x8, :got_lo12:var]
				; LSE-NEXT: stp x4, x5, [x8]
				; LSE-NEXT: ret

	%val = atomicrmw or i128* %p, i128 %bits seq_cst			%val = atomicrmw or i128* %p, i128 %bits seq_cst
	store i128 %val, i128* @var, align 16			store i128 %val, i128* @var, align 16
	ret void			ret void
	}			}

	define void @fetch_and_add(i128* %p, i128 %bits) {			define void @fetch_and_add(i128* %p, i128 %bits) {
	; CHECK-LABEL: fetch_and_add:			; NOOUTLINE-LABEL: fetch_and_add:
	; CHECK: // %bb.0:			; NOOUTLINE: // %bb.0:
	; CHECK-NEXT: .LBB6_1: // %atomicrmw.start			; NOOUTLINE-NEXT: .LBB6_1: // %atomicrmw.start
	; CHECK-NEXT: // =>This Inner Loop Header: Depth=1			; NOOUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ldaxp x9, x8, [x0]			; NOOUTLINE-NEXT: ldaxp x9, x8, [x0]
	; CHECK-NEXT: adds x10, x9, x2			; NOOUTLINE-NEXT: adds x10, x9, x2
	; CHECK-NEXT: adc x11, x8, x3			; NOOUTLINE-NEXT: adc x11, x8, x3
	; CHECK-NEXT: stlxp w12, x10, x11, [x0]			; NOOUTLINE-NEXT: stlxp w12, x10, x11, [x0]
	; CHECK-NEXT: cbnz w12, .LBB6_1			; NOOUTLINE-NEXT: cbnz w12, .LBB6_1
	; CHECK-NEXT: // %bb.2: // %atomicrmw.end			; NOOUTLINE-NEXT: // %bb.2: // %atomicrmw.end
	; CHECK-NEXT: adrp x10, :got:var			; NOOUTLINE-NEXT: adrp x10, :got:var
	; CHECK-NEXT: ldr x10, [x10, :got_lo12:var]			; NOOUTLINE-NEXT: ldr x10, [x10, :got_lo12:var]
	; CHECK-NEXT: stp x9, x8, [x10]			; NOOUTLINE-NEXT: stp x9, x8, [x10]
	; CHECK-NEXT: ret			; NOOUTLINE-NEXT: ret
				;
				; OUTLINE-LABEL: fetch_and_add:
				; OUTLINE: // %bb.0:
				; OUTLINE-NEXT: .LBB6_1: // %atomicrmw.start
				; OUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
				; OUTLINE-NEXT: ldaxp x9, x8, [x0]
				; OUTLINE-NEXT: adds x10, x9, x2
				; OUTLINE-NEXT: adc x11, x8, x3
				; OUTLINE-NEXT: stlxp w12, x10, x11, [x0]
				; OUTLINE-NEXT: cbnz w12, .LBB6_1
				; OUTLINE-NEXT: // %bb.2: // %atomicrmw.end
				; OUTLINE-NEXT: adrp x10, :got:var
				; OUTLINE-NEXT: ldr x10, [x10, :got_lo12:var]
				; OUTLINE-NEXT: stp x9, x8, [x10]
				; OUTLINE-NEXT: ret
				;
				; LSE-LABEL: fetch_and_add:
				; LSE: // %bb.0:
				; LSE-NEXT: ldp x4, x5, [x0]
				; LSE-NEXT: .LBB6_1: // %atomicrmw.start
				; LSE-NEXT: // =>This Inner Loop Header: Depth=1
				; LSE-NEXT: mov x7, x5
				; LSE-NEXT: mov x6, x4
				; LSE-NEXT: adds x8, x4, x2
				; LSE-NEXT: adc x9, x7, x3
				; LSE-NEXT: mov x4, x6
				; LSE-NEXT: mov x5, x7
				; LSE-NEXT: caspal x4, x5, x8, x9, [x0]
				; LSE-NEXT: cmp x5, x7
				; LSE-NEXT: ccmp x4, x6, #0, eq
				; LSE-NEXT: b.ne .LBB6_1
				; LSE-NEXT: // %bb.2: // %atomicrmw.end
				; LSE-NEXT: adrp x8, :got:var
				; LSE-NEXT: ldr x8, [x8, :got_lo12:var]
				; LSE-NEXT: stp x4, x5, [x8]
				; LSE-NEXT: ret
	%val = atomicrmw add i128* %p, i128 %bits seq_cst			%val = atomicrmw add i128* %p, i128 %bits seq_cst
	store i128 %val, i128* @var, align 16			store i128 %val, i128* @var, align 16
	ret void			ret void
	}			}

	define void @fetch_and_sub(i128* %p, i128 %bits) {			define void @fetch_and_sub(i128* %p, i128 %bits) {
	; CHECK-LABEL: fetch_and_sub:			; NOOUTLINE-LABEL: fetch_and_sub:
	; CHECK: // %bb.0:			; NOOUTLINE: // %bb.0:
	; CHECK-NEXT: .LBB7_1: // %atomicrmw.start			; NOOUTLINE-NEXT: .LBB7_1: // %atomicrmw.start
	; CHECK-NEXT: // =>This Inner Loop Header: Depth=1			; NOOUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ldaxp x9, x8, [x0]			; NOOUTLINE-NEXT: ldaxp x9, x8, [x0]
	; CHECK-NEXT: subs x10, x9, x2			; NOOUTLINE-NEXT: subs x10, x9, x2
	; CHECK-NEXT: sbc x11, x8, x3			; NOOUTLINE-NEXT: sbc x11, x8, x3
	; CHECK-NEXT: stlxp w12, x10, x11, [x0]			; NOOUTLINE-NEXT: stlxp w12, x10, x11, [x0]
	; CHECK-NEXT: cbnz w12, .LBB7_1			; NOOUTLINE-NEXT: cbnz w12, .LBB7_1
	; CHECK-NEXT: // %bb.2: // %atomicrmw.end			; NOOUTLINE-NEXT: // %bb.2: // %atomicrmw.end
	; CHECK-NEXT: adrp x10, :got:var			; NOOUTLINE-NEXT: adrp x10, :got:var
	; CHECK-NEXT: ldr x10, [x10, :got_lo12:var]			; NOOUTLINE-NEXT: ldr x10, [x10, :got_lo12:var]
	; CHECK-NEXT: stp x9, x8, [x10]			; NOOUTLINE-NEXT: stp x9, x8, [x10]
	; CHECK-NEXT: ret			; NOOUTLINE-NEXT: ret
				;
				; OUTLINE-LABEL: fetch_and_sub:
				; OUTLINE: // %bb.0:
				; OUTLINE-NEXT: .LBB7_1: // %atomicrmw.start
				; OUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
				; OUTLINE-NEXT: ldaxp x9, x8, [x0]
				; OUTLINE-NEXT: subs x10, x9, x2
				; OUTLINE-NEXT: sbc x11, x8, x3
				; OUTLINE-NEXT: stlxp w12, x10, x11, [x0]
				; OUTLINE-NEXT: cbnz w12, .LBB7_1
				; OUTLINE-NEXT: // %bb.2: // %atomicrmw.end
				; OUTLINE-NEXT: adrp x10, :got:var
				; OUTLINE-NEXT: ldr x10, [x10, :got_lo12:var]
				; OUTLINE-NEXT: stp x9, x8, [x10]
				; OUTLINE-NEXT: ret
				;
				; LSE-LABEL: fetch_and_sub:
				; LSE: // %bb.0:
				; LSE-NEXT: ldp x4, x5, [x0]
				; LSE-NEXT: .LBB7_1: // %atomicrmw.start
				; LSE-NEXT: // =>This Inner Loop Header: Depth=1
				; LSE-NEXT: mov x7, x5
				; LSE-NEXT: mov x6, x4
				; LSE-NEXT: subs x8, x4, x2
				; LSE-NEXT: sbc x9, x7, x3
				; LSE-NEXT: mov x4, x6
				; LSE-NEXT: mov x5, x7
				; LSE-NEXT: caspal x4, x5, x8, x9, [x0]
				; LSE-NEXT: cmp x5, x7
				; LSE-NEXT: ccmp x4, x6, #0, eq
				; LSE-NEXT: b.ne .LBB7_1
				; LSE-NEXT: // %bb.2: // %atomicrmw.end
				; LSE-NEXT: adrp x8, :got:var
				; LSE-NEXT: ldr x8, [x8, :got_lo12:var]
				; LSE-NEXT: stp x4, x5, [x8]
				; LSE-NEXT: ret
	%val = atomicrmw sub i128* %p, i128 %bits seq_cst			%val = atomicrmw sub i128* %p, i128 %bits seq_cst
	store i128 %val, i128* @var, align 16			store i128 %val, i128* @var, align 16
	ret void			ret void
	}			}

	define void @fetch_and_min(i128* %p, i128 %bits) {			define void @fetch_and_min(i128* %p, i128 %bits) {
	; CHECK-LABEL: fetch_and_min:			; NOOUTLINE-LABEL: fetch_and_min:
	; CHECK: // %bb.0:			; NOOUTLINE: // %bb.0:
	; CHECK-NEXT: .LBB8_1: // %atomicrmw.start			; NOOUTLINE-NEXT: .LBB8_1: // %atomicrmw.start
	; CHECK-NEXT: // =>This Inner Loop Header: Depth=1			; NOOUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ldaxp x9, x8, [x0]			; NOOUTLINE-NEXT: ldaxp x9, x8, [x0]
	; CHECK-NEXT: cmp x2, x9			; NOOUTLINE-NEXT: cmp x2, x9
	; CHECK-NEXT: sbcs xzr, x3, x8			; NOOUTLINE-NEXT: sbcs xzr, x3, x8
	; CHECK-NEXT: csel x10, x8, x3, ge			; NOOUTLINE-NEXT: csel x10, x8, x3, ge
	; CHECK-NEXT: csel x11, x9, x2, ge			; NOOUTLINE-NEXT: csel x11, x9, x2, ge
	; CHECK-NEXT: stlxp w12, x11, x10, [x0]			; NOOUTLINE-NEXT: stlxp w12, x11, x10, [x0]
	; CHECK-NEXT: cbnz w12, .LBB8_1			; NOOUTLINE-NEXT: cbnz w12, .LBB8_1
	; CHECK-NEXT: // %bb.2: // %atomicrmw.end			; NOOUTLINE-NEXT: // %bb.2: // %atomicrmw.end
	; CHECK-NEXT: adrp x10, :got:var			; NOOUTLINE-NEXT: adrp x10, :got:var
	; CHECK-NEXT: ldr x10, [x10, :got_lo12:var]			; NOOUTLINE-NEXT: ldr x10, [x10, :got_lo12:var]
	; CHECK-NEXT: stp x9, x8, [x10]			; NOOUTLINE-NEXT: stp x9, x8, [x10]
	; CHECK-NEXT: ret			; NOOUTLINE-NEXT: ret
				;
				; OUTLINE-LABEL: fetch_and_min:
				; OUTLINE: // %bb.0:
				; OUTLINE-NEXT: .LBB8_1: // %atomicrmw.start
				; OUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
				; OUTLINE-NEXT: ldaxp x9, x8, [x0]
				; OUTLINE-NEXT: cmp x2, x9
				; OUTLINE-NEXT: sbcs xzr, x3, x8
				; OUTLINE-NEXT: csel x10, x8, x3, ge
				; OUTLINE-NEXT: csel x11, x9, x2, ge
				; OUTLINE-NEXT: stlxp w12, x11, x10, [x0]
				; OUTLINE-NEXT: cbnz w12, .LBB8_1
				; OUTLINE-NEXT: // %bb.2: // %atomicrmw.end
				; OUTLINE-NEXT: adrp x10, :got:var
				; OUTLINE-NEXT: ldr x10, [x10, :got_lo12:var]
				; OUTLINE-NEXT: stp x9, x8, [x10]
				; OUTLINE-NEXT: ret
				;
				; LSE-LABEL: fetch_and_min:
				; LSE: // %bb.0:
				; LSE-NEXT: ldp x4, x5, [x0]
				; LSE-NEXT: .LBB8_1: // %atomicrmw.start
				; LSE-NEXT: // =>This Inner Loop Header: Depth=1
				; LSE-NEXT: mov x7, x5
				; LSE-NEXT: mov x6, x4
				; LSE-NEXT: cmp x2, x4
				; LSE-NEXT: sbcs xzr, x3, x7
				; LSE-NEXT: csel x9, x7, x3, ge
				; LSE-NEXT: csel x8, x4, x2, ge
				; LSE-NEXT: mov x4, x6
				; LSE-NEXT: mov x5, x7
				; LSE-NEXT: caspal x4, x5, x8, x9, [x0]
				; LSE-NEXT: cmp x5, x7
				; LSE-NEXT: ccmp x4, x6, #0, eq
				; LSE-NEXT: b.ne .LBB8_1
				; LSE-NEXT: // %bb.2: // %atomicrmw.end
				; LSE-NEXT: adrp x8, :got:var
				; LSE-NEXT: ldr x8, [x8, :got_lo12:var]
				; LSE-NEXT: stp x4, x5, [x8]
				; LSE-NEXT: ret
	%val = atomicrmw min i128* %p, i128 %bits seq_cst			%val = atomicrmw min i128* %p, i128 %bits seq_cst
	store i128 %val, i128* @var, align 16			store i128 %val, i128* @var, align 16
	ret void			ret void
	}			}

	define void @fetch_and_max(i128* %p, i128 %bits) {			define void @fetch_and_max(i128* %p, i128 %bits) {
	; CHECK-LABEL: fetch_and_max:			; NOOUTLINE-LABEL: fetch_and_max:
	; CHECK: // %bb.0:			; NOOUTLINE: // %bb.0:
	; CHECK-NEXT: .LBB9_1: // %atomicrmw.start			; NOOUTLINE-NEXT: .LBB9_1: // %atomicrmw.start
	; CHECK-NEXT: // =>This Inner Loop Header: Depth=1			; NOOUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ldaxp x9, x8, [x0]			; NOOUTLINE-NEXT: ldaxp x9, x8, [x0]
	; CHECK-NEXT: cmp x2, x9			; NOOUTLINE-NEXT: cmp x2, x9
	; CHECK-NEXT: sbcs xzr, x3, x8			; NOOUTLINE-NEXT: sbcs xzr, x3, x8
	; CHECK-NEXT: csel x10, x8, x3, lt			; NOOUTLINE-NEXT: csel x10, x8, x3, lt
	; CHECK-NEXT: csel x11, x9, x2, lt			; NOOUTLINE-NEXT: csel x11, x9, x2, lt
	; CHECK-NEXT: stlxp w12, x11, x10, [x0]			; NOOUTLINE-NEXT: stlxp w12, x11, x10, [x0]
	; CHECK-NEXT: cbnz w12, .LBB9_1			; NOOUTLINE-NEXT: cbnz w12, .LBB9_1
	; CHECK-NEXT: // %bb.2: // %atomicrmw.end			; NOOUTLINE-NEXT: // %bb.2: // %atomicrmw.end
	; CHECK-NEXT: adrp x10, :got:var			; NOOUTLINE-NEXT: adrp x10, :got:var
	; CHECK-NEXT: ldr x10, [x10, :got_lo12:var]			; NOOUTLINE-NEXT: ldr x10, [x10, :got_lo12:var]
	; CHECK-NEXT: stp x9, x8, [x10]			; NOOUTLINE-NEXT: stp x9, x8, [x10]
	; CHECK-NEXT: ret			; NOOUTLINE-NEXT: ret
				;
				; OUTLINE-LABEL: fetch_and_max:
				; OUTLINE: // %bb.0:
				; OUTLINE-NEXT: .LBB9_1: // %atomicrmw.start
				; OUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
				; OUTLINE-NEXT: ldaxp x9, x8, [x0]
				; OUTLINE-NEXT: cmp x2, x9
				; OUTLINE-NEXT: sbcs xzr, x3, x8
				; OUTLINE-NEXT: csel x10, x8, x3, lt
				; OUTLINE-NEXT: csel x11, x9, x2, lt
				; OUTLINE-NEXT: stlxp w12, x11, x10, [x0]
				; OUTLINE-NEXT: cbnz w12, .LBB9_1
				; OUTLINE-NEXT: // %bb.2: // %atomicrmw.end
				; OUTLINE-NEXT: adrp x10, :got:var
				; OUTLINE-NEXT: ldr x10, [x10, :got_lo12:var]
				; OUTLINE-NEXT: stp x9, x8, [x10]
				; OUTLINE-NEXT: ret
				;
				; LSE-LABEL: fetch_and_max:
				; LSE: // %bb.0:
				; LSE-NEXT: ldp x4, x5, [x0]
				; LSE-NEXT: .LBB9_1: // %atomicrmw.start
				; LSE-NEXT: // =>This Inner Loop Header: Depth=1
				; LSE-NEXT: mov x7, x5
				; LSE-NEXT: mov x6, x4
				; LSE-NEXT: cmp x2, x4
				; LSE-NEXT: sbcs xzr, x3, x7
				; LSE-NEXT: csel x9, x7, x3, lt
				; LSE-NEXT: csel x8, x4, x2, lt
				; LSE-NEXT: mov x4, x6
				; LSE-NEXT: mov x5, x7
				; LSE-NEXT: caspal x4, x5, x8, x9, [x0]
				; LSE-NEXT: cmp x5, x7
				; LSE-NEXT: ccmp x4, x6, #0, eq
				; LSE-NEXT: b.ne .LBB9_1
				; LSE-NEXT: // %bb.2: // %atomicrmw.end
				; LSE-NEXT: adrp x8, :got:var
				; LSE-NEXT: ldr x8, [x8, :got_lo12:var]
				; LSE-NEXT: stp x4, x5, [x8]
				; LSE-NEXT: ret
	%val = atomicrmw max i128* %p, i128 %bits seq_cst			%val = atomicrmw max i128* %p, i128 %bits seq_cst
	store i128 %val, i128* @var, align 16			store i128 %val, i128* @var, align 16
	ret void			ret void
	}			}

	define void @fetch_and_umin(i128* %p, i128 %bits) {			define void @fetch_and_umin(i128* %p, i128 %bits) {
	; CHECK-LABEL: fetch_and_umin:			; NOOUTLINE-LABEL: fetch_and_umin:
	; CHECK: // %bb.0:			; NOOUTLINE: // %bb.0:
	; CHECK-NEXT: .LBB10_1: // %atomicrmw.start			; NOOUTLINE-NEXT: .LBB10_1: // %atomicrmw.start
	; CHECK-NEXT: // =>This Inner Loop Header: Depth=1			; NOOUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ldaxp x9, x8, [x0]			; NOOUTLINE-NEXT: ldaxp x9, x8, [x0]
	; CHECK-NEXT: cmp x2, x9			; NOOUTLINE-NEXT: cmp x2, x9
	; CHECK-NEXT: sbcs xzr, x3, x8			; NOOUTLINE-NEXT: sbcs xzr, x3, x8
	; CHECK-NEXT: csel x10, x8, x3, hs			; NOOUTLINE-NEXT: csel x10, x8, x3, hs
	; CHECK-NEXT: csel x11, x9, x2, hs			; NOOUTLINE-NEXT: csel x11, x9, x2, hs
	; CHECK-NEXT: stlxp w12, x11, x10, [x0]			; NOOUTLINE-NEXT: stlxp w12, x11, x10, [x0]
	; CHECK-NEXT: cbnz w12, .LBB10_1			; NOOUTLINE-NEXT: cbnz w12, .LBB10_1
	; CHECK-NEXT: // %bb.2: // %atomicrmw.end			; NOOUTLINE-NEXT: // %bb.2: // %atomicrmw.end
	; CHECK-NEXT: adrp x10, :got:var			; NOOUTLINE-NEXT: adrp x10, :got:var
	; CHECK-NEXT: ldr x10, [x10, :got_lo12:var]			; NOOUTLINE-NEXT: ldr x10, [x10, :got_lo12:var]
	; CHECK-NEXT: stp x9, x8, [x10]			; NOOUTLINE-NEXT: stp x9, x8, [x10]
	; CHECK-NEXT: ret			; NOOUTLINE-NEXT: ret
				;
				; OUTLINE-LABEL: fetch_and_umin:
				; OUTLINE: // %bb.0:
				; OUTLINE-NEXT: .LBB10_1: // %atomicrmw.start
				; OUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
				; OUTLINE-NEXT: ldaxp x9, x8, [x0]
				; OUTLINE-NEXT: cmp x2, x9
				; OUTLINE-NEXT: sbcs xzr, x3, x8
				; OUTLINE-NEXT: csel x10, x8, x3, hs
				; OUTLINE-NEXT: csel x11, x9, x2, hs
				; OUTLINE-NEXT: stlxp w12, x11, x10, [x0]
				; OUTLINE-NEXT: cbnz w12, .LBB10_1
				; OUTLINE-NEXT: // %bb.2: // %atomicrmw.end
				; OUTLINE-NEXT: adrp x10, :got:var
				; OUTLINE-NEXT: ldr x10, [x10, :got_lo12:var]
				; OUTLINE-NEXT: stp x9, x8, [x10]
				; OUTLINE-NEXT: ret
				;
				; LSE-LABEL: fetch_and_umin:
				; LSE: // %bb.0:
				; LSE-NEXT: ldp x4, x5, [x0]
				; LSE-NEXT: .LBB10_1: // %atomicrmw.start
				; LSE-NEXT: // =>This Inner Loop Header: Depth=1
				; LSE-NEXT: mov x7, x5
				; LSE-NEXT: mov x6, x4
				; LSE-NEXT: cmp x2, x4
				; LSE-NEXT: sbcs xzr, x3, x7
				; LSE-NEXT: csel x9, x7, x3, hs
				; LSE-NEXT: csel x8, x4, x2, hs
				; LSE-NEXT: mov x4, x6
				; LSE-NEXT: mov x5, x7
				; LSE-NEXT: caspal x4, x5, x8, x9, [x0]
				; LSE-NEXT: cmp x5, x7
				; LSE-NEXT: ccmp x4, x6, #0, eq
				; LSE-NEXT: b.ne .LBB10_1
				; LSE-NEXT: // %bb.2: // %atomicrmw.end
				; LSE-NEXT: adrp x8, :got:var
				; LSE-NEXT: ldr x8, [x8, :got_lo12:var]
				; LSE-NEXT: stp x4, x5, [x8]
				; LSE-NEXT: ret
	%val = atomicrmw umin i128* %p, i128 %bits seq_cst			%val = atomicrmw umin i128* %p, i128 %bits seq_cst
	store i128 %val, i128* @var, align 16			store i128 %val, i128* @var, align 16
	ret void			ret void
	}			}

	define void @fetch_and_umax(i128* %p, i128 %bits) {			define void @fetch_and_umax(i128* %p, i128 %bits) {
	; CHECK-LABEL: fetch_and_umax:			; NOOUTLINE-LABEL: fetch_and_umax:
	; CHECK: // %bb.0:			; NOOUTLINE: // %bb.0:
	; CHECK-NEXT: .LBB11_1: // %atomicrmw.start			; NOOUTLINE-NEXT: .LBB11_1: // %atomicrmw.start
	; CHECK-NEXT: // =>This Inner Loop Header: Depth=1			; NOOUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ldaxp x9, x8, [x0]			; NOOUTLINE-NEXT: ldaxp x9, x8, [x0]
	; CHECK-NEXT: cmp x2, x9			; NOOUTLINE-NEXT: cmp x2, x9
	; CHECK-NEXT: sbcs xzr, x3, x8			; NOOUTLINE-NEXT: sbcs xzr, x3, x8
	; CHECK-NEXT: csel x10, x8, x3, lo			; NOOUTLINE-NEXT: csel x10, x8, x3, lo
	; CHECK-NEXT: csel x11, x9, x2, lo			; NOOUTLINE-NEXT: csel x11, x9, x2, lo
	; CHECK-NEXT: stlxp w12, x11, x10, [x0]			; NOOUTLINE-NEXT: stlxp w12, x11, x10, [x0]
	; CHECK-NEXT: cbnz w12, .LBB11_1			; NOOUTLINE-NEXT: cbnz w12, .LBB11_1
	; CHECK-NEXT: // %bb.2: // %atomicrmw.end			; NOOUTLINE-NEXT: // %bb.2: // %atomicrmw.end
	; CHECK-NEXT: adrp x10, :got:var			; NOOUTLINE-NEXT: adrp x10, :got:var
	; CHECK-NEXT: ldr x10, [x10, :got_lo12:var]			; NOOUTLINE-NEXT: ldr x10, [x10, :got_lo12:var]
	; CHECK-NEXT: stp x9, x8, [x10]			; NOOUTLINE-NEXT: stp x9, x8, [x10]
	; CHECK-NEXT: ret			; NOOUTLINE-NEXT: ret
				;
				; OUTLINE-LABEL: fetch_and_umax:
				; OUTLINE: // %bb.0:
				; OUTLINE-NEXT: .LBB11_1: // %atomicrmw.start
				; OUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
				; OUTLINE-NEXT: ldaxp x9, x8, [x0]
				; OUTLINE-NEXT: cmp x2, x9
				; OUTLINE-NEXT: sbcs xzr, x3, x8
				; OUTLINE-NEXT: csel x10, x8, x3, lo
				; OUTLINE-NEXT: csel x11, x9, x2, lo
				; OUTLINE-NEXT: stlxp w12, x11, x10, [x0]
				; OUTLINE-NEXT: cbnz w12, .LBB11_1
				; OUTLINE-NEXT: // %bb.2: // %atomicrmw.end
				; OUTLINE-NEXT: adrp x10, :got:var
				; OUTLINE-NEXT: ldr x10, [x10, :got_lo12:var]
				; OUTLINE-NEXT: stp x9, x8, [x10]
				; OUTLINE-NEXT: ret
				;
				; LSE-LABEL: fetch_and_umax:
				; LSE: // %bb.0:
				; LSE-NEXT: ldp x4, x5, [x0]
				; LSE-NEXT: .LBB11_1: // %atomicrmw.start
				; LSE-NEXT: // =>This Inner Loop Header: Depth=1
				; LSE-NEXT: mov x7, x5
				; LSE-NEXT: mov x6, x4
				; LSE-NEXT: cmp x2, x4
				; LSE-NEXT: sbcs xzr, x3, x7
				; LSE-NEXT: csel x9, x7, x3, lo
				; LSE-NEXT: csel x8, x4, x2, lo
				; LSE-NEXT: mov x4, x6
				; LSE-NEXT: mov x5, x7
				; LSE-NEXT: caspal x4, x5, x8, x9, [x0]
				; LSE-NEXT: cmp x5, x7
				; LSE-NEXT: ccmp x4, x6, #0, eq
				; LSE-NEXT: b.ne .LBB11_1
				; LSE-NEXT: // %bb.2: // %atomicrmw.end
				; LSE-NEXT: adrp x8, :got:var
				; LSE-NEXT: ldr x8, [x8, :got_lo12:var]
				; LSE-NEXT: stp x4, x5, [x8]
				; LSE-NEXT: ret
	%val = atomicrmw umax i128* %p, i128 %bits seq_cst			%val = atomicrmw umax i128* %p, i128 %bits seq_cst
	store i128 %val, i128* @var, align 16			store i128 %val, i128* @var, align 16
	ret void			ret void
	}			}

	define i128 @atomic_load_seq_cst(i128* %p) {			define i128 @atomic_load_seq_cst(i128* %p) {
	; CHECK-LABEL: atomic_load_seq_cst:			; NOOUTLINE-LABEL: atomic_load_seq_cst:
	; CHECK: // %bb.0:			; NOOUTLINE: // %bb.0:
	; CHECK-NEXT: mov x8, x0			; NOOUTLINE-NEXT: mov x8, x0
	; CHECK-NEXT: .LBB12_1: // %atomicrmw.start			; NOOUTLINE-NEXT: .LBB12_1: // %atomicrmw.start
	; CHECK-NEXT: // =>This Inner Loop Header: Depth=1			; NOOUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ldaxp x0, x1, [x8]			; NOOUTLINE-NEXT: ldaxp x0, x1, [x8]
	; CHECK-NEXT: stlxp w9, x0, x1, [x8]			; NOOUTLINE-NEXT: stlxp w9, x0, x1, [x8]
	; CHECK-NEXT: cbnz w9, .LBB12_1			; NOOUTLINE-NEXT: cbnz w9, .LBB12_1
	; CHECK-NEXT: // %bb.2: // %atomicrmw.end			; NOOUTLINE-NEXT: // %bb.2: // %atomicrmw.end
	; CHECK-NEXT: ret			; NOOUTLINE-NEXT: ret
				;
				; OUTLINE-LABEL: atomic_load_seq_cst:
				; OUTLINE: // %bb.0:
				; OUTLINE-NEXT: mov x8, x0
				; OUTLINE-NEXT: .LBB12_1: // %atomicrmw.start
				; OUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
				; OUTLINE-NEXT: ldaxp x0, x1, [x8]
				; OUTLINE-NEXT: stlxp w9, x0, x1, [x8]
				; OUTLINE-NEXT: cbnz w9, .LBB12_1
				; OUTLINE-NEXT: // %bb.2: // %atomicrmw.end
				; OUTLINE-NEXT: ret
				;
				; LSE-LABEL: atomic_load_seq_cst:
				; LSE: // %bb.0:
				; LSE-NEXT: mov x2, #0
				; LSE-NEXT: mov x3, #0
				; LSE-NEXT: caspal x2, x3, x2, x3, [x0]
				; LSE-NEXT: mov x0, x2
				; LSE-NEXT: mov x1, x3
				; LSE-NEXT: ret
	%r = load atomic i128, i128* %p seq_cst, align 16			%r = load atomic i128, i128* %p seq_cst, align 16
	ret i128 %r			ret i128 %r
	}			}

	define i128 @atomic_load_relaxed(i64, i64, i128* %p) {			define i128 @atomic_load_relaxed(i64, i64, i128* %p) {
	; CHECK-LABEL: atomic_load_relaxed:			; NOOUTLINE-LABEL: atomic_load_relaxed:
	; CHECK: // %bb.0:			; NOOUTLINE: // %bb.0:
	; CHECK-NEXT: .LBB13_1: // %atomicrmw.start			; NOOUTLINE-NEXT: .LBB13_1: // %atomicrmw.start
	; CHECK-NEXT: // =>This Inner Loop Header: Depth=1			; NOOUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ldxp x0, x1, [x2]			; NOOUTLINE-NEXT: ldxp x0, x1, [x2]
	; CHECK-NEXT: stxp w8, x0, x1, [x2]			; NOOUTLINE-NEXT: stxp w8, x0, x1, [x2]
	; CHECK-NEXT: cbnz w8, .LBB13_1			; NOOUTLINE-NEXT: cbnz w8, .LBB13_1
	; CHECK-NEXT: // %bb.2: // %atomicrmw.end			; NOOUTLINE-NEXT: // %bb.2: // %atomicrmw.end
	; CHECK-NEXT: ret			; NOOUTLINE-NEXT: ret
				;
				; OUTLINE-LABEL: atomic_load_relaxed:
				; OUTLINE: // %bb.0:
				; OUTLINE-NEXT: .LBB13_1: // %atomicrmw.start
				; OUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
				; OUTLINE-NEXT: ldxp x0, x1, [x2]
				; OUTLINE-NEXT: stxp w8, x0, x1, [x2]
				; OUTLINE-NEXT: cbnz w8, .LBB13_1
				; OUTLINE-NEXT: // %bb.2: // %atomicrmw.end
				; OUTLINE-NEXT: ret
				;
				; LSE-LABEL: atomic_load_relaxed:
				; LSE: // %bb.0:
				; LSE-NEXT: mov x0, #0
				; LSE-NEXT: mov x1, #0
				; LSE-NEXT: casp x0, x1, x0, x1, [x2]
				; LSE-NEXT: ret
	%r = load atomic i128, i128* %p monotonic, align 16			%r = load atomic i128, i128* %p monotonic, align 16
	ret i128 %r			ret i128 %r
	}			}


	define void @atomic_store_seq_cst(i128 %in, i128* %p) {			define void @atomic_store_seq_cst(i128 %in, i128* %p) {
	; CHECK-LABEL: atomic_store_seq_cst:			; NOOUTLINE-LABEL: atomic_store_seq_cst:
	; CHECK: // %bb.0:			; NOOUTLINE: // %bb.0:
	; CHECK-NEXT: .LBB14_1: // %atomicrmw.start			; NOOUTLINE-NEXT: .LBB14_1: // %atomicrmw.start
	; CHECK-NEXT: // =>This Inner Loop Header: Depth=1			; NOOUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ldaxp xzr, x8, [x2]			; NOOUTLINE-NEXT: ldaxp xzr, x8, [x2]
	; CHECK-NEXT: stlxp w8, x0, x1, [x2]			; NOOUTLINE-NEXT: stlxp w8, x0, x1, [x2]
	; CHECK-NEXT: cbnz w8, .LBB14_1			; NOOUTLINE-NEXT: cbnz w8, .LBB14_1
	; CHECK-NEXT: // %bb.2: // %atomicrmw.end			; NOOUTLINE-NEXT: // %bb.2: // %atomicrmw.end
	; CHECK-NEXT: ret			; NOOUTLINE-NEXT: ret
				;
				; OUTLINE-LABEL: atomic_store_seq_cst:
				; OUTLINE: // %bb.0:
				; OUTLINE-NEXT: .LBB14_1: // %atomicrmw.start
				; OUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
				; OUTLINE-NEXT: ldaxp xzr, x8, [x2]
				efriedmaUnsubmitted Not Done Reply Inline Actions This bug got fixed, right? efriedma: This bug got fixed, right?
				efriedmaUnsubmitted Not Done Reply Inline Actions Nevermind, this isn't a bug; got it confused with a different issue. efriedma: Nevermind, this isn't a bug; got it confused with a different issue.
				; OUTLINE-NEXT: stlxp w8, x0, x1, [x2]
				; OUTLINE-NEXT: cbnz w8, .LBB14_1
				; OUTLINE-NEXT: // %bb.2: // %atomicrmw.end
				; OUTLINE-NEXT: ret
				;
				; LSE-LABEL: atomic_store_seq_cst:
				; LSE: // %bb.0:
				; LSE-NEXT: // kill: def $x1 killed $x1 killed $x0_x1 def $x0_x1
				; LSE-NEXT: ldp x4, x5, [x2]
				; LSE-NEXT: // kill: def $x0 killed $x0 killed $x0_x1 def $x0_x1
				; LSE-NEXT: .LBB14_1: // %atomicrmw.start
				; LSE-NEXT: // =>This Inner Loop Header: Depth=1
				; LSE-NEXT: mov x6, x4
				; LSE-NEXT: mov x7, x5
				; LSE-NEXT: caspal x6, x7, x0, x1, [x2]
				; LSE-NEXT: cmp x7, x5
				; LSE-NEXT: ccmp x6, x4, #0, eq
				; LSE-NEXT: mov x4, x6
				; LSE-NEXT: mov x5, x7
				; LSE-NEXT: b.ne .LBB14_1
				; LSE-NEXT: // %bb.2: // %atomicrmw.end
				; LSE-NEXT: ret
	store atomic i128 %in, i128* %p seq_cst, align 16			store atomic i128 %in, i128* %p seq_cst, align 16
	ret void			ret void
	}			}

	define void @atomic_store_release(i128 %in, i128* %p) {			define void @atomic_store_release(i128 %in, i128* %p) {
	; CHECK-LABEL: atomic_store_release:			; NOOUTLINE-LABEL: atomic_store_release:
	; CHECK: // %bb.0:			; NOOUTLINE: // %bb.0:
	; CHECK-NEXT: .LBB15_1: // %atomicrmw.start			; NOOUTLINE-NEXT: .LBB15_1: // %atomicrmw.start
	; CHECK-NEXT: // =>This Inner Loop Header: Depth=1			; NOOUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ldxp xzr, x8, [x2]			; NOOUTLINE-NEXT: ldxp xzr, x8, [x2]
	; CHECK-NEXT: stlxp w8, x0, x1, [x2]			; NOOUTLINE-NEXT: stlxp w8, x0, x1, [x2]
	; CHECK-NEXT: cbnz w8, .LBB15_1			; NOOUTLINE-NEXT: cbnz w8, .LBB15_1
	; CHECK-NEXT: // %bb.2: // %atomicrmw.end			; NOOUTLINE-NEXT: // %bb.2: // %atomicrmw.end
	; CHECK-NEXT: ret			; NOOUTLINE-NEXT: ret
				;
				; OUTLINE-LABEL: atomic_store_release:
				; OUTLINE: // %bb.0:
				; OUTLINE-NEXT: .LBB15_1: // %atomicrmw.start
				; OUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
				; OUTLINE-NEXT: ldxp xzr, x8, [x2]
				; OUTLINE-NEXT: stlxp w8, x0, x1, [x2]
				; OUTLINE-NEXT: cbnz w8, .LBB15_1
				; OUTLINE-NEXT: // %bb.2: // %atomicrmw.end
				; OUTLINE-NEXT: ret
				;
				; LSE-LABEL: atomic_store_release:
				; LSE: // %bb.0:
				; LSE-NEXT: // kill: def $x1 killed $x1 killed $x0_x1 def $x0_x1
				; LSE-NEXT: ldp x4, x5, [x2]
				; LSE-NEXT: // kill: def $x0 killed $x0 killed $x0_x1 def $x0_x1
				; LSE-NEXT: .LBB15_1: // %atomicrmw.start
				; LSE-NEXT: // =>This Inner Loop Header: Depth=1
				; LSE-NEXT: mov x6, x4
				; LSE-NEXT: mov x7, x5
				; LSE-NEXT: caspl x6, x7, x0, x1, [x2]
				; LSE-NEXT: cmp x7, x5
				; LSE-NEXT: ccmp x6, x4, #0, eq
				; LSE-NEXT: mov x4, x6
				; LSE-NEXT: mov x5, x7
				; LSE-NEXT: b.ne .LBB15_1
				; LSE-NEXT: // %bb.2: // %atomicrmw.end
				; LSE-NEXT: ret
	store atomic i128 %in, i128* %p release, align 16			store atomic i128 %in, i128* %p release, align 16
	ret void			ret void
	}			}

	define void @atomic_store_relaxed(i128 %in, i128* %p) {			define void @atomic_store_relaxed(i128 %in, i128* %p) {
	; CHECK-LABEL: atomic_store_relaxed:			; NOOUTLINE-LABEL: atomic_store_relaxed:
	; CHECK: // %bb.0:			; NOOUTLINE: // %bb.0:
	; CHECK-NEXT: .LBB16_1: // %atomicrmw.start			; NOOUTLINE-NEXT: .LBB16_1: // %atomicrmw.start
	; CHECK-NEXT: // =>This Inner Loop Header: Depth=1			; NOOUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ldxp xzr, x8, [x2]			; NOOUTLINE-NEXT: ldxp xzr, x8, [x2]
	; CHECK-NEXT: stxp w8, x0, x1, [x2]			; NOOUTLINE-NEXT: stxp w8, x0, x1, [x2]
	; CHECK-NEXT: cbnz w8, .LBB16_1			; NOOUTLINE-NEXT: cbnz w8, .LBB16_1
	; CHECK-NEXT: // %bb.2: // %atomicrmw.end			; NOOUTLINE-NEXT: // %bb.2: // %atomicrmw.end
	; CHECK-NEXT: ret			; NOOUTLINE-NEXT: ret
				;
				; OUTLINE-LABEL: atomic_store_relaxed:
				; OUTLINE: // %bb.0:
				; OUTLINE-NEXT: .LBB16_1: // %atomicrmw.start
				; OUTLINE-NEXT: // =>This Inner Loop Header: Depth=1
				; OUTLINE-NEXT: ldxp xzr, x8, [x2]
				; OUTLINE-NEXT: stxp w8, x0, x1, [x2]
				; OUTLINE-NEXT: cbnz w8, .LBB16_1
				; OUTLINE-NEXT: // %bb.2: // %atomicrmw.end
				; OUTLINE-NEXT: ret
				;
				; LSE-LABEL: atomic_store_relaxed:
				; LSE: // %bb.0:
				; LSE-NEXT: // kill: def $x1 killed $x1 killed $x0_x1 def $x0_x1
				; LSE-NEXT: ldp x4, x5, [x2]
				; LSE-NEXT: // kill: def $x0 killed $x0 killed $x0_x1 def $x0_x1
				; LSE-NEXT: .LBB16_1: // %atomicrmw.start
				; LSE-NEXT: // =>This Inner Loop Header: Depth=1
				; LSE-NEXT: mov x6, x4
				; LSE-NEXT: mov x7, x5
				; LSE-NEXT: casp x6, x7, x0, x1, [x2]
				; LSE-NEXT: cmp x7, x5
				; LSE-NEXT: ccmp x6, x4, #0, eq
				; LSE-NEXT: mov x4, x6
				; LSE-NEXT: mov x5, x7
				; LSE-NEXT: b.ne .LBB16_1
				; LSE-NEXT: // %bb.2: // %atomicrmw.end
				; LSE-NEXT: ret
	store atomic i128 %in, i128* %p unordered, align 16			store atomic i128 %in, i128* %p unordered, align 16
	ret void			ret void
	}			}

	; Since we store the original value to ensure no tearing for the unsuccessful			; Since we store the original value to ensure no tearing for the unsuccessful
	; case, the register used must not be xzr.			; case, the register used must not be xzr.
	define void @cmpxchg_dead(i128* %ptr, i128 %desired, i128 %new) {			define void @cmpxchg_dead(i128* %ptr, i128 %desired, i128 %new) {
	; NOOUTLINE-LABEL: cmpxchg_dead:			; NOOUTLINE-LABEL: cmpxchg_dead:
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/atomic-ops-lse.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 9,431 Lines • ▼ Show 20 Lines

	; CHECK: ldeoral x0, x[[NEW:[0-9]+]], [x[[ADDR]]]			; CHECK: ldeoral x0, x[[NEW:[0-9]+]], [x[[ADDR]]]
	; CHECK-NOT: dmb			; CHECK-NOT: dmb
	ret void			ret void
	}			}

	define dso_local i128 @test_atomic_load_i128() nounwind {			define dso_local i128 @test_atomic_load_i128() nounwind {
	; CHECK-LABEL: test_atomic_load_i128:			; CHECK-LABEL: test_atomic_load_i128:
	; CHECK: ldxp			; CHECK: casp
	; CHECK: stxp

	; OUTLINE-ATOMICS-LABEL: test_atomic_load_i128:			; OUTLINE-ATOMICS-LABEL: test_atomic_load_i128:
	; OUTLINE-ATOMICS: ldxp			; OUTLINE-ATOMICS: ldxp
	; OUTLINE-ATOMICS: stxp			; OUTLINE-ATOMICS: stxp
	%pair = load atomic i128, i128* @var128 monotonic, align 16			%pair = load atomic i128, i128* @var128 monotonic, align 16
	ret i128 %pair			ret i128 %pair
	}			}

llvm/test/CodeGen/AArch64/atomicrmw-xchg-fp.ll

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	; NOLSE-NEXT: ldr q0, [sp], #32			; NOLSE-NEXT: ldr q0, [sp], #32
	; NOLSE-NEXT: ret			; NOLSE-NEXT: ret
	;			;
	; LSE-LABEL: test_rmw_xchg_f128:			; LSE-LABEL: test_rmw_xchg_f128:
	; LSE: // %bb.0:			; LSE: // %bb.0:
	; LSE-NEXT: sub sp, sp, #32			; LSE-NEXT: sub sp, sp, #32
	; LSE-NEXT: .cfi_def_cfa_offset 32			; LSE-NEXT: .cfi_def_cfa_offset 32
	; LSE-NEXT: str q0, [sp, #16]			; LSE-NEXT: str q0, [sp, #16]
	; LSE-NEXT: ldp x9, x8, [sp, #16]			; LSE-NEXT: ldp x2, x3, [sp, #16]
				; LSE-NEXT: ldp x4, x5, [x0]
	; LSE-NEXT: .LBB3_1: // %atomicrmw.start			; LSE-NEXT: .LBB3_1: // %atomicrmw.start
	; LSE-NEXT: // =>This Inner Loop Header: Depth=1			; LSE-NEXT: // =>This Inner Loop Header: Depth=1
	; LSE-NEXT: ldaxp x11, x10, [x0]			; LSE-NEXT: mov x7, x5
	; LSE-NEXT: stlxp w12, x9, x8, [x0]			; LSE-NEXT: mov x6, x4
	; LSE-NEXT: cbnz w12, .LBB3_1			; LSE-NEXT: mov x5, x7
				; LSE-NEXT: mov x4, x6
				efriedmaUnsubmitted Not Done Reply Inline Actions These moves seem very strange. efriedma: These moves seem very strange.
				t.p.northoverAuthorUnsubmitted Done Reply Inline Actions The first two are part of forming an `xseqregclass` thing from component registers, the second two are because `CASP` clobbers its input but we want to compare against it afterwards. Still not ideal, but not completely out there. t.p.northover: The first two are part of forming an `xseqregclass` thing from component registers, the second…
				; LSE-NEXT: caspal x4, x5, x2, x3, [x0]
				; LSE-NEXT: cmp x5, x7
				; LSE-NEXT: ccmp x4, x6, #0, eq
				; LSE-NEXT: b.ne .LBB3_1
	; LSE-NEXT: // %bb.2: // %atomicrmw.end			; LSE-NEXT: // %bb.2: // %atomicrmw.end
	; LSE-NEXT: stp x11, x10, [sp]			; LSE-NEXT: stp x4, x5, [sp]
	; LSE-NEXT: ldr q0, [sp], #32			; LSE-NEXT: ldr q0, [sp], #32
	; LSE-NEXT: ret			; LSE-NEXT: ret
	%res = atomicrmw xchg fp128* %dst, fp128 %new seq_cst			%res = atomicrmw xchg fp128* %dst, fp128 %new seq_cst
	ret fp128 %res			ret fp128 %res
	}			}