This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
-
AtomicExpandPass.cpp
-
test/
-
CodeGen/
-
AArch64/
-
atomicrmw-xchg-fp.ll
-
X86/
-
atomicf128.ll
-
Transforms/AtomicExpand/
-
AtomicExpand/
-
AArch64/
-
expand-atomicrmw-xchg-fp.ll
-
X86/
-
expand-atomic-xchg-fp.ll

Differential D103232

[AtomicExpandPass][AArch64] Promote xchg with floating-point types to integer ones
ClosedPublic

Authored by LemonBoy on May 27 2021, 2:36 AM.

Download Raw Diff

Details

Reviewers

efriedma
aemerson
lenary

Commits

rGb577ec495698: [AtomicExpandPass][AArch64] Promote xchg with floating-point types to integer…

Summary

Follow the same strategy used for atomic loads/stores by converting the operands to equally-sized integer types.
This change prevents the atomic expansion pass from generating illegal LL/SC pairs when targeting AArch64: expand-atomicrmw-xchg-fp.ll would previously instantiate intrinsics such as llvm.aarch64.ldaxr.p0f32 that cannot be lowered.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

LemonBoy created this revision.May 27 2021, 2:36 AM

Herald added subscribers: danielkiss, pengfei, jfb and 2 others. · View Herald TranscriptMay 27 2021, 2:36 AM

LemonBoy requested review of this revision.May 27 2021, 2:36 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2021, 2:36 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B106460: Diff 348193.May 27 2021, 3:14 AM

tmatheson added a subscriber: tmatheson.May 27 2021, 6:31 AM

I don't really like unconditionally messing with the type of the atomic operation; not sure what kind of impact that will have.

Maybe it would make sense to add a new AtomicExpansionKind?

In D103232#2785963, @efriedma wrote:

I don't really like unconditionally messing with the type of the atomic operation; not sure what kind of impact that will have.

Maybe it would make sense to add a new AtomicExpansionKind?

The impact is minimal and, according to the test diff, it's quite positive: it fixes the LL/SC lowering for ARM and AArch64 targets (Hexagon, the other target to make use of this expansion, is already integer-expanding the operands) and, if you consider the lower-to-cmpxchg case, pulling the bitcast out of the CAS loop makes the code slimmer at -O0.

I'd be against adding a new AtomicExpansionKind, the TODO comment about adding a TLI hook to precisely control the expansion serves the same purpose IMO.

The transform makes sense on targets that don't have atomic operations on floating-point registers, but that isn't all targets. In particular, the GPU targets have floating-point atomic operations, and bitcasting like this might get in the way of the natural lowering there.

If you think it makes sense to use a dedicated lowering hook, rather than extending AtomicExpansionKind, that's fine, I guess.

In D103232#2787468, @efriedma wrote:

The transform makes sense on targets that don't have atomic operations on floating-point registers, but that isn't all targets. In particular, the GPU targets have floating-point atomic operations, and bitcasting like this might get in the way of the natural lowering there.

If you think it makes sense to use a dedicated lowering hook, rather than extending AtomicExpansionKind, that's fine, I guess.

I haven't seen any regression in the test suite, hence the unimplemented TODO comment. The same bitcasting is applied to atomic stores/loads and compare-exchange ops, if it ever becomes a problem it can be easily implemented at a later stage IMO.

LGTM

I looked, and apparently we do have an AMDGPU test for xchg; if it isn't impacted, that's fine, I guess.

This revision is now accepted and ready to land.May 28 2021, 12:38 PM

This revision was landed with ongoing or failed builds.May 28 2021, 11:57 PM

Closed by commit rGb577ec495698: [AtomicExpandPass][AArch64] Promote xchg with floating-point types to integer… (authored by LemonBoy). · Explain Why

This revision was automatically updated to reflect the committed changes.

LemonBoy added a commit: rGb577ec495698: [AtomicExpandPass][AArch64] Promote xchg with floating-point types to integer….

tkf mentioned this in D124728: Allow pointer types for atomicrmw xchg.May 3 2022, 2:36 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

AtomicExpandPass.cpp

38 lines

test/

CodeGen/

AArch64/

atomicrmw-xchg-fp.ll

112 lines

X86/

atomicf128.ll

15 lines

Transforms/

AtomicExpand/

AArch64/

expand-atomicrmw-xchg-fp.ll

50 lines

X86/

expand-atomic-xchg-fp.ll

34 lines

Diff 348618

llvm/lib/CodeGen/AtomicExpandPass.cpp

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	private:
IntegerType getCorrespondingIntegerType(Type T, const DataLayout &DL);		IntegerType getCorrespondingIntegerType(Type T, const DataLayout &DL);
LoadInst convertAtomicLoadToIntegerType(LoadInst LI);		LoadInst convertAtomicLoadToIntegerType(LoadInst LI);
bool tryExpandAtomicLoad(LoadInst *LI);		bool tryExpandAtomicLoad(LoadInst *LI);
bool expandAtomicLoadToLL(LoadInst *LI);		bool expandAtomicLoadToLL(LoadInst *LI);
bool expandAtomicLoadToCmpXchg(LoadInst *LI);		bool expandAtomicLoadToCmpXchg(LoadInst *LI);
StoreInst convertAtomicStoreToIntegerType(StoreInst SI);		StoreInst convertAtomicStoreToIntegerType(StoreInst SI);
bool expandAtomicStore(StoreInst *SI);		bool expandAtomicStore(StoreInst *SI);
bool tryExpandAtomicRMW(AtomicRMWInst *AI);		bool tryExpandAtomicRMW(AtomicRMWInst *AI);
		AtomicRMWInst convertAtomicXchgToIntegerType(AtomicRMWInst RMWI);
Value *		Value *
insertRMWLLSCLoop(IRBuilder<> &Builder, Type ResultTy, Value Addr,		insertRMWLLSCLoop(IRBuilder<> &Builder, Type ResultTy, Value Addr,
Align AddrAlign, AtomicOrdering MemOpOrder,		Align AddrAlign, AtomicOrdering MemOpOrder,
function_ref<Value (IRBuilder<> &, Value )> PerformOp);		function_ref<Value (IRBuilder<> &, Value )> PerformOp);
void expandAtomicOpToLLSC(		void expandAtomicOpToLLSC(
Instruction I, Type ResultTy, Value *Addr, Align AddrAlign,		Instruction I, Type ResultTy, Value *Addr, Align AddrAlign,
AtomicOrdering MemOpOrder,		AtomicOrdering MemOpOrder,
function_ref<Value (IRBuilder<> &, Value )> PerformOp);		function_ref<Value (IRBuilder<> &, Value )> PerformOp);
▲ Show 20 Lines • Show All 187 Lines • ▼ Show 20 Lines	if (LI) {
// There are two different ways of expanding RMW instructions:		// There are two different ways of expanding RMW instructions:
// - into a load if it is idempotent		// - into a load if it is idempotent
// - into a Cmpxchg/LL-SC loop otherwise		// - into a Cmpxchg/LL-SC loop otherwise
// we try them in that order.		// we try them in that order.

if (isIdempotentRMW(RMWI) && simplifyIdempotentRMW(RMWI)) {		if (isIdempotentRMW(RMWI) && simplifyIdempotentRMW(RMWI)) {
MadeChange = true;		MadeChange = true;
} else {		} else {
		AtomicRMWInst::BinOp Op = RMWI->getOperation();
		if (Op == AtomicRMWInst::Xchg &&
		RMWI->getValOperand()->getType()->isFloatingPointTy()) {
		// TODO: add a TLI hook to control this so that each target can
		// convert to lowering the original type one at a time.
		RMWI = convertAtomicXchgToIntegerType(RMWI);
		assert(RMWI->getValOperand()->getType()->isIntegerTy() &&
		"invariant broken");
		MadeChange = true;
		}
unsigned MinCASSize = TLI->getMinCmpXchgSizeInBits() / 8;		unsigned MinCASSize = TLI->getMinCmpXchgSizeInBits() / 8;
unsigned ValueSize = getAtomicOpSize(RMWI);		unsigned ValueSize = getAtomicOpSize(RMWI);
AtomicRMWInst::BinOp Op = RMWI->getOperation();
if (ValueSize < MinCASSize &&		if (ValueSize < MinCASSize &&
(Op == AtomicRMWInst::Or \|\| Op == AtomicRMWInst::Xor \|\|		(Op == AtomicRMWInst::Or \|\| Op == AtomicRMWInst::Xor \|\|
Op == AtomicRMWInst::And)) {		Op == AtomicRMWInst::And)) {
RMWI = widenPartwordAtomicRMW(RMWI);		RMWI = widenPartwordAtomicRMW(RMWI);
MadeChange = true;		MadeChange = true;
}		}

MadeChange \|= tryExpandAtomicRMW(RMWI);		MadeChange \|= tryExpandAtomicRMW(RMWI);
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	LoadInst AtomicExpand::convertAtomicLoadToIntegerType(LoadInst LI) {
LLVM_DEBUG(dbgs() << "Replaced " << LI << " with " << NewLI << "\n");		LLVM_DEBUG(dbgs() << "Replaced " << LI << " with " << NewLI << "\n");

Value *NewVal = Builder.CreateBitCast(NewLI, LI->getType());		Value *NewVal = Builder.CreateBitCast(NewLI, LI->getType());
LI->replaceAllUsesWith(NewVal);		LI->replaceAllUsesWith(NewVal);
LI->eraseFromParent();		LI->eraseFromParent();
return NewLI;		return NewLI;
}		}

		AtomicRMWInst *
		AtomicExpand::convertAtomicXchgToIntegerType(AtomicRMWInst *RMWI) {
		auto *M = RMWI->getModule();
		Type *NewTy =
		getCorrespondingIntegerType(RMWI->getType(), M->getDataLayout());

		IRBuilder<> Builder(RMWI);

		Value *Addr = RMWI->getPointerOperand();
		Value *Val = RMWI->getValOperand();
		Type *PT = PointerType::get(NewTy, RMWI->getPointerAddressSpace());
		Value *NewAddr = Builder.CreateBitCast(Addr, PT);
		Value *NewVal = Builder.CreateBitCast(Val, NewTy);

		auto *NewRMWI =
		Builder.CreateAtomicRMW(AtomicRMWInst::Xchg, NewAddr, NewVal,
		RMWI->getAlign(), RMWI->getOrdering());
		NewRMWI->setVolatile(RMWI->isVolatile());
		LLVM_DEBUG(dbgs() << "Replaced " << RMWI << " with " << NewRMWI << "\n");

		Value *NewRVal = Builder.CreateBitCast(NewRMWI, RMWI->getType());
		RMWI->replaceAllUsesWith(NewRVal);
		RMWI->eraseFromParent();
		return NewRMWI;
		}

bool AtomicExpand::tryExpandAtomicLoad(LoadInst *LI) {		bool AtomicExpand::tryExpandAtomicLoad(LoadInst *LI) {
switch (TLI->shouldExpandAtomicLoadInIR(LI)) {		switch (TLI->shouldExpandAtomicLoadInIR(LI)) {
case TargetLoweringBase::AtomicExpansionKind::None:		case TargetLoweringBase::AtomicExpansionKind::None:
return false;		return false;
case TargetLoweringBase::AtomicExpansionKind::LLSC:		case TargetLoweringBase::AtomicExpansionKind::LLSC:
expandAtomicOpToLLSC(		expandAtomicOpToLLSC(
LI, LI->getType(), LI->getPointerOperand(), LI->getAlign(),		LI, LI->getType(), LI->getPointerOperand(), LI->getAlign(),
LI->getOrdering(),		LI->getOrdering(),
▲ Show 20 Lines • Show All 1,484 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/atomicrmw-xchg-fp.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --force-update
				; RUN: llc -verify-machineinstrs -mtriple=aarch64-- -O1 -fast-isel=0 -global-isel=false %s -o - \| FileCheck %s -check-prefix=NOLSE
				; RUN: llc -verify-machineinstrs -mtriple=aarch64-- -mattr=+lse -O1 -fast-isel=0 -global-isel=false %s -o - \| FileCheck %s -check-prefix=LSE

				define half @test_rmw_xchg_f16(half* %dst, half %new) {
				; NOLSE-LABEL: test_rmw_xchg_f16:
				; NOLSE: // %bb.0:
				; NOLSE-NEXT: // kill: def $h0 killed $h0 def $s0
				; NOLSE-NEXT: fmov w8, s0
				; NOLSE-NEXT: .LBB0_1: // %atomicrmw.start
				; NOLSE-NEXT: // =>This Inner Loop Header: Depth=1
				; NOLSE-NEXT: ldaxrh w9, [x0]
				; NOLSE-NEXT: stlxrh w10, w8, [x0]
				; NOLSE-NEXT: cbnz w10, .LBB0_1
				; NOLSE-NEXT: // %bb.2: // %atomicrmw.end
				; NOLSE-NEXT: fmov s0, w9
				; NOLSE-NEXT: // kill: def $h0 killed $h0 killed $s0
				; NOLSE-NEXT: ret
				;
				; LSE-LABEL: test_rmw_xchg_f16:
				; LSE: // %bb.0:
				; LSE-NEXT: // kill: def $h0 killed $h0 def $s0
				; LSE-NEXT: fmov w8, s0
				; LSE-NEXT: swpalh w8, w8, [x0]
				; LSE-NEXT: fmov s0, w8
				; LSE-NEXT: // kill: def $h0 killed $h0 killed $s0
				; LSE-NEXT: ret
				%res = atomicrmw xchg half* %dst, half %new seq_cst
				ret half %res
				}

				define float @test_rmw_xchg_f32(float* %dst, float %new) {
				; NOLSE-LABEL: test_rmw_xchg_f32:
				; NOLSE: // %bb.0:
				; NOLSE-NEXT: fmov w9, s0
				; NOLSE-NEXT: .LBB1_1: // %atomicrmw.start
				; NOLSE-NEXT: // =>This Inner Loop Header: Depth=1
				; NOLSE-NEXT: ldaxr w8, [x0]
				; NOLSE-NEXT: stlxr w10, w9, [x0]
				; NOLSE-NEXT: cbnz w10, .LBB1_1
				; NOLSE-NEXT: // %bb.2: // %atomicrmw.end
				; NOLSE-NEXT: fmov s0, w8
				; NOLSE-NEXT: ret
				;
				; LSE-LABEL: test_rmw_xchg_f32:
				; LSE: // %bb.0:
				; LSE-NEXT: fmov w8, s0
				; LSE-NEXT: swpal w8, w8, [x0]
				; LSE-NEXT: fmov s0, w8
				; LSE-NEXT: ret
				%res = atomicrmw xchg float* %dst, float %new seq_cst
				ret float %res
				}

				define double @test_rmw_xchg_f64(double* %dst, double %new) {
				; NOLSE-LABEL: test_rmw_xchg_f64:
				; NOLSE: // %bb.0:
				; NOLSE-NEXT: fmov x8, d0
				; NOLSE-NEXT: .LBB2_1: // %atomicrmw.start
				; NOLSE-NEXT: // =>This Inner Loop Header: Depth=1
				; NOLSE-NEXT: ldaxr x9, [x0]
				; NOLSE-NEXT: stlxr w10, x8, [x0]
				; NOLSE-NEXT: cbnz w10, .LBB2_1
				; NOLSE-NEXT: // %bb.2: // %atomicrmw.end
				; NOLSE-NEXT: fmov d0, x9
				; NOLSE-NEXT: ret
				;
				; LSE-LABEL: test_rmw_xchg_f64:
				; LSE: // %bb.0:
				; LSE-NEXT: fmov x8, d0
				; LSE-NEXT: swpal x8, x8, [x0]
				; LSE-NEXT: fmov d0, x8
				; LSE-NEXT: ret
				%res = atomicrmw xchg double* %dst, double %new seq_cst
				ret double %res
				}

				define fp128 @test_rmw_xchg_f128(fp128* %dst, fp128 %new) {
				; NOLSE-LABEL: test_rmw_xchg_f128:
				; NOLSE: // %bb.0:
				; NOLSE-NEXT: sub sp, sp, #32 // =32
				; NOLSE-NEXT: .cfi_def_cfa_offset 32
				; NOLSE-NEXT: str q0, [sp, #16]
				; NOLSE-NEXT: ldp x9, x8, [sp, #16]
				; NOLSE-NEXT: .LBB3_1: // %atomicrmw.start
				; NOLSE-NEXT: // =>This Inner Loop Header: Depth=1
				; NOLSE-NEXT: ldaxp x11, x10, [x0]
				; NOLSE-NEXT: stlxp w12, x9, x8, [x0]
				; NOLSE-NEXT: cbnz w12, .LBB3_1
				; NOLSE-NEXT: // %bb.2: // %atomicrmw.end
				; NOLSE-NEXT: stp x11, x10, [sp]
				; NOLSE-NEXT: ldr q0, [sp], #32
				; NOLSE-NEXT: ret
				;
				; LSE-LABEL: test_rmw_xchg_f128:
				; LSE: // %bb.0:
				; LSE-NEXT: sub sp, sp, #32 // =32
				; LSE-NEXT: .cfi_def_cfa_offset 32
				; LSE-NEXT: str q0, [sp, #16]
				; LSE-NEXT: ldp x9, x8, [sp, #16]
				; LSE-NEXT: .LBB3_1: // %atomicrmw.start
				; LSE-NEXT: // =>This Inner Loop Header: Depth=1
				; LSE-NEXT: ldaxp x11, x10, [x0]
				; LSE-NEXT: stlxp w12, x9, x8, [x0]
				; LSE-NEXT: cbnz w12, .LBB3_1
				; LSE-NEXT: // %bb.2: // %atomicrmw.end
				; LSE-NEXT: stp x11, x10, [sp]
				; LSE-NEXT: ldr q0, [sp], #32
				; LSE-NEXT: ret
				%res = atomicrmw xchg fp128* %dst, fp128 %new seq_cst
				ret fp128 %res
				}

llvm/test/CodeGen/X86/atomicf128.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-apple-macosx10.9 -verify-machineinstrs -mattr=cx16 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-apple-macosx10.9 -verify-machineinstrs -mattr=cx16 \| FileCheck %s
	; RUN: llc < %s -mtriple=x86_64-apple-macosx10.9 -verify-machineinstrs -mattr=cx16 -mattr=-sse \| FileCheck %s --check-prefix=NOSSE			; RUN: llc < %s -mtriple=x86_64-apple-macosx10.9 -verify-machineinstrs -mattr=cx16 -mattr=-sse \| FileCheck %s --check-prefix=NOSSE

	; FIXME: This test has a fatal error in 32-bit mode			; FIXME: This test has a fatal error in 32-bit mode

	@fsc128 = external global fp128			@fsc128 = external global fp128

	define void @atomic_fetch_swapf128(fp128 %x) nounwind {			define void @atomic_fetch_swapf128(fp128 %x) nounwind {
	; CHECK-LABEL: atomic_fetch_swapf128:			; CHECK-LABEL: atomic_fetch_swapf128:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: pushq %rbx			; CHECK-NEXT: pushq %rbx
				; CHECK-NEXT: movaps %xmm0, -{{[0-9]+}}(%rsp)
				; CHECK-NEXT: movq -{{[0-9]+}}(%rsp), %rbx
				; CHECK-NEXT: movq -{{[0-9]+}}(%rsp), %rcx
	; CHECK-NEXT: movq _fsc128@{{.*}}(%rip), %rsi			; CHECK-NEXT: movq _fsc128@{{.*}}(%rip), %rsi
	; CHECK-NEXT: movaps (%rsi), %xmm1			; CHECK-NEXT: movq (%rsi), %rax
				; CHECK-NEXT: movq 8(%rsi), %rdx
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB0_1: ## %atomicrmw.start			; CHECK-NEXT: LBB0_1: ## %atomicrmw.start
	; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: movaps %xmm0, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movq -{{[0-9]+}}(%rsp), %rbx
	; CHECK-NEXT: movq -{{[0-9]+}}(%rsp), %rcx
	; CHECK-NEXT: movaps %xmm1, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movq -{{[0-9]+}}(%rsp), %rax
	; CHECK-NEXT: movq -{{[0-9]+}}(%rsp), %rdx
	; CHECK-NEXT: lock cmpxchg16b (%rsi)			; CHECK-NEXT: lock cmpxchg16b (%rsi)
	; CHECK-NEXT: movq %rdx, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movq %rax, -{{[0-9]+}}(%rsp)
	; CHECK-NEXT: movaps -{{[0-9]+}}(%rsp), %xmm1
	; CHECK-NEXT: jne LBB0_1			; CHECK-NEXT: jne LBB0_1
	; CHECK-NEXT: ## %bb.2: ## %atomicrmw.end			; CHECK-NEXT: ## %bb.2: ## %atomicrmw.end
	; CHECK-NEXT: popq %rbx			; CHECK-NEXT: popq %rbx
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	;			;
	; NOSSE-LABEL: atomic_fetch_swapf128:			; NOSSE-LABEL: atomic_fetch_swapf128:
	; NOSSE: ## %bb.0:			; NOSSE: ## %bb.0:
	; NOSSE-NEXT: pushq %rbx			; NOSSE-NEXT: pushq %rbx
	Show All 16 Lines

llvm/test/Transforms/AtomicExpand/AArch64/expand-atomicrmw-xchg-fp.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -O1 -S -mtriple=aarch64-- -atomic-expand %s \| FileCheck %s			; RUN: opt -O1 -S -mtriple=aarch64-- -atomic-expand %s \| FileCheck %s
	; RUN: opt -O1 -S -mtriple=aarch64-- -mattr=+outline-atomics -atomic-expand %s \| FileCheck %s --check-prefix=OUTLINE-ATOMICS			; RUN: opt -O1 -S -mtriple=aarch64-- -mattr=+outline-atomics -atomic-expand %s \| FileCheck %s --check-prefix=OUTLINE-ATOMICS

	define void @atomic_swap_f16(half* %ptr, half %val) nounwind {			define void @atomic_swap_f16(half* %ptr, half %val) nounwind {
	; CHECK-LABEL: @atomic_swap_f16(			; CHECK-LABEL: @atomic_swap_f16(
				; CHECK-NEXT: [[TMP1:%.]] = bitcast half [[PTR:%.]] to i16
				; CHECK-NEXT: [[TMP2:%.]] = bitcast half [[VAL:%.]] to i16
	; CHECK-NEXT: br label [[ATOMICRMW_START:%.*]]			; CHECK-NEXT: br label [[ATOMICRMW_START:%.*]]
	; CHECK: atomicrmw.start:			; CHECK: atomicrmw.start:
	; CHECK-NEXT: [[TMP1:%.]] = call i64 @llvm.aarch64.ldaxr.p0f16(half [[PTR:%.*]])			; CHECK-NEXT: [[TMP3:%.]] = call i64 @llvm.aarch64.ldaxr.p0i16(i16 [[TMP1]])
	; CHECK-NEXT: [[TMP2:%.*]] = trunc i64 [[TMP1]] to i16			; CHECK-NEXT: [[TMP4:%.*]] = trunc i64 [[TMP3]] to i16
	; CHECK-NEXT: [[TMP3:%.*]] = bitcast i16 [[TMP2]] to half			; CHECK-NEXT: [[TMP5:%.*]] = zext i16 [[TMP2]] to i64
	; CHECK-NEXT: [[TMP4:%.]] = bitcast half [[VAL:%.]] to i16			; CHECK-NEXT: [[TMP6:%.]] = call i32 @llvm.aarch64.stxr.p0i16(i64 [[TMP5]], i16 [[TMP1]])
	; CHECK-NEXT: [[TMP5:%.*]] = zext i16 [[TMP4]] to i64
	; CHECK-NEXT: [[TMP6:%.]] = call i32 @llvm.aarch64.stxr.p0f16(i64 [[TMP5]], half [[PTR]])
	; CHECK-NEXT: [[TRYAGAIN:%.*]] = icmp ne i32 [[TMP6]], 0			; CHECK-NEXT: [[TRYAGAIN:%.*]] = icmp ne i32 [[TMP6]], 0
	; CHECK-NEXT: br i1 [[TRYAGAIN]], label [[ATOMICRMW_START]], label [[ATOMICRMW_END:%.*]]			; CHECK-NEXT: br i1 [[TRYAGAIN]], label [[ATOMICRMW_START]], label [[ATOMICRMW_END:%.*]]
	; CHECK: atomicrmw.end:			; CHECK: atomicrmw.end:
				; CHECK-NEXT: [[TMP7:%.*]] = bitcast i16 [[TMP4]] to half
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; OUTLINE-ATOMICS-LABEL: @atomic_swap_f16(			; OUTLINE-ATOMICS-LABEL: @atomic_swap_f16(
	; OUTLINE-ATOMICS-NEXT: [[T1:%.]] = atomicrmw xchg half [[PTR:%.]], half [[VAL:%.]] acquire			; OUTLINE-ATOMICS-NEXT: [[TMP1:%.]] = bitcast half [[PTR:%.]] to i16
				; OUTLINE-ATOMICS-NEXT: [[TMP2:%.]] = bitcast half [[VAL:%.]] to i16
				; OUTLINE-ATOMICS-NEXT: [[TMP3:%.]] = atomicrmw xchg i16 [[TMP1]], i16 [[TMP2]] acquire, align 2
				; OUTLINE-ATOMICS-NEXT: [[TMP4:%.*]] = bitcast i16 [[TMP3]] to half
	; OUTLINE-ATOMICS-NEXT: ret void			; OUTLINE-ATOMICS-NEXT: ret void
	;			;
	%t1 = atomicrmw xchg half* %ptr, half %val acquire			%t1 = atomicrmw xchg half* %ptr, half %val acquire
	ret void			ret void
	}			}

	define void @atomic_swap_f32(float* %ptr, float %val) nounwind {			define void @atomic_swap_f32(float* %ptr, float %val) nounwind {
	; CHECK-LABEL: @atomic_swap_f32(			; CHECK-LABEL: @atomic_swap_f32(
				; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[PTR:%.]] to i32
				; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[VAL:%.]] to i32
	; CHECK-NEXT: br label [[ATOMICRMW_START:%.*]]			; CHECK-NEXT: br label [[ATOMICRMW_START:%.*]]
	; CHECK: atomicrmw.start:			; CHECK: atomicrmw.start:
	; CHECK-NEXT: [[TMP1:%.]] = call i64 @llvm.aarch64.ldaxr.p0f32(float [[PTR:%.*]])			; CHECK-NEXT: [[TMP3:%.]] = call i64 @llvm.aarch64.ldaxr.p0i32(i32 [[TMP1]])
	; CHECK-NEXT: [[TMP2:%.*]] = trunc i64 [[TMP1]] to i32			; CHECK-NEXT: [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
	; CHECK-NEXT: [[TMP3:%.*]] = bitcast i32 [[TMP2]] to float			; CHECK-NEXT: [[TMP5:%.*]] = zext i32 [[TMP2]] to i64
	; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[VAL:%.]] to i32			; CHECK-NEXT: [[TMP6:%.]] = call i32 @llvm.aarch64.stxr.p0i32(i64 [[TMP5]], i32 [[TMP1]])
	; CHECK-NEXT: [[TMP5:%.*]] = zext i32 [[TMP4]] to i64
	; CHECK-NEXT: [[TMP6:%.]] = call i32 @llvm.aarch64.stxr.p0f32(i64 [[TMP5]], float [[PTR]])
	; CHECK-NEXT: [[TRYAGAIN:%.*]] = icmp ne i32 [[TMP6]], 0			; CHECK-NEXT: [[TRYAGAIN:%.*]] = icmp ne i32 [[TMP6]], 0
	; CHECK-NEXT: br i1 [[TRYAGAIN]], label [[ATOMICRMW_START]], label [[ATOMICRMW_END:%.*]]			; CHECK-NEXT: br i1 [[TRYAGAIN]], label [[ATOMICRMW_START]], label [[ATOMICRMW_END:%.*]]
	; CHECK: atomicrmw.end:			; CHECK: atomicrmw.end:
				; CHECK-NEXT: [[TMP7:%.*]] = bitcast i32 [[TMP4]] to float
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; OUTLINE-ATOMICS-LABEL: @atomic_swap_f32(			; OUTLINE-ATOMICS-LABEL: @atomic_swap_f32(
	; OUTLINE-ATOMICS-NEXT: [[T1:%.]] = atomicrmw xchg float [[PTR:%.]], float [[VAL:%.]] acquire			; OUTLINE-ATOMICS-NEXT: [[TMP1:%.]] = bitcast float [[PTR:%.]] to i32
				; OUTLINE-ATOMICS-NEXT: [[TMP2:%.]] = bitcast float [[VAL:%.]] to i32
				; OUTLINE-ATOMICS-NEXT: [[TMP3:%.]] = atomicrmw xchg i32 [[TMP1]], i32 [[TMP2]] acquire, align 4
				; OUTLINE-ATOMICS-NEXT: [[TMP4:%.*]] = bitcast i32 [[TMP3]] to float
	; OUTLINE-ATOMICS-NEXT: ret void			; OUTLINE-ATOMICS-NEXT: ret void
	;			;
	%t1 = atomicrmw xchg float* %ptr, float %val acquire			%t1 = atomicrmw xchg float* %ptr, float %val acquire
	ret void			ret void
	}			}

	define void @atomic_swap_f64(double* %ptr, double %val) nounwind {			define void @atomic_swap_f64(double* %ptr, double %val) nounwind {
	; CHECK-LABEL: @atomic_swap_f64(			; CHECK-LABEL: @atomic_swap_f64(
				; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[PTR:%.]] to i64
				; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[VAL:%.]] to i64
	; CHECK-NEXT: br label [[ATOMICRMW_START:%.*]]			; CHECK-NEXT: br label [[ATOMICRMW_START:%.*]]
	; CHECK: atomicrmw.start:			; CHECK: atomicrmw.start:
	; CHECK-NEXT: [[TMP1:%.]] = call i64 @llvm.aarch64.ldaxr.p0f64(double [[PTR:%.*]])			; CHECK-NEXT: [[TMP3:%.]] = call i64 @llvm.aarch64.ldaxr.p0i64(i64 [[TMP1]])
	; CHECK-NEXT: [[TMP2:%.*]] = bitcast i64 [[TMP1]] to double			; CHECK-NEXT: [[TMP4:%.]] = call i32 @llvm.aarch64.stxr.p0i64(i64 [[TMP2]], i64 [[TMP1]])
	; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[VAL:%.]] to i64
	; CHECK-NEXT: [[TMP4:%.]] = call i32 @llvm.aarch64.stxr.p0f64(i64 [[TMP3]], double [[PTR]])
	; CHECK-NEXT: [[TRYAGAIN:%.*]] = icmp ne i32 [[TMP4]], 0			; CHECK-NEXT: [[TRYAGAIN:%.*]] = icmp ne i32 [[TMP4]], 0
	; CHECK-NEXT: br i1 [[TRYAGAIN]], label [[ATOMICRMW_START]], label [[ATOMICRMW_END:%.*]]			; CHECK-NEXT: br i1 [[TRYAGAIN]], label [[ATOMICRMW_START]], label [[ATOMICRMW_END:%.*]]
	; CHECK: atomicrmw.end:			; CHECK: atomicrmw.end:
				; CHECK-NEXT: [[TMP5:%.*]] = bitcast i64 [[TMP3]] to double
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; OUTLINE-ATOMICS-LABEL: @atomic_swap_f64(			; OUTLINE-ATOMICS-LABEL: @atomic_swap_f64(
	; OUTLINE-ATOMICS-NEXT: [[T1:%.]] = atomicrmw xchg double [[PTR:%.]], double [[VAL:%.]] acquire			; OUTLINE-ATOMICS-NEXT: [[TMP1:%.]] = bitcast double [[PTR:%.]] to i64
				; OUTLINE-ATOMICS-NEXT: [[TMP2:%.]] = bitcast double [[VAL:%.]] to i64
				; OUTLINE-ATOMICS-NEXT: [[TMP3:%.]] = atomicrmw xchg i64 [[TMP1]], i64 [[TMP2]] acquire, align 8
				; OUTLINE-ATOMICS-NEXT: [[TMP4:%.*]] = bitcast i64 [[TMP3]] to double
	; OUTLINE-ATOMICS-NEXT: ret void			; OUTLINE-ATOMICS-NEXT: ret void
	;			;
	%t1 = atomicrmw xchg double* %ptr, double %val acquire			%t1 = atomicrmw xchg double* %ptr, double %val acquire
	ret void			ret void
	}			}

llvm/test/Transforms/AtomicExpand/X86/expand-atomic-xchg-fp.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -mtriple=i686-linux-gnu -atomic-expand %s \| FileCheck %s			; RUN: opt -S -mtriple=i686-linux-gnu -atomic-expand %s \| FileCheck %s

	define double @atomic_xchg_f64(double* %ptr) nounwind {			define double @atomic_xchg_f64(double* %ptr) nounwind {
	; CHECK-LABEL: @atomic_xchg_f64(			; CHECK-LABEL: @atomic_xchg_f64(
	; CHECK-NEXT: [[TMP1:%.]] = load double, double [[PTR:%.*]], align 8			; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[PTR:%.]] to i64
				; CHECK-NEXT: [[TMP2:%.]] = load i64, i64 [[TMP1]], align 8
	; CHECK-NEXT: br label [[ATOMICRMW_START:%.*]]			; CHECK-NEXT: br label [[ATOMICRMW_START:%.*]]
	; CHECK: atomicrmw.start:			; CHECK: atomicrmw.start:
	; CHECK-NEXT: [[LOADED:%.]] = phi double [ [[TMP1]], [[TMP0:%.]] ], [ [[TMP5:%.*]], [[ATOMICRMW_START]] ]			; CHECK-NEXT: [[LOADED:%.]] = phi i64 [ [[TMP2]], [[TMP0:%.]] ], [ [[NEWLOADED:%.*]], [[ATOMICRMW_START]] ]
	; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[PTR]] to i64*			; CHECK-NEXT: [[TMP3:%.]] = cmpxchg i64 [[TMP1]], i64 [[LOADED]], i64 4616189618054758400 seq_cst seq_cst, align 8
	; CHECK-NEXT: [[TMP3:%.*]] = bitcast double [[LOADED]] to i64			; CHECK-NEXT: [[SUCCESS:%.*]] = extractvalue { i64, i1 } [[TMP3]], 1
	; CHECK-NEXT: [[TMP4:%.]] = cmpxchg i64 [[TMP2]], i64 [[TMP3]], i64 4616189618054758400 seq_cst seq_cst			; CHECK-NEXT: [[NEWLOADED]] = extractvalue { i64, i1 } [[TMP3]], 0
	; CHECK-NEXT: [[SUCCESS:%.*]] = extractvalue { i64, i1 } [[TMP4]], 1
	; CHECK-NEXT: [[NEWLOADED:%.*]] = extractvalue { i64, i1 } [[TMP4]], 0
	; CHECK-NEXT: [[TMP5]] = bitcast i64 [[NEWLOADED]] to double
	; CHECK-NEXT: br i1 [[SUCCESS]], label [[ATOMICRMW_END:%.*]], label [[ATOMICRMW_START]]			; CHECK-NEXT: br i1 [[SUCCESS]], label [[ATOMICRMW_END:%.*]], label [[ATOMICRMW_START]]
	; CHECK: atomicrmw.end:			; CHECK: atomicrmw.end:
	; CHECK-NEXT: ret double [[TMP5]]			; CHECK-NEXT: [[TMP4:%.*]] = bitcast i64 [[NEWLOADED]] to double
				; CHECK-NEXT: ret double [[TMP4]]
	;			;
	%result = atomicrmw xchg double* %ptr, double 4.0 seq_cst			%result = atomicrmw xchg double* %ptr, double 4.0 seq_cst
	ret double %result			ret double %result
	}			}

	define double @atomic_xchg_f64_as1(double addrspace(1)* %ptr) nounwind {			define double @atomic_xchg_f64_as1(double addrspace(1)* %ptr) nounwind {
	; CHECK-LABEL: @atomic_xchg_f64_as1(			; CHECK-LABEL: @atomic_xchg_f64_as1(
	; CHECK-NEXT: [[TMP1:%.]] = load double, double addrspace(1) [[PTR:%.*]], align 8			; CHECK-NEXT: [[TMP1:%.]] = bitcast double addrspace(1) [[PTR:%.]] to i64 addrspace(1)
				; CHECK-NEXT: [[TMP2:%.]] = load i64, i64 addrspace(1) [[TMP1]], align 8
	; CHECK-NEXT: br label [[ATOMICRMW_START:%.*]]			; CHECK-NEXT: br label [[ATOMICRMW_START:%.*]]
	; CHECK: atomicrmw.start:			; CHECK: atomicrmw.start:
	; CHECK-NEXT: [[LOADED:%.]] = phi double [ [[TMP1]], [[TMP0:%.]] ], [ [[TMP5:%.*]], [[ATOMICRMW_START]] ]			; CHECK-NEXT: [[LOADED:%.]] = phi i64 [ [[TMP2]], [[TMP0:%.]] ], [ [[NEWLOADED:%.*]], [[ATOMICRMW_START]] ]
	; CHECK-NEXT: [[TMP2:%.]] = bitcast double addrspace(1) [[PTR]] to i64 addrspace(1)*			; CHECK-NEXT: [[TMP3:%.]] = cmpxchg i64 addrspace(1) [[TMP1]], i64 [[LOADED]], i64 4616189618054758400 seq_cst seq_cst, align 8
	; CHECK-NEXT: [[TMP3:%.*]] = bitcast double [[LOADED]] to i64			; CHECK-NEXT: [[SUCCESS:%.*]] = extractvalue { i64, i1 } [[TMP3]], 1
	; CHECK-NEXT: [[TMP4:%.]] = cmpxchg i64 addrspace(1) [[TMP2]], i64 [[TMP3]], i64 4616189618054758400 seq_cst seq_cst			; CHECK-NEXT: [[NEWLOADED]] = extractvalue { i64, i1 } [[TMP3]], 0
	; CHECK-NEXT: [[SUCCESS:%.*]] = extractvalue { i64, i1 } [[TMP4]], 1
	; CHECK-NEXT: [[NEWLOADED:%.*]] = extractvalue { i64, i1 } [[TMP4]], 0
	; CHECK-NEXT: [[TMP5]] = bitcast i64 [[NEWLOADED]] to double
	; CHECK-NEXT: br i1 [[SUCCESS]], label [[ATOMICRMW_END:%.*]], label [[ATOMICRMW_START]]			; CHECK-NEXT: br i1 [[SUCCESS]], label [[ATOMICRMW_END:%.*]], label [[ATOMICRMW_START]]
	; CHECK: atomicrmw.end:			; CHECK: atomicrmw.end:
	; CHECK-NEXT: ret double [[TMP5]]			; CHECK-NEXT: [[TMP4:%.*]] = bitcast i64 [[NEWLOADED]] to double
				; CHECK-NEXT: ret double [[TMP4]]
	;			;
	%result = atomicrmw xchg double addrspace(1)* %ptr, double 4.0 seq_cst			%result = atomicrmw xchg double addrspace(1)* %ptr, double 4.0 seq_cst
	ret double %result			ret double %result
	}			}