This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
1/1
AtomicExpandPass.cpp
-
test/
-
CodeGen/X86/
-
X86/
-
2010-01-08-Atomic64Bug.ll
-
atomic-load-store-wide.ll
-
atomic-minmax-i6432.ll
-
atomic128.ll
-
Transforms/AtomicExpand/X86/
-
AtomicExpand/
-
X86/
1/3
expand-atomic-rmw-initial-load.ll

Differential D12338

Make `llvm::expandAtomicRMWToCmpXchg`'s initial load atomic.
ClosedPublic

Authored by DiamondLovesYou on Aug 25 2015, 2:44 PM.

Download Raw Diff

Details

Reviewers

reames
jfb

Summary

PNaCl needs this so optimization/targetmachine passes don't reorder the initial load upon translation to a native target. This also reflects what the function comment specifies.

Diff Detail

Event Timeline

DiamondLovesYou updated this revision to Diff 33129.Aug 25 2015, 2:44 PM

DiamondLovesYou retitled this revision from to Add a boolean parameter to make the initial load atomic..

DiamondLovesYou updated this object.

DiamondLovesYou added a reviewer: jfb.

DiamondLovesYou added a subscriber: llvm-commits.

Herald added subscribers: dschuff, jfb. · View Herald TranscriptAug 25 2015, 2:44 PM

This isn't just for PNaCl, so the description should be updated. You should mention something along the lines of:

On weak memory systems the CPU can speculate on subsequent loads (e.g. the cmpxchg) and observe them without honoring the happens-before ordering of the corresponding stores. This is the "load buffering" problem in literature, and occurs on ARM and POWER.

lib/CodeGen/AtomicExpandPass.cpp
556	I think this should be an unconditional `InitLoaded->setOrdering(AtomicCmpXchgInst::getStrongestFailureOrdering(MemOpOrder));` and there shouldn't be an `IsRelaxed` parameter. This will give the leading `load` a memory order that's valid for loads, and corresponds to the ordering the RMW instruction had.

Unconditionally make the initial load atomic and fix the resulting X86 test failures.

jfb added inline comments.Aug 31 2015, 10:09 AM

test/Transforms/AtomicExpand/X86/expand-atomic-rmw-initial-load.ll
7	Could you also test the other memory orderings in separate test functions? I mainly want to ensure that an `acq_rel` RMW operation gets a `load acquire` (the others should have the same ordering).

Fix comments.

jfb added inline comments.Aug 31 2015, 2:28 PM

test/Transforms/AtomicExpand/X86/expand-atomic-rmw-initial-load.ll
7	Also `relaxed`, `acquire`, `release` (which should stay as-is).

Adding Robin.

Added tests for the other atomic orderings.

DiamondLovesYou marked an inline comment as done.Aug 31 2015, 2:41 PM

DiamondLovesYou added inline comments.

test/Transforms/AtomicExpand/X86/expand-atomic-rmw-initial-load.ll
7	Sorry, I didn't realize you asked for the others until after I checked `Done`. Also, `s/relaxed/monotonic/`.

lgtm, but leave open for a short while if @morisset wants to comment.

This revision is now accepted and ready to land.Aug 31 2015, 2:46 PM

In D12338#232597, @jfb wrote:

This isn't just for PNaCl, so the description should be updated. You should mention something along the lines of:

On weak memory systems the CPU can speculate on subsequent loads (e.g. the cmpxchg) and observe them without honoring the happens-before ordering of the corresponding stores. This is the "load buffering" problem in literature, and occurs on ARM and POWER.

I don't completely understand your argument here. The load buffering pattern (LB) occurs where there is a load to a location, followed by a store to a different location (and I agree that in that case the store can be executed before the load on ARM/Power). Here there is a load followed by a cmpxchg to the same location. No cache-coherent processor I know of can do anything tricky in such a case (accesses to the same location are always globally ordered, and in a way compatible with the program order). The only thing that I could see occurring is for the load to return stale data, which would only lead the cmpxchg to fail, and to an extra iteration of the loop. Am I missing something (likely considering it is 1 am) ?

So I agree that it deserves a comment to explain why it is safe but I still believe the original code safe; and I would rather avoid these extra cmpxchg8b/16b your change introduces unless they are necessary.

Extra remark: if the risk is indeed a LB pattern, then x86 is immune to it anyway, which makes me doubt the utility of these tests. Although in that case I don't have any better ones to suggest, since x86 is the only backend to exercise this function.

After thinking some more about it, I realized that having the load stay non-atomic is actually unsound because there might be a race, which would make it undefined behaviour.
Sadly, making it monotonic would still introduce these redundant cmpxchg8b/16b since monotonic requires the access to be atomic.
I don't know well enough the semantics of unordered accesses to know if they would do the trick, and the llvm reference manual description of them is fairly lightweight on details.

@reames: do you know if unordered has the right strength for allowing races without forcing the backend to use cmpxchg8b/16b instead of two mov ? If no, do you have a better suggestion ?

Thanks for chiming in. Maybe @jyasskin has an opinion from an LLVM memory model point of view.

In D12338#236924, @morisset wrote:

After thinking some more about it, I realized that having the load stay non-atomic is actually unsound because there might be a race, which would make it undefined behaviour.
Sadly, making it monotonic would still introduce these redundant cmpxchg8b/16b since monotonic requires the access to be atomic.
I don't know well enough the semantics of unordered accesses to know if they would do the trick, and the llvm reference manual description of them is fairly lightweight on details.

@reames: do you know if unordered has the right strength for allowing races without forcing the backend to use cmpxchg8b/16b instead of two mov ? If no, do you have a better suggestion ?

From http://llvm.org/docs/Atomics.html#unordered, "note that an unordered load or store cannot be split into multiple instructions."

You can't use a normal access because that allows a later pass to change it to:

%1 = load %ptr
%2 = op %1
%oops = load %ptr
cmpxchg %ptr, %oops, %2

Bleh. With the current instructions, I don't think there's a good answer here.

Should we instead plop in a signal fence after the (non-atomic) load?

I talked to @jyasskin in person: signal fence would work but isn't quite what we want. We would want a new type of atomic that participates in ordering, but is allowed to tear. This doesn't exist, but we can emulate it by:

Changing the pass to be parameterize with "largest efficient atomic operation for this target". This is effectively the widest is_lock_free value.
Passing this number when initializing the target's pass list.
Having the pass explicitly tear too-wide loads into sub-loads of the type from 1., and reassembling the value from that.

This would manually create the tearing (tests would be the same as before) but the load would be atomic.

@morisset, does that sounds good?

It sounds great to me (although a lot of work for such a corner case, but I don't see any better solution).

@jyasskin I took a look at implementing what we discussed, and it's quite a bit uglier than I though it would be:

We want the same value that factors into ATOMIC_*_LOCK_FREE, which is MaxAtomicInlineWidth from tools/clang/lib/Basic/Targets.cpp (each Target should define a value there, or gets 0 by default).
We can pass this to the backend through LLVM's TargetOptions, which clang's BackendUtil.cpp can initialize.
AtomicExpandPass has a TargetMachine, which has TargetOptions.

This will work for code that come straight from C++, but won't work for code that only uses opt: clang has the information we need, and LLVM doesn't. This is pretty much something that should be in the DataLayout instead, which is a *very* involved change! WDYT? Am I missing something obvious?

In D12338#245398, @jfb wrote:

@jyasskin I took a look at implementing what we discussed, and it's quite a bit uglier than I though it would be:

We want the same value that factors into ATOMIC_*_LOCK_FREE, which is MaxAtomicInlineWidth from tools/clang/lib/Basic/Targets.cpp (each Target should define a value there, or gets 0 by default).

Richard and I don't think you want that value. On x86-64, the maximum width here is actually 128 bits, but you want to split loads down to 64 bits.

We can pass this to the backend through LLVM's TargetOptions, which clang's BackendUtil.cpp can initialize.

AtomicExpandPass has a TargetMachine, which has TargetOptions.

This will work for code that come straight from C++, but won't work for code that only uses opt: clang has the information we need, and LLVM doesn't. This is pretty much something that should be in the DataLayout instead, which is a *very* involved change! WDYT? Am I missing something obvious?

You're probably right, but I don't know enough about the backends to be sure. It does seem like the backend should know the widths of atomics it supports.

In D12338#245624, @jyasskin wrote:

In D12338#245398, @jfb wrote:

@jyasskin I took a look at implementing what we discussed, and it's quite a bit uglier than I though it would be:

We want the same value that factors into ATOMIC_*_LOCK_FREE, which is MaxAtomicInlineWidth from tools/clang/lib/Basic/Targets.cpp (each Target should define a value there, or gets 0 by default).

Richard and I don't think you want that value. On x86-64, the maximum width here is actually 128 bits, but you want to split loads down to 64 bits.

We can pass this to the backend through LLVM's TargetOptions, which clang's BackendUtil.cpp can initialize.

AtomicExpandPass has a TargetMachine, which has TargetOptions.

This will work for code that come straight from C++, but won't work for code that only uses opt: clang has the information we need, and LLVM doesn't. This is pretty much something that should be in the DataLayout instead, which is a *very* involved change! WDYT? Am I missing something obvious?

You're probably right, but I don't know enough about the backends to be sure. It does seem like the backend should know the widths of atomics it supports.

It seems to me that this is a x86-64 only issue, in that only x86-64 wants to split >64bit atomic loads in two. Perhaps we could add a load attribute, ie tearable, that would signal to the backend that this particular load could be torn without violating frontend assumptions? The ability to tear this initial load seems to me to be a property of this particular transform, and not of general IR (obviously). WDYT?

Given this: https://blog.chromium.org/2017/05/goodbye-pnacl-hello-webassembly.html
I think this patch can be closed as "won't fix".

Revision Contents

Path

Size

lib/

CodeGen/

AtomicExpandPass.cpp

1 line

test/

CodeGen/

X86/

2010-01-08-Atomic64Bug.ll

4 lines

atomic-load-store-wide.ll

1 line

atomic-minmax-i6432.ll

3 lines

atomic128.ll

42 lines

Transforms/

AtomicExpand/

X86/

expand-atomic-rmw-initial-load.ll

34 lines

Diff 33633

lib/CodeGen/AtomicExpandPass.cpp

Show First 20 Lines • Show All 545 Lines • ▼ Show 20 Lines	bool llvm::expandAtomicRMWToCmpXchg(AtomicRMWInst *AI,
// The split call above "helpfully" added a branch at the end of BB (to the		// The split call above "helpfully" added a branch at the end of BB (to the
// wrong place), but we want a load. It's easiest to just remove		// wrong place), but we want a load. It's easiest to just remove
// the branch entirely.		// the branch entirely.
std::prev(BB->end())->eraseFromParent();		std::prev(BB->end())->eraseFromParent();
Builder.SetInsertPoint(BB);		Builder.SetInsertPoint(BB);
LoadInst *InitLoaded = Builder.CreateLoad(Addr);		LoadInst *InitLoaded = Builder.CreateLoad(Addr);
// Atomics require at least natural alignment.		// Atomics require at least natural alignment.
InitLoaded->setAlignment(AI->getType()->getPrimitiveSizeInBits() / 8);		InitLoaded->setAlignment(AI->getType()->getPrimitiveSizeInBits() / 8);
		InitLoaded->setOrdering(AtomicCmpXchgInst::getStrongestFailureOrdering(MemOpOrder));
Builder.CreateBr(LoopBB);		Builder.CreateBr(LoopBB);

		jfbUnsubmitted Done Reply Inline Actions I think this should be an unconditional `InitLoaded->setOrdering(AtomicCmpXchgInst::getStrongestFailureOrdering(MemOpOrder));` and there shouldn't be an `IsRelaxed` parameter. This will give the leading `load` a memory order that's valid for loads, and corresponds to the ordering the RMW instruction had. jfb: I think this should be an unconditional `InitLoaded->setOrdering(AtomicCmpXchgInst…
// Start the main loop block now that we've taken care of the preliminaries.		// Start the main loop block now that we've taken care of the preliminaries.
Builder.SetInsertPoint(LoopBB);		Builder.SetInsertPoint(LoopBB);
PHINode *Loaded = Builder.CreatePHI(AI->getType(), 2, "loaded");		PHINode *Loaded = Builder.CreatePHI(AI->getType(), 2, "loaded");
Loaded->addIncoming(InitLoaded, BB);		Loaded->addIncoming(InitLoaded, BB);

Value *NewVal =		Value *NewVal =
performAtomicOp(AI->getOperation(), Builder, Loaded, AI->getValOperand());		performAtomicOp(AI->getOperation(), Builder, Loaded, AI->getValOperand());

Show All 18 Lines

test/CodeGen/X86/2010-01-08-Atomic64Bug.ll

	; RUN: llc < %s -mtriple=i386-apple-darwin -mcpu=corei7 \| FileCheck %s			; RUN: llc < %s -mtriple=i386-apple-darwin -mcpu=corei7 \| FileCheck %s
	; rdar://r7512579			; rdar://r7512579

	; PHI defs in the atomic loop should be used by the add / adc			; PHI defs in the atomic loop should be used by the add / adc
	; instructions. They should not be dead.			; instructions. They should not be dead.

	define void @t(i64* nocapture %p) nounwind ssp {			define void @t(i64* nocapture %p) nounwind ssp {
	entry:			entry:
	; CHECK-LABEL: t:			; CHECK-LABEL: t:
	; CHECK: movl ([[REG:%[a-z]+]]), %eax			; CHECK: movl 12({{%[a-z]+}}), [[REG:%[a-z]+]]
	; CHECK: movl 4([[REG]]), %edx			; CHECK: lock cmpxchg8b ([[REG]])
	; CHECK: LBB0_1:			; CHECK: LBB0_1:
	; CHECK: movl %eax, %ebx			; CHECK: movl %eax, %ebx
	; CHECK: addl $1, %ebx			; CHECK: addl $1, %ebx
	; CHECK: movl %edx, %ecx			; CHECK: movl %edx, %ecx
	; CHECK: adcl $0, %ecx			; CHECK: adcl $0, %ecx
	; CHECK: lock cmpxchg8b ([[REG]])			; CHECK: lock cmpxchg8b ([[REG]])
	; CHECK-NEXT: jne			; CHECK-NEXT: jne
	%0 = atomicrmw add i64* %p, i64 1 seq_cst			%0 = atomicrmw add i64* %p, i64 1 seq_cst
	ret void			ret void
	}			}

test/CodeGen/X86/atomic-load-store-wide.ll

	; RUN: llc < %s -mcpu=corei7 -march=x86 -verify-machineinstrs \| FileCheck %s			; RUN: llc < %s -mcpu=corei7 -march=x86 -verify-machineinstrs \| FileCheck %s

	; 64-bit load/store on x86-32			; 64-bit load/store on x86-32
	; FIXME: The generated code can be substantially improved.			; FIXME: The generated code can be substantially improved.

	define void @test1(i64* %ptr, i64 %val1) {			define void @test1(i64* %ptr, i64 %val1) {
	; CHECK-LABEL: test1			; CHECK-LABEL: test1
	; CHECK: lock cmpxchg8b			; CHECK: lock cmpxchg8b
				; CHECK: lock cmpxchg8b
	; CHECK-NEXT: jne			; CHECK-NEXT: jne
	store atomic i64 %val1, i64* %ptr seq_cst, align 8			store atomic i64 %val1, i64* %ptr seq_cst, align 8
	ret void			ret void
	}			}

	define i64 @test2(i64* %ptr) {			define i64 @test2(i64* %ptr) {
	; CHECK-LABEL: test2			; CHECK-LABEL: test2
	; CHECK: lock cmpxchg8b			; CHECK: lock cmpxchg8b
	%val = load atomic i64, i64* %ptr seq_cst, align 8			%val = load atomic i64, i64* %ptr seq_cst, align 8
	ret i64 %val			ret i64 %val
	}			}

test/CodeGen/X86/atomic-minmax-i6432.ll

Show All 39 Lines	; LINUX: jne [[LABEL]]
ret void		ret void
}		}

; rdar://12453106		; rdar://12453106
@id = internal global i64 0, align 8		@id = internal global i64 0, align 8

define void @tf_bug(i8* %ptr) nounwind {		define void @tf_bug(i8* %ptr) nounwind {
; PIC-LABEL: tf_bug:		; PIC-LABEL: tf_bug:
; PIC-DAG: movl _id-L1$pb(		; PIC-DAG: lock cmpxchg8b _id-L1$pb(
; PIC-DAG: movl (_id-L1$pb)+4(
%tmp1 = atomicrmw add i64* @id, i64 1 seq_cst		%tmp1 = atomicrmw add i64* @id, i64 1 seq_cst
%tmp2 = add i64 %tmp1, 1		%tmp2 = add i64 %tmp1, 1
%tmp3 = bitcast i8* %ptr to i64*		%tmp3 = bitcast i8* %ptr to i64*
store i64 %tmp2, i64* %tmp3, align 4		store i64 %tmp2, i64* %tmp3, align 4
ret void		ret void
}		}

test/CodeGen/X86/atomic128.ll

Show All 12 Lines	; CHECK: cmpxchg16b (%rdi)
%pair = cmpxchg i128* %p, i128 %oldval, i128 %newval acquire acquire		%pair = cmpxchg i128* %p, i128 %oldval, i128 %newval acquire acquire
%val = extractvalue { i128, i1 } %pair, 0		%val = extractvalue { i128, i1 } %pair, 0
ret i128 %val		ret i128 %val
}		}

define void @fetch_and_nand(i128* %p, i128 %bits) {		define void @fetch_and_nand(i128* %p, i128 %bits) {
; CHECK-LABEL: fetch_and_nand:		; CHECK-LABEL: fetch_and_nand:
; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]		; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]
; CHECK-DAG: movq (%rdi), %rax		; CHECK-DAG: lock cmpxchg16b (%rdi)
; CHECK-DAG: movq 8(%rdi), %rdx

; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:		; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:
; CHECK: movq %rdx, %rcx		; CHECK: movq %rdx, %rcx
; CHECK: andq [[INCHI]], %rcx		; CHECK: andq [[INCHI]], %rcx
; CHECK: movq %rax, %rbx		; CHECK: movq %rax, %rbx
; INCLO equivalent comes in in %rsi, so it makes sense it stays there.		; INCLO equivalent comes in in %rsi, so it makes sense it stays there.
; CHECK: andq %rsi, %rbx		; CHECK: andq %rsi, %rbx
; CHECK: notq %rbx		; CHECK: notq %rbx
; CHECK: notq %rcx		; CHECK: notq %rcx
; CHECK: lock		; CHECK: lock
; CHECK: cmpxchg16b (%rdi)		; CHECK: cmpxchg16b (%rdi)
; CHECK: jne [[LOOP]]		; CHECK: jne [[LOOP]]

; CHECK: movq %rax, _var		; CHECK: movq %rax, _var
; CHECK: movq %rdx, _var+8		; CHECK: movq %rdx, _var+8
%val = atomicrmw nand i128* %p, i128 %bits release		%val = atomicrmw nand i128* %p, i128 %bits release
store i128 %val, i128* @var, align 16		store i128 %val, i128* @var, align 16
ret void		ret void
}		}

define void @fetch_and_or(i128* %p, i128 %bits) {		define void @fetch_and_or(i128* %p, i128 %bits) {
; CHECK-LABEL: fetch_and_or:		; CHECK-LABEL: fetch_and_or:
; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]		; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]
; CHECK-DAG: movq (%rdi), %rax		; CHECK-DAG: lock cmpxchg16b (%rdi)
; CHECK-DAG: movq 8(%rdi), %rdx

; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:		; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:
; CHECK: movq %rax, %rbx		; CHECK: movq %rax, %rbx
; INCLO equivalent comes in in %rsi, so it makes sense it stays there.		; INCLO equivalent comes in in %rsi, so it makes sense it stays there.
; CHECK: orq %rsi, %rbx		; CHECK: orq %rsi, %rbx
; CHECK: movq %rdx, %rcx		; CHECK: movq %rdx, %rcx
; CHECK: orq [[INCHI]], %rcx		; CHECK: orq [[INCHI]], %rcx
; CHECK: lock		; CHECK: lock
; CHECK: cmpxchg16b (%rdi)		; CHECK: cmpxchg16b (%rdi)
; CHECK: jne [[LOOP]]		; CHECK: jne [[LOOP]]

; CHECK: movq %rax, _var		; CHECK: movq %rax, _var
; CHECK: movq %rdx, _var+8		; CHECK: movq %rdx, _var+8

%val = atomicrmw or i128* %p, i128 %bits seq_cst		%val = atomicrmw or i128* %p, i128 %bits seq_cst
store i128 %val, i128* @var, align 16		store i128 %val, i128* @var, align 16
ret void		ret void
}		}

define void @fetch_and_add(i128* %p, i128 %bits) {		define void @fetch_and_add(i128* %p, i128 %bits) {
; CHECK-LABEL: fetch_and_add:		; CHECK-LABEL: fetch_and_add:
; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]		; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]
; CHECK-DAG: movq (%rdi), %rax		; CHECK-DAG: lock cmpxchg16b (%rdi)
; CHECK-DAG: movq 8(%rdi), %rdx

; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:		; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:
; CHECK: movq %rax, %rbx		; CHECK: movq %rax, %rbx
; INCLO equivalent comes in in %rsi, so it makes sense it stays there.		; INCLO equivalent comes in in %rsi, so it makes sense it stays there.
; CHECK: addq %rsi, %rbx		; CHECK: addq %rsi, %rbx
; CHECK: movq %rdx, %rcx		; CHECK: movq %rdx, %rcx
; CHECK: adcq [[INCHI]], %rcx		; CHECK: adcq [[INCHI]], %rcx
; CHECK: lock		; CHECK: lock
; CHECK: cmpxchg16b (%rdi)		; CHECK: cmpxchg16b (%rdi)
; CHECK: jne [[LOOP]]		; CHECK: jne [[LOOP]]

; CHECK: movq %rax, _var		; CHECK: movq %rax, _var
; CHECK: movq %rdx, _var+8		; CHECK: movq %rdx, _var+8

%val = atomicrmw add i128* %p, i128 %bits seq_cst		%val = atomicrmw add i128* %p, i128 %bits seq_cst
store i128 %val, i128* @var, align 16		store i128 %val, i128* @var, align 16
ret void		ret void
}		}

define void @fetch_and_sub(i128* %p, i128 %bits) {		define void @fetch_and_sub(i128* %p, i128 %bits) {
; CHECK-LABEL: fetch_and_sub:		; CHECK-LABEL: fetch_and_sub:
; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]		; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]
; CHECK-DAG: movq (%rdi), %rax		; CHECK-DAG: lock cmpxchg16b (%rdi)
; CHECK-DAG: movq 8(%rdi), %rdx

; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:		; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:
; CHECK: movq %rax, %rbx		; CHECK: movq %rax, %rbx
; INCLO equivalent comes in in %rsi, so it makes sense it stays there.		; INCLO equivalent comes in in %rsi, so it makes sense it stays there.
; CHECK: subq %rsi, %rbx		; CHECK: subq %rsi, %rbx
; CHECK: movq %rdx, %rcx		; CHECK: movq %rdx, %rcx
; CHECK: sbbq [[INCHI]], %rcx		; CHECK: sbbq [[INCHI]], %rcx
; CHECK: lock		; CHECK: lock
; CHECK: cmpxchg16b (%rdi)		; CHECK: cmpxchg16b (%rdi)
; CHECK: jne [[LOOP]]		; CHECK: jne [[LOOP]]

; CHECK: movq %rax, _var		; CHECK: movq %rax, _var
; CHECK: movq %rdx, _var+8		; CHECK: movq %rdx, _var+8

%val = atomicrmw sub i128* %p, i128 %bits seq_cst		%val = atomicrmw sub i128* %p, i128 %bits seq_cst
store i128 %val, i128* @var, align 16		store i128 %val, i128* @var, align 16
ret void		ret void
}		}

define void @fetch_and_min(i128* %p, i128 %bits) {		define void @fetch_and_min(i128* %p, i128 %bits) {
; CHECK-LABEL: fetch_and_min:		; CHECK-LABEL: fetch_and_min:
; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]		; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]
; CHECK-DAG: movq (%rdi), %rax		; CHECK-DAG: lock cmpxchg16b (%rdi)
; CHECK-DAG: movq 8(%rdi), %rdx

; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:		; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:
; CHECK: cmpq %rsi, %rax		; CHECK: cmpq %rsi, %rax
; CHECK: setbe [[CMP:%[a-z0-9]+]]		; CHECK: setbe [[CMP:%[a-z0-9]+]]
; CHECK: cmpq [[INCHI]], %rdx		; CHECK: cmpq [[INCHI]], %rdx
; CHECK: setle [[HICMP:%[a-z0-9]+]]		; CHECK: setle [[HICMP:%[a-z0-9]+]]
; CHECK: je [[USE_LO:.?LBB[0-9]+_[0-9]+]]		; CHECK: je [[USE_LO:.?LBB[0-9]+_[0-9]+]]

Show All 14 Lines	; CHECK: movq %rdx, _var+8
%val = atomicrmw min i128* %p, i128 %bits seq_cst		%val = atomicrmw min i128* %p, i128 %bits seq_cst
store i128 %val, i128* @var, align 16		store i128 %val, i128* @var, align 16
ret void		ret void
}		}

define void @fetch_and_max(i128* %p, i128 %bits) {		define void @fetch_and_max(i128* %p, i128 %bits) {
; CHECK-LABEL: fetch_and_max:		; CHECK-LABEL: fetch_and_max:
; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]		; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]
; CHECK-DAG: movq (%rdi), %rax		; CHECK-DAG: lock cmpxchg16b (%rdi)
; CHECK-DAG: movq 8(%rdi), %rdx

; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:		; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:
; CHECK: cmpq %rsi, %rax		; CHECK: cmpq %rsi, %rax
; CHECK: setae [[CMP:%[a-z0-9]+]]		; CHECK: setae [[CMP:%[a-z0-9]+]]
; CHECK: cmpq [[INCHI]], %rdx		; CHECK: cmpq [[INCHI]], %rdx
; CHECK: setge [[HICMP:%[a-z0-9]+]]		; CHECK: setge [[HICMP:%[a-z0-9]+]]
; CHECK: je [[USE_LO:.?LBB[0-9]+_[0-9]+]]		; CHECK: je [[USE_LO:.?LBB[0-9]+_[0-9]+]]

Show All 14 Lines	; CHECK: movq %rdx, _var+8
%val = atomicrmw max i128* %p, i128 %bits seq_cst		%val = atomicrmw max i128* %p, i128 %bits seq_cst
store i128 %val, i128* @var, align 16		store i128 %val, i128* @var, align 16
ret void		ret void
}		}

define void @fetch_and_umin(i128* %p, i128 %bits) {		define void @fetch_and_umin(i128* %p, i128 %bits) {
; CHECK-LABEL: fetch_and_umin:		; CHECK-LABEL: fetch_and_umin:
; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]		; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]
; CHECK-DAG: movq (%rdi), %rax		; CHECK-DAG: lock cmpxchg16b (%rdi)
; CHECK-DAG: movq 8(%rdi), %rdx

; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:		; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:
; CHECK: cmpq %rsi, %rax		; CHECK: cmpq %rsi, %rax
; CHECK: setbe [[CMP:%[a-z0-9]+]]		; CHECK: setbe [[CMP:%[a-z0-9]+]]
; CHECK: cmpq [[INCHI]], %rdx		; CHECK: cmpq [[INCHI]], %rdx
; CHECK: setbe [[HICMP:%[a-z0-9]+]]		; CHECK: setbe [[HICMP:%[a-z0-9]+]]
; CHECK: je [[USE_LO:.?LBB[0-9]+_[0-9]+]]		; CHECK: je [[USE_LO:.?LBB[0-9]+_[0-9]+]]

Show All 14 Lines	; CHECK: movq %rdx, _var+8
%val = atomicrmw umin i128* %p, i128 %bits seq_cst		%val = atomicrmw umin i128* %p, i128 %bits seq_cst
store i128 %val, i128* @var, align 16		store i128 %val, i128* @var, align 16
ret void		ret void
}		}

define void @fetch_and_umax(i128* %p, i128 %bits) {		define void @fetch_and_umax(i128* %p, i128 %bits) {
; CHECK-LABEL: fetch_and_umax:		; CHECK-LABEL: fetch_and_umax:
; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]		; CHECK-DAG: movq %rdx, [[INCHI:%[a-z0-9]+]]
; CHECK-DAG: movq (%rdi), %rax		; CHECK-DAG: lock cmpxchg16b (%rdi)
; CHECK-DAG: movq 8(%rdi), %rdx

; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:		; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:
; CHECK: cmpq %rax, %rsi		; CHECK: cmpq %rax, %rsi
; CHECK: setb [[CMP:%[a-z0-9]+]]		; CHECK: setb [[CMP:%[a-z0-9]+]]
; CHECK: cmpq [[INCHI]], %rdx		; CHECK: cmpq [[INCHI]], %rdx
; CHECK: seta [[HICMP:%[a-z0-9]+]]		; CHECK: seta [[HICMP:%[a-z0-9]+]]
; CHECK: je [[USE_LO:.?LBB[0-9]+_[0-9]+]]		; CHECK: je [[USE_LO:.?LBB[0-9]+_[0-9]+]]

Show All 39 Lines
; CHECK: cmpxchg16b (%rdi)		; CHECK: cmpxchg16b (%rdi)

%r = load atomic i128, i128* %p monotonic, align 16		%r = load atomic i128, i128* %p monotonic, align 16
ret i128 %r		ret i128 %r
}		}

define void @atomic_store_seq_cst(i128* %p, i128 %in) {		define void @atomic_store_seq_cst(i128* %p, i128 %in) {
; CHECK-LABEL: atomic_store_seq_cst:		; CHECK-LABEL: atomic_store_seq_cst:
; CHECK: movq %rdx, %rcx		; CHECK: movq %rdx, %r8
; CHECK: movq %rsi, %rbx		; CHECK: lock cmpxchg16b (%rdi)
; CHECK: movq (%rdi), %rax
; CHECK: movq 8(%rdi), %rdx

; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:		; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:
; CHECK: lock		; CHECK: lock
; CHECK: cmpxchg16b (%rdi)		; CHECK: cmpxchg16b (%rdi)
; CHECK: jne [[LOOP]]		; CHECK: jne [[LOOP]]
; CHECK-NOT: callq ___sync_lock_test_and_set_16		; CHECK-NOT: callq ___sync_lock_test_and_set_16

store atomic i128 %in, i128* %p seq_cst, align 16		store atomic i128 %in, i128* %p seq_cst, align 16
ret void		ret void
}		}

define void @atomic_store_release(i128* %p, i128 %in) {		define void @atomic_store_release(i128* %p, i128 %in) {
; CHECK-LABEL: atomic_store_release:		; CHECK-LABEL: atomic_store_release:
; CHECK: movq %rdx, %rcx		; CHECK: movq %rdx, %r8
; CHECK: movq %rsi, %rbx		; CHECK: lock cmpxchg16b (%rdi)
; CHECK: movq (%rdi), %rax
; CHECK: movq 8(%rdi), %rdx

; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:		; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:
; CHECK: lock		; CHECK: lock
; CHECK: cmpxchg16b (%rdi)		; CHECK: cmpxchg16b (%rdi)
; CHECK: jne [[LOOP]]		; CHECK: jne [[LOOP]]

store atomic i128 %in, i128* %p release, align 16		store atomic i128 %in, i128* %p release, align 16
ret void		ret void
}		}

define void @atomic_store_relaxed(i128* %p, i128 %in) {		define void @atomic_store_relaxed(i128* %p, i128 %in) {
; CHECK-LABEL: atomic_store_relaxed:		; CHECK-LABEL: atomic_store_relaxed:
; CHECK: movq %rdx, %rcx		; CHECK: movq %rdx, %r8
; CHECK: movq %rsi, %rbx		; CHECK: lock cmpxchg16b (%rdi)
; CHECK: movq (%rdi), %rax
; CHECK: movq 8(%rdi), %rdx

; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:		; CHECK: [[LOOP:.?LBB[0-9]+_[0-9]+]]:
; CHECK: lock		; CHECK: lock
; CHECK: cmpxchg16b (%rdi)		; CHECK: cmpxchg16b (%rdi)
; CHECK: jne [[LOOP]]		; CHECK: jne [[LOOP]]

store atomic i128 %in, i128* %p unordered, align 16		store atomic i128 %in, i128* %p unordered, align 16
ret void		ret void
}		}

test/Transforms/AtomicExpand/X86/expand-atomic-rmw-initial-load.ll

	; RUN: opt -S %s -atomic-expand -mtriple=i686-linux-gnu \| FileCheck %s			; RUN: opt -S %s -atomic-expand -mtriple=i686-linux-gnu \| FileCheck %s

	; This file tests the function `llvm::expandAtomicRMWToCmpXchg`.			; This file tests the function `llvm::expandAtomicRMWToCmpXchg`.
	; It isn't technically target specific, but is exposed through a pass that is.			; It isn't technically target specific, but is exposed through a pass that is.

	define i8 @test_initial_load(i8* %ptr, i8 %value) {			define i8 @test_initial_load_seq_cst(i8* %ptr, i8 %value) {
	%res = atomicrmw nand i8* %ptr, i8 %value seq_cst			%res = atomicrmw nand i8* %ptr, i8 %value seq_cst
				jfbUnsubmitted Not Done Reply Inline Actions Could you also test the other memory orderings in separate test functions? I mainly want to ensure that an `acq_rel` RMW operation gets a `load acquire` (the others should have the same ordering). jfb: Could you also test the other memory orderings in separate test functions? I mainly want to…
				jfbUnsubmitted Done Reply Inline Actions Also `relaxed`, `acquire`, `release` (which should stay as-is). jfb: Also `relaxed`, `acquire`, `release` (which should stay as-is).
				DiamondLovesYouAuthorUnsubmitted Not Done Reply Inline Actions Sorry, I didn't realize you asked for the others until after I checked `Done`. Also, `s/relaxed/monotonic/`. DiamondLovesYou: Sorry, I didn't realize you asked for the others until after I checked `Done`. Also…
	ret i8 %res			ret i8 %res
	}			}
	; CHECK-LABEL: @test_initial_load			; CHECK-LABEL: @test_initial_load_seq_cst
	; CHECK-NEXT: %1 = load i8, i8* %ptr, align 1			; CHECK-NEXT: %1 = load atomic i8, i8* %ptr seq_cst, align 1

				define i8 @test_initial_load_acq_rel(i8* %ptr, i8 %value) {
				%res = atomicrmw nand i8* %ptr, i8 %value acq_rel
				ret i8 %res
				}
				; CHECK-LABEL: @test_initial_load_acq_rel
				; CHECK-NEXT: %1 = load atomic i8, i8* %ptr acquire, align 1

				define i8 @test_initial_load_acquire(i8* %ptr, i8 %value) {
				%res = atomicrmw nand i8* %ptr, i8 %value acquire
				ret i8 %res
				}
				; CHECK-LABEL: @test_initial_load_acquire
				; CHECK-NEXT: %1 = load atomic i8, i8* %ptr acquire, align 1

				define i8 @test_initial_load_release(i8* %ptr, i8 %value) {
				%res = atomicrmw nand i8* %ptr, i8 %value release
				ret i8 %res
				}
				; CHECK-LABEL: @test_initial_load_release
				; CHECK-NEXT: %1 = load atomic i8, i8* %ptr monotonic, align 1

				define i8 @test_initial_load_monotonic(i8* %ptr, i8 %value) {
				%res = atomicrmw nand i8* %ptr, i8 %value monotonic
				ret i8 %res
				}
				; CHECK-LABEL: @test_initial_load_monotonic
				; CHECK-NEXT: %1 = load atomic i8, i8* %ptr monotonic, align 1