This is an archive of the discontinued LLVM Phabricator instance.

[Trivial Dead] Consider any non volatile load as trivially dead independent on ordering
Needs ReviewPublic

Authored by skatkov on Apr 22 2022, 4:23 AM.

Download Raw Diff

Details

Reviewers

reames
fhahn
nikic
rampitec
efriedma
timshen
t-tye

Summary

None volatile load can be considered as trivial dead independent on atomic
semantic. This is based on the fact that release-acquire synchronization
happens only if load reads the value written by store-release operation.
As soone as load instruction does not have uses, so no one will check what
actually value has been read. So optimizer may suggest that load reads the
value was before store release happened and so no synchronization happened.
This allows us simply to remove this load.

Diff Detail

Event Timeline

skatkov created this revision.Apr 22 2022, 4:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2022, 4:23 AM

Herald added subscribers: kerbowa, hiraditya, jvesely, nemanjai. · View Herald Transcript

skatkov requested review of this revision.Apr 22 2022, 4:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2022, 4:23 AM

Harbormaster completed remote builds in B160821: Diff 424429.Apr 22 2022, 4:59 AM

nikic mentioned this in D124241: [Local] Consider atomic loads from constant global as dead.Apr 22 2022, 5:24 AM

lkail added a subscriber: lkail.Apr 22 2022, 5:27 AM

rampitec added a reviewer: t-tye.Apr 22 2022, 10:30 AM

This isn't trivially correct.

For example, say you have a sequence "%load1 = load seq_cst %a; %load2 = load relaxed %a". I think the later load can sort of "inherit" the sequential consistency of the earlier load; the values it can contain are restricted. (I mean, maybe I'm missing something, but it's not as simple as "no one will check what actually value has been read".)

In D124247#3468179, @efriedma wrote:

This isn't trivially correct.

For example, say you have a sequence "%load1 = load seq_cst %a; %load2 = load relaxed %a". I think the later load can sort of "inherit" the sequential consistency of the earlier load; the values it can contain are restricted. (I mean, maybe I'm missing something, but it's not as simple as "no one will check what actually value has been read".)

I'm sorry I do not follow your example. First of all read the same memory as atomic and as not an atomic generally is not a good idea, however...

If in real execution we have an order load1, store-release in other thread, load2 then load2 does not introduce any release-acquire synchronization. So it would be incorrect to say about any inheritance.
May be I miss anything as well....

I think I wrote it backwards. Say you have something like the following:

// thread 1
store seq_cst 1, %a

// thread 2
%load1 = load monotonic %a
%load2 = load seq_cst %a
if (%load1 == 1) {
  // If %load1 produces 1, %load2 must also produce 1.
  // So %load2 acts as an acquire barrier.
}

In this example, I think it isn't legal to erase %load2.

Hi Eli, nice example. Do I understand correctly that here is the problem with exactly using of monotonic? If we used unordered (aka relaxed) then load2 can be safely removed?

The bad thing here is the no analysis will help me, so for the function:
f(int x) {

load acq a
if (x == 42) {
  sync may happend due to incoming parameter x might be result of load a monotonic.
}

}
So generally we should check all callers for this pattern which is likely not a good idea.

However if we could say that our language/runtime does not use monotonic atomic instructions (or do not use them for pointers from some addrspace or say for gc pointers) we are safe to remove such load?
Or you have some other example in your pocket?:)

Do I understand correctly that here is the problem with exactly using of monotonic? If we used unordered (aka relaxed) then load2 can be safely removed?

In that example, yes. If the first load isn't atomic, you'd have a race, which has undefined behavior. If the first load is seq_cst, the second load is redundant, so it can be removed.

So generally we should check all callers for this pattern which is likely not a good idea.

Right; it's very hard to detect cases where we can actually prove the safety.

However if we could say that our language/runtime does not use monotonic atomic instructions (or do not use them for pointers from some addrspace or say for gc pointers) we are safe to remove such load?

If all memory ops in the program are seq_cst, it ends up being okay: every operation in the program is globally sequenced anyway, so a "dead" atomic op can't impose any additional ordering restrictions. If you mix in non-atomic ops, it's more difficult to reason about, but I think you end up with a non-deterministic race in any other case where the "dead" atomic might matter.

I don't want to think about what happens if you mix in monotonic ops in other address-spaces.

Thanks, for you comment. Let me add details about my reasoning.
Let's we have load atomic %a with acquire or seq_cst semantic and this load has no uses.
According to specification this load has a ordering semantic only if it observes the value stored with release semantic to the same memory location.
If there is no uses then explicit check for is impossible. But it can be a implicit check like in example you mentioned.
So to do this implicit check we should load a value before our load and check its value. If our load must have the same value as previous load the previously loaded value may be used for such check.
Side note, let's consider there is only one store release to avoid mentioning "observe release store or later store" all the time.

So what this previous load may be:

Non atomic load - not interesting due to specification declares that atomic release store and non-atomic load is undefined behavior.
Atomic seq_cst or acquire load - if the previous such load already observes the store release our load is not required due to synchronization already happened and our load can be simply removed. If the previous load does not observe store release then it does not help to in implicit check.
Atomic monotonic load - according to your example definitely a problem. So if there is a monotonic load before our load, its check implicitly checks our load and we cannot eliminate our load.
Atomic unordered load - generally our acquire load does not prohibit moving other unordered load after our load (if it would be other memory location for sure and for the same location I guess also) and so it does not help in implicit check of our load.

So it looks like only usage of monotonic load before our load can be used for implicit check.
If it is true then we can probably add the following check in compiler:

If the monotonic atomic load is not possible to be used for accessing the memory location used in our load then we can remove our load.

It can be done on different levels:

Add global flag that this complier does not use monotonic atomics.
Say for this addrspace compiler does not use monotonic atomics.
Say that for the current GC, gc pointers cannot be accessed with monotonic atomics.

In all cases verifier should be updated to ensure that monotonic loads are really not used for this type of pointers.

This is an idea. Do I miss anything?

kpn added a subscriber: kpn.Apr 26 2022, 6:41 AM

The part of that analysis I'm most unsure about is the handling of "acquire" loads. Overall sequential consistency goes beyond just sequencing between "release" stores and "acquire" loads.

Generally, I'd rather not go significantly outside the transforms that have been formally proven; I don't trust my intuition here.

Sequential consistency adds total order of all seq_cst operations to release-acquire semantic. However If load is not used its total order does not matter, so only acquire semantic plays role here.

But I understand your point.
All this stuff related to synchronization is complicated. The worst thing is that in this area it is easier to reject something with an example than to prove that such example is not possible.

Don't you think that adding the implementation above under the flag is more or less safe if it is off by default and we can consider it switching on some day in future. At least we can try to play with it...
Meaning I'd like to add it only if there is no visible failure in my analysis.

And thanks for the discussion anyway - the topic is pretty complex and discuss it with someone is really helpful.

Sequential consistency adds total order of all seq_cst operations to release-acquire semantic. However If load is not used its total order does not matter, so only acquire semantic plays role here.

I'm not sure this is right, but I don't really have time to dig into it. (Maybe you could end up in a situation where the result of an "acquire" load contradicts the total order, even if the seq_cst load is not used.

Don't you think that adding the implementation above under the flag is more or less safe if it is off by default and we can consider it switching on some day in future. At least we can try to play with it...

Under your proposal, we'd need some way to specify that some particular set of loads is safe to optimize more aggressively. Given that existing IR wouldn't have those markings, there's not really any point to turning off the feature by default?

More generally, if C++11 atomics aren't suitable for your use-case, it would make sense to look at other possibilities. If you're interested in pursuing this, please start a thread on Discourse.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Utils/

Local.cpp

10 lines

test/

CodeGen/

AMDGPU/

noclobber-barrier.ll

6 lines

PowerPC/

atomics-constant.ll

4 lines

Transforms/

EarlyCSE/

atomics.ll

19 lines

basic.ll

15 lines

InstCombine/

atomic.ll

2 lines

store.ll

1 line

Diff 424429

llvm/lib/Transforms/Utils/Local.cpp

Show First 20 Lines • Show All 436 Lines • ▼ Show 20 Lines	if (DbgLabelInst *DLI = dyn_cast<DbgLabelInst>(I)) {
if (DLI->getLabel())		if (DLI->getLabel())
return false;		return false;
return true;		return true;
}		}

if (!I->willReturn())		if (!I->willReturn())
return false;		return false;

		// None volatile load can be considered as trivial dead independent on atomic
		// semantic. This is based on the fact that release-acquire synchronization
		// happens only if load reads the value written by store-release operation.
		// As soone as load instruction does not have uses, so no one will check what
		// actually value has been read. So optimizer may suggest that load reads the
		// value was before store release happened and so no synchronization happened.
		// This allows us simply to remove this load.
		if (LoadInst *LI = dyn_cast<LoadInst>(I))
		return !LI->isVolatile();

if (!I->mayHaveSideEffects())		if (!I->mayHaveSideEffects())
return true;		return true;

// Special case intrinsics that "may have side effects" but can be deleted		// Special case intrinsics that "may have side effects" but can be deleted
// when dead.		// when dead.
if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {		if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
// Safe to delete llvm.stacksave and launder.invariant.group if dead.		// Safe to delete llvm.stacksave and launder.invariant.group if dead.
if (II->getIntrinsicID() == Intrinsic::stacksave \|\|		if (II->getIntrinsicID() == Intrinsic::stacksave \|\|
▲ Show 20 Lines • Show All 2,942 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/noclobber-barrier.ll

	Show First 20 Lines • Show All 317 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @clobber_by_atomic_load(i32 addrspace(1)* %arg) {			define amdgpu_kernel void @clobber_by_atomic_load(i32 addrspace(1)* %arg) {
	; CHECK-LABEL: @clobber_by_atomic_load(			; CHECK-LABEL: @clobber_by_atomic_load(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[I:%.]] = load i32, i32 addrspace(1) [[ARG:%.*]], align 4, !amdgpu.noclobber !0			; CHECK-NEXT: [[I:%.]] = load i32, i32 addrspace(1) [[ARG:%.*]], align 4, !amdgpu.noclobber !0
	; CHECK-NEXT: [[GEP:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[ARG]], i64 2, !amdgpu.uniform !0			; CHECK-NEXT: [[GEP:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[ARG]], i64 2, !amdgpu.uniform !0
	; CHECK-NEXT: [[VAL:%.]] = load atomic i32, i32 addrspace(1) [[GEP]] seq_cst, align 4, !amdgpu.noclobber !0			; CHECK-NEXT: [[VAL:%.]] = load atomic i32, i32 addrspace(1) [[GEP]] seq_cst, align 4, !amdgpu.noclobber !0
	; CHECK-NEXT: [[I1:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[ARG]], i64 3, !amdgpu.uniform !0			; CHECK-NEXT: [[I1:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[ARG]], i64 3, !amdgpu.uniform !0
	; CHECK-NEXT: [[I2:%.]] = load i32, i32 addrspace(1) [[I1]], align 4			; CHECK-NEXT: [[I2:%.]] = load i32, i32 addrspace(1) [[I1]], align 4
	; CHECK-NEXT: [[I3:%.*]] = add i32 [[I2]], [[I]]			; CHECK-NEXT: [[I3_1:%.*]] = add i32 [[I2]], [[I]]
				; CHECK-NEXT: [[I3:%.*]] = add i32 [[I3_1]], [[VAL]]
	; CHECK-NEXT: [[I4:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[ARG]], i64 4			; CHECK-NEXT: [[I4:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[ARG]], i64 4
	; CHECK-NEXT: store i32 [[I3]], i32 addrspace(1)* [[I4]], align 4			; CHECK-NEXT: store i32 [[I3]], i32 addrspace(1)* [[I4]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%i = load i32, i32 addrspace(1)* %arg, align 4			%i = load i32, i32 addrspace(1)* %arg, align 4
	%gep = getelementptr inbounds i32, i32 addrspace(1)* %arg, i64 2			%gep = getelementptr inbounds i32, i32 addrspace(1)* %arg, i64 2
	%val = load atomic i32, i32 addrspace(1)* %gep seq_cst, align 4			%val = load atomic i32, i32 addrspace(1)* %gep seq_cst, align 4
	%i1 = getelementptr inbounds i32, i32 addrspace(1)* %arg, i64 3			%i1 = getelementptr inbounds i32, i32 addrspace(1)* %arg, i64 3
	%i2 = load i32, i32 addrspace(1)* %i1, align 4			%i2 = load i32, i32 addrspace(1)* %i1, align 4
	%i3 = add i32 %i2, %i			%i3_1 = add i32 %i2, %i
				%i3 = add i32 %i3_1, %val
	%i4 = getelementptr inbounds i32, i32 addrspace(1)* %arg, i64 4			%i4 = getelementptr inbounds i32, i32 addrspace(1)* %arg, i64 4
	store i32 %i3, i32 addrspace(1)* %i4, align 4			store i32 %i3, i32 addrspace(1)* %i4, align 4
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}no_alias_store:			; GCN-LABEL: {{^}}no_alias_store:
	; GCN: ds_write_b32			; GCN: ds_write_b32
	; GCN: s_barrier			; GCN: s_barrier
	▲ Show 20 Lines • Show All 261 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/atomics-constant.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s \| FileCheck %s			; RUN: llc < %s \| FileCheck %s

	target triple = "powerpc64le-unknown-linux-gnu"			target triple = "powerpc64le-unknown-linux-gnu"

	@a = dso_local constant i64 zeroinitializer			@a = dso_local constant i64 zeroinitializer

	define i64 @foo() {			define i64 @foo() {
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: li 4, 0			; CHECK-NEXT: li 4, 0
	; CHECK-NEXT: addis 3, 2, a@toc@ha
	; CHECK-NEXT: ld 3, a@toc@l(3)
	; CHECK-NEXT: cmpd 7, 4, 4
	; CHECK-NEXT: li 3, 0			; CHECK-NEXT: li 3, 0
				; CHECK-NEXT: cmpd 7, 4, 4
	; CHECK-NEXT: bne- 7, .+4			; CHECK-NEXT: bne- 7, .+4
	; CHECK-NEXT: isync			; CHECK-NEXT: isync
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	entry:			entry:
	%value = load atomic i64, i64* @a acquire, align 8			%value = load atomic i64, i64* @a acquire, align 8
	ret i64 %value			ret i64 %value
	}			}

llvm/test/Transforms/EarlyCSE/atomics.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -S -early-cse -earlycse-debug-hash \| FileCheck %s			; RUN: opt < %s -S -early-cse -earlycse-debug-hash \| FileCheck %s
	; RUN: opt < %s -S -basic-aa -early-cse-memssa \| FileCheck %s			; RUN: opt < %s -S -basic-aa -early-cse-memssa \| FileCheck %s
				; RUN: opt < %s -S -passes=early-cse \| FileCheck %s

	define i32 @test12(i1 %B, i32* %P1, i32* %P2) {			define i32 @test12(i1 %B, i32* %P1, i32* %P2) {
	; CHECK-LABEL: @test12(			; CHECK-LABEL: @test12(
	; CHECK-NEXT: [[LOAD0:%.]] = load i32, i32 [[P1:%.*]], align 4			; CHECK-NEXT: [[LOAD0:%.]] = load i32, i32 [[P1:%.*]], align 4
				; CHECK-NEXT: ret i32 [[LOAD0]]
				;
				%load0 = load i32, i32* %P1
				%1 = load atomic i32, i32* %P2 seq_cst, align 4
				%load1 = load i32, i32* %P1
				%sel = select i1 %B, i32 %load0, i32 %load1
				ret i32 %sel
				}

				define i32 @test12_2(i1 %B, i32* %P1, i32* %P2) {
				; CHECK-LABEL: @test12_2(
				; CHECK-NEXT: [[LOAD0:%.]] = load i32, i32 [[P1:%.*]], align 4
	; CHECK-NEXT: [[TMP1:%.]] = load atomic i32, i32 [[P2:%.*]] seq_cst, align 4			; CHECK-NEXT: [[TMP1:%.]] = load atomic i32, i32 [[P2:%.*]] seq_cst, align 4
	; CHECK-NEXT: [[LOAD1:%.]] = load i32, i32 [[P1]], align 4			; CHECK-NEXT: [[LOAD1:%.]] = load i32, i32 [[P1]], align 4
	; CHECK-NEXT: [[SEL:%.]] = select i1 [[B:%.]], i32 [[LOAD0]], i32 [[LOAD1]]			; CHECK-NEXT: [[SEL:%.]] = select i1 [[B:%.]], i32 [[LOAD0]], i32 [[LOAD1]]
	; CHECK-NEXT: ret i32 [[SEL]]			; CHECK-NEXT: [[RES:%.*]] = add i32 [[SEL]], [[TMP1]]
				; CHECK-NEXT: ret i32 [[RES]]
	;			;
	%load0 = load i32, i32* %P1			%load0 = load i32, i32* %P1
	%1 = load atomic i32, i32* %P2 seq_cst, align 4			%1 = load atomic i32, i32* %P2 seq_cst, align 4
	%load1 = load i32, i32* %P1			%load1 = load i32, i32* %P1
	%sel = select i1 %B, i32 %load0, i32 %load1			%sel = select i1 %B, i32 %load0, i32 %load1
	ret i32 %sel			%res = add i32 %sel, %1
				ret i32 %res
	}			}

	; atomic to non-atomic forwarding is legal			; atomic to non-atomic forwarding is legal
	define i32 @test13(i1 %B, i32* %P1) {			define i32 @test13(i1 %B, i32* %P1) {
	; CHECK-LABEL: @test13(			; CHECK-LABEL: @test13(
	; CHECK-NEXT: [[A:%.]] = load atomic i32, i32 [[P1:%.*]] seq_cst, align 4			; CHECK-NEXT: [[A:%.]] = load atomic i32, i32 [[P1:%.*]] seq_cst, align 4
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	▲ Show 20 Lines • Show All 281 Lines • Show Last 20 Lines

llvm/test/Transforms/EarlyCSE/basic.ll

	Show First 20 Lines • Show All 229 Lines • ▼ Show 20 Lines
	;			;
	store i32 42, i32* %P			store i32 42, i32* %P
	store i32 43, i32* %P			store i32 43, i32* %P
	store i32 44, i32* %P			store i32 44, i32* %P
	store i32 45, i32* %P			store i32 45, i32* %P
	ret void			ret void
	}			}

	define i32 @test12(i1 %B, i32* %P1, i32* %P2) {
	; CHECK-LABEL: @test12(
	; CHECK-NEXT: [[LOAD0:%.]] = load i32, i32 [[P1:%.*]], align 4
	; CHECK-NEXT: [[TMP1:%.]] = load atomic i32, i32 [[P2:%.*]] seq_cst, align 4
	; CHECK-NEXT: [[LOAD1:%.]] = load i32, i32 [[P1]], align 4
	; CHECK-NEXT: [[SEL:%.]] = select i1 [[B:%.]], i32 [[LOAD0]], i32 [[LOAD1]]
	; CHECK-NEXT: ret i32 [[SEL]]
	;
	%load0 = load i32, i32* %P1
	%1 = load atomic i32, i32* %P2 seq_cst, align 4
	%load1 = load i32, i32* %P1
	%sel = select i1 %B, i32 %load0, i32 %load1
	ret i32 %sel
	}

	define void @dse1(i32 *%P) {			define void @dse1(i32 *%P) {
	; CHECK-LABEL: @dse1(			; CHECK-LABEL: @dse1(
	; CHECK-NEXT: [[V:%.]] = load i32, i32 [[P:%.*]], align 4			; CHECK-NEXT: [[V:%.]] = load i32, i32 [[P:%.*]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%v = load i32, i32* %P			%v = load i32, i32* %P
	store i32 %v, i32* %P			store i32 %v, i32* %P
	ret void			ret void
	▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/atomic.ll

Show First 20 Lines • Show All 419 Lines • ▼ Show 20 Lines	;
store atomic i64 %1, i64* %2 unordered, align 8		store atomic i64 %1, i64* %2 unordered, align 8
ret void		ret void
}		}

@c = constant i32 42		@c = constant i32 42

define i32 @atomic_load_from_constant_global() {		define i32 @atomic_load_from_constant_global() {
; CHECK-LABEL: @atomic_load_from_constant_global(		; CHECK-LABEL: @atomic_load_from_constant_global(
; CHECK-NEXT: [[V:%.]] = load atomic i32, i32 @c seq_cst, align 4
; CHECK-NEXT: ret i32 42		; CHECK-NEXT: ret i32 42
;		;
%v = load atomic i32, i32* @c seq_cst, align 4		%v = load atomic i32, i32* @c seq_cst, align 4
ret i32 %v		ret i32 %v
}		}

define i8 @atomic_load_from_constant_global_bitcast() {		define i8 @atomic_load_from_constant_global_bitcast() {
; CHECK-LABEL: @atomic_load_from_constant_global_bitcast(		; CHECK-LABEL: @atomic_load_from_constant_global_bitcast(
; CHECK-NEXT: [[V:%.]] = load atomic i8, i8 bitcast (i32* @c to i8*) seq_cst, align 1
; CHECK-NEXT: ret i8 42		; CHECK-NEXT: ret i8 42
;		;
%v = load atomic i8, i8* bitcast (i32* @c to i8*) seq_cst, align 1		%v = load atomic i8, i8* bitcast (i32* @c to i8*) seq_cst, align 1
ret i8 %v		ret i8 %v
}		}

define void @volatile_load_from_constant_global() {		define void @volatile_load_from_constant_global() {
; CHECK-LABEL: @volatile_load_from_constant_global(		; CHECK-LABEL: @volatile_load_from_constant_global(
; CHECK-NEXT: [[TMP1:%.]] = load volatile i32, i32 @c, align 4		; CHECK-NEXT: [[TMP1:%.]] = load volatile i32, i32 @c, align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
load volatile i32, i32* @c, align 4		load volatile i32, i32* @c, align 4
ret void		ret void
}		}

attributes #0 = { null_pointer_is_valid }		attributes #0 = { null_pointer_is_valid }

llvm/test/Transforms/InstCombine/store.ll

	Show First 20 Lines • Show All 300 Lines • ▼ Show 20 Lines
	;			;
	%v = load atomic i32, i32* %p unordered, align 4			%v = load atomic i32, i32* %p unordered, align 4
	store atomic i32 %v, i32* %p seq_cst, align 4			store atomic i32 %v, i32* %p seq_cst, align 4
	ret void			ret void
	}			}

	define void @write_back6(i32* %p) {			define void @write_back6(i32* %p) {
	; CHECK-LABEL: @write_back6(			; CHECK-LABEL: @write_back6(
	; CHECK-NEXT: [[V:%.]] = load atomic i32, i32 [[P:%.*]] seq_cst, align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%v = load atomic i32, i32* %p seq_cst, align 4			%v = load atomic i32, i32* %p seq_cst, align 4
	store atomic i32 %v, i32* %p unordered, align 4			store atomic i32 %v, i32* %p unordered, align 4
	ret void			ret void
	}			}

	define void @write_back7(i32* %p) {			define void @write_back7(i32* %p) {
	Show All 24 Lines