This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
atomic-ops-msvc.ll

Differential D141748

[WoA] Use fences for sequentially consistent stores/writes
ClosedPublic

Authored by mnadeem on Jan 13 2023, 8:48 PM.

Download Raw Diff

Details

Reviewers

efriedma
dmgreen
lenary
john.brawn
t.p.northover

Commits

rGc9821abfc023: [WoA] Use fences for sequentially consistent stores/writes

Summary

LLVM currently uses LDAR/STLR and variants for acquire/release
as well as seq_cst operations. This is fine as long as all code uses
this convention.

Normally LDAR/STLR act as one way barriers but when used in
combination they provide a sequentially consistent model. i.e.
when an LDAR appears after an STLR in program order the STLR
acts as a two way fence and the store will be observed before
the load.

The problem is that normal loads (unlike ldar), when they appear
after the STLR can be observed before STLR (if my understanding
is correct). Possibly providing weaker than expected guarantees if
they are used for ordered atomic operations.

Unfortunately in Microsoft Visual Studio STL seq_cst ld/st are
implemented using normal load/stores and explicit fences:
dmb ish + str + dmb ish
ldr + dmb ish

This patch uses fences for MSVC target whenever we write to the
memory in a sequentially consistent way so that we don't rely on
the assumptions that just using LDAR/STLR will give us sequentially
consistent ordering.

Diff Detail

Unit TestsFailed

	Time	Test
	60,250 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/non-overloaded::vloxseg.c
	60,230 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/non-overloaded::vluxseg.c
	60,230 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/overloaded::vloxseg.c
	60,170 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/overloaded::vluxseg.c

Event Timeline

mnadeem created this revision.Jan 13 2023, 8:48 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 13 2023, 8:48 PM

Herald added subscribers: mstorsjo, hiraditya. · View Herald Transcript

mnadeem requested review of this revision.Jan 13 2023, 8:48 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 13 2023, 8:48 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B207776: Diff 489190.Jan 13 2023, 8:49 PM

mnadeem retitled this revision from [WoA] Use fences for sequentially consistent stores to [WoA] Use fences for sequentially consistent stores/writes.Jan 13 2023, 8:53 PM

mnadeem added a parent revision: D141964: [NFC] Precommit test.Jan 17 2023, 1:27 PM

The explanation makes sense.

Can we make this specifically apply to _Interlocked* functions, instead of all atomics on Windows targets? I'd prefer not to impose this performance penalty on other users of atomics if we can avoid it.

It looks like for atomic rmw ops, MSVC generates one barrier, not two; can we do the same?

In D141748#4060636, @efriedma wrote:

The explanation makes sense.

Can we make this specifically apply to _Interlocked* functions, instead of all atomics on Windows targets? I'd prefer not to impose this performance penalty on other users of atomics if we can avoid it.

To be clear this is about the ABI of std::atomic, and we are likely to change that when we get to break that Someday. Interlocked doesn't come with plain load/store ops (because, I assume, it was assumed plain volatiles were sufficient for that)

It looks like for atomic rmw ops, MSVC generates one barrier, not two; can we do the same?

Let me poke our backend folks...

It looks like if MSVC is implementing sequentially-consistent atomic operations in the manner described then we will need to stores (but not loads) in the same way, so it looks like this patch is doing the right thing. Reasoning below:

Using the memory model simulator at http://diy.inria.fr/www/?record=aarch64, the following input has three threads purely using LDAR/STRL

AArch64 SeqCst
{
0:X1=x; 1:X1=x; 2:X1=x;
0:X3=y; 1:X3=y; 2:X3=y;
}
 P0            | P1            | P2 ;
 MOV W0, #1    | MOV W2, #1    | LDAR W2, [X3] ;
 STLR W0, [X1] | STLR W2, [X3] | LDAR W0, [X1] ;
 LDAR W2, [X3] | LDAR W0, [X1] | ;
exists(2:X2=1 /\ 2:X0=0 /\ 0:X2=0 /\ 1:X0=1)

gives the result "No", which means it's not possible for P2 to see X2=1 and X0=0 if P0 sees X2=0 and P1 sees X0=1, or in other words using LDAR/STRL gives sequentially consistent behaviour.

The following has thread P0 do STLR followed by a MSVC-style sequentially consistent load

AArch64 SeqCst
{
0:X1=x; 1:X1=x; 2:X1=x;
0:X3=y; 1:X3=y; 2:X3=y;
}
 P0            | P1            | P2 ;
 MOV W0, #1    | MOV W2, #1    | LDAR W2, [X3] ;
 STLR W0, [X1] | STLR W2, [X3] | LDAR W0, [X1] ;
 LDR W2, [X3]  | LDAR W0, [X1] | ;
 DMB ISH       |               | ;
exists(2:X2=1 /\ 2:X0=0 /\ 0:X2=0 /\ 1:X0=1)

this gives the result "OK", meaning it is possible for these threads to observe that, which means that we don't have sequentially consistent behaviour.

For the MSVC-style sequentially consistent store followed by LDAR

AArch64 SeqCst
{
0:X1=x; 1:X1=x; 2:X1=x;
0:X3=y; 1:X3=y; 2:X3=y;
}
 P0            | P1            | P2 ;
 MOV W0, #1    | MOV W2, #1    | LDAR W2, [X3] ;
 DMB ISH       |               | ;
 STR W0, [X1]  | STLR W2, [X3] | LDAR W0, [X1] ;
 DMB ISH       |               | ;
 LDAR W2, [X3] | LDAR W0, [X1] | ;
exists(2:X2=1 /\ 2:X0=0 /\ 0:X2=0 /\ 1:X0=1)

we get "No", so we still do get sequentially consistent behaviour in this case.

In D141748#4060636, @efriedma wrote:

The explanation makes sense.

Can we make this specifically apply to _Interlocked* functions, instead of all atomics on Windows targets? I'd prefer not to impose this performance penalty on other users of atomics if we can avoid it.

IIUC we must add a fence after every sequentially consistent write to memory.

It looks like for atomic rmw ops, MSVC generates one barrier, not two; can we do the same?

They generate acquire/release instructions with implicit fences + one trailing explicit fence, while this patch generates normal monotonic st/ld, so two explicit fences are needed but the fences are outside the loop.
We could do something similar to MSVC if it helps performance. It would only involve adding a trailing fence while keeping the current acquire/release instructions.

I can think of a few possible ways to do this:

Modify clang to add a fence seq_cst after the interlock intrinsics, I think we would also need to modify any other builtins that end up generating seq_cst writes to memory, so it might get messy.
Add something like TLI->shouldInsertTrailingFenceForAtomic(I) to use in AtomicExpandPass and generate a Trailing Fence while keeping the original memory ordering. With the current patch we set the ordering to Monotonic and thus also need a leading fence.
Add a fence during lowering. Not sure how much effort this will take.

In D141748#4063352, @mnadeem wrote:

In D141748#4060636, @efriedma wrote:

The explanation makes sense.

Can we make this specifically apply to _Interlocked* functions, instead of all atomics on Windows targets? I'd prefer not to impose this performance penalty on other users of atomics if we can avoid it.

IIUC we must add a fence after every sequentially consistent write to memory.

The scenario I wanted to avoid is that someone has an existing codebase that isn't using Microsoft's STL, and is working correctly, but we pessimize the performance to fix a bug that never affected it in the first place. But I guess there are two issues with that:

To maintain sequential consistency between all atomic ops, we'd need to ensure that nothing in the process is using the Microsoft STL <atomic>, which maybe isn't practical.
It looks like MSVC is going to implement _Atomic the same way (see https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-2022-version-17-5-preview-2/); in that case, there are no intrinsics involved, so I guess we need to precisely copy whatever MSVC does.

Maybe we can expose the optimal sequence as a command-line option for codebases that don't need ABI compatibility.

It looks like for atomic rmw ops, MSVC generates one barrier, not two; can we do the same?

They generate acquire/release instructions with implicit fences + one trailing explicit fence, while this patch generates normal monotonic st/ld, so two explicit fences are needed but the fences are outside the loop.
We could do something similar to MSVC if it helps performance.

I assume fewer fences helps performance, although I haven't tried it. Using the acquire/release instructions also means we do the right thing if there's code using the normal armv8 atomic sequences.

Add something like TLI->shouldInsertTrailingFenceForAtomic(I) to use in AtomicExpandPass and generate a Trailing Fence while keeping the original memory ordering. With the current patch we set the ordering to Monotonic and thus also need a leading fence.

This seems fine.

Use fewer barriers and existing acquire/release instruction sequence.
Add a shouldInsertTrailingFenceForAtomicStore() function.

Code LGTM; maybe wait to see if Microsoft has any more feedback before we merge.

This revision is now accepted and ready to land.Jan 18 2023, 4:12 PM

Harbormaster completed remote builds in B208623: Diff 490328.Jan 18 2023, 4:49 PM

This revision was landed with ongoing or failed builds.Jan 23 2023, 4:10 PM

Closed by commit rGc9821abfc023: [WoA] Use fences for sequentially consistent stores/writes (authored by mnadeem). · Explain Why

This revision was automatically updated to reflect the committed changes.

mnadeem added a commit: rGc9821abfc023: [WoA] Use fences for sequentially consistent stores/writes.

efriedma mentioned this in D141429: [AArch64] Codegen for FEAT_LRCPC3.Jan 26 2023, 1:50 PM

simonwallis2 added a subscriber: simonwallis2.May 3 2023, 2:11 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

16 lines

test/

CodeGen/

AArch64/

atomic-ops-msvc.ll

100 lines

Diff 489190

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 22,230 Lines • ▼ Show 20 Lines	if (auto SI = dyn_cast<StoreInst>(I))
return SI->getValueOperand()->getType()->getPrimitiveSizeInBits() == 128 &&		return SI->getValueOperand()->getType()->getPrimitiveSizeInBits() == 128 &&
SI->getAlign() >= Align(16);		SI->getAlign() >= Align(16);

return false;		return false;
}		}

bool AArch64TargetLowering::shouldInsertFencesForAtomic(		bool AArch64TargetLowering::shouldInsertFencesForAtomic(
const Instruction *I) const {		const Instruction *I) const {
		// Store-Release instructions only provide seq_cst guarantees when paired with
		// Load-Acquire instructions. MSVC CRT does not use these instructions to
		// implement seq_cst loads and stores, so we need fences.
		if (Subtarget->getTargetTriple().isWindowsMSVCEnvironment()) {
		if (auto *RMWI = dyn_cast<AtomicRMWInst>(I)) {
		if (RMWI->getOrdering() == AtomicOrdering::SequentiallyConsistent)
		return true;
		} else if (auto *CASI = dyn_cast<AtomicCmpXchgInst>(I)) {
		if (CASI->getSuccessOrdering() == AtomicOrdering::SequentiallyConsistent)
		return true;
		} else if (auto *SI = dyn_cast<StoreInst>(I)) {
		if (SI->getOrdering() == AtomicOrdering::SequentiallyConsistent)
		return true;
		}
		}

return isOpSuitableForLDPSTP(I);		return isOpSuitableForLDPSTP(I);
}		}

// Loads and stores less than 128-bits are already atomic; ones above that		// Loads and stores less than 128-bits are already atomic; ones above that
// are doomed anyway, so defer to the default libcall and blame the OS when		// are doomed anyway, so defer to the default libcall and blame the OS when
// things go wrong.		// things go wrong.
TargetLoweringBase::AtomicExpansionKind		TargetLoweringBase::AtomicExpansionKind
AArch64TargetLowering::shouldExpandAtomicStoreInIR(StoreInst *SI) const {		AArch64TargetLowering::shouldExpandAtomicStoreInIR(StoreInst *SI) const {
▲ Show 20 Lines • Show All 1,717 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/atomic-ops-msvc.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=aarch64-windows-pc-msvc -disable-post-ra -verify-machineinstrs < %s \| FileCheck %s		; RUN: llc -mtriple=aarch64-windows-pc-msvc -disable-post-ra -verify-machineinstrs < %s \| FileCheck %s

@var8 = dso_local global i8 0		@var8 = dso_local global i8 0
@var16 = dso_local global i16 0		@var16 = dso_local global i16 0
@var32 = dso_local global i32 0		@var32 = dso_local global i32 0
@var64 = dso_local global i64 0		@var64 = dso_local global i64 0

define dso_local i8 @test_atomic_load_add_i8(i8 %offset) nounwind {		define dso_local i8 @test_atomic_load_add_i8(i8 %offset) nounwind {
; CHECK-LABEL: test_atomic_load_add_i8:		; CHECK-LABEL: test_atomic_load_add_i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: adrp x9, var8		; CHECK-NEXT: adrp x9, var8
; CHECK-NEXT: add x9, x9, :lo12:var8		; CHECK-NEXT: add x9, x9, :lo12:var8
; CHECK-NEXT: .LBB0_1: // %atomicrmw.start		; CHECK-NEXT: .LBB0_1: // %atomicrmw.start
; CHECK-NEXT: // =>This Inner Loop Header: Depth=1		; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: ldaxrb w8, [x9]		; CHECK-NEXT: ldxrb w8, [x9]
; CHECK-NEXT: add w10, w8, w0		; CHECK-NEXT: add w10, w8, w0
; CHECK-NEXT: stlxrb w11, w10, [x9]		; CHECK-NEXT: stxrb w11, w10, [x9]
; CHECK-NEXT: cbnz w11, .LBB0_1		; CHECK-NEXT: cbnz w11, .LBB0_1
; CHECK-NEXT: // %bb.2: // %atomicrmw.end		; CHECK-NEXT: // %bb.2: // %atomicrmw.end
; CHECK-NEXT: mov w0, w8		; CHECK-NEXT: mov w0, w8
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%old = atomicrmw add ptr @var8, i8 %offset seq_cst		%old = atomicrmw add ptr @var8, i8 %offset seq_cst
ret i8 %old		ret i8 %old
}		}

define dso_local i16 @test_atomic_load_add_i16(i16 %offset) nounwind {		define dso_local i16 @test_atomic_load_add_i16(i16 %offset) nounwind {
; CHECK-LABEL: test_atomic_load_add_i16:		; CHECK-LABEL: test_atomic_load_add_i16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%old = atomicrmw sub ptr @var32, i32 %offset acquire		%old = atomicrmw sub ptr @var32, i32 %offset acquire
ret i32 %old		ret i32 %old
}		}

define dso_local i64 @test_atomic_load_sub_i64(i64 %offset) nounwind {		define dso_local i64 @test_atomic_load_sub_i64(i64 %offset) nounwind {
; CHECK-LABEL: test_atomic_load_sub_i64:		; CHECK-LABEL: test_atomic_load_sub_i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: adrp x9, var64		; CHECK-NEXT: adrp x9, var64
; CHECK-NEXT: add x9, x9, :lo12:var64		; CHECK-NEXT: add x9, x9, :lo12:var64
; CHECK-NEXT: .LBB7_1: // %atomicrmw.start		; CHECK-NEXT: .LBB7_1: // %atomicrmw.start
; CHECK-NEXT: // =>This Inner Loop Header: Depth=1		; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: ldaxr x8, [x9]		; CHECK-NEXT: ldxr x8, [x9]
; CHECK-NEXT: sub x10, x8, x0		; CHECK-NEXT: sub x10, x8, x0
; CHECK-NEXT: stlxr w11, x10, [x9]		; CHECK-NEXT: stxr w11, x10, [x9]
; CHECK-NEXT: cbnz w11, .LBB7_1		; CHECK-NEXT: cbnz w11, .LBB7_1
; CHECK-NEXT: // %bb.2: // %atomicrmw.end		; CHECK-NEXT: // %bb.2: // %atomicrmw.end
; CHECK-NEXT: mov x0, x8		; CHECK-NEXT: mov x0, x8
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%old = atomicrmw sub ptr @var64, i64 %offset seq_cst		%old = atomicrmw sub ptr @var64, i64 %offset seq_cst
ret i64 %old		ret i64 %old
}		}

define dso_local i8 @test_atomic_load_and_i8(i8 %offset) nounwind {		define dso_local i8 @test_atomic_load_and_i8(i8 %offset) nounwind {
; CHECK-LABEL: test_atomic_load_and_i8:		; CHECK-LABEL: test_atomic_load_and_i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
Show All 28 Lines
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%old = atomicrmw and ptr @var16, i16 %offset monotonic		%old = atomicrmw and ptr @var16, i16 %offset monotonic
ret i16 %old		ret i16 %old
}		}

define dso_local i32 @test_atomic_load_and_i32(i32 %offset) nounwind {		define dso_local i32 @test_atomic_load_and_i32(i32 %offset) nounwind {
; CHECK-LABEL: test_atomic_load_and_i32:		; CHECK-LABEL: test_atomic_load_and_i32:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: adrp x9, var32		; CHECK-NEXT: adrp x9, var32
; CHECK-NEXT: add x9, x9, :lo12:var32		; CHECK-NEXT: add x9, x9, :lo12:var32
; CHECK-NEXT: .LBB10_1: // %atomicrmw.start		; CHECK-NEXT: .LBB10_1: // %atomicrmw.start
; CHECK-NEXT: // =>This Inner Loop Header: Depth=1		; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: ldaxr w8, [x9]		; CHECK-NEXT: ldxr w8, [x9]
; CHECK-NEXT: and w10, w8, w0		; CHECK-NEXT: and w10, w8, w0
; CHECK-NEXT: stlxr w11, w10, [x9]		; CHECK-NEXT: stxr w11, w10, [x9]
; CHECK-NEXT: cbnz w11, .LBB10_1		; CHECK-NEXT: cbnz w11, .LBB10_1
; CHECK-NEXT: // %bb.2: // %atomicrmw.end		; CHECK-NEXT: // %bb.2: // %atomicrmw.end
; CHECK-NEXT: mov w0, w8		; CHECK-NEXT: mov w0, w8
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%old = atomicrmw and ptr @var32, i32 %offset seq_cst		%old = atomicrmw and ptr @var32, i32 %offset seq_cst
ret i32 %old		ret i32 %old
}		}

define dso_local i64 @test_atomic_load_and_i64(i64 %offset) nounwind {		define dso_local i64 @test_atomic_load_and_i64(i64 %offset) nounwind {
; CHECK-LABEL: test_atomic_load_and_i64:		; CHECK-LABEL: test_atomic_load_and_i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
Show All 10 Lines
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%old = atomicrmw and ptr @var64, i64 %offset acquire		%old = atomicrmw and ptr @var64, i64 %offset acquire
ret i64 %old		ret i64 %old
}		}

define dso_local i8 @test_atomic_load_or_i8(i8 %offset) nounwind {		define dso_local i8 @test_atomic_load_or_i8(i8 %offset) nounwind {
; CHECK-LABEL: test_atomic_load_or_i8:		; CHECK-LABEL: test_atomic_load_or_i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: adrp x9, var8		; CHECK-NEXT: adrp x9, var8
; CHECK-NEXT: add x9, x9, :lo12:var8		; CHECK-NEXT: add x9, x9, :lo12:var8
; CHECK-NEXT: .LBB12_1: // %atomicrmw.start		; CHECK-NEXT: .LBB12_1: // %atomicrmw.start
; CHECK-NEXT: // =>This Inner Loop Header: Depth=1		; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: ldaxrb w8, [x9]		; CHECK-NEXT: ldxrb w8, [x9]
; CHECK-NEXT: orr w10, w8, w0		; CHECK-NEXT: orr w10, w8, w0
; CHECK-NEXT: stlxrb w11, w10, [x9]		; CHECK-NEXT: stxrb w11, w10, [x9]
; CHECK-NEXT: cbnz w11, .LBB12_1		; CHECK-NEXT: cbnz w11, .LBB12_1
; CHECK-NEXT: // %bb.2: // %atomicrmw.end		; CHECK-NEXT: // %bb.2: // %atomicrmw.end
; CHECK-NEXT: mov w0, w8		; CHECK-NEXT: mov w0, w8
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%old = atomicrmw or ptr @var8, i8 %offset seq_cst		%old = atomicrmw or ptr @var8, i8 %offset seq_cst
ret i8 %old		ret i8 %old
}		}

define dso_local i16 @test_atomic_load_or_i16(i16 %offset) nounwind {		define dso_local i16 @test_atomic_load_or_i16(i16 %offset) nounwind {
; CHECK-LABEL: test_atomic_load_or_i16:		; CHECK-LABEL: test_atomic_load_or_i16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%old = atomicrmw xor ptr @var16, i16 %offset release		%old = atomicrmw xor ptr @var16, i16 %offset release
ret i16 %old		ret i16 %old
}		}

define dso_local i32 @test_atomic_load_xor_i32(i32 %offset) nounwind {		define dso_local i32 @test_atomic_load_xor_i32(i32 %offset) nounwind {
; CHECK-LABEL: test_atomic_load_xor_i32:		; CHECK-LABEL: test_atomic_load_xor_i32:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: adrp x9, var32		; CHECK-NEXT: adrp x9, var32
; CHECK-NEXT: add x9, x9, :lo12:var32		; CHECK-NEXT: add x9, x9, :lo12:var32
; CHECK-NEXT: .LBB18_1: // %atomicrmw.start		; CHECK-NEXT: .LBB18_1: // %atomicrmw.start
; CHECK-NEXT: // =>This Inner Loop Header: Depth=1		; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: ldaxr w8, [x9]		; CHECK-NEXT: ldxr w8, [x9]
; CHECK-NEXT: eor w10, w8, w0		; CHECK-NEXT: eor w10, w8, w0
; CHECK-NEXT: stlxr w11, w10, [x9]		; CHECK-NEXT: stxr w11, w10, [x9]
; CHECK-NEXT: cbnz w11, .LBB18_1		; CHECK-NEXT: cbnz w11, .LBB18_1
; CHECK-NEXT: // %bb.2: // %atomicrmw.end		; CHECK-NEXT: // %bb.2: // %atomicrmw.end
; CHECK-NEXT: mov w0, w8		; CHECK-NEXT: mov w0, w8
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%old = atomicrmw xor ptr @var32, i32 %offset seq_cst		%old = atomicrmw xor ptr @var32, i32 %offset seq_cst
ret i32 %old		ret i32 %old
}		}

define dso_local i64 @test_atomic_load_xor_i64(i64 %offset) nounwind {		define dso_local i64 @test_atomic_load_xor_i64(i64 %offset) nounwind {
; CHECK-LABEL: test_atomic_load_xor_i64:		; CHECK-LABEL: test_atomic_load_xor_i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
Show All 29 Lines	; CHECK-NEXT: ret
%old = atomicrmw xchg ptr @var8, i8 %offset monotonic		%old = atomicrmw xchg ptr @var8, i8 %offset monotonic
ret i8 %old		ret i8 %old
}		}

define dso_local i16 @test_atomic_load_xchg_i16(i16 %offset) nounwind {		define dso_local i16 @test_atomic_load_xchg_i16(i16 %offset) nounwind {
; CHECK-LABEL: test_atomic_load_xchg_i16:		; CHECK-LABEL: test_atomic_load_xchg_i16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0		; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: adrp x9, var16		; CHECK-NEXT: adrp x9, var16
; CHECK-NEXT: add x9, x9, :lo12:var16		; CHECK-NEXT: add x9, x9, :lo12:var16
; CHECK-NEXT: .LBB21_1: // %atomicrmw.start		; CHECK-NEXT: .LBB21_1: // %atomicrmw.start
; CHECK-NEXT: // =>This Inner Loop Header: Depth=1		; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: ldaxrh w8, [x9]		; CHECK-NEXT: ldxrh w8, [x9]
; CHECK-NEXT: stlxrh w10, w0, [x9]		; CHECK-NEXT: stxrh w10, w0, [x9]
; CHECK-NEXT: cbnz w10, .LBB21_1		; CHECK-NEXT: cbnz w10, .LBB21_1
; CHECK-NEXT: // %bb.2: // %atomicrmw.end		; CHECK-NEXT: // %bb.2: // %atomicrmw.end
; CHECK-NEXT: mov w0, w8		; CHECK-NEXT: mov w0, w8
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%old = atomicrmw xchg ptr @var16, i16 %offset seq_cst		%old = atomicrmw xchg ptr @var16, i16 %offset seq_cst
ret i16 %old		ret i16 %old
}		}

define dso_local i32 @test_atomic_load_xchg_i32(i32 %offset) nounwind {		define dso_local i32 @test_atomic_load_xchg_i32(i32 %offset) nounwind {
; CHECK-LABEL: test_atomic_load_xchg_i32:		; CHECK-LABEL: test_atomic_load_xchg_i32:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%old = atomicrmw min ptr @var32, i32 %offset monotonic		%old = atomicrmw min ptr @var32, i32 %offset monotonic
ret i32 %old		ret i32 %old
}		}

define dso_local i64 @test_atomic_load_min_i64(i64 %offset) nounwind {		define dso_local i64 @test_atomic_load_min_i64(i64 %offset) nounwind {
; CHECK-LABEL: test_atomic_load_min_i64:		; CHECK-LABEL: test_atomic_load_min_i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: adrp x9, var64		; CHECK-NEXT: adrp x9, var64
; CHECK-NEXT: add x9, x9, :lo12:var64		; CHECK-NEXT: add x9, x9, :lo12:var64
; CHECK-NEXT: .LBB27_1: // %atomicrmw.start		; CHECK-NEXT: .LBB27_1: // %atomicrmw.start
; CHECK-NEXT: // =>This Inner Loop Header: Depth=1		; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: ldaxr x8, [x9]		; CHECK-NEXT: ldxr x8, [x9]
; CHECK-NEXT: cmp x8, x0		; CHECK-NEXT: cmp x8, x0
; CHECK-NEXT: csel x10, x8, x0, le		; CHECK-NEXT: csel x10, x8, x0, le
; CHECK-NEXT: stlxr w11, x10, [x9]		; CHECK-NEXT: stxr w11, x10, [x9]
; CHECK-NEXT: cbnz w11, .LBB27_1		; CHECK-NEXT: cbnz w11, .LBB27_1
; CHECK-NEXT: // %bb.2: // %atomicrmw.end		; CHECK-NEXT: // %bb.2: // %atomicrmw.end
; CHECK-NEXT: mov x0, x8		; CHECK-NEXT: mov x0, x8
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%old = atomicrmw min ptr @var64, i64 %offset seq_cst		%old = atomicrmw min ptr @var64, i64 %offset seq_cst
ret i64 %old		ret i64 %old
}		}

define dso_local i8 @test_atomic_load_max_i8(i8 %offset) nounwind {		define dso_local i8 @test_atomic_load_max_i8(i8 %offset) nounwind {
; CHECK-LABEL: test_atomic_load_max_i8:		; CHECK-LABEL: test_atomic_load_max_i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: adrp x9, var8		; CHECK-NEXT: adrp x9, var8
; CHECK-NEXT: add x9, x9, :lo12:var8		; CHECK-NEXT: add x9, x9, :lo12:var8
; CHECK-NEXT: .LBB28_1: // %atomicrmw.start		; CHECK-NEXT: .LBB28_1: // %atomicrmw.start
; CHECK-NEXT: // =>This Inner Loop Header: Depth=1		; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: ldaxrb w10, [x9]		; CHECK-NEXT: ldxrb w10, [x9]
; CHECK-NEXT: sxtb w8, w10		; CHECK-NEXT: sxtb w8, w10
; CHECK-NEXT: cmp w8, w0, sxtb		; CHECK-NEXT: cmp w8, w0, sxtb
; CHECK-NEXT: csel w10, w10, w0, gt		; CHECK-NEXT: csel w10, w10, w0, gt
; CHECK-NEXT: stlxrb w11, w10, [x9]		; CHECK-NEXT: stxrb w11, w10, [x9]
; CHECK-NEXT: cbnz w11, .LBB28_1		; CHECK-NEXT: cbnz w11, .LBB28_1
; CHECK-NEXT: // %bb.2: // %atomicrmw.end		; CHECK-NEXT: // %bb.2: // %atomicrmw.end
; CHECK-NEXT: mov w0, w8		; CHECK-NEXT: mov w0, w8
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%old = atomicrmw max ptr @var8, i8 %offset seq_cst		%old = atomicrmw max ptr @var8, i8 %offset seq_cst
ret i8 %old		ret i8 %old
}		}

define dso_local i16 @test_atomic_load_max_i16(i16 %offset) nounwind {		define dso_local i16 @test_atomic_load_max_i16(i16 %offset) nounwind {
; CHECK-LABEL: test_atomic_load_max_i16:		; CHECK-LABEL: test_atomic_load_max_i16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%old = atomicrmw umin ptr @var16, i16 %offset acquire		%old = atomicrmw umin ptr @var16, i16 %offset acquire
ret i16 %old		ret i16 %old
}		}

define dso_local i32 @test_atomic_load_umin_i32(i32 %offset) nounwind {		define dso_local i32 @test_atomic_load_umin_i32(i32 %offset) nounwind {
; CHECK-LABEL: test_atomic_load_umin_i32:		; CHECK-LABEL: test_atomic_load_umin_i32:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: adrp x9, var32		; CHECK-NEXT: adrp x9, var32
; CHECK-NEXT: add x9, x9, :lo12:var32		; CHECK-NEXT: add x9, x9, :lo12:var32
; CHECK-NEXT: .LBB34_1: // %atomicrmw.start		; CHECK-NEXT: .LBB34_1: // %atomicrmw.start
; CHECK-NEXT: // =>This Inner Loop Header: Depth=1		; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: ldaxr w8, [x9]		; CHECK-NEXT: ldxr w8, [x9]
; CHECK-NEXT: cmp w8, w0		; CHECK-NEXT: cmp w8, w0
; CHECK-NEXT: csel w10, w8, w0, ls		; CHECK-NEXT: csel w10, w8, w0, ls
; CHECK-NEXT: stlxr w11, w10, [x9]		; CHECK-NEXT: stxr w11, w10, [x9]
; CHECK-NEXT: cbnz w11, .LBB34_1		; CHECK-NEXT: cbnz w11, .LBB34_1
; CHECK-NEXT: // %bb.2: // %atomicrmw.end		; CHECK-NEXT: // %bb.2: // %atomicrmw.end
; CHECK-NEXT: mov w0, w8		; CHECK-NEXT: mov w0, w8
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%old = atomicrmw umin ptr @var32, i32 %offset seq_cst		%old = atomicrmw umin ptr @var32, i32 %offset seq_cst
ret i32 %old		ret i32 %old
}		}

define dso_local i64 @test_atomic_load_umin_i64(i64 %offset) nounwind {		define dso_local i64 @test_atomic_load_umin_i64(i64 %offset) nounwind {
; CHECK-LABEL: test_atomic_load_umin_i64:		; CHECK-LABEL: test_atomic_load_umin_i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%old = atomicrmw umax ptr @var16, i16 %offset monotonic		%old = atomicrmw umax ptr @var16, i16 %offset monotonic
ret i16 %old		ret i16 %old
}		}

define dso_local i32 @test_atomic_load_umax_i32(i32 %offset) nounwind {		define dso_local i32 @test_atomic_load_umax_i32(i32 %offset) nounwind {
; CHECK-LABEL: test_atomic_load_umax_i32:		; CHECK-LABEL: test_atomic_load_umax_i32:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: adrp x9, var32		; CHECK-NEXT: adrp x9, var32
; CHECK-NEXT: add x9, x9, :lo12:var32		; CHECK-NEXT: add x9, x9, :lo12:var32
; CHECK-NEXT: .LBB38_1: // %atomicrmw.start		; CHECK-NEXT: .LBB38_1: // %atomicrmw.start
; CHECK-NEXT: // =>This Inner Loop Header: Depth=1		; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: ldaxr w8, [x9]		; CHECK-NEXT: ldxr w8, [x9]
; CHECK-NEXT: cmp w8, w0		; CHECK-NEXT: cmp w8, w0
; CHECK-NEXT: csel w10, w8, w0, hi		; CHECK-NEXT: csel w10, w8, w0, hi
; CHECK-NEXT: stlxr w11, w10, [x9]		; CHECK-NEXT: stxr w11, w10, [x9]
; CHECK-NEXT: cbnz w11, .LBB38_1		; CHECK-NEXT: cbnz w11, .LBB38_1
; CHECK-NEXT: // %bb.2: // %atomicrmw.end		; CHECK-NEXT: // %bb.2: // %atomicrmw.end
; CHECK-NEXT: mov w0, w8		; CHECK-NEXT: mov w0, w8
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%old = atomicrmw umax ptr @var32, i32 %offset seq_cst		%old = atomicrmw umax ptr @var32, i32 %offset seq_cst
ret i32 %old		ret i32 %old
}		}

define dso_local i64 @test_atomic_load_umax_i64(i64 %offset) nounwind {		define dso_local i64 @test_atomic_load_umax_i64(i64 %offset) nounwind {
; CHECK-LABEL: test_atomic_load_umax_i64:		; CHECK-LABEL: test_atomic_load_umax_i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
Show All 38 Lines
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%pair = cmpxchg ptr @var8, i8 %wanted, i8 %new acquire acquire		%pair = cmpxchg ptr @var8, i8 %wanted, i8 %new acquire acquire
%old = extractvalue { i8, i1 } %pair, 0		%old = extractvalue { i8, i1 } %pair, 0
ret i8 %old		ret i8 %old
}		}

define dso_local i16 @test_atomic_cmpxchg_i16(i16 %wanted, i16 %new) nounwind {		define dso_local i16 @test_atomic_cmpxchg_i16(i16 %wanted, i16 %new) nounwind {
; CHECK-LABEL: test_atomic_cmpxchg_i16:		; CHECK-LABEL: test_atomic_cmpxchg_i16:
; CHECK: // %bb.0:		; CHECK: // %bb.0: // %cmpxchg.start
; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1		; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK-NEXT: and w8, w0, #0xffff		; CHECK-NEXT: adrp x8, var16
; CHECK-NEXT: adrp x9, var16		; CHECK-NEXT: add x8, x8, :lo12:var16
; CHECK-NEXT: add x9, x9, :lo12:var16		; CHECK-NEXT: ldxrh w10, [x8]
; CHECK-NEXT: .LBB41_1: // %cmpxchg.start		; CHECK-NEXT: and w9, w0, #0xffff
; CHECK-NEXT: // =>This Inner Loop Header: Depth=1		; CHECK-NEXT: mov w0, w10
; CHECK-NEXT: ldaxrh w0, [x9]		; CHECK-NEXT: cmp w10, w9
; CHECK-NEXT: cmp w0, w8
; CHECK-NEXT: b.ne .LBB41_4		; CHECK-NEXT: b.ne .LBB41_4
; CHECK-NEXT: // %bb.2: // %cmpxchg.trystore		; CHECK-NEXT: // %bb.1: // %cmpxchg.fencedstore
; CHECK-NEXT: // in Loop: Header=BB41_1 Depth=1		; CHECK-NEXT: dmb ish
; CHECK-NEXT: stlxrh w10, w1, [x9]		; CHECK-NEXT: .LBB41_2: // %cmpxchg.trystore
; CHECK-NEXT: cbnz w10, .LBB41_1		; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: // %bb.3: // %cmpxchg.end		; CHECK-NEXT: stxrh w10, w1, [x8]
; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0		; CHECK-NEXT: cbz w10, .LBB41_5
; CHECK-NEXT: ret		; CHECK-NEXT: // %bb.3: // %cmpxchg.releasedload
		; CHECK-NEXT: // in Loop: Header=BB41_2 Depth=1
		; CHECK-NEXT: ldxrh w0, [x8]
		; CHECK-NEXT: cmp w0, w9
		; CHECK-NEXT: b.eq .LBB41_2
; CHECK-NEXT: .LBB41_4: // %cmpxchg.nostore		; CHECK-NEXT: .LBB41_4: // %cmpxchg.nostore
; CHECK-NEXT: clrex		; CHECK-NEXT: clrex
		; CHECK-NEXT: .LBB41_5: // %cmpxchg.end
; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0		; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%pair = cmpxchg ptr @var16, i16 %wanted, i16 %new seq_cst seq_cst		%pair = cmpxchg ptr @var16, i16 %wanted, i16 %new seq_cst seq_cst
%old = extractvalue { i16, i1 } %pair, 0		%old = extractvalue { i16, i1 } %pair, 0
ret i16 %old		ret i16 %old
}		}

define dso_local i32 @test_atomic_cmpxchg_i32(i32 %wanted, i32 %new) nounwind {		define dso_local i32 @test_atomic_cmpxchg_i32(i32 %wanted, i32 %new) nounwind {
; CHECK-LABEL: test_atomic_cmpxchg_i32:		; CHECK-LABEL: test_atomic_cmpxchg_i32:
▲ Show 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
store atomic i8 %val, ptr @var8 release, align 1		store atomic i8 %val, ptr @var8 release, align 1
ret void		ret void
}		}

define dso_local void @test_atomic_store_seq_cst_i8(i8 %val) nounwind {		define dso_local void @test_atomic_store_seq_cst_i8(i8 %val) nounwind {
; CHECK-LABEL: test_atomic_store_seq_cst_i8:		; CHECK-LABEL: test_atomic_store_seq_cst_i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: adrp x8, var8		; CHECK-NEXT: adrp x8, var8
; CHECK-NEXT: add x8, x8, :lo12:var8		; CHECK-NEXT: dmb ish
; CHECK-NEXT: stlrb w0, [x8]		; CHECK-NEXT: strb w0, [x8, :lo12:var8]
		; CHECK-NEXT: dmb ish
; CHECK-NEXT: ret		; CHECK-NEXT: ret
store atomic i8 %val, ptr @var8 seq_cst, align 1		store atomic i8 %val, ptr @var8 seq_cst, align 1
ret void		ret void
}		}

define dso_local void @test_atomic_store_monotonic_i16(i16 %val) nounwind {		define dso_local void @test_atomic_store_monotonic_i16(i16 %val) nounwind {
; CHECK-LABEL: test_atomic_store_monotonic_i16:		; CHECK-LABEL: test_atomic_store_monotonic_i16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
Show All 28 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[WoA] Use fences for sequentially consistent stores/writesClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 489190

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/atomic-ops-msvc.ll

[WoA] Use fences for sequentially consistent stores/writes
ClosedPublic