This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/ARM/
-
Target/
-
ARM/
-
ARMISelLowering.cpp
-
ARMSubtarget.h
-
ARMSubtarget.cpp
-
test/CodeGen/ARM/
-
CodeGen/
-
ARM/
-
atomic-64bit.ll
-
atomic-load-store.ll
2/5
atomic-op.ll
-
atomic-ops-m33.ll
-
atomicrmw_exclusive_monitor_ints.ll

Differential D120026

[ARM] Fix ARM backend to correctly use atomic expansion routines.
ClosedPublic

Authored by efriedma on Feb 17 2022, 2:04 AM.

Download Raw Diff

Details

Reviewers

dmgreen
SjoerdMeijer
t.p.northover
aemerson
jyknight
john.brawn
momchil.velikov
labrinea

Commits

rG2f497ec3a005: [ARM] Fix ARM backend to correctly use atomic expansion routines.

Summary

Without this patch, clang would generate calls to __sync_* routines on targets where it does not make sense; we can't assume the routines exist on unknown targets. Linux has special implementations of the routines that work on old ARM targets; other targets have no such routines. In general, atomics operations which aren't natively supported should go through libatomic (__atomic_*) APIs, which can support arbitrary atomics through locks.

ARM targets older than v6, where this patch makes a difference, are rare in practice, but not completely extinct. See, for example, discussion on D116088.

This also affects Cortex-M0, but I don't think __sync_* routines actually exist in any Cortex-M0 libraries. So in practice this just leads to a slightly different linker error for those cases, I think.

Mechanically, this patch does the following:

Ensures we run atomic expansion unconditionally; it never makes sense to completely skip it.
Fixes getMaxAtomicSizeInBitsSupported() so it returns an appropriate number on all ARM subtargets.
Fixes shouldExpandAtomicRMWInIR() and shouldExpandAtomicCmpXchgInIR() to correctly handle subtargets that don't have atomic instructions.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

efriedma created this revision.Feb 17 2022, 2:04 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptFeb 17 2022, 2:04 AM

efriedma requested review of this revision.Feb 17 2022, 2:04 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 17 2022, 2:04 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

efriedma edited the summary of this revision. (Show Details)Feb 17 2022, 2:05 AM

efriedma added a reviewer: jyknight.

efriedma mentioned this in D116088: [compiler-rt] Implement ARM atomic operations for architectures without SMP support.Feb 17 2022, 2:13 AM

lkail added a subscriber: lkail.Feb 17 2022, 2:26 AM

Harbormaster completed remote builds in B150170: Diff 409547.Feb 17 2022, 2:59 AM

I'm ambivalent about wether to use sync_* or atomic_*, but last time I looked at this, we generated the generic unsized variant a lot, even when the arguments have a known static size.

Ping

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 10:31 AM

dmgreen added reviewers: john.brawn, momchil.velikov, labrinea.Mar 3 2022, 11:51 PM

LGTM, but please adjust the comment in ARMISelLowering.cpp as suggested by the clang-format linter before committing.

This revision is now accepted and ready to land.Mar 16 2022, 10:26 AM

This revision was landed with ongoing or failed builds.Mar 18 2022, 12:44 PM

Closed by commit rG2f497ec3a005: [ARM] Fix ARM backend to correctly use atomic expansion routines. (authored by efriedma). · Explain Why

This revision was automatically updated to reflect the committed changes.

efriedma added a commit: rG2f497ec3a005: [ARM] Fix ARM backend to correctly use atomic expansion routines..

This also affects Cortex-M0, but I don't think __sync_* routines actually exist in any Cortex-M0 libraries. So in practice this just leads to a slightly different linker error for those cases, I think.

TinyGo implements these __sync_* routines so that atomic operations work. See https://github.com/tinygo-org/tinygo/blob/v0.22.0/src/runtime/atomics_critical.go#L188. I believe the __sync_* variants only exist for Cortex-M / ARM, other targets use __atomic_*. That said, I'm generally in favor of this change because it looks more correct and aligns with atomics on other LLVM backends.

I suspect Rust Embedded also implements these routines (I think I've seen them somewhere) but I can't find it easily.

llvm/test/CodeGen/ARM/atomic-op.ll
414	This looks like a regression. I'm pretty sure that these loads and stores are always atomic. Is there a reason why this has to go through a library function?

efriedma added inline comments.Mar 22 2022, 11:08 AM

llvm/test/CodeGen/ARM/atomic-op.ll
414	In general, `__atomic_` routines are not guaranteed to be lock-free. Suppose `___atomic_compare_exchange_4` is running at the same time on a different thread; if we store directly to the address, we're ignoring whatever lock `___atomic_compare_exchange_4` uses internally. Because of that, unless we have some prior knowledge that libatomic supports a lock-free cmpxchg with the given width, all atomic operations have to go through libatomic. It doesn't matter if load or store ops are atomic at a hardware level. This isn't a problem with __sync_ because they're guaranteed to be lock-free. See also https://llvm.org/docs/Atomics.html#libcalls-atomic .

Interesting, this commit results in several testing timeouts in our downstream compiler for several C++ compliance tests (ACE, plumhall, perennial) for Cortex-M0/M0plus only. I confess I don't understand the nature of this commit or how it would impact runtime here, other than you indicate that this could impact Cortex-M0. Do you have any pointers as to what I could verify here?

The impact on Cortex-M0 is that instead of using __sync_*, and inlined atomic load/store ops, it uses __atomic_* calls. I can't imagine that causing timeouts unless you're somehow using a bad implementation of __atomic_*.

In D120026#3423066, @efriedma wrote:

The impact on Cortex-M0 is that instead of using __sync_*, and inlined atomic load/store ops, it uses __atomic_* calls. I can't imagine that causing timeouts unless you're somehow using a bad implementation of __atomic_*.

Hmm. It appears to be getting stuck in the lock and is spinning forever. Something isn't being built correctly in the builtins, maybe?

By "the builtins", do you mean you're using compiler-rt with -DCOMPILER_RT_EXCLUDE_ATOMIC_BUILTIN=Off? That implementation probably don't work on a Cortex-M0, unless you've hacked it somehow; it assumes width-1 lock-free atomics are available.

In D120026#3424687, @efriedma wrote:

By "the builtins", do you mean you're using compiler-rt with -DCOMPILER_RT_EXCLUDE_ATOMIC_BUILTIN=Off? That implementation probably don't work on a Cortex-M0, unless you've hacked it somehow; it assumes width-1 lock-free atomics are available.

Yes, that's exactly what we're doing. What should we be doing instead for Cortex-M0? I don't know a lot about the history of this implementation, but if it doesn't work for M0, why is your change considered acceptable for M0?

In D120026#3427564, @alanphipps wrote:

In D120026#3424687, @efriedma wrote:

By "the builtins", do you mean you're using compiler-rt with -DCOMPILER_RT_EXCLUDE_ATOMIC_BUILTIN=Off? That implementation probably don't work on a Cortex-M0, unless you've hacked it somehow; it assumes width-1 lock-free atomics are available.

Yes, that's exactly what we're doing. What should we be doing instead for Cortex-M0? I don't know a lot about the history of this implementation, but if it doesn't work for M0, why is your change considered acceptable for M0?

There is no way to implement an "atomic" operation on an M0 besides turning off interrupts, and we don't want the builtins library to mess with the interrupt mask. I'd expect the OS/user to provide the functionality, if they need it. compiler-rt doesn't provide __sync_* for cortex-m0, for the same reason.

I'm not sure how your setup was working before. I guess maybe if you only need atomics for C++ static local vars, and you built libc++abi with LIBCXXABI_HAS_NO_THREADS? If you don't actually need threading/atomics, you can use -fno-threadsafe-statics and/or -mthread-model single.

There is no way to implement an "atomic" operation on an M0 besides turning off interrupts, and we don't want the builtins library to mess with the interrupt mask. I'd expect the OS/user to provide the functionality, if they need it. compiler-rt doesn't provide __sync_* for cortex-m0, for the same reason.

I'm not sure how your setup was working before. I guess maybe if you only need atomics for C++ static local vars, and you built libc++abi with LIBCXXABI_HAS_NO_THREADS? If you don't actually need threading/atomics, you can use -fno-threadsafe-statics and/or -mthread-model single.

OK thank you for answering my questions. Our downstream compiler targets primarily baremetal systems on embedded devices, so we can't rely on an OS provided implementation of atomics. I suspect we're not the only downstream compiler impacted. I will try the options you suggest and let you know, but we may have to do something else to work around the effects of this commit. We weren't doing anything special other than what was out of the box, so this change took us by surprise.

It probably makes sense to add #error to atomics.c, I guess, so that particular mistake gets caught at compile-time, not runtime.

Beyond that, though, I'm not sure what else I can do here. I guess I could set setMaxAtomicSizeInBitsSupported(32) for Cortex-M0, even though the target doesn't really have such atomics, and document "users must provide __sync_*_{1,2,4}". That seems a bit weird, though; like I mentioned before, we have no way to actually provide them. (I guess we could provide them off-by-default, or something, but again, I don't want to mess with the interrupt mask behind the user's back.)

efriedma mentioned this in D123080: [compiler-rt builtins] Assert that atomic.c can be compiled correctly..Apr 4 2022, 2:53 PM

efriedma mentioned this in rGdd20323f51b6: [compiler-rt builtins] Assert that atomic.c can be compiled correctly..May 16 2022, 2:41 PM

nikic added a subscriber: nikic.Jul 23 2022, 11:42 AM

nikic added inline comments.

llvm/test/CodeGen/ARM/atomic-op.ll
413	Is the dmb still needed if we're going through `__atomic`?

efriedma added inline comments.Jul 23 2022, 3:10 PM

llvm/test/CodeGen/ARM/atomic-op.ll
413	__atomic_load_4 is just supposed to be a sequentially consistent load, and a sequentially consistent load doesn't imply a full fence, even if the load is sequentially consistent. So there isn't any obvious rule that would allow that transform. See also https://github.com/llvm/llvm-project/issues/29472, https://github.com/llvm/llvm-project/issues/56450 .

nikic added inline comments.Jul 23 2022, 3:27 PM

llvm/test/CodeGen/ARM/atomic-op.ll
413	Uh sorry, I missed that there was an explicit IR fence instruction in this test. The preceding two functions don't have an IR fence, and also no longer emit dmb, so everything is good here.

I think we're going to need an option to restore the previous behavior for Rust. The context here is that Rust targets can separately specify up to which size the support atomic load/store, and whether they support atomic CAS. Historically, the thumbv6m target allowed atomic load/store and disabled CAS, relying on the previous LLVM behavior of emitting atomic load/store as a memory barrier + simple load/store.

Of course, this makes Rust code using atomics incompatible with any C code using atomic CAS via libatomic (or I guess any other CAS implementation without OS support). There is an implicit assumption here that Rust code for this target will never get linked against C code using libatomic. I think this assumption was made by accident, but it's probably a fairly reasonable assumption to make for common use-cases, especially for baremetal targets.

Based on the feedback I received, removing support for atomic load/store for these targets would be a major breaking change for the embedded ecosystem. The other bit of context here is that Rust atomics are required to be lock-free -- I guess the alternative would be to go back on that and actually use libatomic, but that would probably come with its own complications, especially for this kind of target.

My thinking here is that we could add a target feature which basically says "I promise I won't use atomic CAS", in which case atomic load/store can be correctly lowered as before. Does that sound reasonable? It's worth noting that there is also interest in something like this for other targets, such as riscv without the A extension.

I see two possible approaches to restore the functionality you want.

One, you could add a target feature that says "I have 32-bit (or 64-bit) atomics". That would override the normal rule for setMaxAtomicSizeInBitsSupported. The target environment would be expected to provide whatever __sync_* functions are necessary. If you don't actually use any atomic operations that require CAS, you won't see any calls, so everything works even if the functions don't actually exist. (Or it's possible to write an operating system that implements lock-free CAS on Cortex-M0, the same way the Linux kernel implements atomics on old ARM cores.)

Two, you could relax the constraint that atomic load/store are required to be atomic with respect to cmpxchg. We can add a value to syncscope to represent this. This would allow mixing code that uses load-store atomics with code that requires "real" atomics.

That said, I'm not sure how you use load-store atomics in practice. I mean, I guess you can use Dekker's alogorithm, or some kinds of lock-free queues, but that seems a bit exotic. Or do you use some sort of lock implemented by disabling interrupts?

nikic mentioned this in D130480: [ARM] Add target feature to force 32-bit atomics.Jul 25 2022, 6:23 AM

In D120026#3674604, @efriedma wrote:

I see two possible approaches to restore the functionality you want.

One, you could add a target feature that says "I have 32-bit (or 64-bit) atomics". That would override the normal rule for setMaxAtomicSizeInBitsSupported. The target environment would be expected to provide whatever __sync_* functions are necessary. If you don't actually use any atomic operations that require CAS, you won't see any calls, so everything works even if the functions don't actually exist. (Or it's possible to write an operating system that implements lock-free CAS on Cortex-M0, the same way the Linux kernel implements atomics on old ARM cores.)

I think this is the solution we want. I've put up https://reviews.llvm.org/D130480 to implement this.

Two, you could relax the constraint that atomic load/store are required to be atomic with respect to cmpxchg. We can add a value to syncscope to represent this. This would allow mixing code that uses load-store atomics with code that requires "real" atomics.

That said, I'm not sure how you use load-store atomics in practice. I mean, I guess you can use Dekker's alogorithm, or some kinds of lock-free queues, but that seems a bit exotic. Or do you use some sort of lock implemented by disabling interrupts?

Some use cases are mentioned starting from here: https://github.com/rust-lang/rust/issues/99668#issuecomment-1193417939 So it seems to be mostly about synchronization with interrupt handlers, lock-free spsc queues, as well as ability to use atomics as safe mutable globals (a rust-specific issue). And yes, disabling interrupts for critical sections seems to be common practice in this context as well.

nikic mentioned this in rGb1b1086973d5: [ARM] Add target feature to force 32-bit atomics.Jul 27 2022, 1:02 AM

efriedma added a reverting change: D137980: [ARM] Pretend atomics are always lock-free for small widths..Nov 14 2022, 1:42 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

ARMISelLowering.cpp

62 lines

ARMSubtarget.h

3 lines

ARMSubtarget.cpp

2 lines

test/

CodeGen/

ARM/

26 lines

4 lines

62 lines

2 lines

atomicrmw_exclusive_monitor_ints.ll

22 lines

Diff 416577

llvm/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,363 Lines • ▼ Show 20 Lines	if (Subtarget->hasAnyDataBarrier() &&
// Mark ATOMIC_LOAD and ATOMIC_STORE custom so we can handle the		// Mark ATOMIC_LOAD and ATOMIC_STORE custom so we can handle the
// Unordered/Monotonic case.		// Unordered/Monotonic case.
if (!InsertFencesForAtomic) {		if (!InsertFencesForAtomic) {
setOperationAction(ISD::ATOMIC_LOAD, MVT::i32, Custom);		setOperationAction(ISD::ATOMIC_LOAD, MVT::i32, Custom);
setOperationAction(ISD::ATOMIC_STORE, MVT::i32, Custom);		setOperationAction(ISD::ATOMIC_STORE, MVT::i32, Custom);
}		}
}		}

		// Compute supported atomic widths.
		if (Subtarget->isTargetLinux() \|\|
		(!Subtarget->isMClass() && Subtarget->hasV6Ops())) {
		// For targets where __sync_* routines are reliably available, we use them
		// if necessary.
		//
		// ARM Linux always supports 64-bit atomics through kernel-assisted atomic
		// routines (kernel 3.1 or later). FIXME: Not with compiler-rt?
		//
		// ARMv6 targets have native instructions in ARM mode. For Thumb mode,
		// such targets should provide __sync_* routines, which use the ARM mode
		// instructions. (ARMv6 doesn't have dmb, but it has an equivalent
		// encoding; see ARMISD::MEMBARRIER_MCR.)
		setMaxAtomicSizeInBitsSupported(64);
		} else if (Subtarget->isMClass() && Subtarget->hasV8MBaselineOps()) {
		// Cortex-M (besides Cortex-M0) have 32-bit atomics.
		setMaxAtomicSizeInBitsSupported(32);
		} else {
		// We can't assume anything about other targets; just use libatomic
		// routines.
		setMaxAtomicSizeInBitsSupported(0);
		}

setOperationAction(ISD::PREFETCH, MVT::Other, Custom);		setOperationAction(ISD::PREFETCH, MVT::Other, Custom);

// Requires SXTB/SXTH, available on v6 and up in both ARM and Thumb modes.		// Requires SXTB/SXTH, available on v6 and up in both ARM and Thumb modes.
if (!Subtarget->hasV6Ops()) {		if (!Subtarget->hasV6Ops()) {
setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i16, Expand);		setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i16, Expand);
setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i8, Expand);		setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i8, Expand);
}		}
setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i1, Expand);		setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i1, Expand);
▲ Show 20 Lines • Show All 19,593 Lines • ▼ Show 20 Lines

// For the real atomic operations, we have ldrex/strex up to 32 bits,		// For the real atomic operations, we have ldrex/strex up to 32 bits,
// and up to 64 bits on the non-M profiles		// and up to 64 bits on the non-M profiles
TargetLowering::AtomicExpansionKind		TargetLowering::AtomicExpansionKind
ARMTargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const {		ARMTargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const {
if (AI->isFloatingPointOperation())		if (AI->isFloatingPointOperation())
return AtomicExpansionKind::CmpXChg;		return AtomicExpansionKind::CmpXChg;

		unsigned Size = AI->getType()->getPrimitiveSizeInBits();
		bool hasAtomicRMW;
		if (Subtarget->isMClass())
		hasAtomicRMW = Subtarget->hasV8MBaselineOps();
		else if (Subtarget->isThumb())
		hasAtomicRMW = Subtarget->hasV7Ops();
		else
		hasAtomicRMW = Subtarget->hasV6Ops();
		if (Size <= (Subtarget->isMClass() ? 32U : 64U) && hasAtomicRMW) {
// At -O0, fast-regalloc cannot cope with the live vregs necessary to		// At -O0, fast-regalloc cannot cope with the live vregs necessary to
// implement atomicrmw without spilling. If the target address is also on the		// implement atomicrmw without spilling. If the target address is also on
// stack and close enough to the spill slot, this can lead to a situation		// the stack and close enough to the spill slot, this can lead to a
// where the monitor always gets cleared and the atomic operation can never		// situation where the monitor always gets cleared and the atomic operation
// succeed. So at -O0 lower this operation to a CAS loop.		// can never succeed. So at -O0 lower this operation to a CAS loop.
if (getTargetMachine().getOptLevel() == CodeGenOpt::None)		if (getTargetMachine().getOptLevel() == CodeGenOpt::None)
return AtomicExpansionKind::CmpXChg;		return AtomicExpansionKind::CmpXChg;
		return AtomicExpansionKind::LLSC;
unsigned Size = AI->getType()->getPrimitiveSizeInBits();		}
bool hasAtomicRMW = !Subtarget->isThumb() \|\| Subtarget->hasV8MBaselineOps();		return AtomicExpansionKind::None;
return (Size <= (Subtarget->isMClass() ? 32U : 64U) && hasAtomicRMW)
? AtomicExpansionKind::LLSC
: AtomicExpansionKind::None;
}		}

// Similar to shouldExpandAtomicRMWInIR, ldrex/strex can be used up to 32		// Similar to shouldExpandAtomicRMWInIR, ldrex/strex can be used up to 32
// bits, and up to 64 bits on the non-M profiles.		// bits, and up to 64 bits on the non-M profiles.
TargetLowering::AtomicExpansionKind		TargetLowering::AtomicExpansionKind
ARMTargetLowering::shouldExpandAtomicCmpXchgInIR(AtomicCmpXchgInst *AI) const {		ARMTargetLowering::shouldExpandAtomicCmpXchgInIR(AtomicCmpXchgInst *AI) const {
// At -O0, fast-regalloc cannot cope with the live vregs necessary to		// At -O0, fast-regalloc cannot cope with the live vregs necessary to
// implement cmpxchg without spilling. If the address being exchanged is also		// implement cmpxchg without spilling. If the address being exchanged is also
// on the stack and close enough to the spill slot, this can lead to a		// on the stack and close enough to the spill slot, this can lead to a
// situation where the monitor always gets cleared and the atomic operation		// situation where the monitor always gets cleared and the atomic operation
// can never succeed. So at -O0 we need a late-expanded pseudo-inst instead.		// can never succeed. So at -O0 we need a late-expanded pseudo-inst instead.
unsigned Size = AI->getOperand(1)->getType()->getPrimitiveSizeInBits();		unsigned Size = AI->getOperand(1)->getType()->getPrimitiveSizeInBits();
bool HasAtomicCmpXchg =		bool HasAtomicCmpXchg;
!Subtarget->isThumb() \|\| Subtarget->hasV8MBaselineOps();		if (Subtarget->isMClass())
		HasAtomicCmpXchg = Subtarget->hasV8MBaselineOps();
		else if (Subtarget->isThumb())
		HasAtomicCmpXchg = Subtarget->hasV7Ops();
		else
		HasAtomicCmpXchg = Subtarget->hasV6Ops();
if (getTargetMachine().getOptLevel() != 0 && HasAtomicCmpXchg &&		if (getTargetMachine().getOptLevel() != 0 && HasAtomicCmpXchg &&
Size <= (Subtarget->isMClass() ? 32U : 64U))		Size <= (Subtarget->isMClass() ? 32U : 64U))
return AtomicExpansionKind::LLSC;		return AtomicExpansionKind::LLSC;
return AtomicExpansionKind::None;		return AtomicExpansionKind::None;
}		}

bool ARMTargetLowering::shouldInsertFencesForAtomic(		bool ARMTargetLowering::shouldInsertFencesForAtomic(
const Instruction *I) const {		const Instruction *I) const {
▲ Show 20 Lines • Show All 675 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMSubtarget.h

Show First 20 Lines • Show All 472 Lines • ▼ Show 20 Lines	#include "ARMGenSubtargetInfo.inc"

/// Check whether this subtarget wants to use subregister liveness.		/// Check whether this subtarget wants to use subregister liveness.
bool enableSubRegLiveness() const override;		bool enableSubRegLiveness() const override;

/// Enable use of alias analysis during code generation (during MI		/// Enable use of alias analysis during code generation (during MI
/// scheduling, DAGCombine, etc.).		/// scheduling, DAGCombine, etc.).
bool useAA() const override { return true; }		bool useAA() const override { return true; }

// enableAtomicExpand- True if we need to expand our atomics.
bool enableAtomicExpand() const override;

/// getInstrItins - Return the instruction itineraries based on subtarget		/// getInstrItins - Return the instruction itineraries based on subtarget
/// selection.		/// selection.
const InstrItineraryData *getInstrItineraryData() const override {		const InstrItineraryData *getInstrItineraryData() const override {
return &InstrItins;		return &InstrItins;
}		}

/// getStackAlignment - Returns the minimum alignment known to hold of the		/// getStackAlignment - Returns the minimum alignment known to hold of the
/// stack frame on entry to the function and which must be maintained by every		/// stack frame on entry to the function and which must be maintained by every
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMSubtarget.cpp

	Show First 20 Lines • Show All 405 Lines • ▼ Show 20 Lines
	bool ARMSubtarget::enablePostRAMachineScheduler() const {			bool ARMSubtarget::enablePostRAMachineScheduler() const {
	if (!enableMachineScheduler())			if (!enableMachineScheduler())
	return false;			return false;
	if (disablePostRAScheduler())			if (disablePostRAScheduler())
	return false;			return false;
	return !isThumb1Only();			return !isThumb1Only();
	}			}

	bool ARMSubtarget::enableAtomicExpand() const { return hasAnyDataBarrier(); }

	bool ARMSubtarget::useStride4VFPs() const {			bool ARMSubtarget::useStride4VFPs() const {
	// For general targets, the prologue can grow when VFPs are allocated with			// For general targets, the prologue can grow when VFPs are allocated with
	// stride 4 (more vpush instructions). But WatchOS uses a compact unwind			// stride 4 (more vpush instructions). But WatchOS uses a compact unwind
	// format which it's more important to get right.			// format which it's more important to get right.
	return isTargetWatchABI() \|\|			return isTargetWatchABI() \|\|
	(useWideStrideVFP() && !OptMinSize);			(useWideStrideVFP() && !OptMinSize);
	}			}

	▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/atomic-64bit.ll

	Show All 24 Lines
	; CHECK-THUMB-LE: adc.w [[REG4:[a-z0-9]+]], [[REG2]]			; CHECK-THUMB-LE: adc.w [[REG4:[a-z0-9]+]], [[REG2]]
	; CHECK-THUMB-BE: adds.w [[REG4:[a-z0-9]+]], [[REG2]]			; CHECK-THUMB-BE: adds.w [[REG4:[a-z0-9]+]], [[REG2]]
	; CHECK-THUMB-BE: adc.w [[REG3:[a-z0-9]+]], [[REG1]]			; CHECK-THUMB-BE: adc.w [[REG3:[a-z0-9]+]], [[REG1]]
	; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[REG3]], [[REG4]]			; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[REG3]], [[REG4]]
	; CHECK-THUMB: cmp			; CHECK-THUMB: cmp
	; CHECK-THUMB: bne			; CHECK-THUMB: bne
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}

	; CHECK-M: __sync_fetch_and_add_8			; CHECK-M: __atomic_fetch_add_8

	%r = atomicrmw add i64* %ptr, i64 %val seq_cst			%r = atomicrmw add i64* %ptr, i64 %val seq_cst
	ret i64 %r			ret i64 %r
	}			}

	define i64 @test2(i64* %ptr, i64 %val) {			define i64 @test2(i64* %ptr, i64 %val) {
	; CHECK-LABEL: test2:			; CHECK-LABEL: test2:
	; CHECK: dmb {{ish$}}			; CHECK: dmb {{ish$}}
	Show All 14 Lines
	; CHECK-THUMB-LE: sbc.w [[REG4:[a-z0-9]+]], [[REG2]]			; CHECK-THUMB-LE: sbc.w [[REG4:[a-z0-9]+]], [[REG2]]
	; CHECK-THUMB-BE: subs.w [[REG4:[a-z0-9]+]], [[REG2]]			; CHECK-THUMB-BE: subs.w [[REG4:[a-z0-9]+]], [[REG2]]
	; CHECK-THUMB-BE: sbc.w [[REG3:[a-z0-9]+]], [[REG1]]			; CHECK-THUMB-BE: sbc.w [[REG3:[a-z0-9]+]], [[REG1]]
	; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[REG3]], [[REG4]]			; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[REG3]], [[REG4]]
	; CHECK-THUMB: cmp			; CHECK-THUMB: cmp
	; CHECK-THUMB: bne			; CHECK-THUMB: bne
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}

	; CHECK-M: __sync_fetch_and_sub_8			; CHECK-M: __atomic_fetch_sub_8

	%r = atomicrmw sub i64* %ptr, i64 %val seq_cst			%r = atomicrmw sub i64* %ptr, i64 %val seq_cst
	ret i64 %r			ret i64 %r
	}			}

	define i64 @test3(i64* %ptr, i64 %val) {			define i64 @test3(i64* %ptr, i64 %val) {
	; CHECK-LABEL: test3:			; CHECK-LABEL: test3:
	; CHECK: dmb {{ish$}}			; CHECK: dmb {{ish$}}
	Show All 14 Lines
	; CHECK-THUMB-LE-DAG: and.w [[REG4:[a-z0-9]+]], [[REG2]]			; CHECK-THUMB-LE-DAG: and.w [[REG4:[a-z0-9]+]], [[REG2]]
	; CHECK-THUMB-BE-DAG: and.w [[REG4:[a-z0-9]+]], [[REG2]]			; CHECK-THUMB-BE-DAG: and.w [[REG4:[a-z0-9]+]], [[REG2]]
	; CHECK-THUMB-BE-DAG: and.w [[REG3:[a-z0-9]+]], [[REG1]]			; CHECK-THUMB-BE-DAG: and.w [[REG3:[a-z0-9]+]], [[REG1]]
	; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[REG3]], [[REG4]]			; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[REG3]], [[REG4]]
	; CHECK-THUMB: cmp			; CHECK-THUMB: cmp
	; CHECK-THUMB: bne			; CHECK-THUMB: bne
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}

	; CHECK-M: __sync_fetch_and_and_8			; CHECK-M: _atomic_fetch_and_8

	%r = atomicrmw and i64* %ptr, i64 %val seq_cst			%r = atomicrmw and i64* %ptr, i64 %val seq_cst
	ret i64 %r			ret i64 %r
	}			}

	define i64 @test4(i64* %ptr, i64 %val) {			define i64 @test4(i64* %ptr, i64 %val) {
	; CHECK-LABEL: test4:			; CHECK-LABEL: test4:
	; CHECK: dmb {{ish$}}			; CHECK: dmb {{ish$}}
	Show All 14 Lines
	; CHECK-THUMB-LE-DAG: orr.w [[REG4:[a-z0-9]+]], [[REG2]]			; CHECK-THUMB-LE-DAG: orr.w [[REG4:[a-z0-9]+]], [[REG2]]
	; CHECK-THUMB-BE-DAG: orr.w [[REG4:[a-z0-9]+]], [[REG2]]			; CHECK-THUMB-BE-DAG: orr.w [[REG4:[a-z0-9]+]], [[REG2]]
	; CHECK-THUMB-BE-DAG: orr.w [[REG3:[a-z0-9]+]], [[REG1]]			; CHECK-THUMB-BE-DAG: orr.w [[REG3:[a-z0-9]+]], [[REG1]]
	; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[REG3]], [[REG4]]			; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[REG3]], [[REG4]]
	; CHECK-THUMB: cmp			; CHECK-THUMB: cmp
	; CHECK-THUMB: bne			; CHECK-THUMB: bne
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}

	; CHECK-M: __sync_fetch_and_or_8			; CHECK-M: __atomic_fetch_or_8

	%r = atomicrmw or i64* %ptr, i64 %val seq_cst			%r = atomicrmw or i64* %ptr, i64 %val seq_cst
	ret i64 %r			ret i64 %r
	}			}

	define i64 @test5(i64* %ptr, i64 %val) {			define i64 @test5(i64* %ptr, i64 %val) {
	; CHECK-LABEL: test5:			; CHECK-LABEL: test5:
	; CHECK: dmb {{ish$}}			; CHECK: dmb {{ish$}}
	Show All 14 Lines
	; CHECK-THUMB-LE-DAG: eor.w [[REG4:[a-z0-9]+]], [[REG2]]			; CHECK-THUMB-LE-DAG: eor.w [[REG4:[a-z0-9]+]], [[REG2]]
	; CHECK-THUMB-BE-DAG: eor.w [[REG4:[a-z0-9]+]], [[REG2]]			; CHECK-THUMB-BE-DAG: eor.w [[REG4:[a-z0-9]+]], [[REG2]]
	; CHECK-THUMB-BE-DAG: eor.w [[REG3:[a-z0-9]+]], [[REG1]]			; CHECK-THUMB-BE-DAG: eor.w [[REG3:[a-z0-9]+]], [[REG1]]
	; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[REG3]], [[REG4]]			; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[REG3]], [[REG4]]
	; CHECK-THUMB: cmp			; CHECK-THUMB: cmp
	; CHECK-THUMB: bne			; CHECK-THUMB: bne
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}

	; CHECK-M: __sync_fetch_and_xor_8			; CHECK-M: __atomic_fetch_xor_8

	%r = atomicrmw xor i64* %ptr, i64 %val seq_cst			%r = atomicrmw xor i64* %ptr, i64 %val seq_cst
	ret i64 %r			ret i64 %r
	}			}

	define i64 @test6(i64* %ptr, i64 %val) {			define i64 @test6(i64* %ptr, i64 %val) {
	; CHECK-LABEL: test6:			; CHECK-LABEL: test6:
	; CHECK: dmb {{ish$}}			; CHECK: dmb {{ish$}}
	; CHECK: ldrexd [[REG1:(r[0-9]?[02468])]], [[REG2:(r[0-9]?[13579])]]			; CHECK: ldrexd [[REG1:(r[0-9]?[02468])]], [[REG2:(r[0-9]?[13579])]]
	; CHECK: strexd {{[a-z0-9]+}}, {{r[0-9]?[02468]}}, {{r[0-9]?[13579]}}			; CHECK: strexd {{[a-z0-9]+}}, {{r[0-9]?[02468]}}, {{r[0-9]?[13579]}}
	; CHECK: cmp			; CHECK: cmp
	; CHECK: bne			; CHECK: bne
	; CHECK: dmb {{ish$}}			; CHECK: dmb {{ish$}}

	; CHECK-THUMB-LABEL: test6:			; CHECK-THUMB-LABEL: test6:
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}
	; CHECK-THUMB: ldrexd [[REG1:[a-z0-9]+]], [[REG2:[a-z0-9]+]]			; CHECK-THUMB: ldrexd [[REG1:[a-z0-9]+]], [[REG2:[a-z0-9]+]]
	; CHECK-THUMB: strexd {{[a-z0-9]+}}, {{[a-z0-9]+}}, {{[a-z0-9]+}}			; CHECK-THUMB: strexd {{[a-z0-9]+}}, {{[a-z0-9]+}}, {{[a-z0-9]+}}
	; CHECK-THUMB: cmp			; CHECK-THUMB: cmp
	; CHECK-THUMB: bne			; CHECK-THUMB: bne
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}

	; CHECK-M: __sync_lock_test_and_set_8			; CHECK-M: __atomic_exchange_8

	%r = atomicrmw xchg i64* %ptr, i64 %val seq_cst			%r = atomicrmw xchg i64* %ptr, i64 %val seq_cst
	ret i64 %r			ret i64 %r
	}			}

	define i64 @test7(i64* %ptr, i64 %val1, i64 %val2) {			define i64 @test7(i64* %ptr, i64 %val1, i64 %val2) {
	; CHECK-LABEL: test7:			; CHECK-LABEL: test7:
	; CHECK-DAG: mov [[VAL1LO:r[0-9]+]], r1			; CHECK-DAG: mov [[VAL1LO:r[0-9]+]], r1
	Show All 19 Lines
	; CHECK-THUMB-LE: orrs.w {{.*}}, [[MISMATCH_LO]], [[MISMATCH_HI]]			; CHECK-THUMB-LE: orrs.w {{.*}}, [[MISMATCH_LO]], [[MISMATCH_HI]]
	; CHECK-THUMB: bne			; CHECK-THUMB: bne
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}
	; CHECK-THUMB: strexd {{[a-z0-9]+}}, {{[a-z0-9]+}}, {{[a-z0-9]+}}			; CHECK-THUMB: strexd {{[a-z0-9]+}}, {{[a-z0-9]+}}, {{[a-z0-9]+}}
	; CHECK-THUMB: cmp			; CHECK-THUMB: cmp
	; CHECK-THUMB: beq			; CHECK-THUMB: beq
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}

	; CHECK-M: __sync_val_compare_and_swap_8			; CHECK-M: __atomic_compare_exchange_8

	%pair = cmpxchg i64* %ptr, i64 %val1, i64 %val2 seq_cst seq_cst			%pair = cmpxchg i64* %ptr, i64 %val1, i64 %val2 seq_cst seq_cst
	%r = extractvalue { i64, i1 } %pair, 0			%r = extractvalue { i64, i1 } %pair, 0
	ret i64 %r			ret i64 %r
	}			}

	; Compiles down to a single ldrexd, except on M class devices where ldrexd			; Compiles down to a single ldrexd, except on M class devices where ldrexd
	; isn't supported.			; isn't supported.
	define i64 @test8(i64* %ptr) {			define i64 @test8(i64* %ptr) {
	; CHECK-LABEL: test8:			; CHECK-LABEL: test8:
	; CHECK: ldrexd [[REG1:(r[0-9]?[02468])]], [[REG2:(r[0-9]?[13579])]]			; CHECK: ldrexd [[REG1:(r[0-9]?[02468])]], [[REG2:(r[0-9]?[13579])]]
	; CHECK-NOT: strexd			; CHECK-NOT: strexd
	; CHECK: clrex			; CHECK: clrex
	; CHECK-NOT: strexd			; CHECK-NOT: strexd
	; CHECK: dmb {{ish$}}			; CHECK: dmb {{ish$}}

	; CHECK-THUMB-LABEL: test8:			; CHECK-THUMB-LABEL: test8:
	; CHECK-THUMB: ldrexd [[REG1:[a-z0-9]+]], [[REG2:[a-z0-9]+]]			; CHECK-THUMB: ldrexd [[REG1:[a-z0-9]+]], [[REG2:[a-z0-9]+]]
	; CHECK-THUMB-NOT: strexd			; CHECK-THUMB-NOT: strexd
	; CHECK-THUMB: clrex			; CHECK-THUMB: clrex
	; CHECK-THUMB-NOT: strexd			; CHECK-THUMB-NOT: strexd
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}

	; CHECK-M: __sync_val_compare_and_swap_8			; CHECK-M: __atomic_load_8

	%r = load atomic i64, i64* %ptr seq_cst, align 8			%r = load atomic i64, i64* %ptr seq_cst, align 8
	ret i64 %r			ret i64 %r
	}			}

	; Compiles down to atomicrmw xchg; there really isn't any more efficient			; Compiles down to atomicrmw xchg; there really isn't any more efficient
	; way to write it. Except on M class devices, where ldrexd/strexd aren't			; way to write it. Except on M class devices, where ldrexd/strexd aren't
	; supported.			; supported.
	Show All 9 Lines
	; CHECK-THUMB-LABEL: test9:			; CHECK-THUMB-LABEL: test9:
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}
	; CHECK-THUMB: ldrexd [[REG1:[a-z0-9]+]], [[REG2:[a-z0-9]+]]			; CHECK-THUMB: ldrexd [[REG1:[a-z0-9]+]], [[REG2:[a-z0-9]+]]
	; CHECK-THUMB: strexd {{[a-z0-9]+}}, {{[a-z0-9]+}}, {{[a-z0-9]+}}			; CHECK-THUMB: strexd {{[a-z0-9]+}}, {{[a-z0-9]+}}, {{[a-z0-9]+}}
	; CHECK-THUMB: cmp			; CHECK-THUMB: cmp
	; CHECK-THUMB: bne			; CHECK-THUMB: bne
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}

	; CHECK-M: __sync_lock_test_and_set_8			; CHECK-M: __atomic_store_8

	store atomic i64 %val, i64* %ptr seq_cst, align 8			store atomic i64 %val, i64* %ptr seq_cst, align 8
	ret void			ret void
	}			}

	define i64 @test10(i64* %ptr, i64 %val) {			define i64 @test10(i64* %ptr, i64 %val) {
	; CHECK-LABEL: test10:			; CHECK-LABEL: test10:
	; CHECK: dmb {{ish$}}			; CHECK: dmb {{ish$}}
	Show All 28 Lines
	; CHECK-THUMB: mov [[OUT_HI:[a-z0-9]+]], r3			; CHECK-THUMB: mov [[OUT_HI:[a-z0-9]+]], r3
	; CHECK-THUMB: movne [[OUT_HI]], [[REG2]]			; CHECK-THUMB: movne [[OUT_HI]], [[REG2]]
	; CHECK-THUMB: movne [[OUT_LO]], [[REG1]]			; CHECK-THUMB: movne [[OUT_LO]], [[REG1]]
	; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[OUT_LO]], [[OUT_HI]]			; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[OUT_LO]], [[OUT_HI]]
	; CHECK-THUMB: cmp			; CHECK-THUMB: cmp
	; CHECK-THUMB: bne			; CHECK-THUMB: bne
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}

	; CHECK-M: __sync_fetch_and_min_8			; CHECK-M: __atomic_compare_exchange_8

	%r = atomicrmw min i64* %ptr, i64 %val seq_cst			%r = atomicrmw min i64* %ptr, i64 %val seq_cst
	ret i64 %r			ret i64 %r
	}			}

	define i64 @test11(i64* %ptr, i64 %val) {			define i64 @test11(i64* %ptr, i64 %val) {
	; CHECK-LABEL: test11:			; CHECK-LABEL: test11:
	; CHECK: dmb {{ish$}}			; CHECK: dmb {{ish$}}
	Show All 28 Lines
	; CHECK-THUMB: mov [[OUT_HI:[a-z0-9]+]], r3			; CHECK-THUMB: mov [[OUT_HI:[a-z0-9]+]], r3
	; CHECK-THUMB: movne [[OUT_HI]], [[REG2]]			; CHECK-THUMB: movne [[OUT_HI]], [[REG2]]
	; CHECK-THUMB: movne [[OUT_LO]], [[REG1]]			; CHECK-THUMB: movne [[OUT_LO]], [[REG1]]
	; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[OUT_LO]], [[OUT_HI]]			; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[OUT_LO]], [[OUT_HI]]
	; CHECK-THUMB: cmp			; CHECK-THUMB: cmp
	; CHECK-THUMB: bne			; CHECK-THUMB: bne
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}

	; CHECK-M: __sync_fetch_and_umin_8			; CHECK-M: __atomic_compare_exchange_8

	%r = atomicrmw umin i64* %ptr, i64 %val seq_cst			%r = atomicrmw umin i64* %ptr, i64 %val seq_cst
	ret i64 %r			ret i64 %r
	}			}

	define i64 @test12(i64* %ptr, i64 %val) {			define i64 @test12(i64* %ptr, i64 %val) {
	; CHECK-LABEL: test12:			; CHECK-LABEL: test12:
	; CHECK: dmb {{ish$}}			; CHECK: dmb {{ish$}}
	Show All 28 Lines
	; CHECK-THUMB: mov [[OUT_HI:[a-z0-9]+]], r3			; CHECK-THUMB: mov [[OUT_HI:[a-z0-9]+]], r3
	; CHECK-THUMB: movne [[OUT_HI]], [[REG2]]			; CHECK-THUMB: movne [[OUT_HI]], [[REG2]]
	; CHECK-THUMB: movne [[OUT_LO]], [[REG1]]			; CHECK-THUMB: movne [[OUT_LO]], [[REG1]]
	; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[OUT_LO]], [[OUT_HI]]			; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[OUT_LO]], [[OUT_HI]]
	; CHECK-THUMB: cmp			; CHECK-THUMB: cmp
	; CHECK-THUMB: bne			; CHECK-THUMB: bne
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}

	; CHECK-M: __sync_fetch_and_max_8			; CHECK-M: __atomic_compare_exchange_8

	%r = atomicrmw max i64* %ptr, i64 %val seq_cst			%r = atomicrmw max i64* %ptr, i64 %val seq_cst
	ret i64 %r			ret i64 %r
	}			}

	define i64 @test13(i64* %ptr, i64 %val) {			define i64 @test13(i64* %ptr, i64 %val) {
	; CHECK-LABEL: test13:			; CHECK-LABEL: test13:
	; CHECK: dmb {{ish$}}			; CHECK: dmb {{ish$}}
	Show All 28 Lines
	; CHECK-THUMB: mov [[OUT_HI:[a-z0-9]+]], r3			; CHECK-THUMB: mov [[OUT_HI:[a-z0-9]+]], r3
	; CHECK-THUMB: movne [[OUT_HI]], [[REG2]]			; CHECK-THUMB: movne [[OUT_HI]], [[REG2]]
	; CHECK-THUMB: movne [[OUT_LO]], [[REG1]]			; CHECK-THUMB: movne [[OUT_LO]], [[REG1]]
	; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[OUT_LO]], [[OUT_HI]]			; CHECK-THUMB: strexd {{[a-z0-9]+}}, [[OUT_LO]], [[OUT_HI]]
	; CHECK-THUMB: cmp			; CHECK-THUMB: cmp
	; CHECK-THUMB: bne			; CHECK-THUMB: bne
	; CHECK-THUMB: dmb {{ish$}}			; CHECK-THUMB: dmb {{ish$}}

	; CHECK-M: __sync_fetch_and_umax_8			; CHECK-M: __atomic_compare_exchange_8

	%r = atomicrmw umax i64* %ptr, i64 %val seq_cst			%r = atomicrmw umax i64* %ptr, i64 %val seq_cst
	ret i64 %r			ret i64 %r
	}			}

llvm/test/CodeGen/ARM/atomic-load-store.ll

	Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	; THUMBM-LABEL: test4			; THUMBM-LABEL: test4
	%val = load atomic i8, i8* %ptr1 seq_cst, align 1			%val = load atomic i8, i8* %ptr1 seq_cst, align 1
	store atomic i8 %val, i8* %ptr2 seq_cst, align 1			store atomic i8 %val, i8* %ptr2 seq_cst, align 1
	ret void			ret void
	}			}

	define i64 @test_old_load_64bit(i64* %p) {			define i64 @test_old_load_64bit(i64* %p) {
	; ARMV4-LABEL: test_old_load_64bit			; ARMV4-LABEL: test_old_load_64bit
	; ARMV4: ___sync_val_compare_and_swap_8			; ARMV4: ___atomic_load_8
	%1 = load atomic i64, i64* %p seq_cst, align 8			%1 = load atomic i64, i64* %p seq_cst, align 8
	ret i64 %1			ret i64 %1
	}			}

	define void @test_old_store_64bit(i64* %p, i64 %v) {			define void @test_old_store_64bit(i64* %p, i64 %v) {
	; ARMV4-LABEL: test_old_store_64bit			; ARMV4-LABEL: test_old_store_64bit
	; ARMV4: ___sync_lock_test_and_set_8			; ARMV4: ___atomic_store_8
	store atomic i64 %v, i64* %p seq_cst, align 8			store atomic i64 %v, i64* %p seq_cst, align 8
	ret void			ret void
	}			}

llvm/test/CodeGen/ARM/atomic-op.ll

Show All 25 Lines	entry:
store i32 3855, i32* %xort		store i32 3855, i32* %xort
store i32 4, i32* %temp		store i32 4, i32* %temp
%tmp = load i32, i32* %temp		%tmp = load i32, i32* %temp
call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()		call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()
; CHECK: ldrex		; CHECK: ldrex
; CHECK: add		; CHECK: add
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_add_4		; CHECK-T1: bl ___sync_fetch_and_add_4
; CHECK-T1-M0: bl ___sync_fetch_and_add_4		; CHECK-T1-M0: bl ___atomic_fetch_add_4
; CHECK-BAREMETAL: add		; CHECK-BAREMETAL: add
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%0 = atomicrmw add i32* %val1, i32 %tmp monotonic		%0 = atomicrmw add i32* %val1, i32 %tmp monotonic
store i32 %0, i32* %old		store i32 %0, i32* %old
call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()		call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()
; CHECK: ldrex		; CHECK: ldrex
; CHECK: sub		; CHECK: sub
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_sub_4		; CHECK-T1: bl ___sync_fetch_and_sub_4
; CHECK-T1-M0: bl ___sync_fetch_and_sub_4		; CHECK-T1-M0: bl ___atomic_fetch_sub_4
; CHECK-BAREMETAL: sub		; CHECK-BAREMETAL: sub
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%1 = atomicrmw sub i32* %val2, i32 30 monotonic		%1 = atomicrmw sub i32* %val2, i32 30 monotonic
store i32 %1, i32* %old		store i32 %1, i32* %old
call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()		call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()
; CHECK: ldrex		; CHECK: ldrex
; CHECK: add		; CHECK: add
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_add_4		; CHECK-T1: bl ___sync_fetch_and_add_4
; CHECK-T1-M0: bl ___sync_fetch_and_add_4		; CHECK-T1-M0: bl ___atomic_fetch_add_4
; CHECK-BAREMETAL: add		; CHECK-BAREMETAL: add
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%2 = atomicrmw add i32* %val2, i32 1 monotonic		%2 = atomicrmw add i32* %val2, i32 1 monotonic
store i32 %2, i32* %old		store i32 %2, i32* %old
call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()		call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()
; CHECK: ldrex		; CHECK: ldrex
; CHECK: sub		; CHECK: sub
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_sub_4		; CHECK-T1: bl ___sync_fetch_and_sub_4
; CHECK-T1-M0: bl ___sync_fetch_and_sub_4		; CHECK-T1-M0: bl ___atomic_fetch_sub_4
; CHECK-BAREMETAL: sub		; CHECK-BAREMETAL: sub
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%3 = atomicrmw sub i32* %val2, i32 1 monotonic		%3 = atomicrmw sub i32* %val2, i32 1 monotonic
store i32 %3, i32* %old		store i32 %3, i32* %old
call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()		call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()
; CHECK: ldrex		; CHECK: ldrex
; CHECK: and		; CHECK: and
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_and_4		; CHECK-T1: bl ___sync_fetch_and_and_4
; CHECK-T1-M0: bl ___sync_fetch_and_and_4		; CHECK-T1-M0: bl ___atomic_fetch_and_4
; CHECK-BAREMETAL: and		; CHECK-BAREMETAL: and
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%4 = atomicrmw and i32* %andt, i32 4080 monotonic		%4 = atomicrmw and i32* %andt, i32 4080 monotonic
store i32 %4, i32* %old		store i32 %4, i32* %old
call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()		call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()
; CHECK: ldrex		; CHECK: ldrex
; CHECK: or		; CHECK: or
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_or_4		; CHECK-T1: bl ___sync_fetch_and_or_4
; CHECK-T1-M0: bl ___sync_fetch_and_or_4		; CHECK-T1-M0: bl ___atomic_fetch_or_4
; CHECK-BAREMETAL: or		; CHECK-BAREMETAL: or
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%5 = atomicrmw or i32* %ort, i32 4080 monotonic		%5 = atomicrmw or i32* %ort, i32 4080 monotonic
store i32 %5, i32* %old		store i32 %5, i32* %old
call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()		call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()
; CHECK: ldrex		; CHECK: ldrex
; CHECK: eor		; CHECK: eor
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_xor_4		; CHECK-T1: bl ___sync_fetch_and_xor_4
; CHECK-T1-M0: bl ___sync_fetch_and_xor_4		; CHECK-T1-M0: bl ___atomic_fetch_xor_4
; CHECK-BAREMETAL: eor		; CHECK-BAREMETAL: eor
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%6 = atomicrmw xor i32* %xort, i32 4080 monotonic		%6 = atomicrmw xor i32* %xort, i32 4080 monotonic
store i32 %6, i32* %old		store i32 %6, i32* %old
call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()		call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()
; CHECK: ldrex		; CHECK: ldrex
; CHECK: cmp		; CHECK: cmp
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_min_4		; CHECK-T1: bl ___sync_fetch_and_min_4
; CHECK-T1-M0: bl ___sync_fetch_and_min_4		; CHECK-T1-M0: bl ___atomic_compare_exchange_4
; CHECK-BAREMETAL: cmp		; CHECK-BAREMETAL: cmp
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%7 = atomicrmw min i32* %val2, i32 16 monotonic		%7 = atomicrmw min i32* %val2, i32 16 monotonic
store i32 %7, i32* %old		store i32 %7, i32* %old
call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()		call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()
%neg = sub i32 0, 1		%neg = sub i32 0, 1
; CHECK: ldrex		; CHECK: ldrex
; CHECK: cmp		; CHECK: cmp
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_min_4		; CHECK-T1: bl ___sync_fetch_and_min_4
; CHECK-T1-M0: bl ___sync_fetch_and_min_4		; CHECK-T1-M0: bl ___atomic_compare_exchange_4
; CHECK-BAREMETAL: cmp		; CHECK-BAREMETAL: cmp
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%8 = atomicrmw min i32* %val2, i32 %neg monotonic		%8 = atomicrmw min i32* %val2, i32 %neg monotonic
store i32 %8, i32* %old		store i32 %8, i32* %old
call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()		call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()
; CHECK: ldrex		; CHECK: ldrex
; CHECK: cmp		; CHECK: cmp
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_max_4		; CHECK-T1: bl ___sync_fetch_and_max_4
; CHECK-T1-M0: bl ___sync_fetch_and_max_4		; CHECK-T1-M0: bl ___atomic_compare_exchange_4
; CHECK-BAREMETAL: cmp		; CHECK-BAREMETAL: cmp
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%9 = atomicrmw max i32* %val2, i32 1 monotonic		%9 = atomicrmw max i32* %val2, i32 1 monotonic
store i32 %9, i32* %old		store i32 %9, i32* %old
call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()		call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()
; CHECK: ldrex		; CHECK: ldrex
; CHECK: bic		; CHECK: bic
; CHECK-NOT: cmp		; CHECK-NOT: cmp
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_max_4		; CHECK-T1: bl ___sync_fetch_and_max_4
; CHECK-T1-M0: bl ___sync_fetch_and_max_4		; CHECK-T1-M0: bl ___atomic_compare_exchange_4
; CHECK-BAREMETAL: bic		; CHECK-BAREMETAL: bic
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%10 = atomicrmw max i32* %val2, i32 0 monotonic		%10 = atomicrmw max i32* %val2, i32 0 monotonic
store i32 %10, i32* %old		store i32 %10, i32* %old
call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()		call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()
; CHECK: ldrex		; CHECK: ldrex
; CHECK: cmp		; CHECK: cmp
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_umin_4		; CHECK-T1: bl ___sync_fetch_and_umin_4
; CHECK-T1-M0: bl ___sync_fetch_and_umin_4		; CHECK-T1-M0: bl ___atomic_compare_exchange_4
; CHECK-BAREMETAL: cmp		; CHECK-BAREMETAL: cmp
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%11 = atomicrmw umin i32* %val2, i32 16 monotonic		%11 = atomicrmw umin i32* %val2, i32 16 monotonic
store i32 %11, i32* %old		store i32 %11, i32* %old
call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()		call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()
%uneg = sub i32 0, 1		%uneg = sub i32 0, 1
; CHECK: ldrex		; CHECK: ldrex
; CHECK: cmp		; CHECK: cmp
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_umin_4		; CHECK-T1: bl ___sync_fetch_and_umin_4
; CHECK-T1-M0: bl ___sync_fetch_and_umin_4		; CHECK-T1-M0: bl ___atomic_compare_exchange_4
; CHECK-BAREMETAL: cmp		; CHECK-BAREMETAL: cmp
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%12 = atomicrmw umin i32* %val2, i32 %uneg monotonic		%12 = atomicrmw umin i32* %val2, i32 %uneg monotonic
store i32 %12, i32* %old		store i32 %12, i32* %old
call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()		call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()
; CHECK: ldrex		; CHECK: ldrex
; CHECK: cmp		; CHECK: cmp
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_umax_4		; CHECK-T1: bl ___sync_fetch_and_umax_4
; CHECK-T1-M0: bl ___sync_fetch_and_umax_4		; CHECK-T1-M0: bl ___atomic_compare_exchange_4
; CHECK-BAREMETAL: cmp		; CHECK-BAREMETAL: cmp
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%13 = atomicrmw umax i32* %val2, i32 1 monotonic		%13 = atomicrmw umax i32* %val2, i32 1 monotonic
store i32 %13, i32* %old		store i32 %13, i32* %old
call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()		call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()
; CHECK: ldrex		; CHECK: ldrex
; CHECK: cmp		; CHECK: cmp
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_umax_4		; CHECK-T1: bl ___sync_fetch_and_umax_4
; CHECK-T1-M0: bl ___sync_fetch_and_umax_4		; CHECK-T1-M0: bl ___atomic_compare_exchange_4
; CHECK-BAREMETAL: cmp		; CHECK-BAREMETAL: cmp
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%14 = atomicrmw umax i32* %val2, i32 0 monotonic		%14 = atomicrmw umax i32* %val2, i32 0 monotonic
store i32 %14, i32* %old		store i32 %14, i32* %old

ret void		ret void
}		}

define void @func2() nounwind {		define void @func2() nounwind {
entry:		entry:
%val = alloca i16		%val = alloca i16
%old = alloca i16		%old = alloca i16
store i16 31, i16* %val		store i16 31, i16* %val
; CHECK: ldrex		; CHECK: ldrex
; CHECK: cmp		; CHECK: cmp
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_umin_2		; CHECK-T1: bl ___sync_fetch_and_umin_2
; CHECK-T1-M0: bl ___sync_fetch_and_umin_2		; CHECK-T1-M0: bl ___atomic_compare_exchange_2
; CHECK-BAREMETAL: cmp		; CHECK-BAREMETAL: cmp
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%0 = atomicrmw umin i16* %val, i16 16 monotonic		%0 = atomicrmw umin i16* %val, i16 16 monotonic
store i16 %0, i16* %old		store i16 %0, i16* %old
%uneg = sub i16 0, 1		%uneg = sub i16 0, 1
; CHECK: ldrex		; CHECK: ldrex
; CHECK: cmp		; CHECK: cmp
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_umin_2		; CHECK-T1: bl ___sync_fetch_and_umin_2
; CHECK-T1-M0: bl ___sync_fetch_and_umin_2		; CHECK-T1-M0: bl ___atomic_compare_exchange_2
; CHECK-BAREMETAL: cmp		; CHECK-BAREMETAL: cmp
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%1 = atomicrmw umin i16* %val, i16 %uneg monotonic		%1 = atomicrmw umin i16* %val, i16 %uneg monotonic
store i16 %1, i16* %old		store i16 %1, i16* %old
; CHECK: ldrex		; CHECK: ldrex
; CHECK: cmp		; CHECK: cmp
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_umax_2		; CHECK-T1: bl ___sync_fetch_and_umax_2
; CHECK-T1-M0: bl ___sync_fetch_and_umax_2		; CHECK-T1-M0: bl ___atomic_compare_exchange_2
; CHECK-BAREMETAL: cmp		; CHECK-BAREMETAL: cmp
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%2 = atomicrmw umax i16* %val, i16 1 monotonic		%2 = atomicrmw umax i16* %val, i16 1 monotonic
store i16 %2, i16* %old		store i16 %2, i16* %old
; CHECK: ldrex		; CHECK: ldrex
; CHECK: cmp		; CHECK: cmp
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_umax_2		; CHECK-T1: bl ___sync_fetch_and_umax_2
; CHECK-T1-M0: bl ___sync_fetch_and_umax_2		; CHECK-T1-M0: bl ___atomic_compare_exchange_2
; CHECK-BAREMETAL: cmp		; CHECK-BAREMETAL: cmp
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%3 = atomicrmw umax i16* %val, i16 0 monotonic		%3 = atomicrmw umax i16* %val, i16 0 monotonic
store i16 %3, i16* %old		store i16 %3, i16* %old
ret void		ret void
}		}

define void @func3() nounwind {		define void @func3() nounwind {
entry:		entry:
%val = alloca i8		%val = alloca i8
%old = alloca i8		%old = alloca i8
store i8 31, i8* %val		store i8 31, i8* %val
; CHECK: ldrex		; CHECK: ldrex
; CHECK: cmp		; CHECK: cmp
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_umin_1		; CHECK-T1: bl ___sync_fetch_and_umin_1
; CHECK-T1-M0: bl ___sync_fetch_and_umin_1		; CHECK-T1-M0: bl ___atomic_compare_exchange_1
; CHECK-BAREMETAL: cmp		; CHECK-BAREMETAL: cmp
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%0 = atomicrmw umin i8* %val, i8 16 monotonic		%0 = atomicrmw umin i8* %val, i8 16 monotonic
store i8 %0, i8* %old		store i8 %0, i8* %old
; CHECK: ldrex		; CHECK: ldrex
; CHECK: cmp		; CHECK: cmp
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_umin_1		; CHECK-T1: bl ___sync_fetch_and_umin_1
; CHECK-T1-M0: bl ___sync_fetch_and_umin_1		; CHECK-T1-M0: bl ___atomic_compare_exchange_1
; CHECK-BAREMETAL: cmp		; CHECK-BAREMETAL: cmp
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%uneg = sub i8 0, 1		%uneg = sub i8 0, 1
%1 = atomicrmw umin i8* %val, i8 %uneg monotonic		%1 = atomicrmw umin i8* %val, i8 %uneg monotonic
store i8 %1, i8* %old		store i8 %1, i8* %old
; CHECK: ldrex		; CHECK: ldrex
; CHECK: cmp		; CHECK: cmp
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_umax_1		; CHECK-T1: bl ___sync_fetch_and_umax_1
; CHECK-T1-M0: bl ___sync_fetch_and_umax_1		; CHECK-T1-M0: bl ___atomic_compare_exchange_1
; CHECK-BAREMETAL: cmp		; CHECK-BAREMETAL: cmp
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%2 = atomicrmw umax i8* %val, i8 1 monotonic		%2 = atomicrmw umax i8* %val, i8 1 monotonic
store i8 %2, i8* %old		store i8 %2, i8* %old
; CHECK: ldrex		; CHECK: ldrex
; CHECK: cmp		; CHECK: cmp
; CHECK: strex		; CHECK: strex
; CHECK-T1: bl ___sync_fetch_and_umax_1		; CHECK-T1: bl ___sync_fetch_and_umax_1
; CHECK-T1-M0: bl ___sync_fetch_and_umax_1		; CHECK-T1-M0: bl ___atomic_compare_exchange_1
; CHECK-BAREMETAL: cmp		; CHECK-BAREMETAL: cmp
; CHECK-BAREMETAL-NOT: __sync		; CHECK-BAREMETAL-NOT: __sync
%3 = atomicrmw umax i8* %val, i8 0 monotonic		%3 = atomicrmw umax i8* %val, i8 0 monotonic
store i8 %3, i8* %old		store i8 %3, i8* %old
ret void		ret void
}		}

; CHECK: func4		; CHECK: func4
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	; CHECK-LABEL: load_load_add_acquire
%tmp = add i32 %val1, %val2		%tmp = add i32 %val1, %val2

; CHECK: ldr {{r[0-9]}}, [r0]		; CHECK: ldr {{r[0-9]}}, [r0]
; CHECK: dmb		; CHECK: dmb
; CHECK: ldr {{r[0-9]}}, [r1]		; CHECK: ldr {{r[0-9]}}, [r1]
; CHECK: dmb		; CHECK: dmb
; CHECK: add r0,		; CHECK: add r0,

; CHECK-T1-M0: ldr {{r[0-9]}}, [r0]		; CHECK-T1-M0: __atomic_load_4
; CHECK-T1-M0: dmb		; CHECK-T1-M0: __atomic_load_4
; CHECK-T1-M0: ldr {{r[0-9]}}, [r1]
; CHECK-T1-M0: dmb

; CHECK-T1: ___sync_val_compare_and_swap_4		; CHECK-T1: ___sync_val_compare_and_swap_4
; CHECK-T1: ___sync_val_compare_and_swap_4		; CHECK-T1: ___sync_val_compare_and_swap_4

; CHECK-BAREMETAL: ldr {{r[0-9]}}, [r0]		; CHECK-BAREMETAL: ldr {{r[0-9]}}, [r0]
; CHECK-BAREMETAL-NOT: dmb		; CHECK-BAREMETAL-NOT: dmb
; CHECK-BAREMETAL: ldr {{r[0-9]}}, [r1]		; CHECK-BAREMETAL: ldr {{r[0-9]}}, [r1]
; CHECK-BAREMETAL-NOT: dmb		; CHECK-BAREMETAL-NOT: dmb
Show All 10 Lines
; CHECK: dmb		; CHECK: dmb
; CHECK: str r1, [r0]		; CHECK: str r1, [r0]
; CHECK: dmb		; CHECK: dmb
; CHECK: str r3, [r2]		; CHECK: str r3, [r2]

; CHECK-T1: ___sync_lock_test_and_set		; CHECK-T1: ___sync_lock_test_and_set
; CHECK-T1: ___sync_lock_test_and_set		; CHECK-T1: ___sync_lock_test_and_set

; CHECK-T1-M0: dmb		; CHECK-T1-M0: __atomic_store_4
; CHECK-T1-M0: str r1, [r0]		; CHECK-T1-M0: __atomic_store_4
; CHECK-T1-M0: dmb
; CHECK-T1-M0: str r3, [r2]

; CHECK-BAREMETAL-NOT: dmb		; CHECK-BAREMETAL-NOT: dmb
; CHECK-BAREMETAL: str r1, [r0]		; CHECK-BAREMETAL: str r1, [r0]
; CHECK-BAREMETAL-NOT: dmb		; CHECK-BAREMETAL-NOT: dmb
; CHECK-BAREMETAL: str r3, [r2]		; CHECK-BAREMETAL: str r3, [r2]

ret void		ret void
}		}

define void @load_fence_store_monotonic(i32* %mem1, i32* %mem2) {		define void @load_fence_store_monotonic(i32* %mem1, i32* %mem2) {
; CHECK-LABEL: load_fence_store_monotonic		; CHECK-LABEL: load_fence_store_monotonic
%val = load atomic i32, i32* %mem1 monotonic, align 4		%val = load atomic i32, i32* %mem1 monotonic, align 4
fence seq_cst		fence seq_cst
store atomic i32 %val, i32* %mem2 monotonic, align 4		store atomic i32 %val, i32* %mem2 monotonic, align 4

; CHECK: ldr [[R0:r[0-9]]], [r0]		; CHECK: ldr [[R0:r[0-9]]], [r0]
; CHECK: dmb		; CHECK: dmb
; CHECK: str [[R0]], [r1]		; CHECK: str [[R0]], [r1]

; CHECK-T1-M0: ldr [[R0:r[0-9]]], [r0]		; CHECK-T1-M0: __atomic_load_4
; CHECK-T1-M0: dmb		; CHECK-T1-M0: dmb
		nikicUnsubmitted Not Done Reply Inline Actions Is the dmb still needed if we're going through `__atomic`? nikic: Is the dmb still needed if we're going through `__atomic`?
		efriedmaAuthorUnsubmitted Done Reply Inline Actions __atomic_load_4 is just supposed to be a sequentially consistent load, and a sequentially consistent load doesn't imply a full fence, even if the load is sequentially consistent. So there isn't any obvious rule that would allow that transform. See also https://github.com/llvm/llvm-project/issues/29472, https://github.com/llvm/llvm-project/issues/56450 . efriedma: __atomic_load_4 is just supposed to be a sequentially consistent load, and a sequentially…
		nikicUnsubmitted Not Done Reply Inline Actions Uh sorry, I missed that there was an explicit IR fence instruction in this test. The preceding two functions don't have an IR fence, and also no longer emit dmb, so everything is good here. nikic: Uh sorry, I missed that there was an explicit IR fence instruction in this test. The preceding…
; CHECK-T1-M0: str [[R0]], [r1]		; CHECK-T1-M0: __atomic_store_4
		aykevlUnsubmitted Not Done Reply Inline Actions This looks like a regression. I'm pretty sure that these loads and stores are always atomic. Is there a reason why this has to go through a library function? aykevl: This looks like a regression. I'm pretty sure that these loads and stores are always atomic. Is…
		efriedmaAuthorUnsubmitted Done Reply Inline Actions In general, `__atomic_` routines are not guaranteed to be lock-free. Suppose `___atomic_compare_exchange_4` is running at the same time on a different thread; if we store directly to the address, we're ignoring whatever lock `___atomic_compare_exchange_4` uses internally. Because of that, unless we have some prior knowledge that libatomic supports a lock-free cmpxchg with the given width, all atomic operations have to go through libatomic. It doesn't matter if load or store ops are atomic at a hardware level. This isn't a problem with __sync_ because they're guaranteed to be lock-free. See also https://llvm.org/docs/Atomics.html#libcalls-atomic . efriedma: In general, `__atomic_*` routines are not guaranteed to be lock-free. Suppose…

; CHECK-T1: ldr [[R0:r[0-9]]], [{{r[0-9]+}}]		; CHECK-T1: ldr [[R0:r[0-9]]], [{{r[0-9]+}}]
; CHECK-T1: {{dmb\|bl ___sync_synchronize}}		; CHECK-T1: {{dmb\|bl ___sync_synchronize}}
; CHECK-T1: str [[R0]], [{{r[0-9]+}}]		; CHECK-T1: str [[R0]], [{{r[0-9]+}}]

; CHECK-BAREMETAL: ldr [[R0:r[0-9]]], [r0]		; CHECK-BAREMETAL: ldr [[R0:r[0-9]]], [r0]
; CHECK-BAREMETAL-NOT: dmb		; CHECK-BAREMETAL-NOT: dmb
; CHECK-BAREMETAL: str [[R0]], [r1]		; CHECK-BAREMETAL: str [[R0]], [r1]

ret void		ret void
}		}

llvm/test/CodeGen/ARM/atomic-ops-m33.ll

	Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	; CHECK-NOT: mcr			; CHECK-NOT: mcr

	; CHECK: mov r0, r[[OLD]]			; CHECK: mov r0, r[[OLD]]
	ret i32 %old			ret i32 %old
	}			}

	define void @test_atomic_load_add_i64(i64 %offset) nounwind {			define void @test_atomic_load_add_i64(i64 %offset) nounwind {
	; CHECK-LABEL: test_atomic_load_add_i64:			; CHECK-LABEL: test_atomic_load_add_i64:
	; CHECK: bl __sync_fetch_and_add_8			; CHECK: bl __atomic_fetch_add_8
	%old = atomicrmw add i64* @var64, i64 %offset monotonic			%old = atomicrmw add i64* @var64, i64 %offset monotonic
	store i64 %old, i64* @var64			store i64 %old, i64* @var64
	ret void			ret void
	}			}

	define i8 @test_load_acquire_i8(i8* %ptr) {			define i8 @test_load_acquire_i8(i8* %ptr) {
	; CHECK-LABEL: test_load_acquire_i8:			; CHECK-LABEL: test_load_acquire_i8:
	; CHECK: ldab r0, [r0]			; CHECK: ldab r0, [r0]
	▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/atomicrmw_exclusive_monitor_ints.ll

	Show First 20 Lines • Show All 349 Lines • ▼ Show 20 Lines
	}			}

	define i64 @test_xchg_i64() {			define i64 @test_xchg_i64() {
	; COMMON-LABEL: test_xchg_i64:			; COMMON-LABEL: test_xchg_i64:
	; EXPAND64: ldrexd			; EXPAND64: ldrexd
	; EXPAND64-NOT: str			; EXPAND64-NOT: str
	; EXPAND64: strexd			; EXPAND64: strexd
	; THUMB1: bl __sync_lock_test_and_set_8			; THUMB1: bl __sync_lock_test_and_set_8
	; BASELINE64: bl __sync_val_compare_and_swap_8			; BASELINE64: bl __atomic_exchange_8
	entry:			entry:
	%0 = atomicrmw xchg i64* @atomic_i64, i64 1 monotonic			%0 = atomicrmw xchg i64* @atomic_i64, i64 1 monotonic
	ret i64 %0			ret i64 %0
	}			}
	define i64 @test_add_i64() {			define i64 @test_add_i64() {
	; COMMON-LABEL: test_add_i64:			; COMMON-LABEL: test_add_i64:
	; EXPAND64: ldrexd			; EXPAND64: ldrexd
	; EXPAND64-NOT: str			; EXPAND64-NOT: str
	; EXPAND64: strexd			; EXPAND64: strexd
	; THUMB1: bl __sync_fetch_and_add_8			; THUMB1: bl __sync_fetch_and_add_8
	; BASELINE64: bl __sync_val_compare_and_swap_8			; BASELINE64: bl __atomic_fetch_add_8
	entry:			entry:
	%0 = atomicrmw add i64* @atomic_i64, i64 1 monotonic			%0 = atomicrmw add i64* @atomic_i64, i64 1 monotonic
	ret i64 %0			ret i64 %0
	}			}
	define i64 @test_sub_i64() {			define i64 @test_sub_i64() {
	; COMMON-LABEL: test_sub_i64:			; COMMON-LABEL: test_sub_i64:
	; EXPAND64: ldrexd			; EXPAND64: ldrexd
	; EXPAND64-NOT: str			; EXPAND64-NOT: str
	; EXPAND64: strexd			; EXPAND64: strexd
	; THUMB1: bl __sync_fetch_and_sub_8			; THUMB1: bl __sync_fetch_and_sub_8
	; BASELINE64: bl __sync_val_compare_and_swap_8			; BASELINE64: bl __atomic_fetch_sub_8
	entry:			entry:
	%0 = atomicrmw sub i64* @atomic_i64, i64 1 monotonic			%0 = atomicrmw sub i64* @atomic_i64, i64 1 monotonic
	ret i64 %0			ret i64 %0
	}			}
	define i64 @test_and_i64() {			define i64 @test_and_i64() {
	; COMMON-LABEL: test_and_i64:			; COMMON-LABEL: test_and_i64:
	; EXPAND64: ldrexd			; EXPAND64: ldrexd
	; EXPAND64-NOT: str			; EXPAND64-NOT: str
	; EXPAND64: strexd			; EXPAND64: strexd
	; THUMB1: bl __sync_fetch_and_and_8			; THUMB1: bl __sync_fetch_and_and_8
	; BASELINE64: bl __sync_val_compare_and_swap_8			; BASELINE64: bl __atomic_fetch_and_8
	entry:			entry:
	%0 = atomicrmw and i64* @atomic_i64, i64 1 monotonic			%0 = atomicrmw and i64* @atomic_i64, i64 1 monotonic
	ret i64 %0			ret i64 %0
	}			}
	define i64 @test_nand_i64() {			define i64 @test_nand_i64() {
	; COMMON-LABEL: test_nand_i64:			; COMMON-LABEL: test_nand_i64:
	; EXPAND64: ldrexd			; EXPAND64: ldrexd
	; EXPAND64-NOT: str			; EXPAND64-NOT: str
	; EXPAND64: strexd			; EXPAND64: strexd
	; THUMB1: bl __sync_fetch_and_nand_8			; THUMB1: bl __sync_fetch_and_nand_8
	; BASELINE64: bl __sync_val_compare_and_swap_8			; BASELINE64: bl __atomic_fetch_nand_8
	entry:			entry:
	%0 = atomicrmw nand i64* @atomic_i64, i64 1 monotonic			%0 = atomicrmw nand i64* @atomic_i64, i64 1 monotonic
	ret i64 %0			ret i64 %0
	}			}
	define i64 @test_or_i64() {			define i64 @test_or_i64() {
	; COMMON-LABEL: test_or_i64:			; COMMON-LABEL: test_or_i64:
	; EXPAND64: ldrexd			; EXPAND64: ldrexd
	; EXPAND64-NOT: str			; EXPAND64-NOT: str
	; EXPAND64: strexd			; EXPAND64: strexd
	; THUMB1: bl __sync_fetch_and_or_8			; THUMB1: bl __sync_fetch_and_or_8
	; BASELINE64: bl __sync_val_compare_and_swap_8			; BASELINE64: bl __atomic_fetch_or_8
	entry:			entry:
	%0 = atomicrmw or i64* @atomic_i64, i64 1 monotonic			%0 = atomicrmw or i64* @atomic_i64, i64 1 monotonic
	ret i64 %0			ret i64 %0
	}			}
	define i64 @test_xor_i64() {			define i64 @test_xor_i64() {
	; COMMON-LABEL: test_xor_i64:			; COMMON-LABEL: test_xor_i64:
	; EXPAND64: ldrexd			; EXPAND64: ldrexd
	; EXPAND64-NOT: str			; EXPAND64-NOT: str
	; EXPAND64: strexd			; EXPAND64: strexd
	; THUMB1: bl __sync_fetch_and_xor_8			; THUMB1: bl __sync_fetch_and_xor_8
	; BASELINE64: bl __sync_val_compare_and_swap_8			; BASELINE64: bl __atomic_fetch_xor_8
	entry:			entry:
	%0 = atomicrmw xor i64* @atomic_i64, i64 1 monotonic			%0 = atomicrmw xor i64* @atomic_i64, i64 1 monotonic
	ret i64 %0			ret i64 %0
	}			}

	define i64 @test_max_i64() {			define i64 @test_max_i64() {
	; COMMON-LABEL: test_max_i64:			; COMMON-LABEL: test_max_i64:
	; EXPAND64: ldrexd			; EXPAND64: ldrexd
	; EXPAND64-NOT: str			; EXPAND64-NOT: str
	; EXPAND64: strexd			; EXPAND64: strexd
	; THUMB1: bl __sync_fetch_and_max_8			; THUMB1: bl __sync_fetch_and_max_8
	; BASELINE64: bl __sync_val_compare_and_swap_8			; BASELINE64: bl __atomic_compare_exchange_8
	entry:			entry:
	%0 = atomicrmw max i64* @atomic_i64, i64 1 monotonic			%0 = atomicrmw max i64* @atomic_i64, i64 1 monotonic
	ret i64 %0			ret i64 %0
	}			}
	define i64 @test_min_i64() {			define i64 @test_min_i64() {
	; COMMON-LABEL: test_min_i64:			; COMMON-LABEL: test_min_i64:
	; EXPAND64: ldrexd			; EXPAND64: ldrexd
	; EXPAND64-NOT: str			; EXPAND64-NOT: str
	; EXPAND64: strexd			; EXPAND64: strexd
	; THUMB1: bl __sync_fetch_and_min_8			; THUMB1: bl __sync_fetch_and_min_8
	; BASELINE64: bl __sync_val_compare_and_swap_8			; BASELINE64: bl __atomic_compare_exchange_8
	entry:			entry:
	%0 = atomicrmw min i64* @atomic_i64, i64 1 monotonic			%0 = atomicrmw min i64* @atomic_i64, i64 1 monotonic
	ret i64 %0			ret i64 %0
	}			}
	define i64 @test_umax_i64() {			define i64 @test_umax_i64() {
	; COMMON-LABEL: test_umax_i64:			; COMMON-LABEL: test_umax_i64:
	; EXPAND64: ldrexd			; EXPAND64: ldrexd
	; EXPAND64-NOT: str			; EXPAND64-NOT: str
	; EXPAND64: strexd			; EXPAND64: strexd
	; THUMB1: bl __sync_fetch_and_umax_8			; THUMB1: bl __sync_fetch_and_umax_8
	; BASELINE64: bl __sync_val_compare_and_swap_8			; BASELINE64: bl __atomic_compare_exchange_8
	entry:			entry:
	%0 = atomicrmw umax i64* @atomic_i64, i64 1 monotonic			%0 = atomicrmw umax i64* @atomic_i64, i64 1 monotonic
	ret i64 %0			ret i64 %0
	}			}
	define i64 @test_umin_i64() {			define i64 @test_umin_i64() {
	; COMMON-LABEL: test_umin_i64:			; COMMON-LABEL: test_umin_i64:
	; EXPAND64: ldrexd			; EXPAND64: ldrexd
	; EXPAND64-NOT: str			; EXPAND64-NOT: str
	; EXPAND64: strexd			; EXPAND64: strexd
	; THUMB1: bl __sync_fetch_and_umin_8			; THUMB1: bl __sync_fetch_and_umin_8
	; BASELINE64: bl __sync_val_compare_and_swap_8			; BASELINE64: bl __atomic_compare_exchange_8
	entry:			entry:
	%0 = atomicrmw umin i64* @atomic_i64, i64 1 monotonic			%0 = atomicrmw umin i64* @atomic_i64, i64 1 monotonic
	ret i64 %0			ret i64 %0
	}			}