This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
2/4
AtomicLoopBundler.h
-
lib/
-
CodeGen/
4
RegAllocFast.cpp
-
Target/
-
AArch64/
-
AArch64.h
-
AArch64AtomicLoopBundler.h
-
AArch64AtomicLoopBundler.cpp
1
AArch64TargetMachine.cpp
-
CMakeLists.txt
-
ARM/
-
ARM.h
-
ARMAtomicLoopBundler.h
1/1
ARMAtomicLoopBundler.cpp
-
ARMTargetMachine.cpp
-
CMakeLists.txt
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
O0-pipeline.ll
-
atomicrmw_exclusive_monitor.ll
-
AMDGPU/
-
fast-regalloc-bundles.mir
-
ARM/
-
atomicrmw_exclusive_monitor.ll

Differential D94949

[AArch64][RegAllocFast] Add findSpillBefore to TargetRegisterInfo
AbandonedPublic

Authored by tmatheson on Jan 18 2021, 11:25 PM.

Download Raw Diff

Details

Reviewers

t.p.northover
arsenm
voltur01
sdesmalen
mtrofin
vhscampos
greened
kazu

Summary

Due to interactions between RegAllocFast and expansion of atomicrmw at -O0,
both ARM and AArch64 backends would emit stores between ldrex and strex,
which clears the exclusive access monitor.

atomicrmw instructions are expanded to loops, where the main MachineBasicBlock
includes a ldrex/strex pair. It then conditionally branches if this atomic
operation was successful. Because of this, the register loaded by ldrex is
LiveOut, and RegAllocFast therefore spills this register. The issue is that it
spills between the ldrex and strex, which invalidates the monitor.

I tried several ways of fixing this which all have problems:

Adding a pass after RegAllocFast which moved the str instructions (spills) to after the strex. For more complex sequences like those generated for 64 bit atomics with e.g. nand, this becomes difficult to do.

Add new pseudo instructions for atomicrmw which are expanded after register allocation. This would involve duplicating all of the loop creation code. Similar approach has been used before for cmpxchg: https://reviews.llvm.org/D16239?id=52861

Stop FastRegAlloc from spilling these registers for these instructions. However, other instructions between ldrex and strex can spill, and it is hard to catch them all.

Move the location of the spill to after the strex. This is the approach taken.

To spill after strex, I have added a new function to TargetRegisterInfo which returns an appropriate
point to spill at for a given instruction. For all backends except ARM/AArch64 this just returns the
next instruction. For ARM/AArch64 it returns the strex.

Alternatives:

A similar approach could have been applied to calcSpillCost instead, to set a high spill cost between ldrex/strex
Pseudo instructions could have been used instead

It is also possible that the cmpxchg pseudoinstructions could be removed, and the same technique used for them.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tmatheson created this revision.Jan 18 2021, 11:25 PM

Herald added subscribers: danielkiss, jfb, hiraditya and 3 others. · View Herald TranscriptJan 18 2021, 11:25 PM

tmatheson requested review of this revision.Jan 18 2021, 11:25 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 18 2021, 11:25 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

tmatheson added reviewers: t.p.northover, arsenm, voltur01, sdesmalen, mtrofin, vhscampos, greened, kazu.Jan 18 2021, 11:33 PM

Herald added a subscriber: wdng. · View Herald TranscriptJan 18 2021, 11:33 PM

Harbormaster completed remote builds in B85673: Diff 317469.Jan 19 2021, 12:17 AM

AMDGPU has a similar problem/mechanism handled with TII::isBasicBlockPrologue. I'm not really satisfied with it, and this looks similarly ad-hoc. I'm not really sure what the right solution is, but I've been considering something looking like a new type of label pseudo

llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
820 ↗	(On Diff #317469)	Technically 0 is TargetOpcode::PHI
llvm/lib/Target/ARM/ARMRegisterInfo.cpp
22–23 ↗	(On Diff #317469)	Could you use a bundle for this?

Use Optional rather than returning 0 (==TargetOpcode::PHI).
Rename isExclusiveLoad to IsExclusiveLoad.

tmatheson marked an inline comment as done.Jan 27 2021, 12:30 AM

tmatheson added inline comments.

llvm/lib/Target/ARM/ARMRegisterInfo.cpp
22–23 ↗	(On Diff #317469)	Potentially, can you elaborate on what you mean exactly? One approach could be to create a bundle in `AtomicExpand::insertRMWLLSCLoop` to bundle the results of `emitLoadLinked` and `emitStoreConditional`. However, this would also need some sort of hook from AtomicExpandPass to use in order to do this in a target dependent way.

Harbormaster completed remote builds in B86810: Diff 319483.Jan 27 2021, 1:08 AM

mtrofin added inline comments.Jan 27 2021, 8:40 AM

llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
791 ↗	(On Diff #319483)	Nit: "Is" prefix suggests returning a boolean. How about GetExclusiveLoadOpcode?
llvm/lib/Target/ARM/ARMRegisterInfo.cpp
25 ↗	(On Diff #319483)	how similar are the 2 implementations - the one in AArch64RegisterInfo and here? Can there be some factoring to improve maintainability?
26 ↗	(On Diff #319483)	same comment about name as in AArch64RegisterInfo.cpp

tmatheson mentioned this in D96765: [ARM][AtomicExpand] Bundle exclusive loads and stores created by AtomicExpandPass.Feb 16 2021, 2:37 AM

This is quite a substantial change in approach, please take a look.

[ARM][AtomicExpand] Bundle exclusive loads and stores created by AtomicExpandPass

AtomicExpandPass expands atomicrmw instructions to loop structures. On
ARM/AArch64, these make use of exclusive load/store instructions. Any additional
store that occurs between these instructions will invalidate the exclusive
access monitor, and potentially cause an infinite loop. Therefore the register
allocator must be prevented from inserting spills between these two points.

The approach taken here is to create a bundle containing all the instructions
between the exclusive load and store. This prevents the register allocator from
inserting spills.

This exposed an issue with RegAllocFast, wherein a virtual register defined
inside the bundle might be assigned the same physical register as a virtual
register with a use that occurs after the def. For example:

%0 = something global
BUNDLE implicit-def %1, implicit %0 {
  %1 = MOVi 123
  store %0, ...
}

In the above example was possible to allocate the same physical register to both
%0 and %1. RegAllocFast has been updated to avoid this. RegAllocGreedy does not
have a similar problem, since it uses liveness analysis.

Finally, UnpackMachineBundles is added after register allocation for ARM/AArch64
to remove the bundles.

Herald added subscribers: nikic, kerbowa, mgorny and 2 others. · View Herald TranscriptFeb 16 2021, 2:41 AM

tmatheson marked 3 inline comments as done.Feb 16 2021, 2:43 AM

tmatheson added inline comments.

llvm/lib/Target/ARM/ARMRegisterInfo.cpp
25 ↗	(On Diff #319483)	I have moved the common code into a common base class in the updated version.

Harbormaster completed remote builds in B89347: Diff 323935.Feb 16 2021, 3:27 AM

Ping

Ping. @arsenm, any comments on new approach using bundles, and the regalloc changes?

Some comments below, in addition to these questions

Have you tested this with both the new and the legacy pass managers?

I think the most important question to be answered about the approach is whether the backends use bundles anywhere else - if it does, this is probably too brittle and pseudo-instructions is a better approach, even though it adds duplication of loop insertion.

llvm/include/llvm/CodeGen/AtomicLoopBundler.h
43	Can you document the use of Derived::IsLoadInstr and Derived::IsStoreInstr? It's not clear from a quick scan of the class that they are required to use the pass.
58–59	Don't you want to find the store after the Load? So maybe start at `LdIter`?
llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
613	Does the backend use bundles anywhere else, as we would need to make sure we're not unpacking other bundles at this point by mistake.
llvm/lib/Target/ARM/ARMAtomicLoopBundler.cpp
19	Here you're looking for instructions as introduced by atomic loop expansion - so the predicate should match that (I think). Alternatively, you could just use `MI.mayLoad()` if you're looking for any kind of Load?

Address review comments and clang-tidy
Rename isLoadInstr/isStoreInstr to isExclusiveLoad/isExclusiveStore

@lenary sorry for the delay in responding to comments.

I think the most important question to be answered about the approach is whether the backends use bundles anywhere else - if it does, this is probably too brittle and pseudo-instructions is a better approach, even though it adds duplication of loop insertion.

This is only applied to the AArch64 and ARM backends. The other uses of bundles in these backends (or in common passes used by these backends) that I'm aware of are:

MVEVPTBlockPass
Thumb2ITBlockPass
ARMExpandPseudoInsts
AArch64ExpandPseudoInsts

These are all added after register allocation, so unbundling before register allocation should not affect them.

There are other uses of bundles to prevent insertion of instructions (D79792) and to prevent reordering (D91048).

Harbormaster completed remote builds in B94225: Diff 331240.Mar 17 2021, 6:06 AM

foad added a subscriber: foad.Mar 17 2021, 6:25 AM

foad added inline comments.

llvm/lib/CodeGen/RegAllocFast.cpp
1098	Naive question: shouldn't whatever added the operands to the BUNDLE have set the IsEarlyClobber flag appropriately, so you don't need to special-case bundles here?

arsenm added inline comments.Mar 17 2021, 6:34 AM

llvm/include/llvm/CodeGen/AtomicLoopBundler.h
35	I think running selection, and then trying to interpret the instructions to figure out what it did is fraught with peril. You would be better off expanding a pseudoinstruction after register allocation
43	You absolutely cannot rely on machine basic block names
llvm/lib/CodeGen/RegAllocFast.cpp
1093	The order of instructions in the bundle doesn't actually matter. It's not accurate to say a def happens before or after a use inside the bundle
1098	Yes

tmatheson added inline comments.Mar 17 2021, 6:40 AM

llvm/lib/CodeGen/RegAllocFast.cpp
1098	I looked into that, specifically marking any defs which are followed by any use inside the bundle as early-clobber. For example, you might have a bundle that defines %1 and then uses %2. The idea being that RegAllocFast sees only the bundle instruction, and within the bundle these defs/uses act like early-clobbers in that the def must have it's own separate register. This works well for RegAllocFast, but RegAllocGreedy actually looks at the live ranges, which do not see bundles. At some point it would hit an assertion failure because it would see an early-clobber register (on the bundle instruction) who's live range didn't start at an early-clobber slot (because it was copied from the instruction inside the bundle and started at the `r` slot). Trying to avoid this problem seemed like it would require breaking the live ranges semantics and didn't seem like a good path to go down.

Thank you everyone for the comments, they have been very useful. From the discussions here and internally it seems that neither of these approaches (new target hook for spill location, or using bundles) is the right way forward, and I should look into the pseudo instructions. Since this review has grown quite large I will abandon it and open a new review when I have something working with pseudo expansion.

Herald added a subscriber: mstorsjo. · View Herald TranscriptMar 23 2021, 4:41 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

AtomicLoopBundler.h

110 lines

lib/

CodeGen/

RegAllocFast.cpp

31 lines

Target/

AArch64/

AArch64.h

1 line

AArch64AtomicLoopBundler.h

38 lines

AArch64AtomicLoopBundler.cpp

66 lines

AArch64TargetMachine.cpp

19 lines

CMakeLists.txt

1 line

ARM/

ARM.h

1 line

ARMAtomicLoopBundler.h

34 lines

ARMAtomicLoopBundler.cpp

60 lines

ARMTargetMachine.cpp

19 lines

CMakeLists.txt

1 line

test/

CodeGen/

AArch64/

O0-pipeline.ll

2 lines

atomicrmw_exclusive_monitor.ll

471 lines

AMDGPU/

fast-regalloc-bundles.mir

6 lines

ARM/

atomicrmw_exclusive_monitor.ll

381 lines

Diff 331240

llvm/include/llvm/CodeGen/AtomicLoopBundler.h

This file was added.

				//===--- AtomicLoopBundler.h ------------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file This pass bundles the basic block created by AtomicExpand so that the
				/// Fast Register Allocator cannot insert spills between the exclusive load and
				/// stores, which clears the exclusive monitor (and causes an infinite loop).
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CODEGEN_ATOMICLOOPBUNDLER_H
				#define LLVM_CODEGEN_ATOMICLOOPBUNDLER_H

				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstrBundle.h"

				#define DEBUG_TYPE "atomic-loop-bundler"

				namespace llvm {

				/// Bundle instructions between exclusive loads and stores that were inserted by
				/// the Atomic Exand Pass.
				/// \param Derived must specify provide the following two predicate functions,
				/// which indicate when a machine instruction is a relevant load or store:
				/// static bool isExclusiveLoad(const MachineInstr &MI);
				/// static bool isExclusiveStore(const MachineInstr &MI);
				/// A bundle will be inserted in appropriate blocks between the first identified
				/// exclusive load and the next occurring exclusive store.
				template <typename Derived>
				class AtomicLoopBundler : public MachineFunctionPass {
				private:
				bool bundleBlock(MachineBasicBlock &MBB) {
				arsenmUnsubmitted Not Done Reply Inline Actions I think running selection, and then trying to interpret the instructions to figure out what it did is fraught with peril. You would be better off expanding a pseudoinstruction after register allocation arsenm: I think running selection, and then trying to interpret the instructions to figure out what it…
				// One of the basic blocks inserted by AtomicExpandPass looks like this:
				// atomicrmw.start:
				// %loaded = @load.linked(%addr)
				// %new = some_op iN %loaded, %incr
				// %stored = @store_conditional(%new, %addr)
				// %try_again = icmp i32 ne %stored, 0
				// br i1 %try_again, label %loop, label %atomicrmw.end
				if (!MBB.getName().contains("atomicrmw.start"))
				lenaryUnsubmitted Done Reply Inline Actions Can you document the use of Derived::IsLoadInstr and Derived::IsStoreInstr? It's not clear from a quick scan of the class that they are required to use the pass. lenary: Can you document the use of Derived::IsLoadInstr and Derived::IsStoreInstr? It's not clear from…
				arsenmUnsubmitted Not Done Reply Inline Actions You absolutely cannot rely on machine basic block names arsenm: You absolutely cannot rely on machine basic block names
				return false;

				// Search for the exclusive load
				MachineBasicBlock::instr_iterator LdIter = std::find_if(
				MBB.instr_begin(), MBB.instr_end(), Derived::isExclusiveLoad);

				// cmpxchg is expanded into a pseudo instruction CMP_SWAP_*. It can also be
				// inserted by atomic loop expansion for floating point types. If we have a
				// cmpxchg we won't see an exclusive load here, and don't need to do
				// anything.
				// FIXME: We could handle cmpxchg with bundles as well (remove the pseudos).
				if (LdIter == MBB.end())
				return false;

				// Check we haven't already bundled
				if (LdIter->isBundled())
				lenaryUnsubmitted Done Reply Inline Actions Don't you want to find the store after the Load? So maybe start at `LdIter`? lenary: Don't you want to find the store after the Load? So maybe start at `LdIter`?
				return false;

				// Search for the exclusive store
				MachineBasicBlock::instr_iterator StrIter =
				std::find_if(LdIter, MBB.instr_end(), Derived::isExclusiveStore);

				assert(StrIter != MBB.end() &&
				"Failed to find exclusive store in atomicrmw.start block");
				if (StrIter == MBB.end())
				return false;

				// Create a finalized bundle ready for register allocation.
				finalizeBundle(MBB, LdIter, std::next(StrIter));

				// Print some info
				LLVM_DEBUG(dbgs() << "Created bundle with "
				<< std::distance(LdIter, StrIter)
				<< " instructions between\n"
				" "
				<< *LdIter << "\n"
				<< " and " << *StrIter << ".\n";);
				return true;
				}

				public:
				static char ID;

				// Base class must initialize LdOpcodes and StrOpcodes
				AtomicLoopBundler(char &ID) : MachineFunctionPass(ID) {}

				bool runOnMachineFunction(MachineFunction &MF) override {
				// If the ISel pipeline failed, do not bother running this pass.
				if (MF.getProperties().hasProperty(
				MachineFunctionProperties::Property::FailedISel))
				return false;

				LLVM_DEBUG(dbgs() << "Bundle Atomic Loops for: " << MF.getName() << '\n');

				bool Changed = false;
				for (MachineFunction::iterator I = MF.begin(); I != MF.end(); ++I) {
				Changed \|= bundleBlock(*I);
				}
				return Changed;
				}
				};

				} // End namespace llvm.

				#undef DEBUG_TYPE

				#endif

llvm/lib/CodeGen/RegAllocFast.cpp

Show First 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	int RegAllocFast::getStackSpaceFor(Register VirtReg) {
Align Alignment = TRI->getSpillAlign(RC);		Align Alignment = TRI->getSpillAlign(RC);
int FrameIdx = MFI->CreateSpillStackObject(Size, Alignment);		int FrameIdx = MFI->CreateSpillStackObject(Size, Alignment);

// Assign the slot.		// Assign the slot.
StackSlotForVirtReg[VirtReg] = FrameIdx;		StackSlotForVirtReg[VirtReg] = FrameIdx;
return FrameIdx;		return FrameIdx;
}		}

static bool dominates(MachineBasicBlock &MBB,		static bool dominates(const MachineBasicBlock &MBB,
MachineBasicBlock::const_iterator A,		MachineBasicBlock::const_instr_iterator A,
MachineBasicBlock::const_iterator B) {		MachineBasicBlock::const_instr_iterator B) {
auto MBBEnd = MBB.end();		MachineBasicBlock::const_instr_iterator MBBEnd = MBB.instr_end();
if (B == MBBEnd)		if (B == MBBEnd)
return true;		return true;

MachineBasicBlock::const_iterator I = MBB.begin();		MachineBasicBlock::const_instr_iterator I = MBB.instr_begin();
for (; &I != A && &I != B; ++I)		for (; I != A && I != B; ++I)
;		;

return &*I == A;		return I == A;
}		}

/// Returns false if \p VirtReg is known to not live out of the current block.		/// Returns false if \p VirtReg is known to not live out of the current block.
bool RegAllocFast::mayLiveOut(Register VirtReg) {		bool RegAllocFast::mayLiveOut(Register VirtReg) {
if (MayLiveAcrossBlocks.test(Register::virtReg2Index(VirtReg))) {		if (MayLiveAcrossBlocks.test(Register::virtReg2Index(VirtReg))) {
// Cannot be live-out if there are no successors.		// Cannot be live-out if there are no successors.
return !MBB->succ_empty();		return !MBB->succ_empty();
}		}
▲ Show 20 Lines • Show All 750 Lines • ▼ Show 20 Lines	void RegAllocFast::allocateInstruction(MachineInstr &MI) {
// operands are processed to avoid the allocation heuristics clashing with		// operands are processed to avoid the allocation heuristics clashing with
// the pre-assignment.		// the pre-assignment.
// - The "free def operands" step has to come last instead of first for tied		// - The "free def operands" step has to come last instead of first for tied
// operands and early-clobbers.		// operands and early-clobbers.

UsedInInstr.clear();		UsedInInstr.clear();
BundleVirtRegsMap.clear();		BundleVirtRegsMap.clear();

		// If a bundle contains a virtual register def followed by a use of another,
		arsenmUnsubmitted Not Done Reply Inline Actions The order of instructions in the bundle doesn't actually matter. It's not accurate to say a def happens before or after a use inside the bundle arsenm: The order of instructions in the bundle doesn't actually matter. It's not accurate to say a def…
		// they must not be allocated the same physical register. The alternative
		// case, i.e. when the def is not followed by any use within the bundle, is
		// probably uncommon enough to ignore for now. Hence we treat any def in a
		// bundle like an early-clobber.
		const bool IsBundle = MI.getOpcode() == TargetOpcode::BUNDLE;
		foadUnsubmitted Not Done Reply Inline Actions Naive question: shouldn't whatever added the operands to the BUNDLE have set the IsEarlyClobber flag appropriately, so you don't need to special-case bundles here? foad: Naive question: shouldn't whatever added the operands to the BUNDLE have set the IsEarlyClobber…
		tmathesonAuthorUnsubmitted Not Done Reply Inline Actions I looked into that, specifically marking any defs which are followed by any use inside the bundle as early-clobber. For example, you might have a bundle that defines %1 and then uses %2. The idea being that RegAllocFast sees only the bundle instruction, and within the bundle these defs/uses act like early-clobbers in that the def must have it's own separate register. This works well for RegAllocFast, but RegAllocGreedy actually looks at the live ranges, which do not see bundles. At some point it would hit an assertion failure because it would see an early-clobber register (on the bundle instruction) who's live range didn't start at an early-clobber slot (because it was copied from the instruction inside the bundle and started at the `r` slot). Trying to avoid this problem seemed like it would require breaking the live ranges semantics and didn't seem like a good path to go down. tmatheson: I looked into that, specifically marking any defs which are followed by any use inside the…
		arsenmUnsubmitted Not Done Reply Inline Actions Yes arsenm: Yes

// Scan for special cases; Apply pre-assigned register defs to state.		// Scan for special cases; Apply pre-assigned register defs to state.
bool HasPhysRegUse = false;		bool HasPhysRegUse = false;
bool HasRegMask = false;		bool HasRegMask = false;
bool HasVRegDef = false;		bool HasVRegDef = false;
bool HasDef = false;		bool HasDef = false;
bool HasEarlyClobber = false;		bool HasEarlyClobber = false;
bool NeedToAssignLiveThroughs = false;		bool NeedToAssignLiveThroughs = false;
for (MachineOperand &MO : MI.operands()) {		for (MachineOperand &MO : MI.operands()) {
if (MO.isReg()) {		if (MO.isReg()) {
Register Reg = MO.getReg();		Register Reg = MO.getReg();
if (Reg.isVirtual()) {		if (Reg.isVirtual()) {
if (MO.isDef()) {		if (MO.isDef()) {
HasDef = true;		HasDef = true;
HasVRegDef = true;		HasVRegDef = true;
if (MO.isEarlyClobber()) {		if (MO.isEarlyClobber()) {
HasEarlyClobber = true;		HasEarlyClobber = true;
NeedToAssignLiveThroughs = true;		NeedToAssignLiveThroughs = true;
}		}
if (MO.isTied() \|\| (MO.getSubReg() != 0 && !MO.isUndef()))		if (MO.isTied() \|\| (MO.getSubReg() != 0 && !MO.isUndef()))
NeedToAssignLiveThroughs = true;		NeedToAssignLiveThroughs = true;
		if (IsBundle)
		NeedToAssignLiveThroughs = true;
}		}
} else if (Reg.isPhysical()) {		} else if (Reg.isPhysical()) {
if (!MRI->isReserved(Reg)) {		if (!MRI->isReserved(Reg)) {
if (MO.isDef()) {		if (MO.isDef()) {
HasDef = true;		HasDef = true;
bool displacedAny = definePhysReg(MI, Reg);		bool displacedAny = definePhysReg(MI, Reg);
if (MO.isEarlyClobber())		if (MO.isEarlyClobber())
HasEarlyClobber = true;		HasEarlyClobber = true;
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	if (HasVRegDef) {
return I0 < I1;		return I0 < I1;
});		});

for (uint16_t OpIdx : DefOperandIndexes) {		for (uint16_t OpIdx : DefOperandIndexes) {
MachineOperand &MO = MI.getOperand(OpIdx);		MachineOperand &MO = MI.getOperand(OpIdx);
LLVM_DEBUG(dbgs() << "Allocating " << MO << '\n');		LLVM_DEBUG(dbgs() << "Allocating " << MO << '\n');
unsigned Reg = MO.getReg();		unsigned Reg = MO.getReg();
if (MO.isEarlyClobber() \|\| MO.isTied() \|\|		if (MO.isEarlyClobber() \|\| MO.isTied() \|\|
(MO.getSubReg() && !MO.isUndef())) {		(MO.getSubReg() && !MO.isUndef()) \|\| IsBundle) {
defineLiveThroughVirtReg(MI, OpIdx, Reg);		defineLiveThroughVirtReg(MI, OpIdx, Reg);
} else {		} else {
defineVirtReg(MI, OpIdx, Reg);		defineVirtReg(MI, OpIdx, Reg);
}		}
}		}
} else {		} else {
// Assign virtual register defs.		// Assign virtual register defs.
for (unsigned I = 0, E = MI.getNumOperands(); I < E; ++I) {		for (unsigned I = 0, E = MI.getNumOperands(); I < E; ++I) {
Show All 18 Lines	for (unsigned I = MI.getNumOperands(); I-- > 0;) {
// subreg defs don't free the full register. We left the subreg number		// subreg defs don't free the full register. We left the subreg number
// around as a marker in setPhysReg() to recognize this case here.		// around as a marker in setPhysReg() to recognize this case here.
if (MO.getSubReg() != 0) {		if (MO.getSubReg() != 0) {
MO.setSubReg(0);		MO.setSubReg(0);
continue;		continue;
}		}

// Do not free tied operands and early clobbers.		// Do not free tied operands and early clobbers.
if (MO.isTied() \|\| MO.isEarlyClobber())		if (MO.isTied() \|\| MO.isEarlyClobber() \|\| IsBundle)
continue;		continue;
Register Reg = MO.getReg();		Register Reg = MO.getReg();
if (!Reg)		if (!Reg)
continue;		continue;
assert(Reg.isPhysical());		assert(Reg.isPhysical());
if (MRI->isReserved(Reg))		if (MRI->isReserved(Reg))
continue;		continue;
freePhysReg(Reg);		freePhysReg(Reg);
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	for (MachineOperand &MO : MI.uses()) {
continue;		continue;

assert(MO.isUndef() && "Should only have undef virtreg uses left");		assert(MO.isUndef() && "Should only have undef virtreg uses left");
allocVirtRegUndef(MO);		allocVirtRegUndef(MO);
}		}
}		}

// Free early clobbers.		// Free early clobbers.
if (HasEarlyClobber) {		if (HasEarlyClobber \|\| IsBundle) {
for (unsigned I = MI.getNumOperands(); I-- > 0; ) {		for (unsigned I = MI.getNumOperands(); I-- > 0; ) {
MachineOperand &MO = MI.getOperand(I);		MachineOperand &MO = MI.getOperand(I);
if (!MO.isReg() \|\| !MO.isDef() \|\| !MO.isEarlyClobber())		if (!MO.isReg() \|\| !MO.isDef() \|\| !(MO.isEarlyClobber() \|\| IsBundle))
continue;		continue;
// subreg defs don't free the full register. We left the subreg number		// subreg defs don't free the full register. We left the subreg number
// around as a marker in setPhysReg() to recognize this case here.		// around as a marker in setPhysReg() to recognize this case here.
if (MO.getSubReg() != 0) {		if (MO.getSubReg() != 0) {
MO.setSubReg(0);		MO.setSubReg(0);
continue;		continue;
}		}

▲ Show 20 Lines • Show All 195 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64.h

	Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	FunctionPass *createAArch64PostLegalizerLowering();			FunctionPass *createAArch64PostLegalizerLowering();
	FunctionPass *createAArch64PostSelectOptimize();			FunctionPass *createAArch64PostSelectOptimize();
	FunctionPass *createAArch64StackTaggingPass(bool IsOptNone);			FunctionPass *createAArch64StackTaggingPass(bool IsOptNone);
	FunctionPass *createAArch64StackTaggingPreRAPass();			FunctionPass *createAArch64StackTaggingPreRAPass();

	void initializeAArch64A53Fix835769Pass(PassRegistry&);			void initializeAArch64A53Fix835769Pass(PassRegistry&);
	void initializeAArch64A57FPLoadBalancingPass(PassRegistry&);			void initializeAArch64A57FPLoadBalancingPass(PassRegistry&);
	void initializeAArch64AdvSIMDScalarPass(PassRegistry&);			void initializeAArch64AdvSIMDScalarPass(PassRegistry&);
				void initializeAArch64AtomicLoopBundlerPass(PassRegistry &);
	void initializeAArch64BranchTargetsPass(PassRegistry&);			void initializeAArch64BranchTargetsPass(PassRegistry&);
	void initializeAArch64CollectLOHPass(PassRegistry&);			void initializeAArch64CollectLOHPass(PassRegistry&);
	void initializeAArch64CondBrTuningPass(PassRegistry &);			void initializeAArch64CondBrTuningPass(PassRegistry &);
	void initializeAArch64CompressJumpTablesPass(PassRegistry&);			void initializeAArch64CompressJumpTablesPass(PassRegistry&);
	void initializeAArch64ConditionalComparesPass(PassRegistry&);			void initializeAArch64ConditionalComparesPass(PassRegistry&);
	void initializeAArch64ConditionOptimizerPass(PassRegistry&);			void initializeAArch64ConditionOptimizerPass(PassRegistry&);
	void initializeAArch64DeadRegisterDefinitionsPass(PassRegistry&);			void initializeAArch64DeadRegisterDefinitionsPass(PassRegistry&);
	void initializeAArch64ExpandPseudoPass(PassRegistry&);			void initializeAArch64ExpandPseudoPass(PassRegistry&);
	Show All 21 Lines

llvm/lib/Target/AArch64/AArch64AtomicLoopBundler.h

This file was added.

				//===--- AArch64AtomicLoopBundler.h ------------------------------ C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file Implements AtomicLoopBundler for AArch64.
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_AARCH64_AARCH64ATOMICLOOPBUNDLER_H
				#define LLVM_LIB_TARGET_AARCH64_AARCH64ATOMICLOOPBUNDLER_H

				#include "llvm/CodeGen/AtomicLoopBundler.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"

				namespace llvm {

				class AArch64AtomicLoopBundler
				: public AtomicLoopBundler<AArch64AtomicLoopBundler> {

				public:
				static char ID;

				static bool isExclusiveLoad(const MachineInstr &MI);
				static bool isExclusiveStore(const MachineInstr &MI);

				StringRef getPassName() const override {
				return "AArch64 Atomic Loop Bundler";
				}

				AArch64AtomicLoopBundler()
				: AtomicLoopBundler<AArch64AtomicLoopBundler>(ID){};
				};
				} // namespace llvm

				#endif

llvm/lib/Target/AArch64/AArch64AtomicLoopBundler.cpp

This file was added.

				//===--- AArch64AtomicLoopBundler.cpp ---------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file Implements AtomicLoopBundler for AArch64.
				//===----------------------------------------------------------------------===//

				#include "AArch64AtomicLoopBundler.h"
				#include "AArch64.h"

				using namespace llvm;

				char AArch64AtomicLoopBundler::ID = 0;

				INITIALIZE_PASS_BEGIN(
				AArch64AtomicLoopBundler, "aarch64-atomic-loop-bundler",
				"Bundle exclusive loads and stores created by atomic loop expansion", false,
				false)
				INITIALIZE_PASS_END(
				AArch64AtomicLoopBundler, "aarch64-atomic-loop-bundler",
				"Bundle exclusive loads and stores created by atomic loop expansion", false,
				false)

				bool AArch64AtomicLoopBundler::isExclusiveLoad(const MachineInstr &MI) {
				switch (MI.getOpcode()) {
				case AArch64::LDAXRW:
				case AArch64::LDAXRX:
				case AArch64::LDAXRB:
				case AArch64::LDAXRH:
				case AArch64::LDXRW:
				case AArch64::LDXRX:
				case AArch64::LDXRB:
				case AArch64::LDXRH:
				case AArch64::LDAXPW:
				case AArch64::LDAXPX:
				case AArch64::LDXPW:
				case AArch64::LDXPX:
				return true;
				default:
				return false;
				}
				}

				bool AArch64AtomicLoopBundler::isExclusiveStore(const MachineInstr &MI) {
				switch (MI.getOpcode()) {
				case AArch64::STLXRW:
				case AArch64::STLXRX:
				case AArch64::STLXRB:
				case AArch64::STLXRH:
				case AArch64::STXRW:
				case AArch64::STXRX:
				case AArch64::STXRB:
				case AArch64::STXRH:
				case AArch64::STLXPW:
				case AArch64::STLXPX:
				case AArch64::STXPW:
				case AArch64::STXPX:
				return true;
				default:
				return false;
				}
				}

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp

//===-- AArch64TargetMachine.cpp - Define TargetMachine for AArch64 -------===//		//===-- AArch64TargetMachine.cpp - Define TargetMachine for AArch64 -------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AArch64TargetMachine.h"		#include "AArch64TargetMachine.h"
#include "AArch64.h"		#include "AArch64.h"
		#include "AArch64AtomicLoopBundler.h"
#include "AArch64MachineFunctionInfo.h"		#include "AArch64MachineFunctionInfo.h"
#include "AArch64MacroFusion.h"		#include "AArch64MacroFusion.h"
#include "AArch64Subtarget.h"		#include "AArch64Subtarget.h"
#include "AArch64TargetObjectFile.h"		#include "AArch64TargetObjectFile.h"
#include "AArch64TargetTransformInfo.h"		#include "AArch64TargetTransformInfo.h"
#include "MCTargetDesc/AArch64MCTargetDesc.h"		#include "MCTargetDesc/AArch64MCTargetDesc.h"
#include "TargetInfo/AArch64TargetInfo.h"		#include "TargetInfo/AArch64TargetInfo.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAArch64Target() {
RegisterTargetMachine<AArch64leTargetMachine> Z(getTheARM64Target());		RegisterTargetMachine<AArch64leTargetMachine> Z(getTheARM64Target());
RegisterTargetMachine<AArch64leTargetMachine> W(getTheARM64_32Target());		RegisterTargetMachine<AArch64leTargetMachine> W(getTheARM64_32Target());
RegisterTargetMachine<AArch64leTargetMachine> V(getTheAArch64_32Target());		RegisterTargetMachine<AArch64leTargetMachine> V(getTheAArch64_32Target());
auto PR = PassRegistry::getPassRegistry();		auto PR = PassRegistry::getPassRegistry();
initializeGlobalISel(*PR);		initializeGlobalISel(*PR);
initializeAArch64A53Fix835769Pass(*PR);		initializeAArch64A53Fix835769Pass(*PR);
initializeAArch64A57FPLoadBalancingPass(*PR);		initializeAArch64A57FPLoadBalancingPass(*PR);
initializeAArch64AdvSIMDScalarPass(*PR);		initializeAArch64AdvSIMDScalarPass(*PR);
		initializeAArch64AtomicLoopBundlerPass(*PR);
initializeAArch64BranchTargetsPass(*PR);		initializeAArch64BranchTargetsPass(*PR);
initializeAArch64CollectLOHPass(*PR);		initializeAArch64CollectLOHPass(*PR);
initializeAArch64CompressJumpTablesPass(*PR);		initializeAArch64CompressJumpTablesPass(*PR);
initializeAArch64ConditionalComparesPass(*PR);		initializeAArch64ConditionalComparesPass(*PR);
initializeAArch64ConditionOptimizerPass(*PR);		initializeAArch64ConditionOptimizerPass(*PR);
initializeAArch64DeadRegisterDefinitionsPass(*PR);		initializeAArch64DeadRegisterDefinitionsPass(*PR);
initializeAArch64ExpandPseudoPass(*PR);		initializeAArch64ExpandPseudoPass(*PR);
initializeAArch64LoadStoreOptPass(*PR);		initializeAArch64LoadStoreOptPass(*PR);
▲ Show 20 Lines • Show All 235 Lines • ▼ Show 20 Lines	public:
bool addInstSelector() override;		bool addInstSelector() override;
bool addIRTranslator() override;		bool addIRTranslator() override;
void addPreLegalizeMachineIR() override;		void addPreLegalizeMachineIR() override;
bool addLegalizeMachineIR() override;		bool addLegalizeMachineIR() override;
void addPreRegBankSelect() override;		void addPreRegBankSelect() override;
bool addRegBankSelect() override;		bool addRegBankSelect() override;
void addPreGlobalInstructionSelect() override;		void addPreGlobalInstructionSelect() override;
bool addGlobalInstructionSelect() override;		bool addGlobalInstructionSelect() override;
		void addFastRegAlloc() override;
bool addILPOpts() override;		bool addILPOpts() override;
void addPreRegAlloc() override;		void addPreRegAlloc() override;
void addPostRegAlloc() override;		void addPostRegAlloc() override;
void addPreSched2() override;		void addPreSched2() override;
void addPreEmitPass() override;		void addPreEmitPass() override;
void addPreEmitPass2() override;		void addPreEmitPass2() override;

std::unique_ptr<CSEConfigBase> getCSEConfig() const override;		std::unique_ptr<CSEConfigBase> getCSEConfig() const override;
▲ Show 20 Lines • Show All 152 Lines • ▼ Show 20 Lines

bool AArch64PassConfig::addGlobalInstructionSelect() {		bool AArch64PassConfig::addGlobalInstructionSelect() {
addPass(new InstructionSelect(getOptLevel()));		addPass(new InstructionSelect(getOptLevel()));
if (getOptLevel() != CodeGenOpt::None)		if (getOptLevel() != CodeGenOpt::None)
addPass(createAArch64PostSelectOptimize());		addPass(createAArch64PostSelectOptimize());
return false;		return false;
}		}

		void AArch64PassConfig::addFastRegAlloc() {
		// Bundles must be finalized (register defs/uses added to the BUNDLE MI)
		// before register allocation, because the register allocator looks at only
		// top level MachineInstructions, not the contents of the bunde. However,
		// this can't be done in SSA form as it creates multiple definitions for
		// virtual registers which will fail validation. Must also be done after
		// two-address instruction expansion, which removes REG_SEQUENCE.
		insertPass(&TwoAddressInstructionPassID, &AArch64AtomicLoopBundler::ID);

		TargetPassConfig::addFastRegAlloc();

		// Remove the bundles created by AtomicLoopBundler; otherwise instructions
		// inside the bundle will not be lowered correctly.
		addPass(createUnpackMachineBundles(nullptr));
		lenaryUnsubmitted Not Done Reply Inline Actions Does the backend use bundles anywhere else, as we would need to make sure we're not unpacking other bundles at this point by mistake. lenary: Does the backend use bundles anywhere else, as we would need to make sure we're not unpacking…
		}

bool AArch64PassConfig::addILPOpts() {		bool AArch64PassConfig::addILPOpts() {
if (EnableCondOpt)		if (EnableCondOpt)
addPass(createAArch64ConditionOptimizerPass());		addPass(createAArch64ConditionOptimizerPass());
if (EnableCCMP)		if (EnableCCMP)
addPass(createAArch64ConditionalCompares());		addPass(createAArch64ConditionalCompares());
if (EnableMCR)		if (EnableMCR)
addPass(&MachineCombinerID);		addPass(&MachineCombinerID);
if (EnableCondBrTuning)		if (EnableCondBrTuning)
▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/CMakeLists.txt

Show All 34 Lines	add_llvm_target(AArch64CodeGen
GISel/AArch64PreLegalizerCombiner.cpp		GISel/AArch64PreLegalizerCombiner.cpp
GISel/AArch64PostLegalizerCombiner.cpp		GISel/AArch64PostLegalizerCombiner.cpp
GISel/AArch64PostLegalizerLowering.cpp		GISel/AArch64PostLegalizerLowering.cpp
GISel/AArch64PostSelectOptimize.cpp		GISel/AArch64PostSelectOptimize.cpp
GISel/AArch64RegisterBankInfo.cpp		GISel/AArch64RegisterBankInfo.cpp
AArch64A57FPLoadBalancing.cpp		AArch64A57FPLoadBalancing.cpp
AArch64AdvSIMDScalarPass.cpp		AArch64AdvSIMDScalarPass.cpp
AArch64AsmPrinter.cpp		AArch64AsmPrinter.cpp
		AArch64AtomicLoopBundler.cpp
AArch64BranchTargets.cpp		AArch64BranchTargets.cpp
AArch64CallingConvention.cpp		AArch64CallingConvention.cpp
AArch64CleanupLocalDynamicTLSPass.cpp		AArch64CleanupLocalDynamicTLSPass.cpp
AArch64CollectLOH.cpp		AArch64CollectLOH.cpp
AArch64CondBrTuning.cpp		AArch64CondBrTuning.cpp
AArch64ConditionalCompares.cpp		AArch64ConditionalCompares.cpp
AArch64DeadRegisterDefinitionsPass.cpp		AArch64DeadRegisterDefinitionsPass.cpp
AArch64ExpandImm.cpp		AArch64ExpandImm.cpp
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARM.h

Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	createARMInstructionSelector(const ARMBaseTargetMachine &TM, const ARMSubtarget &STI,
const ARMRegisterBankInfo &RBI);		const ARMRegisterBankInfo &RBI);
Pass *createMVEGatherScatterLoweringPass();		Pass *createMVEGatherScatterLoweringPass();
FunctionPass *createARMSLSHardeningPass();		FunctionPass *createARMSLSHardeningPass();
FunctionPass *createARMIndirectThunks();		FunctionPass *createARMIndirectThunks();

void LowerARMMachineInstrToMCInst(const MachineInstr *MI, MCInst &OutMI,		void LowerARMMachineInstrToMCInst(const MachineInstr *MI, MCInst &OutMI,
ARMAsmPrinter &AP);		ARMAsmPrinter &AP);

		void initializeARMAtomicLoopBundlerPass(PassRegistry &);
void initializeARMParallelDSPPass(PassRegistry &);		void initializeARMParallelDSPPass(PassRegistry &);
void initializeARMLoadStoreOptPass(PassRegistry &);		void initializeARMLoadStoreOptPass(PassRegistry &);
void initializeARMPreAllocLoadStoreOptPass(PassRegistry &);		void initializeARMPreAllocLoadStoreOptPass(PassRegistry &);
void initializeARMConstantIslandsPass(PassRegistry &);		void initializeARMConstantIslandsPass(PassRegistry &);
void initializeARMExpandPseudoPass(PassRegistry &);		void initializeARMExpandPseudoPass(PassRegistry &);
void initializeThumb2SizeReducePass(PassRegistry &);		void initializeThumb2SizeReducePass(PassRegistry &);
void initializeThumb2ITBlockPass(PassRegistry &);		void initializeThumb2ITBlockPass(PassRegistry &);
void initializeMVEVPTBlockPass(PassRegistry &);		void initializeMVEVPTBlockPass(PassRegistry &);
Show All 10 Lines

llvm/lib/Target/ARM/ARMAtomicLoopBundler.h

This file was added.

				//===--- ARMAtomicLoopBundler.h ---------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file Implements AtomicLoopBundler for ARM.
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_ARM_ARMATOMICLOOPBUNDLER_H
				#define LLVM_LIB_TARGET_ARM_ARMATOMICLOOPBUNDLER_H

				#include "llvm/CodeGen/AtomicLoopBundler.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"

				namespace llvm {

				class ARMAtomicLoopBundler : public AtomicLoopBundler<ARMAtomicLoopBundler> {

				public:
				static char ID;

				static bool isExclusiveLoad(const MachineInstr &MI);
				static bool isExclusiveStore(const MachineInstr &MI);

				StringRef getPassName() const override { return "ARM Atomic Loop Bundler"; }

				ARMAtomicLoopBundler() : AtomicLoopBundler<ARMAtomicLoopBundler>(ID){};
				};
				} // namespace llvm

				#endif

llvm/lib/Target/ARM/ARMAtomicLoopBundler.cpp

This file was added.

				//===--- ARMAtomicLoopBundler.cpp -------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file Implements AtomicLoopBundler for ARM.
				//===----------------------------------------------------------------------===//

				#include "ARMAtomicLoopBundler.h"
				#include "ARM.h"
				#include "ARMBaseInstrInfo.h"

				using namespace llvm;

				char ARMAtomicLoopBundler::ID = 0;

				lenaryUnsubmitted Done Reply Inline Actions Here you're looking for instructions as introduced by atomic loop expansion - so the predicate should match that (I think). Alternatively, you could just use `MI.mayLoad()` if you're looking for any kind of Load? lenary: Here you're looking for instructions as introduced by atomic loop expansion - so the predicate…
				INITIALIZE_PASS_BEGIN(
				ARMAtomicLoopBundler, "arm-atomic-loop-bundler",
				"Bundle exclusive loads and stores created by atomic loop expansion", false,
				false)
				INITIALIZE_PASS_END(
				ARMAtomicLoopBundler, "arm-atomic-loop-bundler",
				"Bundle exclusive loads and stores created by atomic loop expansion", false,
				false)

				// TODO add LoadAquire/StoreRelease instructions?
				bool ARMAtomicLoopBundler::isExclusiveLoad(const MachineInstr &MI) {
				switch (MI.getOpcode()) {
				case ARM::t2LDREX:
				case ARM::t2LDREXB:
				case ARM::t2LDREXD:
				case ARM::t2LDREXH:
				case ARM::LDREX:
				case ARM::LDREXB:
				case ARM::LDREXD:
				case ARM::LDREXH:
				return true;
				default:
				return false;
				}
				}

				bool ARMAtomicLoopBundler::isExclusiveStore(const MachineInstr &MI) {
				switch (MI.getOpcode()) {
				case ARM::t2STREX:
				case ARM::t2STREXB:
				case ARM::t2STREXD:
				case ARM::t2STREXH:
				case ARM::STREX:
				case ARM::STREXB:
				case ARM::STREXD:
				case ARM::STREXH:
				return true;
				default:
				return false;
				}
				}

llvm/lib/Target/ARM/ARMTargetMachine.cpp

//===-- ARMTargetMachine.cpp - Define TargetMachine for ARM ---------------===//		//===-- ARMTargetMachine.cpp - Define TargetMachine for ARM ---------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "ARMTargetMachine.h"		#include "ARMTargetMachine.h"
#include "ARM.h"		#include "ARM.h"
		#include "ARMAtomicLoopBundler.h"
#include "ARMMacroFusion.h"		#include "ARMMacroFusion.h"
#include "ARMSubtarget.h"		#include "ARMSubtarget.h"
#include "ARMTargetObjectFile.h"		#include "ARMTargetObjectFile.h"
#include "ARMTargetTransformInfo.h"		#include "ARMTargetTransformInfo.h"
#include "MCTargetDesc/ARMMCTargetDesc.h"		#include "MCTargetDesc/ARMMCTargetDesc.h"
#include "TargetInfo/ARMTargetInfo.h"		#include "TargetInfo/ARMTargetInfo.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeARMTarget() {
// Register the target.		// Register the target.
RegisterTargetMachine<ARMLETargetMachine> X(getTheARMLETarget());		RegisterTargetMachine<ARMLETargetMachine> X(getTheARMLETarget());
RegisterTargetMachine<ARMLETargetMachine> A(getTheThumbLETarget());		RegisterTargetMachine<ARMLETargetMachine> A(getTheThumbLETarget());
RegisterTargetMachine<ARMBETargetMachine> Y(getTheARMBETarget());		RegisterTargetMachine<ARMBETargetMachine> Y(getTheARMBETarget());
RegisterTargetMachine<ARMBETargetMachine> B(getTheThumbBETarget());		RegisterTargetMachine<ARMBETargetMachine> B(getTheThumbBETarget());

PassRegistry &Registry = *PassRegistry::getPassRegistry();		PassRegistry &Registry = *PassRegistry::getPassRegistry();
initializeGlobalISel(Registry);		initializeGlobalISel(Registry);
		initializeARMAtomicLoopBundlerPass(Registry);
initializeARMLoadStoreOptPass(Registry);		initializeARMLoadStoreOptPass(Registry);
initializeARMPreAllocLoadStoreOptPass(Registry);		initializeARMPreAllocLoadStoreOptPass(Registry);
initializeARMParallelDSPPass(Registry);		initializeARMParallelDSPPass(Registry);
initializeARMConstantIslandsPass(Registry);		initializeARMConstantIslandsPass(Registry);
initializeARMExecutionDomainFixPass(Registry);		initializeARMExecutionDomainFixPass(Registry);
initializeARMExpandPseudoPass(Registry);		initializeARMExpandPseudoPass(Registry);
initializeThumb2SizeReducePass(Registry);		initializeThumb2SizeReducePass(Registry);
initializeMVEVPTBlockPass(Registry);		initializeMVEVPTBlockPass(Registry);
▲ Show 20 Lines • Show All 260 Lines • ▼ Show 20 Lines	public:
void addIRPasses() override;		void addIRPasses() override;
void addCodeGenPrepare() override;		void addCodeGenPrepare() override;
bool addPreISel() override;		bool addPreISel() override;
bool addInstSelector() override;		bool addInstSelector() override;
bool addIRTranslator() override;		bool addIRTranslator() override;
bool addLegalizeMachineIR() override;		bool addLegalizeMachineIR() override;
bool addRegBankSelect() override;		bool addRegBankSelect() override;
bool addGlobalInstructionSelect() override;		bool addGlobalInstructionSelect() override;
		void addFastRegAlloc() override;
void addPreRegAlloc() override;		void addPreRegAlloc() override;
void addPreSched2() override;		void addPreSched2() override;
void addPreEmitPass() override;		void addPreEmitPass() override;
void addPreEmitPass2() override;		void addPreEmitPass2() override;

std::unique_ptr<CSEConfigBase> getCSEConfig() const override;		std::unique_ptr<CSEConfigBase> getCSEConfig() const override;
};		};

▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	bool ARMPassConfig::addRegBankSelect() {
return false;		return false;
}		}

bool ARMPassConfig::addGlobalInstructionSelect() {		bool ARMPassConfig::addGlobalInstructionSelect() {
addPass(new InstructionSelect(getOptLevel()));		addPass(new InstructionSelect(getOptLevel()));
return false;		return false;
}		}

		void ARMPassConfig::addFastRegAlloc() {
		// Bundles must be finalized (register defs/uses added to the BUNDLE MI)
		// before register allocation, because the register allocator looks at only
		// top level MachineInstructions, not the contents of the bunde. However,
		// this can't be done in SSA form as it creates multiple definitions for
		// virtual registers which will fail validation. Must also be done after
		// two-address instruction expansion, which removes REG_SEQUENCE.
		insertPass(&TwoAddressInstructionPassID, &ARMAtomicLoopBundler::ID);

		TargetPassConfig::addFastRegAlloc();

		// Remove the bundles created by AtomicLoopBundler; otherwise instructions
		// inside the bundle will not be lowered correctly.
		addPass(createUnpackMachineBundles(nullptr));
		}

void ARMPassConfig::addPreRegAlloc() {		void ARMPassConfig::addPreRegAlloc() {
if (getOptLevel() != CodeGenOpt::None) {		if (getOptLevel() != CodeGenOpt::None) {
addPass(createMVETPAndVPTOptimisationsPass());		addPass(createMVETPAndVPTOptimisationsPass());

addPass(createMLxExpansionPass());		addPass(createMLxExpansionPass());

if (EnableARMLoadStoreOpt)		if (EnableARMLoadStoreOpt)
addPass(createARMLoadStoreOptimizationPass(/* pre-register alloc */ true));		addPass(createARMLoadStoreOptimizationPass(/* pre-register alloc */ true));
▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/CMakeLists.txt

	Show All 16 Lines
	tablegen(LLVM ARMGenSubtargetInfo.inc -gen-subtarget)			tablegen(LLVM ARMGenSubtargetInfo.inc -gen-subtarget)
	tablegen(LLVM ARMGenSystemRegister.inc -gen-searchable-tables)			tablegen(LLVM ARMGenSystemRegister.inc -gen-searchable-tables)

	add_public_tablegen_target(ARMCommonTableGen)			add_public_tablegen_target(ARMCommonTableGen)

	add_llvm_target(ARMCodeGen			add_llvm_target(ARMCodeGen
	A15SDOptimizer.cpp			A15SDOptimizer.cpp
	ARMAsmPrinter.cpp			ARMAsmPrinter.cpp
				ARMAtomicLoopBundler.cpp
	ARMBaseInstrInfo.cpp			ARMBaseInstrInfo.cpp
	ARMBaseRegisterInfo.cpp			ARMBaseRegisterInfo.cpp
	ARMBasicBlockInfo.cpp			ARMBasicBlockInfo.cpp
	ARMCallingConv.cpp			ARMCallingConv.cpp
	ARMCallLowering.cpp			ARMCallLowering.cpp
	ARMConstantIslandPass.cpp			ARMConstantIslandPass.cpp
	ARMConstantPoolValue.cpp			ARMConstantPoolValue.cpp
	ARMExpandPseudoInsts.cpp			ARMExpandPseudoInsts.cpp
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/O0-pipeline.ll

	Show All 40 Lines
	; CHECK-NEXT: Localizer			; CHECK-NEXT: Localizer
	; CHECK-NEXT: InstructionSelect			; CHECK-NEXT: InstructionSelect
	; CHECK-NEXT: ResetMachineFunction			; CHECK-NEXT: ResetMachineFunction
	; CHECK-NEXT: AArch64 Instruction Selection			; CHECK-NEXT: AArch64 Instruction Selection
	; CHECK-NEXT: Finalize ISel and expand pseudo-instructions			; CHECK-NEXT: Finalize ISel and expand pseudo-instructions
	; CHECK-NEXT: Local Stack Slot Allocation			; CHECK-NEXT: Local Stack Slot Allocation
	; CHECK-NEXT: Eliminate PHI nodes for register allocation			; CHECK-NEXT: Eliminate PHI nodes for register allocation
	; CHECK-NEXT: Two-Address instruction pass			; CHECK-NEXT: Two-Address instruction pass
				; CHECK-NEXT: AArch64 Atomic Loop Bundler
	; CHECK-NEXT: Fast Register Allocator			; CHECK-NEXT: Fast Register Allocator
				; CHECK-NEXT: Unpack machine instruction bundles
	; CHECK-NEXT: Fixup Statepoint Caller Saved			; CHECK-NEXT: Fixup Statepoint Caller Saved
	; CHECK-NEXT: Lazy Machine Block Frequency Analysis			; CHECK-NEXT: Lazy Machine Block Frequency Analysis
	; CHECK-NEXT: Machine Optimization Remark Emitter			; CHECK-NEXT: Machine Optimization Remark Emitter
	; CHECK-NEXT: Prologue/Epilogue Insertion & Frame Finalization			; CHECK-NEXT: Prologue/Epilogue Insertion & Frame Finalization
	; CHECK-NEXT: Post-RA pseudo instruction expansion pass			; CHECK-NEXT: Post-RA pseudo instruction expansion pass
	; CHECK-NEXT: AArch64 pseudo instruction expansion pass			; CHECK-NEXT: AArch64 pseudo instruction expansion pass
	; CHECK-NEXT: AArch64 speculation hardening pass			; CHECK-NEXT: AArch64 speculation hardening pass
	; CHECK-NEXT: AArch64 Indirect Thunks			; CHECK-NEXT: AArch64 Indirect Thunks
	Show All 19 Lines

llvm/test/CodeGen/AArch64/atomicrmw_exclusive_monitor.ll

This file was added.

				; RUN: llc -O0 -o - %s \| FileCheck %s --check-prefix=CHECK
				target triple = "aarch64-none-eabi"

				@atomic_i8 = external global i8
				@atomic_i16 = external global i16
				@atomic_i32 = external global i32
				@atomic_i64 = external global i64

				@atomic_half = external global half
				@atomic_float = external global float
				@atomic_double = external global double


				define i8 @test_xchg_i8() {
				entry:
				%0 = atomicrmw xchg i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_xchg_i8:
				; CHECK: ldxrb {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrb {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i8 %0
				}
				define i8 @test_add_i8() {
				entry:
				%0 = atomicrmw add i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_add_i8:
				; CHECK: ldxrb {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrb {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i8 %0
				}
				define i8 @test_sub_i8() {
				entry:
				%0 = atomicrmw sub i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_sub_i8:
				; CHECK: ldxrb {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrb {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i8 %0
				}
				define i8 @test_and_i8() {
				entry:
				%0 = atomicrmw and i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_and_i8:
				; CHECK: ldxrb {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrb {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i8 %0
				}
				define i8 @test_nand_i8() {
				entry:
				%0 = atomicrmw nand i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_nand_i8:
				; CHECK: ldxrb {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrb {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i8 %0
				}
				define i8 @test_or_i8() {
				entry:
				%0 = atomicrmw or i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_or_i8:
				; CHECK: ldxrb {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrb {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i8 %0
				}
				define i8 @test_xor_i8() {
				entry:
				%0 = atomicrmw xor i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_xor_i8:
				; CHECK: ldxrb {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrb {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i8 %0
				}
				define i8 @test_max_i8() {
				entry:
				%0 = atomicrmw max i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_max_i8:
				; CHECK: ldxrb {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrb {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i8 %0
				}
				define i8 @test_min_i8() {
				entry:
				%0 = atomicrmw min i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_min_i8:
				; CHECK: ldxrb {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrb {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i8 %0
				}
				define i8 @test_umax_i8() {
				entry:
				%0 = atomicrmw umax i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_umax_i8:
				; CHECK: ldxrb {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrb {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i8 %0
				}
				define i8 @test_umin_i8() {
				entry:
				%0 = atomicrmw umin i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_umin_i8:
				; CHECK: ldxrb {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrb {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i8 %0
				}


				define i16 @test_xchg_i16() {
				entry:
				%0 = atomicrmw xchg i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_xchg_i16:
				; CHECK: ldxrh {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrh {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i16 %0
				}
				define i16 @test_add_i16() {
				entry:
				%0 = atomicrmw add i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_add_i16:
				; CHECK: ldxrh {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrh {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i16 %0
				}
				define i16 @test_sub_i16() {
				entry:
				%0 = atomicrmw sub i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_sub_i16:
				; CHECK: ldxrh {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrh {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i16 %0
				}
				define i16 @test_and_i16() {
				entry:
				%0 = atomicrmw and i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_and_i16:
				; CHECK: ldxrh {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrh {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i16 %0
				}
				define i16 @test_nand_i16() {
				entry:
				%0 = atomicrmw nand i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_nand_i16:
				; CHECK: ldxrh {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrh {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i16 %0
				}
				define i16 @test_or_i16() {
				entry:
				%0 = atomicrmw or i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_or_i16:
				; CHECK: ldxrh {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrh {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i16 %0
				}
				define i16 @test_xor_i16() {
				entry:
				%0 = atomicrmw xor i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_xor_i16:
				; CHECK: ldxrh {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrh {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i16 %0
				}
				define i16 @test_max_i16() {
				entry:
				%0 = atomicrmw max i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_max_i16:
				; CHECK: ldxrh {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrh {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i16 %0
				}
				define i16 @test_min_i16() {
				entry:
				%0 = atomicrmw min i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_min_i16:
				; CHECK: ldxrh {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrh {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i16 %0
				}
				define i16 @test_umax_i16() {
				entry:
				%0 = atomicrmw umax i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_umax_i16:
				; CHECK: ldxrh {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrh {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i16 %0
				}
				define i16 @test_umin_i16() {
				entry:
				%0 = atomicrmw umin i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_umin_i16:
				; CHECK: ldxrh {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxrh {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i16 %0
				}
				define half @test_fadd_half() {
				entry:
				%0 = atomicrmw fadd half* @atomic_half, half 1.0 monotonic
				; CHECK-LABEL: test_fadd_half:
				; CHECK: ldaxrh {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stlxrh {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret half %0
				}
				define half @test_fsub_half() {
				entry:
				%0 = atomicrmw fsub half* @atomic_half, half 1.0 monotonic
				; CHECK-LABEL: test_fsub_half:
				; CHECK: ldaxrh {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stlxrh {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret half %0
				}


				define i32 @test_xchg_i32() {
				entry:
				%0 = atomicrmw xchg i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_xchg_i32:
				; CHECK: ldxr {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i32 %0
				}
				define i32 @test_add_i32() {
				entry:
				%0 = atomicrmw add i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_add_i32:
				; CHECK: ldxr {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i32 %0
				}
				define i32 @test_sub_i32() {
				entry:
				%0 = atomicrmw sub i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_sub_i32:
				; CHECK: ldxr {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i32 %0
				}
				define i32 @test_and_i32() {
				entry:
				%0 = atomicrmw and i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_and_i32:
				; CHECK: ldxr {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i32 %0
				}
				define i32 @test_nand_i32() {
				entry:
				%0 = atomicrmw nand i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_nand_i32:
				; CHECK: ldxr {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i32 %0
				}
				define i32 @test_or_i32() {
				entry:
				%0 = atomicrmw or i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_or_i32:
				; CHECK: ldxr {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i32 %0
				}
				define i32 @test_xor_i32() {
				entry:
				%0 = atomicrmw xor i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_xor_i32:
				; CHECK: ldxr {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i32 %0
				}
				define i32 @test_max_i32() {
				entry:
				%0 = atomicrmw max i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_max_i32:
				; CHECK: ldxr {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i32 %0
				}
				define i32 @test_min_i32() {
				entry:
				%0 = atomicrmw min i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_min_i32:
				; CHECK: ldxr {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i32 %0
				}
				define i32 @test_umax_i32() {
				entry:
				%0 = atomicrmw umax i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_umax_i32:
				; CHECK: ldxr {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i32 %0
				}
				define i32 @test_umin_i32() {
				entry:
				%0 = atomicrmw umin i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_umin_i32:
				; CHECK: ldxr {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret i32 %0
				}
				define float @test_fadd_float() {
				entry:
				%0 = atomicrmw fadd float* @atomic_float, float 1.0 monotonic
				; CHECK-LABEL: test_fadd_float:
				; CHECK: ldaxr {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stlxr {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret float %0
				}
				define float @test_fsub_float() {
				entry:
				%0 = atomicrmw fsub float* @atomic_float, float 1.0 monotonic
				; CHECK-LABEL: test_fsub_float:
				; CHECK: ldaxr {{w[0-9]+}}, [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stlxr {{w[0-9]+}}, {{w[0-9]+}}, [[ADDR]]
				ret float %0
				}




				define i64 @test_xchg_i64() {
				entry:
				%0 = atomicrmw xchg i64* @atomic_i64, i64 1 monotonic
				; CHECK-LABEL: test_xchg_i64:
				; CHECK: ldxr [[RA:x[0-9]+]], [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{x[0-9]+}}, [[ADDR]]
				ret i64 %0
				}
				define i64 @test_add_i64() {
				entry:
				%0 = atomicrmw add i64* @atomic_i64, i64 1 monotonic
				; CHECK-LABEL: test_add_i64:
				; CHECK: ldxr [[RA:x[0-9]+]], [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{x[0-9]+}}, [[ADDR]]
				ret i64 %0
				}
				define i64 @test_sub_i64() {
				entry:
				%0 = atomicrmw sub i64* @atomic_i64, i64 1 monotonic
				; CHECK-LABEL: test_sub_i64:
				; CHECK: ldxr [[RA:x[0-9]+]], [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{x[0-9]+}}, [[ADDR]]
				ret i64 %0
				}
				define i64 @test_and_i64() {
				entry:
				%0 = atomicrmw and i64* @atomic_i64, i64 1 monotonic
				; CHECK-LABEL: test_and_i64:
				; CHECK: ldxr [[RA:x[0-9]+]], [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{x[0-9]+}}, [[ADDR]]
				ret i64 %0
				}
				define i64 @test_nand_i64() {
				entry:
				%0 = atomicrmw nand i64* @atomic_i64, i64 1 monotonic
				; CHECK-LABEL: test_nand_i64:
				; CHECK: ldxr [[RA:x[0-9]+]], [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{x[0-9]+}}, [[ADDR]]
				ret i64 %0
				}
				define i64 @test_or_i64() {
				entry:
				%0 = atomicrmw or i64* @atomic_i64, i64 1 monotonic
				; CHECK-LABEL: test_or_i64:
				; CHECK: ldxr [[RA:x[0-9]+]], [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{x[0-9]+}}, [[ADDR]]
				ret i64 %0
				}
				define i64 @test_xor_i64() {
				entry:
				%0 = atomicrmw xor i64* @atomic_i64, i64 1 monotonic
				; CHECK-LABEL: test_xor_i64:
				; CHECK: ldxr [[RA:x[0-9]+]], [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{x[0-9]+}}, [[ADDR]]
				ret i64 %0
				}
				define i64 @test_max_i64() {
				entry:
				%0 = atomicrmw max i64* @atomic_i64, i64 1 monotonic
				; CHECK-LABEL: test_max_i64:
				; CHECK: ldxr [[RA:x[0-9]+]], [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{x[0-9]+}}, [[ADDR]]
				ret i64 %0
				}
				define i64 @test_min_i64() {
				entry:
				%0 = atomicrmw min i64* @atomic_i64, i64 1 monotonic
				; CHECK-LABEL: test_min_i64:
				; CHECK: ldxr [[RA:x[0-9]+]], [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{x[0-9]+}}, [[ADDR]]
				ret i64 %0
				}
				define i64 @test_umax_i64() {
				entry:
				%0 = atomicrmw umax i64* @atomic_i64, i64 1 monotonic
				; CHECK-LABEL: test_umax_i64:
				; CHECK: ldxr [[RA:x[0-9]+]], [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{x[0-9]+}}, [[ADDR]]
				ret i64 %0
				}
				define i64 @test_umin_i64() {
				entry:
				%0 = atomicrmw umin i64* @atomic_i64, i64 1 monotonic
				; CHECK-LABEL: test_umin_i64:
				; CHECK: ldxr [[RA:x[0-9]+]], [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stxr {{w[0-9]+}}, {{x[0-9]+}}, [[ADDR]]
				ret i64 %0
				}
				define double @test_fadd_double() {
				entry:
				%0 = atomicrmw fadd double* @atomic_double, double 1.0 monotonic
				; CHECK-LABEL: test_fadd_double:
				; CHECK: ldaxr [[RA:x[0-9]+]], [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stlxr {{w[0-9]+}}, {{x[0-9]+}}, [[ADDR]]
				ret double %0
				}
				define double @test_fsub_double() {
				entry:
				%0 = atomicrmw fsub double* @atomic_double, double 1.0 monotonic
				; CHECK-LABEL: test_fsub_double:
				; CHECK: ldaxr [[RA:x[0-9]+]], [[ADDR:.x[0-9]+.]]
				; CHECK-NOT: str
				; CHECK: stlxr {{w[0-9]+}}, {{x[0-9]+}}, [[ADDR]]
				ret double %0
				}

llvm/test/CodeGen/AMDGPU/fast-regalloc-bundles.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -march=amdgcn -mcpu=gfx902 -verify-machineinstrs -run-pass=regallocfast %s -o - \| FileCheck --check-prefix=GCN %s			# RUN: llc -march=amdgcn -mcpu=gfx902 -verify-machineinstrs -run-pass=regallocfast %s -o - \| FileCheck --check-prefix=GCN %s

	---			---
	name: fast_regalloc_bundle_handling			name: fast_regalloc_bundle_handling
	tracksRegLiveness: true			tracksRegLiveness: true
	registers:			registers:
	- { id: 0, class: vgpr_32 }			- { id: 0, class: vgpr_32 }
	- { id: 1, class: vgpr_32 }			- { id: 1, class: vgpr_32 }
	- { id: 2, class: vgpr_32 }			- { id: 2, class: vgpr_32 }
	body: \|			body: \|
	bb.0:			bb.0:
	; GCN-LABEL: name: fast_regalloc_bundle_handling			; GCN-LABEL: name: fast_regalloc_bundle_handling
	; GCN: renamable $vgpr0 = IMPLICIT_DEF
	; GCN: renamable $vgpr1 = IMPLICIT_DEF			; GCN: renamable $vgpr1 = IMPLICIT_DEF
	; GCN: renamable $vgpr0 = BUNDLE implicit killed renamable $vgpr0, implicit killed renamable $vgpr1, implicit $exec {			; GCN: renamable $vgpr2 = IMPLICIT_DEF
	; GCN: renamable $vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr1, implicit $exec			; GCN: renamable $vgpr0 = BUNDLE implicit killed renamable $vgpr1, implicit killed renamable $vgpr2, implicit $exec {
				; GCN: renamable $vgpr0 = V_ADD_U32_e32 $vgpr1, $vgpr2, implicit $exec
	; GCN: }			; GCN: }
	; GCN: S_ENDPGM 0, implicit killed renamable $vgpr0			; GCN: S_ENDPGM 0, implicit killed renamable $vgpr0
	%0 = IMPLICIT_DEF			%0 = IMPLICIT_DEF
	%1 = IMPLICIT_DEF			%1 = IMPLICIT_DEF
	%2 = BUNDLE implicit %0, implicit %1, implicit $exec {			%2 = BUNDLE implicit %0, implicit %1, implicit $exec {
	%2 = V_ADD_U32_e32 %0, %1, implicit $exec			%2 = V_ADD_U32_e32 %0, %1, implicit $exec
	}			}
	S_ENDPGM 0, implicit %2			S_ENDPGM 0, implicit %2
	...			...

llvm/test/CodeGen/ARM/atomicrmw_exclusive_monitor.ll

This file was added.

				; RUN: llc -O0 -o - %s \| FileCheck %s --check-prefix=CHECK
				target triple = "armv7-none-eabi"

				@atomic_i8 = external global i8
				@atomic_i16 = external global i16
				@atomic_i32 = external global i32
				@atomic_i64 = external global i64

				@atomic_half = external global half
				@atomic_float = external global float
				@atomic_double = external global double


				define i8 @test_xchg_i8() {
				entry:
				%0 = atomicrmw xchg i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_xchg_i8:
				; CHECK: ldrexb {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexb {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i8 %0
				}
				define i8 @test_add_i8() {
				entry:
				%0 = atomicrmw add i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_add_i8:
				; CHECK: ldrexb {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexb {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i8 %0
				}
				define i8 @test_sub_i8() {
				entry:
				%0 = atomicrmw sub i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_sub_i8:
				; CHECK: ldrexb {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexb {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i8 %0
				}
				define i8 @test_and_i8() {
				entry:
				%0 = atomicrmw and i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_and_i8:
				; CHECK: ldrexb {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexb {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i8 %0
				}
				define i8 @test_nand_i8() {
				entry:
				%0 = atomicrmw nand i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_nand_i8:
				; CHECK: ldrexb {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexb {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i8 %0
				}
				define i8 @test_or_i8() {
				entry:
				%0 = atomicrmw or i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_or_i8:
				; CHECK: ldrexb {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexb {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i8 %0
				}
				define i8 @test_xor_i8() {
				entry:
				%0 = atomicrmw xor i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_xor_i8:
				; CHECK: ldrexb {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexb {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i8 %0
				}
				define i8 @test_max_i8() {
				entry:
				%0 = atomicrmw max i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_max_i8:
				; CHECK: ldrexb {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexb {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i8 %0
				}
				define i8 @test_min_i8() {
				entry:
				%0 = atomicrmw min i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_min_i8:
				; CHECK: ldrexb {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexb {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i8 %0
				}
				define i8 @test_umax_i8() {
				entry:
				%0 = atomicrmw umax i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_umax_i8:
				; CHECK: ldrexb {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexb {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i8 %0
				}
				define i8 @test_umin_i8() {
				entry:
				%0 = atomicrmw umin i8* @atomic_i8, i8 1 monotonic
				; CHECK-LABEL: test_umin_i8:
				; CHECK: ldrexb {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexb {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i8 %0
				}


				define i16 @test_xchg_i16() {
				entry:
				%0 = atomicrmw xchg i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_xchg_i16:
				; CHECK: ldrexh {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexh {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i16 %0
				}
				define i16 @test_add_i16() {
				entry:
				%0 = atomicrmw add i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_add_i16:
				; CHECK: ldrexh {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexh {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i16 %0
				}
				define i16 @test_sub_i16() {
				entry:
				%0 = atomicrmw sub i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_sub_i16:
				; CHECK: ldrexh {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexh {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i16 %0
				}
				define i16 @test_and_i16() {
				entry:
				%0 = atomicrmw and i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_and_i16:
				; CHECK: ldrexh {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexh {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i16 %0
				}
				define i16 @test_nand_i16() {
				entry:
				%0 = atomicrmw nand i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_nand_i16:
				; CHECK: ldrexh {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexh {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i16 %0
				}
				define i16 @test_or_i16() {
				entry:
				%0 = atomicrmw or i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_or_i16:
				; CHECK: ldrexh {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexh {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i16 %0
				}
				define i16 @test_xor_i16() {
				entry:
				%0 = atomicrmw xor i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_xor_i16:
				; CHECK: ldrexh {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexh {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i16 %0
				}
				define i16 @test_max_i16() {
				entry:
				%0 = atomicrmw max i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_max_i16:
				; CHECK: ldrexh {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexh {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i16 %0
				}
				define i16 @test_min_i16() {
				entry:
				%0 = atomicrmw min i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_min_i16:
				; CHECK: ldrexh {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexh {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i16 %0
				}
				define i16 @test_umax_i16() {
				entry:
				%0 = atomicrmw umax i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_umax_i16:
				; CHECK: ldrexh {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexh {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i16 %0
				}
				define i16 @test_umin_i16() {
				entry:
				%0 = atomicrmw umin i16* @atomic_i16, i16 1 monotonic
				; CHECK-LABEL: test_umin_i16:
				; CHECK: ldrexh {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexh {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i16 %0
				}


				define i32 @test_xchg_i32() {
				entry:
				%0 = atomicrmw xchg i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_xchg_i32:
				; CHECK: ldrex {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strex {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i32 %0
				}
				define i32 @test_add_i32() {
				entry:
				%0 = atomicrmw add i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_add_i32:
				; CHECK: ldrex {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strex {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i32 %0
				}
				define i32 @test_sub_i32() {
				entry:
				%0 = atomicrmw sub i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_sub_i32:
				; CHECK: ldrex {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strex {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i32 %0
				}
				define i32 @test_and_i32() {
				entry:
				%0 = atomicrmw and i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_and_i32:
				; CHECK: ldrex {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strex {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i32 %0
				}
				define i32 @test_nand_i32() {
				entry:
				%0 = atomicrmw nand i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_nand_i32:
				; CHECK: ldrex {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strex {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i32 %0
				}
				define i32 @test_or_i32() {
				entry:
				%0 = atomicrmw or i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_or_i32:
				; CHECK: ldrex {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strex {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i32 %0
				}
				define i32 @test_xor_i32() {
				entry:
				%0 = atomicrmw xor i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_xor_i32:
				; CHECK: ldrex {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strex {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i32 %0
				}
				define i32 @test_max_i32() {
				entry:
				%0 = atomicrmw max i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_max_i32:
				; CHECK: ldrex {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strex {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i32 %0
				}
				define i32 @test_min_i32() {
				entry:
				%0 = atomicrmw min i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_min_i32:
				; CHECK: ldrex {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strex {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i32 %0
				}
				define i32 @test_umax_i32() {
				entry:
				%0 = atomicrmw umax i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_umax_i32:
				; CHECK: ldrex {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strex {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i32 %0
				}
				define i32 @test_umin_i32() {
				entry:
				%0 = atomicrmw umin i32* @atomic_i32, i32 1 monotonic
				; CHECK-LABEL: test_umin_i32:
				; CHECK: ldrex {{r[0-9]+\|lr}}, [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strex {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i32 %0
				}




				define i64 @test_xchg_i64() {
				entry:
				%0 = atomicrmw xchg i64* @atomic_i64, i64 1 monotonic
				; CHECK-LABEL: test_xchg_i64:
				; CHECK: ldrexd {{r[0-9]+\|lr}}, [[RB:r[0-9]+]], [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexd {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i64 %0
				}
				define i64 @test_add_i64() {
				entry:
				%0 = atomicrmw add i64* @atomic_i64, i64 1 monotonic
				; CHECK-LABEL: test_add_i64:
				; CHECK: ldrexd {{r[0-9]+\|lr}}, [[RB:r[0-9]+]], [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexd {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i64 %0
				}
				define i64 @test_sub_i64() {
				entry:
				%0 = atomicrmw sub i64* @atomic_i64, i64 1 monotonic
				; CHECK-LABEL: test_sub_i64:
				; CHECK: ldrexd {{r[0-9]+\|lr}}, [[RB:r[0-9]+]], [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexd {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i64 %0
				}
				define i64 @test_and_i64() {
				entry:
				%0 = atomicrmw and i64* @atomic_i64, i64 1 monotonic
				; CHECK-LABEL: test_and_i64:
				; CHECK: ldrexd {{r[0-9]+\|lr}}, [[RB:r[0-9]+]], [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexd {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i64 %0
				}
				define i64 @test_nand_i64() {
				entry:
				%0 = atomicrmw nand i64* @atomic_i64, i64 1 monotonic
				; CHECK-LABEL: test_nand_i64:
				; CHECK: ldrexd {{r[0-9]+\|lr}}, [[RB:r[0-9]+]], [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexd {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i64 %0
				}
				define i64 @test_or_i64() {
				entry:
				%0 = atomicrmw or i64* @atomic_i64, i64 1 monotonic
				; CHECK-LABEL: test_or_i64:
				; CHECK: ldrexd {{r[0-9]+\|lr}}, [[RB:r[0-9]+]], [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexd {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i64 %0
				}
				define i64 @test_xor_i64() {
				entry:
				%0 = atomicrmw xor i64* @atomic_i64, i64 1 monotonic
				; CHECK-LABEL: test_xor_i64:
				; CHECK: ldrexd {{r[0-9]+\|lr}}, [[RB:r[0-9]+]], [[ADDR:.(r[0-9]+\|lr).]]
				; CHECK-NOT: str
				; CHECK: strexd {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, {{r[0-9]+\|lr}}, [[ADDR]]
				ret i64 %0
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][RegAllocFast] Add findSpillBefore to TargetRegisterInfoAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 331240

llvm/include/llvm/CodeGen/AtomicLoopBundler.h

llvm/lib/CodeGen/RegAllocFast.cpp

llvm/lib/Target/AArch64/AArch64.h

llvm/lib/Target/AArch64/AArch64AtomicLoopBundler.h

llvm/lib/Target/AArch64/AArch64AtomicLoopBundler.cpp

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp

llvm/lib/Target/AArch64/CMakeLists.txt

llvm/lib/Target/ARM/ARM.h

llvm/lib/Target/ARM/ARMAtomicLoopBundler.h

llvm/lib/Target/ARM/ARMAtomicLoopBundler.cpp

llvm/lib/Target/ARM/ARMTargetMachine.cpp

llvm/lib/Target/ARM/CMakeLists.txt

llvm/test/CodeGen/AArch64/O0-pipeline.ll

llvm/test/CodeGen/AArch64/atomicrmw_exclusive_monitor.ll

llvm/test/CodeGen/AMDGPU/fast-regalloc-bundles.mir

llvm/test/CodeGen/ARM/atomicrmw_exclusive_monitor.ll

[AArch64][RegAllocFast] Add findSpillBefore to TargetRegisterInfo
AbandonedPublic