This is an archive of the discontinued LLVM Phabricator instance.

lib/Target/AMDGPU/AMDGPUMachineModuleInfo.h
38 ↗	(On Diff #113586)	Suggest more descriptive comment: /// \returns \p SSID's position in the total ordering of sync scopes such that a wider scope has a higher value than a narrower scope.
51 ↗	(On Diff #113586)	Would be better to `reportUnknownSynchScope(MI);` since this code would need to be updated if the target introduced another sync scope value.
lib/Target/AMDGPU/SIMemoryLegalizer.cpp
204 ↗	(On Diff #113586)	Should there be a TODO saying that this logic does not check that if MMOs are present they cover the entire set of locations accessed by the memory instruction. If they only partially cover then would need to assume the conservative assumption of sequentially consistent system scope.
227–230 ↗	(On Diff #113586)	This logic seems to belong in `constructFromMI` which claims to create a SIMemOpInfo from a machine instruction. So it ought to do that for any machine instruction, including those with no MMOs. Same comment for other cases.

Why does this need to handle multiple mem operands? The only instructions where that really makes sense are for ds_read2_b32/ds_write2_b32 which this doesn't need to do anything with

In D37397#859430, @arsenm wrote:

Why does this need to handle multiple mem operands? The only instructions where that really makes sense are for ds_read2_b32/ds_write2_b32 which this doesn't need to do anything with

So what are the rules when MMO are allowed and will be preserved? For example, is it guaranteed that no memory operation that originates from an atomic LLVM ir instruction will be combined with another memory instruction resulting in multiple MMOs? Technically the memory model does allow this in some cases, but are be promising that it will never happen?

Would it be possible to require that every memory operation that was atomic has a MMO that was required to be preserved through all optimizations? The current defaulting of memory instructions without any MMO to be treated as atomic seems uncomfortable, and is fragile when multiple MMOs are present since no check is made if the MMOs cover the entire set of memory locations. Should memory instructions have a property to indicate if they are atomic, and then the MMO could provide additional information about what kind of atomic.

Since instructions can have multiple MMOs can it be required that if they do then they are required not to originate from atomic instructions?

It would he helpful to document the rules on when MMOs are allowed/required.

In D37397#859455, @t-tye wrote:

In D37397#859430, @arsenm wrote:

Why does this need to handle multiple mem operands? The only instructions where that really makes sense are for ds_read2_b32/ds_write2_b32 which this doesn't need to do anything with

So what are the rules when MMO are allowed and will be preserved? For example, is it guaranteed that no memory operation that originates from an atomic LLVM ir instruction will be combined with another memory instruction resulting in multiple MMOs? Technically the memory model does allow this in some cases, but are be promising that it will never happen?

Would it be possible to require that every memory operation that was atomic has a MMO that was required to be preserved through all optimizations? The current defaulting of memory instructions without any MMO to be treated as atomic seems uncomfortable, and is fragile when multiple MMOs are present since no check is made if the MMOs cover the entire set of memory locations. Should memory instructions have a property to indicate if they are atomic, and then the MMO could provide additional information about what kind of atomic.

Since instructions can have multiple MMOs can it be required that if they do then they are required not to originate from atomic instructions?

It would he helpful to document the rules on when MMOs are allowed/required.

Combining multiple MMOs that refer to the same location is probably just broken. We don't really have instructions that can access multiple addresses at the same time (except for the ds read2/write2 and I suppose the direct load to LDS buffer mode). It doesn't really make sense to merge atomic operations. I could imagine an architecture with an atomic load addressing modes in multiple operands though.

kzhuravl mentioned this in D37396: AMDGPU: Cleanup/refactor SIMemoryLegalizer [3]:.Sep 5 2017, 9:44 AM

Address review feedback.

lib/Target/AMDGPU/AMDGPUMachineModuleInfo.h
51 ↗	(On Diff #113586)	Discussed a different approach offline.
lib/Target/AMDGPU/SIMemoryLegalizer.cpp
204 ↗	(On Diff #113586)	Discussed offline: this should be in validator. I have put a comment.
227–230 ↗	(On Diff #113586)	Discussed offline.

LGTM

Should there be an MIR test with multiple MMOs?

This revision is now accepted and ready to land.Sep 5 2017, 5:40 PM

Changes discussed offline:

ErrorOr -> Optional
Minor renaming

kzhuravl mentioned this in D36862: AMDGPU: Handle non-temporal loads and stores.Sep 6 2017, 12:36 PM

LGTM with one suggested name change.

lib/Target/AMDGPU/SIMemoryLegalizer.cpp
190 ↗	(On Diff #114050)	Suggest renaming to constructFromMIWithMMO since it only can be used when the instruction has MMOs.

Closed by commit rL312725: AMDGPU: Handle more than one memory operand in SIMemoryLegalizer (authored by kzhuravl). · Explain WhySep 7 2017, 9:15 AM

This revision was automatically updated to reflect the committed changes.

kzhuravl marked an inline comment as done.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AMDGPU/

AMDGPUMachineModuleInfo.h

40 lines

SIMemoryLegalizer.cpp

163 lines

test/

CodeGen/

MIR/

AMDGPU/

memory-legalizer-multiple-mem-operands-atomics.mir

163 lines

Diff 114189

llvm/trunk/lib/Target/AMDGPU/AMDGPUMachineModuleInfo.h

	Show All 10 Lines
	/// \brief AMDGPU Machine Module Info.			/// \brief AMDGPU Machine Module Info.
	///			///
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPUMACHINEMODULEINFO_H			#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPUMACHINEMODULEINFO_H
	#define LLVM_LIB_TARGET_AMDGPU_AMDGPUMACHINEMODULEINFO_H			#define LLVM_LIB_TARGET_AMDGPU_AMDGPUMACHINEMODULEINFO_H

				#include "llvm/ADT/None.h"
				#include "llvm/ADT/Optional.h"
	#include "llvm/CodeGen/MachineModuleInfo.h"			#include "llvm/CodeGen/MachineModuleInfo.h"
	#include "llvm/CodeGen/MachineModuleInfoImpls.h"			#include "llvm/CodeGen/MachineModuleInfoImpls.h"
	#include "llvm/IR/LLVMContext.h"			#include "llvm/IR/LLVMContext.h"

	namespace llvm {			namespace llvm {

	class AMDGPUMachineModuleInfo final : public MachineModuleInfoELF {			class AMDGPUMachineModuleInfo final : public MachineModuleInfoELF {
	private:			private:

	// All supported memory/synchronization scopes can be found here:			// All supported memory/synchronization scopes can be found here:
	// http://llvm.org/docs/AMDGPUUsage.html#memory-scopes			// http://llvm.org/docs/AMDGPUUsage.html#memory-scopes

	/// \brief Agent synchronization scope ID.			/// \brief Agent synchronization scope ID.
	SyncScope::ID AgentSSID;			SyncScope::ID AgentSSID;
	/// \brief Workgroup synchronization scope ID.			/// \brief Workgroup synchronization scope ID.
	SyncScope::ID WorkgroupSSID;			SyncScope::ID WorkgroupSSID;
	/// \brief Wavefront synchronization scope ID.			/// \brief Wavefront synchronization scope ID.
	SyncScope::ID WavefrontSSID;			SyncScope::ID WavefrontSSID;

				/// \brief In AMDGPU target synchronization scopes are inclusive, meaning a
				/// larger synchronization scope is inclusive of a smaller synchronization
				/// scope.
				///
				/// \returns \p SSID's inclusion ordering, or "None" if \p SSID is not
				/// supported by the AMDGPU target.
				Optional<uint8_t> getSyncScopeInclusionOrdering(SyncScope::ID SSID) const {
				if (SSID == SyncScope::SingleThread)
				return 0;
				else if (SSID == getWavefrontSSID())
				return 1;
				else if (SSID == getWorkgroupSSID())
				return 2;
				else if (SSID == getAgentSSID())
				return 3;
				else if (SSID == SyncScope::System)
				return 4;

				return None;
				}

	public:			public:
	AMDGPUMachineModuleInfo(const MachineModuleInfo &MMI);			AMDGPUMachineModuleInfo(const MachineModuleInfo &MMI);

	/// \returns Agent synchronization scope ID.			/// \returns Agent synchronization scope ID.
	SyncScope::ID getAgentSSID() const {			SyncScope::ID getAgentSSID() const {
	return AgentSSID;			return AgentSSID;
	}			}
	/// \returns Workgroup synchronization scope ID.			/// \returns Workgroup synchronization scope ID.
	SyncScope::ID getWorkgroupSSID() const {			SyncScope::ID getWorkgroupSSID() const {
	return WorkgroupSSID;			return WorkgroupSSID;
	}			}
	/// \returns Wavefront synchronization scope ID.			/// \returns Wavefront synchronization scope ID.
	SyncScope::ID getWavefrontSSID() const {			SyncScope::ID getWavefrontSSID() const {
	return WavefrontSSID;			return WavefrontSSID;
	}			}

				/// \brief In AMDGPU target synchronization scopes are inclusive, meaning a
				/// larger synchronization scope is inclusive of a smaller synchronization
				/// scope.
				///
				/// \returns True if synchronization scope \p A is larger than or equal to
				/// synchronization scope \p B, false if synchronization scope \p A is smaller
				/// than synchronization scope \p B, or "None" if either synchronization scope
				/// \p A or \p B is not supported by the AMDGPU target.
				Optional<bool> isSyncScopeInclusion(SyncScope::ID A, SyncScope::ID B) const {
				const auto &AIO = getSyncScopeInclusionOrdering(A);
				const auto &BIO = getSyncScopeInclusionOrdering(B);
				if (!AIO \|\| !BIO)
				return None;

				return AIO.getValue() > BIO.getValue();
				}
	};			};

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_LIB_TARGET_AMDGPU_AMDGPUMACHINEMODULEINFO_H			#endif // LLVM_LIB_TARGET_AMDGPU_AMDGPUMACHINEMODULEINFO_H

llvm/trunk/lib/Target/AMDGPU/SIMemoryLegalizer.cpp

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	private:

SIMemOpInfo(SyncScope::ID SSID, AtomicOrdering Ordering)		SIMemOpInfo(SyncScope::ID SSID, AtomicOrdering Ordering)
: SSID(SSID), Ordering(Ordering) {}		: SSID(SSID), Ordering(Ordering) {}

SIMemOpInfo(SyncScope::ID SSID, AtomicOrdering Ordering,		SIMemOpInfo(SyncScope::ID SSID, AtomicOrdering Ordering,
AtomicOrdering FailureOrdering)		AtomicOrdering FailureOrdering)
: SSID(SSID), Ordering(Ordering), FailureOrdering(FailureOrdering) {}		: SSID(SSID), Ordering(Ordering), FailureOrdering(FailureOrdering) {}

		/// \returns Info constructed from \p MI, which has at least machine memory
		/// operand.
		static Optional<SIMemOpInfo> constructFromMIWithMMO(
		const MachineBasicBlock::iterator &MI);

public:		public:
/// \returns Synchronization scope ID of the machine instruction used to		/// \returns Synchronization scope ID of the machine instruction used to
/// create this SIMemOpInfo.		/// create this SIMemOpInfo.
SyncScope::ID getSSID() const {		SyncScope::ID getSSID() const {
return SSID;		return SSID;
}		}
/// \returns Ordering constraint of the machine instruction used to		/// \returns Ordering constraint of the machine instruction used to
/// create this SIMemOpInfo.		/// create this SIMemOpInfo.
Show All 25 Lines	public:
/// \returns Atomic cmpxchg info if \p MI is an atomic cmpxchg operation,		/// \returns Atomic cmpxchg info if \p MI is an atomic cmpxchg operation,
/// "None" otherwise.		/// "None" otherwise.
static Optional<SIMemOpInfo> getAtomicCmpxchgInfo(		static Optional<SIMemOpInfo> getAtomicCmpxchgInfo(
const MachineBasicBlock::iterator &MI);		const MachineBasicBlock::iterator &MI);
/// \returns Atomic rmw info if \p MI is an atomic rmw operation,		/// \returns Atomic rmw info if \p MI is an atomic rmw operation,
/// "None" otherwise.		/// "None" otherwise.
static Optional<SIMemOpInfo> getAtomicRmwInfo(		static Optional<SIMemOpInfo> getAtomicRmwInfo(
const MachineBasicBlock::iterator &MI);		const MachineBasicBlock::iterator &MI);

		/// \brief Reports unknown synchronization scope used in \p MI to LLVM
		/// context.
		static void reportUnknownSyncScope(
		const MachineBasicBlock::iterator &MI);
};		};

class SIMemoryLegalizer final : public MachineFunctionPass {		class SIMemoryLegalizer final : public MachineFunctionPass {
private:		private:
/// \brief LLVM context.
LLVMContext *CTX = nullptr;

/// \brief Machine module info.		/// \brief Machine module info.
const AMDGPUMachineModuleInfo *MMI = nullptr;		const AMDGPUMachineModuleInfo *MMI = nullptr;

/// \brief Instruction info.		/// \brief Instruction info.
const SIInstrInfo *TII = nullptr;		const SIInstrInfo *TII = nullptr;

/// \brief Immediate for "vmcnt(0)".		/// \brief Immediate for "vmcnt(0)".
unsigned Vmcnt0Immediate = 0;		unsigned Vmcnt0Immediate = 0;
Show All 16 Lines	private:
/// \brief Sets GLC bit if present in \p MI. Returns true if \p MI is		/// \brief Sets GLC bit if present in \p MI. Returns true if \p MI is
/// modified, false otherwise.		/// modified, false otherwise.
bool setGLC(const MachineBasicBlock::iterator &MI) const;		bool setGLC(const MachineBasicBlock::iterator &MI) const;

/// \brief Removes all processed atomic pseudo instructions from the current		/// \brief Removes all processed atomic pseudo instructions from the current
/// function. Returns true if current function is modified, false otherwise.		/// function. Returns true if current function is modified, false otherwise.
bool removeAtomicPseudoMIs();		bool removeAtomicPseudoMIs();

/// \brief Reports unknown synchronization scope used in \p MI to LLVM
/// context.
void reportUnknownSynchScope(const MachineBasicBlock::iterator &MI);

/// \brief Expands load operation \p MI. Returns true if instructions are		/// \brief Expands load operation \p MI. Returns true if instructions are
/// added/deleted or \p MI is modified, false otherwise.		/// added/deleted or \p MI is modified, false otherwise.
bool expandLoad(const SIMemOpInfo &MOI,		bool expandLoad(const SIMemOpInfo &MOI,
MachineBasicBlock::iterator &MI);		MachineBasicBlock::iterator &MI);
/// \brief Expands store operation \p MI. Returns true if instructions are		/// \brief Expands store operation \p MI. Returns true if instructions are
/// added/deleted or \p MI is modified, false otherwise.		/// added/deleted or \p MI is modified, false otherwise.
bool expandStore(const SIMemOpInfo &MOI,		bool expandStore(const SIMemOpInfo &MOI,
MachineBasicBlock::iterator &MI);		MachineBasicBlock::iterator &MI);
Show All 25 Lines	public:
}		}

bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;
};		};

} // end namespace anonymous		} // end namespace anonymous

/* static */		/* static */
		Optional<SIMemOpInfo> SIMemOpInfo::constructFromMIWithMMO(
		const MachineBasicBlock::iterator &MI) {
		assert(MI->getNumMemOperands() > 0);

		const MachineFunction *MF = MI->getParent()->getParent();
		const AMDGPUMachineModuleInfo *MMI =
		&MF->getMMI().getObjFileInfo<AMDGPUMachineModuleInfo>();

		SyncScope::ID SSID = SyncScope::SingleThread;
		AtomicOrdering Ordering = AtomicOrdering::NotAtomic;
		AtomicOrdering FailureOrdering = AtomicOrdering::NotAtomic;

		// Validator should check whether or not MMOs cover the entire set of
		// locations accessed by the memory instruction.
		for (const auto &MMO : MI->memoperands()) {
		const auto &IsSyncScopeInclusion =
		MMI->isSyncScopeInclusion(SSID, MMO->getSyncScopeID());
		if (!IsSyncScopeInclusion) {
		reportUnknownSyncScope(MI);
		return None;
		}

		SSID = IsSyncScopeInclusion.getValue() ? SSID : MMO->getSyncScopeID();
		Ordering =
		isStrongerThan(Ordering, MMO->getOrdering()) ?
		Ordering : MMO->getOrdering();
		FailureOrdering =
		isStrongerThan(FailureOrdering, MMO->getFailureOrdering()) ?
		FailureOrdering : MMO->getFailureOrdering();
		}

		return SIMemOpInfo(SSID, Ordering, FailureOrdering);
		}

		/* static */
Optional<SIMemOpInfo> SIMemOpInfo::getLoadInfo(		Optional<SIMemOpInfo> SIMemOpInfo::getLoadInfo(
const MachineBasicBlock::iterator &MI) {		const MachineBasicBlock::iterator &MI) {
assert(MI->getDesc().TSFlags & SIInstrFlags::maybeAtomic);		assert(MI->getDesc().TSFlags & SIInstrFlags::maybeAtomic);

if (!(MI->mayLoad() && !MI->mayStore()))		if (!(MI->mayLoad() && !MI->mayStore()))
return None;		return None;
if (!MI->hasOneMemOperand())
		// Be conservative if there are no memory operands.
		if (MI->getNumMemOperands() == 0)
return SIMemOpInfo(SyncScope::System,		return SIMemOpInfo(SyncScope::System,
AtomicOrdering::SequentiallyConsistent);		AtomicOrdering::SequentiallyConsistent);

const MachineMemOperand MMO = MI->memoperands_begin();		return SIMemOpInfo::constructFromMIWithMMO(MI);
return SIMemOpInfo(MMO->getSyncScopeID(), MMO->getOrdering());
}		}

/* static */		/* static */
Optional<SIMemOpInfo> SIMemOpInfo::getStoreInfo(		Optional<SIMemOpInfo> SIMemOpInfo::getStoreInfo(
const MachineBasicBlock::iterator &MI) {		const MachineBasicBlock::iterator &MI) {
assert(MI->getDesc().TSFlags & SIInstrFlags::maybeAtomic);		assert(MI->getDesc().TSFlags & SIInstrFlags::maybeAtomic);

if (!(!MI->mayLoad() && MI->mayStore()))		if (!(!MI->mayLoad() && MI->mayStore()))
return None;		return None;
if (!MI->hasOneMemOperand())
		// Be conservative if there are no memory operands.
		if (MI->getNumMemOperands() == 0)
return SIMemOpInfo(SyncScope::System,		return SIMemOpInfo(SyncScope::System,
AtomicOrdering::SequentiallyConsistent);		AtomicOrdering::SequentiallyConsistent);

const MachineMemOperand MMO = MI->memoperands_begin();		return SIMemOpInfo::constructFromMIWithMMO(MI);
return SIMemOpInfo(MMO->getSyncScopeID(), MMO->getOrdering());
}		}

/* static */		/* static */
Optional<SIMemOpInfo> SIMemOpInfo::getAtomicFenceInfo(		Optional<SIMemOpInfo> SIMemOpInfo::getAtomicFenceInfo(
const MachineBasicBlock::iterator &MI) {		const MachineBasicBlock::iterator &MI) {
assert(MI->getDesc().TSFlags & SIInstrFlags::maybeAtomic);		assert(MI->getDesc().TSFlags & SIInstrFlags::maybeAtomic);

if (MI->getOpcode() != AMDGPU::ATOMIC_FENCE)		if (MI->getOpcode() != AMDGPU::ATOMIC_FENCE)
return None;		return None;

SyncScope::ID SSID =		SyncScope::ID SSID =
static_cast<SyncScope::ID>(MI->getOperand(1).getImm());		static_cast<SyncScope::ID>(MI->getOperand(1).getImm());
AtomicOrdering Ordering =		AtomicOrdering Ordering =
static_cast<AtomicOrdering>(MI->getOperand(0).getImm());		static_cast<AtomicOrdering>(MI->getOperand(0).getImm());
return SIMemOpInfo(SSID, Ordering);		return SIMemOpInfo(SSID, Ordering);
}		}

/* static */		/* static */
Optional<SIMemOpInfo> SIMemOpInfo::getAtomicCmpxchgInfo(		Optional<SIMemOpInfo> SIMemOpInfo::getAtomicCmpxchgInfo(
const MachineBasicBlock::iterator &MI) {		const MachineBasicBlock::iterator &MI) {
assert(MI->getDesc().TSFlags & SIInstrFlags::maybeAtomic);		assert(MI->getDesc().TSFlags & SIInstrFlags::maybeAtomic);

if (!(MI->mayLoad() && MI->mayStore()))		if (!(MI->mayLoad() && MI->mayStore()))
return None;		return None;
if (!MI->hasOneMemOperand())
		// Be conservative if there are no memory operands.
		if (MI->getNumMemOperands() == 0)
return SIMemOpInfo(SyncScope::System,		return SIMemOpInfo(SyncScope::System,
AtomicOrdering::SequentiallyConsistent,		AtomicOrdering::SequentiallyConsistent,
AtomicOrdering::SequentiallyConsistent);		AtomicOrdering::SequentiallyConsistent);

const MachineMemOperand MMO = MI->memoperands_begin();		return SIMemOpInfo::constructFromMIWithMMO(MI);
return SIMemOpInfo(MMO->getSyncScopeID(), MMO->getOrdering(),
MMO->getFailureOrdering());
}		}

/* static */		/* static */
Optional<SIMemOpInfo> SIMemOpInfo::getAtomicRmwInfo(		Optional<SIMemOpInfo> SIMemOpInfo::getAtomicRmwInfo(
const MachineBasicBlock::iterator &MI) {		const MachineBasicBlock::iterator &MI) {
assert(MI->getDesc().TSFlags & SIInstrFlags::maybeAtomic);		assert(MI->getDesc().TSFlags & SIInstrFlags::maybeAtomic);

if (!(MI->mayLoad() && MI->mayStore()))		if (!(MI->mayLoad() && MI->mayStore()))
return None;		return None;
if (!MI->hasOneMemOperand())
		// Be conservative if there are no memory operands.
		if (MI->getNumMemOperands() == 0)
return SIMemOpInfo(SyncScope::System,		return SIMemOpInfo(SyncScope::System,
AtomicOrdering::SequentiallyConsistent);		AtomicOrdering::SequentiallyConsistent);

const MachineMemOperand MMO = MI->memoperands_begin();		return SIMemOpInfo::constructFromMIWithMMO(MI);
return SIMemOpInfo(MMO->getSyncScopeID(), MMO->getOrdering());		}

		/* static */
		void SIMemOpInfo::reportUnknownSyncScope(
		const MachineBasicBlock::iterator &MI) {
		DiagnosticInfoUnsupported Diag(*MI->getParent()->getParent()->getFunction(),
		"Unsupported synchronization scope");
		LLVMContext *CTX = &MI->getParent()->getParent()->getFunction()->getContext();
		CTX->diagnose(Diag);
}		}

bool SIMemoryLegalizer::insertBufferWbinvl1Vol(MachineBasicBlock::iterator &MI,		bool SIMemoryLegalizer::insertBufferWbinvl1Vol(MachineBasicBlock::iterator &MI,
bool Before) const {		bool Before) const {
MachineBasicBlock &MBB = *MI->getParent();		MachineBasicBlock &MBB = *MI->getParent();
DebugLoc DL = MI->getDebugLoc();		DebugLoc DL = MI->getDebugLoc();

if (!Before)		if (!Before)
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	bool SIMemoryLegalizer::removeAtomicPseudoMIs() {

for (auto &MI : AtomicPseudoMIs)		for (auto &MI : AtomicPseudoMIs)
MI->eraseFromParent();		MI->eraseFromParent();

AtomicPseudoMIs.clear();		AtomicPseudoMIs.clear();
return true;		return true;
}		}

void SIMemoryLegalizer::reportUnknownSynchScope(
const MachineBasicBlock::iterator &MI) {
DiagnosticInfoUnsupported Diag(*MI->getParent()->getParent()->getFunction(),
"Unsupported synchronization scope");
CTX->diagnose(Diag);
}

bool SIMemoryLegalizer::expandLoad(const SIMemOpInfo &MOI,		bool SIMemoryLegalizer::expandLoad(const SIMemOpInfo &MOI,
MachineBasicBlock::iterator &MI) {		MachineBasicBlock::iterator &MI) {
assert(MI->mayLoad() && !MI->mayStore());		assert(MI->mayLoad() && !MI->mayStore());

bool Changed = false;		bool Changed = false;

if (MOI.isAtomic()) {		if (MOI.isAtomic()) {
if (MOI.getSSID() == SyncScope::System \|\|		if (MOI.getSSID() == SyncScope::System \|\|
MOI.getSSID() == MMI->getAgentSSID()) {		MOI.getSSID() == MMI->getAgentSSID()) {
if (MOI.getOrdering() == AtomicOrdering::Acquire \|\|		if (MOI.getOrdering() == AtomicOrdering::Acquire \|\|
MOI.getOrdering() == AtomicOrdering::SequentiallyConsistent)		MOI.getOrdering() == AtomicOrdering::SequentiallyConsistent)
Changed \|= setGLC(MI);		Changed \|= setGLC(MI);

if (MOI.getOrdering() == AtomicOrdering::SequentiallyConsistent)		if (MOI.getOrdering() == AtomicOrdering::SequentiallyConsistent)
Changed \|= insertWaitcntVmcnt0(MI);		Changed \|= insertWaitcntVmcnt0(MI);

if (MOI.getOrdering() == AtomicOrdering::Acquire \|\|		if (MOI.getOrdering() == AtomicOrdering::Acquire \|\|
MOI.getOrdering() == AtomicOrdering::SequentiallyConsistent) {		MOI.getOrdering() == AtomicOrdering::SequentiallyConsistent) {
Changed \|= insertWaitcntVmcnt0(MI, false);		Changed \|= insertWaitcntVmcnt0(MI, false);
Changed \|= insertBufferWbinvl1Vol(MI, false);		Changed \|= insertBufferWbinvl1Vol(MI, false);
}		}

return Changed;		return Changed;
} else if (MOI.getSSID() == SyncScope::SingleThread \|\|		}

		if (MOI.getSSID() == SyncScope::SingleThread \|\|
MOI.getSSID() == MMI->getWorkgroupSSID() \|\|		MOI.getSSID() == MMI->getWorkgroupSSID() \|\|
MOI.getSSID() == MMI->getWavefrontSSID()) {		MOI.getSSID() == MMI->getWavefrontSSID()) {
return Changed;		return Changed;
} else {
reportUnknownSynchScope(MI);
return Changed;
}		}

		llvm_unreachable("Unsupported synchronization scope");
}		}

return Changed;		return Changed;
}		}

bool SIMemoryLegalizer::expandStore(const SIMemOpInfo &MOI,		bool SIMemoryLegalizer::expandStore(const SIMemOpInfo &MOI,
MachineBasicBlock::iterator &MI) {		MachineBasicBlock::iterator &MI) {
assert(!MI->mayLoad() && MI->mayStore());		assert(!MI->mayLoad() && MI->mayStore());

bool Changed = false;		bool Changed = false;

if (MOI.isAtomic()) {		if (MOI.isAtomic()) {
if (MOI.getSSID() == SyncScope::System \|\|		if (MOI.getSSID() == SyncScope::System \|\|
MOI.getSSID() == MMI->getAgentSSID()) {		MOI.getSSID() == MMI->getAgentSSID()) {
if (MOI.getOrdering() == AtomicOrdering::Release \|\|		if (MOI.getOrdering() == AtomicOrdering::Release \|\|
MOI.getOrdering() == AtomicOrdering::SequentiallyConsistent)		MOI.getOrdering() == AtomicOrdering::SequentiallyConsistent)
Changed \|= insertWaitcntVmcnt0(MI);		Changed \|= insertWaitcntVmcnt0(MI);

return Changed;		return Changed;
} else if (MOI.getSSID() == SyncScope::SingleThread \|\|		}

		if (MOI.getSSID() == SyncScope::SingleThread \|\|
MOI.getSSID() == MMI->getWorkgroupSSID() \|\|		MOI.getSSID() == MMI->getWorkgroupSSID() \|\|
MOI.getSSID() == MMI->getWavefrontSSID()) {		MOI.getSSID() == MMI->getWavefrontSSID()) {
return Changed;		return Changed;
} else {
reportUnknownSynchScope(MI);
return Changed;
}		}

		llvm_unreachable("Unsupported synchronization scope");
}		}

return Changed;		return Changed;
}		}

bool SIMemoryLegalizer::expandAtomicFence(const SIMemOpInfo &MOI,		bool SIMemoryLegalizer::expandAtomicFence(const SIMemOpInfo &MOI,
MachineBasicBlock::iterator &MI) {		MachineBasicBlock::iterator &MI) {
assert(MI->getOpcode() == AMDGPU::ATOMIC_FENCE);		assert(MI->getOpcode() == AMDGPU::ATOMIC_FENCE);
Show All 11 Lines	if (MOI.getSSID() == SyncScope::System \|\|

if (MOI.getOrdering() == AtomicOrdering::Acquire \|\|		if (MOI.getOrdering() == AtomicOrdering::Acquire \|\|
MOI.getOrdering() == AtomicOrdering::AcquireRelease \|\|		MOI.getOrdering() == AtomicOrdering::AcquireRelease \|\|
MOI.getOrdering() == AtomicOrdering::SequentiallyConsistent)		MOI.getOrdering() == AtomicOrdering::SequentiallyConsistent)
Changed \|= insertBufferWbinvl1Vol(MI);		Changed \|= insertBufferWbinvl1Vol(MI);

AtomicPseudoMIs.push_back(MI);		AtomicPseudoMIs.push_back(MI);
return Changed;		return Changed;
} else if (MOI.getSSID() == SyncScope::SingleThread \|\|		}

		if (MOI.getSSID() == SyncScope::SingleThread \|\|
MOI.getSSID() == MMI->getWorkgroupSSID() \|\|		MOI.getSSID() == MMI->getWorkgroupSSID() \|\|
MOI.getSSID() == MMI->getWavefrontSSID()) {		MOI.getSSID() == MMI->getWavefrontSSID()) {
AtomicPseudoMIs.push_back(MI);		AtomicPseudoMIs.push_back(MI);
return Changed;		return Changed;
} else {
reportUnknownSynchScope(MI);
return Changed;
}		}

		SIMemOpInfo::reportUnknownSyncScope(MI);
}		}

return Changed;		return Changed;
}		}

bool SIMemoryLegalizer::expandAtomicCmpxchg(const SIMemOpInfo &MOI,		bool SIMemoryLegalizer::expandAtomicCmpxchg(const SIMemOpInfo &MOI,
MachineBasicBlock::iterator &MI) {		MachineBasicBlock::iterator &MI) {
assert(MI->mayLoad() && MI->mayStore());		assert(MI->mayLoad() && MI->mayStore());
Show All 14 Lines	if (MOI.getSSID() == SyncScope::System \|\|
MOI.getOrdering() == AtomicOrdering::SequentiallyConsistent \|\|		MOI.getOrdering() == AtomicOrdering::SequentiallyConsistent \|\|
MOI.getFailureOrdering() == AtomicOrdering::Acquire \|\|		MOI.getFailureOrdering() == AtomicOrdering::Acquire \|\|
MOI.getFailureOrdering() == AtomicOrdering::SequentiallyConsistent) {		MOI.getFailureOrdering() == AtomicOrdering::SequentiallyConsistent) {
Changed \|= insertWaitcntVmcnt0(MI, false);		Changed \|= insertWaitcntVmcnt0(MI, false);
Changed \|= insertBufferWbinvl1Vol(MI, false);		Changed \|= insertBufferWbinvl1Vol(MI, false);
}		}

return Changed;		return Changed;
} else if (MOI.getSSID() == SyncScope::SingleThread \|\|		}

		if (MOI.getSSID() == SyncScope::SingleThread \|\|
MOI.getSSID() == MMI->getWorkgroupSSID() \|\|		MOI.getSSID() == MMI->getWorkgroupSSID() \|\|
MOI.getSSID() == MMI->getWavefrontSSID()) {		MOI.getSSID() == MMI->getWavefrontSSID()) {
Changed \|= setGLC(MI);		Changed \|= setGLC(MI);
return Changed;		return Changed;
} else {
reportUnknownSynchScope(MI);
return Changed;
}		}

		llvm_unreachable("Unsupported synchronization scope");
}		}

return Changed;		return Changed;
}		}

bool SIMemoryLegalizer::expandAtomicRmw(const SIMemOpInfo &MOI,		bool SIMemoryLegalizer::expandAtomicRmw(const SIMemOpInfo &MOI,
MachineBasicBlock::iterator &MI) {		MachineBasicBlock::iterator &MI) {
assert(MI->mayLoad() && MI->mayStore());		assert(MI->mayLoad() && MI->mayStore());
Show All 11 Lines	if (MOI.getSSID() == SyncScope::System \|\|
if (MOI.getOrdering() == AtomicOrdering::Acquire \|\|		if (MOI.getOrdering() == AtomicOrdering::Acquire \|\|
MOI.getOrdering() == AtomicOrdering::AcquireRelease \|\|		MOI.getOrdering() == AtomicOrdering::AcquireRelease \|\|
MOI.getOrdering() == AtomicOrdering::SequentiallyConsistent) {		MOI.getOrdering() == AtomicOrdering::SequentiallyConsistent) {
Changed \|= insertWaitcntVmcnt0(MI, false);		Changed \|= insertWaitcntVmcnt0(MI, false);
Changed \|= insertBufferWbinvl1Vol(MI, false);		Changed \|= insertBufferWbinvl1Vol(MI, false);
}		}

return Changed;		return Changed;
} else if (MOI.getSSID() == SyncScope::SingleThread \|\|		}

		if (MOI.getSSID() == SyncScope::SingleThread \|\|
MOI.getSSID() == MMI->getWorkgroupSSID() \|\|		MOI.getSSID() == MMI->getWorkgroupSSID() \|\|
MOI.getSSID() == MMI->getWavefrontSSID()) {		MOI.getSSID() == MMI->getWavefrontSSID()) {
Changed \|= setGLC(MI);		Changed \|= setGLC(MI);
return Changed;		return Changed;
} else {
reportUnknownSynchScope(MI);
return Changed;
}		}

		llvm_unreachable("Unsupported synchronization scope");
}		}

return Changed;		return Changed;
}		}

bool SIMemoryLegalizer::runOnMachineFunction(MachineFunction &MF) {		bool SIMemoryLegalizer::runOnMachineFunction(MachineFunction &MF) {
bool Changed = false;		bool Changed = false;
const SISubtarget &ST = MF.getSubtarget<SISubtarget>();		const SISubtarget &ST = MF.getSubtarget<SISubtarget>();
const IsaInfo::IsaVersion IV = IsaInfo::getIsaVersion(ST.getFeatureBits());		const IsaInfo::IsaVersion IV = IsaInfo::getIsaVersion(ST.getFeatureBits());

CTX = &MF.getFunction()->getContext();
MMI = &MF.getMMI().getObjFileInfo<AMDGPUMachineModuleInfo>();		MMI = &MF.getMMI().getObjFileInfo<AMDGPUMachineModuleInfo>();
TII = ST.getInstrInfo();		TII = ST.getInstrInfo();

Vmcnt0Immediate =		Vmcnt0Immediate =
AMDGPU::encodeWaitcnt(IV, 0, getExpcntBitMask(IV), getLgkmcntBitMask(IV));		AMDGPU::encodeWaitcnt(IV, 0, getExpcntBitMask(IV), getLgkmcntBitMask(IV));
Wbinvl1Opcode = ST.getGeneration() <= AMDGPUSubtarget::SOUTHERN_ISLANDS ?		Wbinvl1Opcode = ST.getGeneration() <= AMDGPUSubtarget::SOUTHERN_ISLANDS ?
AMDGPU::BUFFER_WBINVL1 : AMDGPU::BUFFER_WBINVL1_VOL;		AMDGPU::BUFFER_WBINVL1 : AMDGPU::BUFFER_WBINVL1_VOL;

Show All 30 Lines

llvm/trunk/test/CodeGen/MIR/AMDGPU/memory-legalizer-multiple-mem-operands-atomics.mir

				# RUN: llc -march=amdgcn -mcpu=gfx803 -run-pass si-memory-legalizer %s -o - \| FileCheck %s

				--- \|
				; ModuleID = 'memory-legalizer-multiple-mem-operands.ll'
				source_filename = "memory-legalizer-multiple-mem-operands.ll"
				target datalayout = "e-p:32:32-p1:64:64-p2:64:64-p3:32:32-p4:64:64-p5:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64"

				define amdgpu_kernel void @multiple_mem_operands(i32 addrspace(1)* %out, i32 %cond, i32 %if_offset, i32 %else_offset) #0 {
				entry:
				%scratch0 = alloca [8192 x i32]
				%scratch1 = alloca [8192 x i32]
				%scratchptr01 = bitcast [8192 x i32]* %scratch0 to i32*
				store i32 1, i32* %scratchptr01
				%scratchptr12 = bitcast [8192 x i32]* %scratch1 to i32*
				store i32 2, i32* %scratchptr12
				%cmp = icmp eq i32 %cond, 0
				br i1 %cmp, label %if, label %else, !structurizecfg.uniform !0, !amdgpu.uniform !0

				if: ; preds = %entry
				%if_ptr = getelementptr [8192 x i32], [8192 x i32]* %scratch0, i32 0, i32 %if_offset, !amdgpu.uniform !0
				%if_value = load atomic i32, i32* %if_ptr syncscope("workgroup") seq_cst, align 4
				br label %done, !structurizecfg.uniform !0

				else: ; preds = %entry
				%else_ptr = getelementptr [8192 x i32], [8192 x i32]* %scratch1, i32 0, i32 %else_offset, !amdgpu.uniform !0
				%else_value = load atomic i32, i32* %else_ptr syncscope("agent") unordered, align 4
				br label %done, !structurizecfg.uniform !0

				done: ; preds = %else, %if
				%value = phi i32 [ %if_value, %if ], [ %else_value, %else ]
				store i32 %value, i32 addrspace(1)* %out
				ret void
				}

				; Function Attrs: convergent nounwind
				declare { i1, i64 } @llvm.amdgcn.if(i1) #1

				; Function Attrs: convergent nounwind
				declare { i1, i64 } @llvm.amdgcn.else(i64) #1

				; Function Attrs: convergent nounwind readnone
				declare i64 @llvm.amdgcn.break(i64) #2

				; Function Attrs: convergent nounwind readnone
				declare i64 @llvm.amdgcn.if.break(i1, i64) #2

				; Function Attrs: convergent nounwind readnone
				declare i64 @llvm.amdgcn.else.break(i64, i64) #2

				; Function Attrs: convergent nounwind
				declare i1 @llvm.amdgcn.loop(i64) #1

				; Function Attrs: convergent nounwind
				declare void @llvm.amdgcn.end.cf(i64) #1

				attributes #0 = { "target-cpu"="gfx803" }
				attributes #1 = { convergent nounwind }
				attributes #2 = { convergent nounwind readnone }

				!0 = !{}

				...
				---

				# CHECK-LABEL: name: multiple_mem_operands

				# CHECK-LABEL: bb.3.done:
				# CHECK: S_WAITCNT 3952
				# CHECK-NEXT: BUFFER_LOAD_DWORD_OFFEN
				# CHECK-NEXT: S_WAITCNT 3952
				# CHECK-NEXT: BUFFER_WBINVL1_VOL

				name: multiple_mem_operands
				alignment: 0
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				tracksRegLiveness: true
				registers:
				liveins:
				- { reg: '%sgpr0_sgpr1', virtual-reg: '' }
				- { reg: '%sgpr3', virtual-reg: '' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 65540
				offsetAdjustment: 0
				maxAlignment: 4
				adjustsStack: false
				hasCalls: false
				stackProtector: ''
				maxCallFrameSize: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				savePoint: ''
				restorePoint: ''
				fixedStack:
				- { id: 0, type: default, offset: 0, size: 4, alignment: 4, stack-id: 0,
				isImmutable: false, isAliased: false, callee-saved-register: '' }
				stack:
				- { id: 0, name: scratch0, type: default, offset: 4, size: 32768, alignment: 4,
				stack-id: 0, callee-saved-register: '', local-offset: 0, di-variable: '',
				di-expression: '', di-location: '' }
				- { id: 1, name: scratch1, type: default, offset: 32772, size: 32768,
				alignment: 4, stack-id: 0, callee-saved-register: '', local-offset: 32768,
				di-variable: '', di-expression: '', di-location: '' }
				constants:
				body: \|
				bb.0.entry:
				successors: %bb.1.if(0x30000000), %bb.2.else(0x50000000)
				liveins: %sgpr0_sgpr1, %sgpr3

				%sgpr2 = S_LOAD_DWORD_IMM %sgpr0_sgpr1, 44, 0 :: (non-temporal dereferenceable invariant load 4 from `i32 addrspace(2)* undef`)
				%sgpr8 = S_MOV_B32 $SCRATCH_RSRC_DWORD0, implicit-def %sgpr8_sgpr9_sgpr10_sgpr11
				%sgpr4_sgpr5 = S_LOAD_DWORDX2_IMM %sgpr0_sgpr1, 36, 0 :: (non-temporal dereferenceable invariant load 8 from `i64 addrspace(2)* undef`)
				%sgpr9 = S_MOV_B32 $SCRATCH_RSRC_DWORD1, implicit-def %sgpr8_sgpr9_sgpr10_sgpr11
				%sgpr10 = S_MOV_B32 4294967295, implicit-def %sgpr8_sgpr9_sgpr10_sgpr11
				%sgpr11 = S_MOV_B32 15204352, implicit-def %sgpr8_sgpr9_sgpr10_sgpr11
				%vgpr0 = V_MOV_B32_e32 1, implicit %exec
				BUFFER_STORE_DWORD_OFFSET killed %vgpr0, %sgpr8_sgpr9_sgpr10_sgpr11, %sgpr3, 4, 0, 0, 0, implicit %exec :: (store 4 into %ir.scratchptr01)
				S_WAITCNT 127
				S_CMP_LG_U32 killed %sgpr2, 0, implicit-def %scc
				S_WAITCNT 3855
				%vgpr0 = V_MOV_B32_e32 2, implicit %exec
				%vgpr1 = V_MOV_B32_e32 32772, implicit %exec
				BUFFER_STORE_DWORD_OFFEN killed %vgpr0, killed %vgpr1, %sgpr8_sgpr9_sgpr10_sgpr11, %sgpr3, 0, 0, 0, 0, implicit %exec :: (store 4 into %ir.scratchptr12)
				S_CBRANCH_SCC0 %bb.1.if, implicit killed %scc

				bb.2.else:
				successors: %bb.3.done(0x80000000)
				liveins: %sgpr0_sgpr1, %sgpr4_sgpr5, %sgpr3, %sgpr8_sgpr9_sgpr10_sgpr11

				%sgpr0 = S_LOAD_DWORD_IMM killed %sgpr0_sgpr1, 52, 0 :: (non-temporal dereferenceable invariant load 4 from `i32 addrspace(2)* undef`)
				S_WAITCNT 3855
				%vgpr0 = V_MOV_B32_e32 32772, implicit %exec
				S_BRANCH %bb.3.done

				bb.1.if:
				successors: %bb.3.done(0x80000000)
				liveins: %sgpr0_sgpr1, %sgpr4_sgpr5, %sgpr3, %sgpr8_sgpr9_sgpr10_sgpr11

				%sgpr0 = S_LOAD_DWORD_IMM killed %sgpr0_sgpr1, 48, 0 :: (non-temporal dereferenceable invariant load 4 from `i32 addrspace(2)* undef`)
				S_WAITCNT 3855
				%vgpr0 = V_MOV_B32_e32 4, implicit %exec

				bb.3.done:
				liveins: %sgpr3, %sgpr4_sgpr5, %sgpr8_sgpr9_sgpr10_sgpr11, %vgpr0, %sgpr0

				S_WAITCNT 127
				%sgpr0 = S_LSHL_B32 killed %sgpr0, 2, implicit-def dead %scc
				%vgpr0 = V_ADD_I32_e32 killed %sgpr0, killed %vgpr0, implicit-def dead %vcc, implicit %exec
				%vgpr0 = BUFFER_LOAD_DWORD_OFFEN killed %vgpr0, killed %sgpr8_sgpr9_sgpr10_sgpr11, %sgpr3, 0, 0, 0, 0, implicit %exec :: (load syncscope("agent") unordered 4 from %ir.else_ptr), (load syncscope("workgroup") seq_cst 4 from %ir.if_ptr)
				%vgpr1 = V_MOV_B32_e32 %sgpr4, implicit %exec, implicit-def %vgpr1_vgpr2, implicit %sgpr4_sgpr5
				%vgpr2 = V_MOV_B32_e32 killed %sgpr5, implicit %exec, implicit %sgpr4_sgpr5, implicit %exec
				S_WAITCNT 3952
				FLAT_STORE_DWORD killed %vgpr1_vgpr2, killed %vgpr0, 0, 0, 0, implicit %exec, implicit %flat_scr :: (store 4 into %ir.out)
				S_ENDPGM

				...

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Handle more than one memory operand in SIMemoryLegalizerClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 114189

llvm/trunk/lib/Target/AMDGPU/AMDGPUMachineModuleInfo.h

llvm/trunk/lib/Target/AMDGPU/SIMemoryLegalizer.cpp

llvm/trunk/test/CodeGen/MIR/AMDGPU/memory-legalizer-multiple-mem-operands-atomics.mir

AMDGPU: Handle more than one memory operand in SIMemoryLegalizer
ClosedPublic