This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
11/20
SIInsertWaitcnts.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
GlobalISel/
-
memory-legalizer-atomic-fence.ll
-
cache_invalidate.mir
-
llvm.amdgcn.buffer.wbinvl1.ll
-
llvm.amdgcn.buffer.wbinvl1.sc.ll
-
memory-legalizer-fence.ll

Differential D99128

[AMDGPU] Removed unnecessary cache invalidations.
Needs ReviewPublic

Authored by s-perron on Mar 22 2021, 5:14 PM.

Download Raw Diff

Details

Reviewers

piotr
t-tye
sameerds

Summary

The SPIR-V memory barriers are translated in a very pessimistic way by
LLPC. When translating, LLPC does not know where memory will be stored, so
it will introduce a fence instruction. The fence will eventually be
turned into an instruction to invalidate the memory cache.

If the barrier was for workgroup shared memory and all workgroup shared
variabled are allocated to LDS, then the L1 cache does not have to be
invalidated. However, the code will still invalidate it.

This commit modifies the SI-Insert-waitcnts pass to remove cache
invalidation instructions it can prove will not be needed. If no store
or load to memory that is cached reaches the cache invalidation instruction
without passing through another cache invalidation instruction, then it is
safe to remove.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	80 ms	x64 windows > LLVM.CodeGen/AMDGPU::llvm.amdgcn.buffer.wbinvl1.ll
	80 ms	x64 windows > LLVM.CodeGen/AMDGPU::llvm.amdgcn.buffer.wbinvl1.sc.ll

Event Timeline

s-perron created this revision.Mar 22 2021, 5:14 PM

Herald added subscribers: kerbowa, jfb, hiraditya and 8 others. · View Herald TranscriptMar 22 2021, 5:14 PM

s-perron requested review of this revision.Mar 22 2021, 5:14 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2021, 5:14 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B95128: Diff 332477.Mar 22 2021, 6:16 PM

s-perron added a reviewer: piotr.Mar 23 2021, 6:00 AM

s-perron edited the summary of this revision. (Show Details)

arsenm added reviewers: t-tye, sameerds.Mar 23 2021, 6:12 AM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1852–1853	Range loop?

"When translating, LLPC does know"

Did you mean "does not know"?

foad added inline comments.Mar 23 2021, 6:59 AM

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
329	Typo "pontentially".
333	Typo "pontentially".

sameerds added inline comments.Mar 23 2021, 8:18 AM

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1629	This condition and the previous one need to be captured as a function with a meaningful name. Or if TableGen is involved in this enum, then perhaps a property on the instruction.

Fix tests and clean up code

Fix store instructions in tests.
Small refactoring to simplify removing instructions.

s-perron edited the summary of this revision. (Show Details)Mar 23 2021, 11:45 AM

Fix typos.

In D99128#2644718, @foad wrote:

"When translating, LLPC does know"

Did you mean "does not know"?

Yes that is what I meant. I've fixed it up.

t-tye requested changes to this revision.Mar 23 2021, 1:57 PM

t-tye added inline comments.

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
92	gfx90a also has L2 cache control. See https://llvm.org/docs/AMDGPUUsage.html#memory-model-gfx90a .
199–213	Should this be a query on an instruction as the glc, et al bit can control the caches affected? There is also buffer_invl2. Should this be driven by a table gen property? Should cache writeback also be tracked? GFX90A has L2 writeback controls.
503	Invalidating is about removing stale data. Writeback is about removing dirty data. So suggest renaming these operations.
506	Also have similar for removing unnecessary writeback instructions.
729	Is GDS needed here? Seems this is only dealing with VMEM. GDS is like LDS and not part of VMEM. Should SMEM writes be considered? Some targets support that.
730–731	Should the cache bypass of the instructions be considered? When the cache is bypassed the data is not left in the cache and so does not cause the need for an invalidate. But the cache bypass can specify which caches it is bypassing. For example, performing a series of relaxed atomics at system scope would require no invalidates. Should the impact of writes be considered in deciding on the need for a cache writeback? Again there can be cache bypass, and some caches are readonly or write-through.
736	LDS has no cache control so should if be considered? Why only level 0 as loading into the L0 can also cause data to be loaded into the other caches.
798	Seems this is tracking possible stale data, not dirty data.
942	Is this correct? These instructions do not ensure the waicnt is 0. This is also missing GFX90A cache instructions.
1736	Should hasPending be generalized to include all pending "things"? Seems merge has been generalized to merge all the "things".

This revision now requires changes to proceed.Mar 23 2021, 1:57 PM

Harbormaster completed remote builds in B95315: Diff 332746.Mar 23 2021, 3:02 PM

Harbormaster completed remote builds in B95317: Diff 332748.Mar 23 2021, 3:31 PM

Fix lint issues
Use "stale" in place of "dirty" to describe the cache.

I've made some changes. I will look into some of the improvements that have been suggested.

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
729	Is GDS needed here? Does a load from GDS go through the L1 cache? I was under the impression that it does, but I'm not all that familiar with the architecture. If it does not then it is not needed. Should SMEM writes be considered? The SMEM_ACCESS event covers both reads and writes to SMEM, so this should already be covered. See the comment by the definition of the enum.
730–731	Should the cache bypass of the instructions be considered? I did not know about the cache bypass. I'll see what I can do about that. some caches are readonly or write-through? That could be useful. Is this something that can be easily queried? If determining the behaviour of the cache would require a lot of new code, I would prefer to make that change in a subsequent patch. The analysis would still be correct, even if it is a bit conservative. The same for the cache bypass.
736	I guess I misunderstood how the arch is designed. Would a read from LDS load data into the L1 cache? I was told "Another subtlety is that on gfx10 in WGP mode, the L0 cache needs to be invalidated even for LDS." I'm not being precise enough here, I should be checking which mode we are in and possibly which GFX version. I also want to ask about the export instructions. I made a conservative guess that they would require the L0 cache invalidation, but I am not sure how the output buffer is implemented.
942	I do not know. I just tried keep this functionally the same. I could revert my change to this, and then it could be fixed or removed with a different patch. I don't want to fix completely unrelated fixes with this patch.

Harbormaster completed remote builds in B95493: Diff 332994.Mar 24 2021, 1:11 PM

t-tye added inline comments.Mar 24 2021, 10:18 PM

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
729	Is GDS needed here? Does a load from GDS go through the L1 cache? I was under the impression that it does, but I'm not all that familiar with the architecture. If it does not then it is not needed. No it does not. Should SMEM writes be considered? The SMEM_ACCESS event covers both reads and writes to SMEM, so this should already be covered. See the comment by the definition of the enum. Still seems SMEM reads and writes should be separate.
736	I guess I misunderstood how the arch is designed. Would a read from LDS load data into the L1 cache? I was told "Another subtlety is that on gfx10 in WGP mode, the L0 cache needs to be invalidated even for LDS." I'm not being precise enough here, I should be checking which mode we are in and possibly which GFX version. I also want to ask about the export instructions. I made a conservative guess that they would require the L0 cache invalidation, but I am not sure how the output buffer is implemented. It would probably be easier to just talk through how the architecture works so you can then make updates.

kuhar added a subscriber: kuhar.Apr 21 2021, 8:16 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIInsertWaitcnts.cpp

178 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

memory-legalizer-atomic-fence.ll

23 lines

cache_invalidate.mir

229 lines

llvm.amdgcn.buffer.wbinvl1.ll

45 lines

llvm.amdgcn.buffer.wbinvl1.sc.ll

23 lines

memory-legalizer-fence.ll

536 lines

Diff 332994

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp

//===- SIInsertWaitcnts.cpp - Insert Wait Instructions --------------------===//		//===- SIInsertWaitcnts.cpp - Insert Wait Instructions --------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
/// \file		/// \file
/// Insert wait instructions for memory reads and writes.		/// Insert wait instructions for memory reads and writes.
///		///
/// Memory reads and writes are issued asynchronously, so we need to insert		/// Memory reads and writes are issued asynchronously, so we need to insert
/// S_WAITCNT instructions when we want to access any of their results or		/// S_WAITCNT instructions when we want to access any of their results or
/// overwrite any register that's used asynchronously.		/// overwrite any register that's used asynchronously.
///		///
		/// This pass will remove cache invalidation instructions if it can prove that
		/// they are unnecessary.
		///
/// TODO: This pass currently keeps one timeline per hardware counter. A more		/// TODO: This pass currently keeps one timeline per hardware counter. A more
/// finely-grained approach that keeps one timeline per event type could		/// finely-grained approach that keeps one timeline per event type could
/// sometimes get away with generating weaker s_waitcnt instructions. For		/// sometimes get away with generating weaker s_waitcnt instructions. For
/// example, when both SMEM and LDS are in flight and we need to wait for		/// example, when both SMEM and LDS are in flight and we need to wait for
/// the i-th-last LDS instruction, then an lgkmcnt(i) is actually sufficient,		/// the i-th-last LDS instruction, then an lgkmcnt(i) is actually sufficient,
/// but the pass will currently generate a conservative lgkmcnt(0) because		/// but the pass will currently generate a conservative lgkmcnt(0) because
/// multiple event types are in flight.		/// multiple event types are in flight.
//		//
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines

enum InstCounterType { VM_CNT = 0, LGKM_CNT, EXP_CNT, VS_CNT, NUM_INST_CNTS };		enum InstCounterType { VM_CNT = 0, LGKM_CNT, EXP_CNT, VS_CNT, NUM_INST_CNTS };

iterator_range<enum_iterator<InstCounterType>> inst_counter_types() {		iterator_range<enum_iterator<InstCounterType>> inst_counter_types() {
return make_range(enum_iterator<InstCounterType>(VM_CNT),		return make_range(enum_iterator<InstCounterType>(VM_CNT),
enum_iterator<InstCounterType>(NUM_INST_CNTS));		enum_iterator<InstCounterType>(NUM_INST_CNTS));
}		}

		enum MemoryCacheLevel {
		MEM_CACHE_LVL_BEGIN = 0,
		MEM_CACHE_LVL_0 = MEM_CACHE_LVL_BEGIN,
		MEM_CACHE_LVL_1,
		t-tyeUnsubmitted Done Reply Inline Actions gfx90a also has L2 cache control. See https://llvm.org/docs/AMDGPUUsage.html#memory-model-gfx90a . t-tye: gfx90a also has L2 cache control. See https://llvm.org/docs/AMDGPUUsage.html#memory-model…
		MEM_CACHE_LVL_2,
		MEM_CACHE_LVL_END
		};

		iterator_range<enum_iterator<MemoryCacheLevel>> memoryCacheLevels() {
		return make_range(enum_iterator<MemoryCacheLevel>(MEM_CACHE_LVL_BEGIN),
		enum_iterator<MemoryCacheLevel>(MEM_CACHE_LVL_END));
		}

using RegInterval = std::pair<int, int>;		using RegInterval = std::pair<int, int>;

struct {		struct {
unsigned VmcntMax;		unsigned VmcntMax;
unsigned ExpcntMax;		unsigned ExpcntMax;
unsigned LgkmcntMax;		unsigned LgkmcntMax;
unsigned VscntMax;		unsigned VscntMax;
} HardwareLimits;		} HardwareLimits;
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	void addWait(AMDGPU::Waitcnt &Wait, InstCounterType T, unsigned Count) {
case VS_CNT:		case VS_CNT:
Wait.VsCnt = std::min(Wait.VsCnt, Count);		Wait.VsCnt = std::min(Wait.VsCnt, Count);
break;		break;
default:		default:
llvm_unreachable("bad InstCounterType");		llvm_unreachable("bad InstCounterType");
}		}
}		}

		bool instructionInvalidatesL1Cache(unsigned OpCode) {
		switch (OpCode) {
		case AMDGPU::BUFFER_WBINVL1:
		case AMDGPU::BUFFER_WBINVL1_SC:
		case AMDGPU::BUFFER_WBINVL1_VOL:
		case AMDGPU::BUFFER_GL1_INV:
		return true;
		default:
		return false;
		}
		}

		bool instructionInvalidatesL0Cache(unsigned OpCode) {
		return OpCode == AMDGPU::BUFFER_GL0_INV;
		}
		t-tyeUnsubmitted Not Done Reply Inline Actions Should this be a query on an instruction as the glc, et al bit can control the caches affected? There is also buffer_invl2. Should this be driven by a table gen property? Should cache writeback also be tracked? GFX90A has L2 writeback controls. t-tye: Should this be a query on an instruction as the glc, et al bit can control the caches affected?

// This objects maintains the current score brackets of each wait counter, and		// This objects maintains the current score brackets of each wait counter, and
// a per-register scoreboard for each wait counter.		// a per-register scoreboard for each wait counter.
//		//
// We also maintain the latest score for every event type that can change the		// We also maintain the latest score for every event type that can change the
// waitcnt in order to know if there are multiple types of events within		// waitcnt in order to know if there are multiple types of events within
// the brackets. When multiple types of event happen in the bracket,		// the brackets. When multiple types of event happen in the bracket,
// wait count may get decreased out of order, therefore we need to put in		// wait count may get decreased out of order, therefore we need to put in
// "s_waitcnt 0" before use.		// "s_waitcnt 0" before use.
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	public:
bool simplifyWaitcnt(InstCounterType T, unsigned &Count) const;		bool simplifyWaitcnt(InstCounterType T, unsigned &Count) const;
void determineWait(InstCounterType T, unsigned ScoreToWait,		void determineWait(InstCounterType T, unsigned ScoreToWait,
AMDGPU::Waitcnt &Wait) const;		AMDGPU::Waitcnt &Wait) const;
void applyWaitcnt(const AMDGPU::Waitcnt &Wait);		void applyWaitcnt(const AMDGPU::Waitcnt &Wait);
void applyWaitcnt(InstCounterType T, unsigned Count);		void applyWaitcnt(InstCounterType T, unsigned Count);
void updateByEvent(const SIInstrInfo TII, const SIRegisterInfo TRI,		void updateByEvent(const SIInstrInfo TII, const SIRegisterInfo TRI,
const MachineRegisterInfo *MRI, WaitEventType E,		const MachineRegisterInfo *MRI, WaitEventType E,
MachineInstr &MI);		MachineInstr &MI);
		void updatePotentiallyStaleCacheByEvent(WaitEventType E);

		bool hasPending() const {
		return PendingEvents != 0 \|\| hasPotentiallyStaleCache();
		}

bool hasPending() const { return PendingEvents != 0; }
bool hasPendingEvent(WaitEventType E) const {		bool hasPendingEvent(WaitEventType E) const {
return PendingEvents & (1 << E);		return PendingEvents & (1 << E);
}		}

bool hasMixedPendingEvents(InstCounterType T) const {		bool hasMixedPendingEvents(InstCounterType T) const {
unsigned Events = PendingEvents & WaitEventMaskForInst[T];		unsigned Events = PendingEvents & WaitEventMaskForInst[T];
// Return true if more than one bit is set in Events.		// Return true if more than one bit is set in Events.
return Events & (Events - 1);		return Events & (Events - 1);
}		}

bool hasPendingFlat() const {		bool hasPendingFlat() const {
return ((LastFlat[LGKM_CNT] > ScoreLBs[LGKM_CNT] &&		return ((LastFlat[LGKM_CNT] > ScoreLBs[LGKM_CNT] &&
LastFlat[LGKM_CNT] <= ScoreUBs[LGKM_CNT]) \|\|		LastFlat[LGKM_CNT] <= ScoreUBs[LGKM_CNT]) \|\|
(LastFlat[VM_CNT] > ScoreLBs[VM_CNT] &&		(LastFlat[VM_CNT] > ScoreLBs[VM_CNT] &&
LastFlat[VM_CNT] <= ScoreUBs[VM_CNT]));		LastFlat[VM_CNT] <= ScoreUBs[VM_CNT]));
}		}

void setPendingFlat() {		void setPendingFlat() {
LastFlat[VM_CNT] = ScoreUBs[VM_CNT];		LastFlat[VM_CNT] = ScoreUBs[VM_CNT];
LastFlat[LGKM_CNT] = ScoreUBs[LGKM_CNT];		LastFlat[LGKM_CNT] = ScoreUBs[LGKM_CNT];
}		}

		bool hasPotentiallyStaleCache() const {
		for (MemoryCacheLevel Level : memoryCacheLevels()) {
		if (hasPotentiallyStaleCacheAtLevel(Level))
		return true;
		}
		return false;
		}

		bool hasPotentiallyStaleCacheAtLevel(MemoryCacheLevel Level) const {
		return HasPotentiallyStaleCache[Level];
		}

		void setPotentiallyStaleCacheAtLevel(MemoryCacheLevel Level) {
		foadUnsubmitted Done Reply Inline Actions Typo "pontentially". foad: Typo "pontentially".
		HasPotentiallyStaleCache[Level] = true;
		}

		void clearPotentiallyStaleCacheAtLevel(MemoryCacheLevel Level) {
		foadUnsubmitted Done Reply Inline Actions Typo "pontentially". foad: Typo "pontentially".
		HasPotentiallyStaleCache[Level] = false;
		}

// Return true if there might be pending writes to the specified vgpr by VMEM		// Return true if there might be pending writes to the specified vgpr by VMEM
// instructions with types different from V.		// instructions with types different from V.
bool hasOtherPendingVmemTypes(int GprNo, VmemType V) const {		bool hasOtherPendingVmemTypes(int GprNo, VmemType V) const {
assert(GprNo < NUM_ALL_VGPRS);		assert(GprNo < NUM_ALL_VGPRS);
return VgprVmemTypes[GprNo] & ~(1 << V);		return VgprVmemTypes[GprNo] & ~(1 << V);
}		}

void clearVgprVmemTypes(int GprNo) {		void clearVgprVmemTypes(int GprNo) {
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	private:
int VgprUB = -1;		int VgprUB = -1;
int SgprUB = -1;		int SgprUB = -1;
unsigned VgprScores[NUM_INST_CNTS][NUM_ALL_VGPRS] = {{0}};		unsigned VgprScores[NUM_INST_CNTS][NUM_ALL_VGPRS] = {{0}};
// Wait cnt scores for every sgpr, only lgkmcnt is relevant.		// Wait cnt scores for every sgpr, only lgkmcnt is relevant.
unsigned SgprScores[SQ_MAX_PGM_SGPRS] = {0};		unsigned SgprScores[SQ_MAX_PGM_SGPRS] = {0};
// Bitmask of the VmemTypes of VMEM instructions that might have a pending		// Bitmask of the VmemTypes of VMEM instructions that might have a pending
// write to each vgpr.		// write to each vgpr.
unsigned char VgprVmemTypes[NUM_ALL_VGPRS] = {0};		unsigned char VgprVmemTypes[NUM_ALL_VGPRS] = {0};
		// Keeps track of whether or not the cache at each level is potentially dirty.
		bool HasPotentiallyStaleCache[MEM_CACHE_LVL_END] = {};
};		};

class SIInsertWaitcnts : public MachineFunctionPass {		class SIInsertWaitcnts : public MachineFunctionPass {
private:		private:
const GCNSubtarget *ST = nullptr;		const GCNSubtarget *ST = nullptr;
const SIInstrInfo *TII = nullptr;		const SIInstrInfo *TII = nullptr;
const SIRegisterInfo *TRI = nullptr;		const SIRegisterInfo *TRI = nullptr;
const MachineRegisterInfo *MRI = nullptr;		const MachineRegisterInfo *MRI = nullptr;
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
#endif // NDEBUG		#endif // NDEBUG
}		}

bool mayAccessVMEMThroughFlat(const MachineInstr &MI) const;		bool mayAccessVMEMThroughFlat(const MachineInstr &MI) const;
bool mayAccessLDSThroughFlat(const MachineInstr &MI) const;		bool mayAccessLDSThroughFlat(const MachineInstr &MI) const;
bool generateWaitcntInstBefore(MachineInstr &MI,		bool generateWaitcntInstBefore(MachineInstr &MI,
WaitcntBrackets &ScoreBrackets,		WaitcntBrackets &ScoreBrackets,
MachineInstr *OldWaitcntInstr);		MachineInstr *OldWaitcntInstr);
void updateEventWaitcntAfter(MachineInstr &Inst,		void updateWaitcntBracketAfter(MachineInstr &Inst,
WaitcntBrackets *ScoreBrackets);		WaitcntBrackets *ScoreBrackets);
bool insertWaitcntInBlock(MachineFunction &MF, MachineBasicBlock &Block,		bool insertWaitcntInBlock(MachineFunction &MF, MachineBasicBlock &Block,
WaitcntBrackets &ScoreBrackets);		WaitcntBrackets &ScoreBrackets);
		void clearStaleCacheLevelsIfIsCacheInvalidationInstruction(
		t-tyeUnsubmitted Not Done Reply Inline Actions Invalidating is about removing stale data. Writeback is about removing dirty data. So suggest renaming these operations. t-tye: Invalidating is about removing stale data. Writeback is about removing dirty data. So suggest…
		MachineInstr &MI, WaitcntBrackets &Brackets);

		bool removeUnnecessaryMemoryCacheInvalidationInstructions();
		t-tyeUnsubmitted Not Done Reply Inline Actions Also have similar for removing unnecessary writeback instructions. t-tye: Also have similar for removing unnecessary writeback instructions.
		bool removeIfUnnecessaryMemoryCacheInvalidationInBlock(
		MachineBasicBlock &BB, const WaitcntBrackets &Bracket);
		bool isUnnecessaryMemoryCacheInvalidation(const MachineInstr &Inst,
		WaitcntBrackets &Brackets);
};		};

} // end anonymous namespace		} // end anonymous namespace

RegInterval WaitcntBrackets::getRegInterval(const MachineInstr *MI,		RegInterval WaitcntBrackets::getRegInterval(const MachineInstr *MI,
const SIInstrInfo *TII,		const SIInstrInfo *TII,
const MachineRegisterInfo *MRI,		const MachineRegisterInfo *MRI,
const SIRegisterInfo *TRI,		const SIRegisterInfo *TRI,
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	void WaitcntBrackets::updateByEvent(const SIInstrInfo *TII,
unsigned CurrScore = getScoreUB(T) + 1;		unsigned CurrScore = getScoreUB(T) + 1;
if (CurrScore == 0)		if (CurrScore == 0)
report_fatal_error("InsertWaitcnt score wraparound");		report_fatal_error("InsertWaitcnt score wraparound");
// PendingEvents and ScoreUB need to be update regardless if this event		// PendingEvents and ScoreUB need to be update regardless if this event
// changes the score of a register or not.		// changes the score of a register or not.
// Examples including vm_cnt when buffer-store or lgkm_cnt when send-message.		// Examples including vm_cnt when buffer-store or lgkm_cnt when send-message.
PendingEvents \|= 1 << E;		PendingEvents \|= 1 << E;
setScoreUB(T, CurrScore);		setScoreUB(T, CurrScore);
		updatePotentiallyStaleCacheByEvent(E);

if (T == EXP_CNT) {		if (T == EXP_CNT) {
// Put score on the source vgprs. If this is a store, just use those		// Put score on the source vgprs. If this is a store, just use those
// specific register(s).		// specific register(s).
if (TII->isDS(Inst) && (Inst.mayStore() \|\| Inst.mayLoad())) {		if (TII->isDS(Inst) && (Inst.mayStore() \|\| Inst.mayLoad())) {
int AddrOpIdx =		int AddrOpIdx =
AMDGPU::getNamedOperandIdx(Inst.getOpcode(), AMDGPU::OpName::addr);		AMDGPU::getNamedOperandIdx(Inst.getOpcode(), AMDGPU::OpName::addr);
// All GDS operations must protect their address register (same as		// All GDS operations must protect their address register (same as
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	for (unsigned I = 0, E = Inst.getNumOperands(); I != E; ++I) {
}		}
}		}
if (TII->isDS(Inst) && Inst.mayStore()) {		if (TII->isDS(Inst) && Inst.mayStore()) {
setRegScore(SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS, T, CurrScore);		setRegScore(SQ_MAX_PGM_VGPRS + EXTRA_VGPR_LDS, T, CurrScore);
}		}
}		}
}		}

		void WaitcntBrackets::updatePotentiallyStaleCacheByEvent(WaitEventType E) {
		switch (E) {
		case VMEM_ACCESS:
		case VMEM_READ_ACCESS:
		case VMEM_WRITE_ACCESS:
		case SMEM_ACCESS:
		case GDS_ACCESS:
		t-tyeUnsubmitted Not Done Reply Inline Actions Is GDS needed here? Seems this is only dealing with VMEM. GDS is like LDS and not part of VMEM. Should SMEM writes be considered? Some targets support that. t-tye: Is GDS needed here? Seems this is only dealing with VMEM. GDS is like LDS and not part of VMEM.
		s-perronAuthorUnsubmitted Done Reply Inline Actions Is GDS needed here? Does a load from GDS go through the L1 cache? I was under the impression that it does, but I'm not all that familiar with the architecture. If it does not then it is not needed. Should SMEM writes be considered? The SMEM_ACCESS event covers both reads and writes to SMEM, so this should already be covered. See the comment by the definition of the enum. s-perron: > Is GDS needed here? Does a load from GDS go through the L1 cache? I was under the…
		t-tyeUnsubmitted Not Done Reply Inline Actions Is GDS needed here? Does a load from GDS go through the L1 cache? I was under the impression that it does, but I'm not all that familiar with the architecture. If it does not then it is not needed. No it does not. Should SMEM writes be considered? The SMEM_ACCESS event covers both reads and writes to SMEM, so this should already be covered. See the comment by the definition of the enum. Still seems SMEM reads and writes should be separate. t-tye: > > Is GDS needed here? > > Does a load from GDS go through the L1 cache? I was under the…
		setPotentiallyStaleCacheAtLevel(MEM_CACHE_LVL_0);
		setPotentiallyStaleCacheAtLevel(MEM_CACHE_LVL_1);
		t-tyeUnsubmitted Not Done Reply Inline Actions Should the cache bypass of the instructions be considered? When the cache is bypassed the data is not left in the cache and so does not cause the need for an invalidate. But the cache bypass can specify which caches it is bypassing. For example, performing a series of relaxed atomics at system scope would require no invalidates. Should the impact of writes be considered in deciding on the need for a cache writeback? Again there can be cache bypass, and some caches are readonly or write-through. t-tye: Should the cache bypass of the instructions be considered? When the cache is bypassed the data…
		s-perronAuthorUnsubmitted Done Reply Inline Actions Should the cache bypass of the instructions be considered? I did not know about the cache bypass. I'll see what I can do about that. some caches are readonly or write-through? That could be useful. Is this something that can be easily queried? If determining the behaviour of the cache would require a lot of new code, I would prefer to make that change in a subsequent patch. The analysis would still be correct, even if it is a bit conservative. The same for the cache bypass. s-perron: > Should the cache bypass of the instructions be considered? I did not know about the cache…
		break;
		case LDS_ACCESS:
		case EXP_POS_ACCESS:
		case EXP_PARAM_ACCESS:
		setPotentiallyStaleCacheAtLevel(MEM_CACHE_LVL_0);
		t-tyeUnsubmitted Not Done Reply Inline Actions LDS has no cache control so should if be considered? Why only level 0 as loading into the L0 can also cause data to be loaded into the other caches. t-tye: LDS has no cache control so should if be considered? Why only level 0 as loading into the L0…
		s-perronAuthorUnsubmitted Done Reply Inline Actions I guess I misunderstood how the arch is designed. Would a read from LDS load data into the L1 cache? I was told "Another subtlety is that on gfx10 in WGP mode, the L0 cache needs to be invalidated even for LDS." I'm not being precise enough here, I should be checking which mode we are in and possibly which GFX version. I also want to ask about the export instructions. I made a conservative guess that they would require the L0 cache invalidation, but I am not sure how the output buffer is implemented. s-perron: I guess I misunderstood how the arch is designed. Would a read from LDS load data into the L1…
		t-tyeUnsubmitted Not Done Reply Inline Actions I guess I misunderstood how the arch is designed. Would a read from LDS load data into the L1 cache? I was told "Another subtlety is that on gfx10 in WGP mode, the L0 cache needs to be invalidated even for LDS." I'm not being precise enough here, I should be checking which mode we are in and possibly which GFX version. I also want to ask about the export instructions. I made a conservative guess that they would require the L0 cache invalidation, but I am not sure how the output buffer is implemented. It would probably be easier to just talk through how the architecture works so you can then make updates. t-tye: > I guess I misunderstood how the arch is designed. Would a read from LDS load data into the…
		break;
		case SQ_MESSAGE:
		case EXP_GPR_LOCK:
		case GDS_GPR_LOCK:
		case VMW_GPR_LOCK:
		case NUM_WAIT_EVENTS:
		break;
		}
		}

void WaitcntBrackets::print(raw_ostream &OS) {		void WaitcntBrackets::print(raw_ostream &OS) {
OS << '\n';		OS << '\n';
for (auto T : inst_counter_types()) {		for (auto T : inst_counter_types()) {
unsigned LB = getScoreLB(T);		unsigned LB = getScoreLB(T);
unsigned UB = getScoreUB(T);		unsigned UB = getScoreUB(T);

switch (T) {		switch (T) {
case VM_CNT:		case VM_CNT:
Show All 34 Lines	if (LB < UB) {
continue;		continue;
unsigned RelScore = RegScore - LB - 1;		unsigned RelScore = RegScore - LB - 1;
OS << RelScore << ":s" << J << " ";		OS << RelScore << ":s" << J << " ";
}		}
}		}
}		}
OS << '\n';		OS << '\n';
}		}
		for (MemoryCacheLevel Level : memoryCacheLevels()) {
		OS << "Has potentially stale L" << Level
		t-tyeUnsubmitted Done Reply Inline Actions Seems this is tracking possible stale data, not dirty data. t-tye: Seems this is tracking possible stale data, not dirty data.
		<< " cache: " << hasPotentiallyStaleCacheAtLevel(Level) << '\n';
		}
OS << '\n';		OS << '\n';
}		}

/// Simplify the waitcnt, in the sense of removing redundant counts, and return		/// Simplify the waitcnt, in the sense of removing redundant counts, and return
/// whether a waitcnt instruction is needed at all.		/// whether a waitcnt instruction is needed at all.
bool WaitcntBrackets::simplifyWaitcnt(AMDGPU::Waitcnt &Wait) const {		bool WaitcntBrackets::simplifyWaitcnt(AMDGPU::Waitcnt &Wait) const {
return simplifyWaitcnt(VM_CNT, Wait.VmCnt) \|		return simplifyWaitcnt(VM_CNT, Wait.VmCnt) \|
simplifyWaitcnt(EXP_CNT, Wait.ExpCnt) \|		simplifyWaitcnt(EXP_CNT, Wait.ExpCnt) \|
▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	bool SIInsertWaitcnts::generateWaitcntInstBefore(

if (MI.isMetaInstruction())		if (MI.isMetaInstruction())
return false;		return false;

AMDGPU::Waitcnt Wait;		AMDGPU::Waitcnt Wait;

// See if this instruction has a forced S_WAITCNT VM.		// See if this instruction has a forced S_WAITCNT VM.
// TODO: Handle other cases of NeedsWaitcntVmBefore()		// TODO: Handle other cases of NeedsWaitcntVmBefore()
if (MI.getOpcode() == AMDGPU::BUFFER_WBINVL1 \|\|		if (instructionInvalidatesL0Cache(MI.getOpcode()) \|\|
MI.getOpcode() == AMDGPU::BUFFER_WBINVL1_SC \|\|		instructionInvalidatesL1Cache(MI.getOpcode())) {
MI.getOpcode() == AMDGPU::BUFFER_WBINVL1_VOL \|\|
MI.getOpcode() == AMDGPU::BUFFER_GL0_INV \|\|
MI.getOpcode() == AMDGPU::BUFFER_GL1_INV) {
Wait.VmCnt = 0;		Wait.VmCnt = 0;
		t-tyeUnsubmitted Not Done Reply Inline Actions Is this correct? These instructions do not ensure the waicnt is 0. This is also missing GFX90A cache instructions. t-tye: Is this correct? These instructions do not ensure the waicnt is 0. This is also missing GFX90A…
		s-perronAuthorUnsubmitted Done Reply Inline Actions I do not know. I just tried keep this functionally the same. I could revert my change to this, and then it could be fixed or removed with a different patch. I don't want to fix completely unrelated fixes with this patch. s-perron: I do not know. I just tried keep this functionally the same. I could revert my change to this…
}		}

// All waits must be resolved at call return.		// All waits must be resolved at call return.
// NOTE: this could be improved with knowledge of all call sites or		// NOTE: this could be improved with knowledge of all call sites or
// with knowledge of the called routines.		// with knowledge of the called routines.
if (MI.getOpcode() == AMDGPU::SI_RETURN_TO_EPILOG \|\|		if (MI.getOpcode() == AMDGPU::SI_RETURN_TO_EPILOG \|\|
MI.getOpcode() == AMDGPU::S_SETPC_B64_return \|\|		MI.getOpcode() == AMDGPU::S_SETPC_B64_return \|\|
(MI.isReturn() && MI.isCall() && !callWaitsOnFunctionEntry(MI))) {		(MI.isReturn() && MI.isCall() && !callWaitsOnFunctionEntry(MI))) {
▲ Show 20 Lines • Show All 367 Lines • ▼ Show 20 Lines	for (const MachineMemOperand *Memop : MI.memoperands()) {
unsigned AS = Memop->getAddrSpace();		unsigned AS = Memop->getAddrSpace();
if (AS == AMDGPUAS::LOCAL_ADDRESS \|\| AS == AMDGPUAS::FLAT_ADDRESS)		if (AS == AMDGPUAS::LOCAL_ADDRESS \|\| AS == AMDGPUAS::FLAT_ADDRESS)
return true;		return true;
}		}

return false;		return false;
}		}

void SIInsertWaitcnts::updateEventWaitcntAfter(MachineInstr &Inst,		void SIInsertWaitcnts::updateWaitcntBracketAfter(
WaitcntBrackets *ScoreBrackets) {		MachineInstr &Inst, WaitcntBrackets *ScoreBrackets) {
		clearStaleCacheLevelsIfIsCacheInvalidationInstruction(Inst, *ScoreBrackets);

// Now look at the instruction opcode. If it is a memory access		// Now look at the instruction opcode. If it is a memory access
// instruction, update the upper-bound of the appropriate counter's		// instruction, update the upper-bound of the appropriate counter's
// bracket and the destination operand scores.		// bracket and the destination operand scores.
// TODO: Use the (TSFlags & SIInstrFlags::LGKM_CNT) property everywhere.		// TODO: Use the (TSFlags & SIInstrFlags::LGKM_CNT) property everywhere.
if (TII->isDS(Inst) && TII->usesLGKM_CNT(Inst)) {		if (TII->isDS(Inst) && TII->usesLGKM_CNT(Inst)) {
if (TII->isAlwaysGDS(Inst.getOpcode()) \|\|		if (TII->isAlwaysGDS(Inst.getOpcode()) \|\|
TII->hasModifiersSet(Inst, AMDGPU::OpName::gds)) {		TII->hasModifiersSet(Inst, AMDGPU::OpName::gds)) {
ScoreBrackets->updateByEvent(TII, TRI, MRI, GDS_ACCESS, Inst);		ScoreBrackets->updateByEvent(TII, TRI, MRI, GDS_ACCESS, Inst);
Show All 26 Lines	if (TII->isDS(Inst) && TII->usesLGKM_CNT(Inst)) {

// This is a flat memory operation that access both VMEM and LDS, so note it		// This is a flat memory operation that access both VMEM and LDS, so note it
// - it will require that both the VM and LGKM be flushed to zero if it is		// - it will require that both the VM and LGKM be flushed to zero if it is
// pending when a VM or LGKM dependency occurs.		// pending when a VM or LGKM dependency occurs.
if (FlatASCount > 1)		if (FlatASCount > 1)
ScoreBrackets->setPendingFlat();		ScoreBrackets->setPendingFlat();
} else if (SIInstrInfo::isVMEM(Inst) &&		} else if (SIInstrInfo::isVMEM(Inst) &&
// TODO: get a better carve out.		// TODO: get a better carve out.
Inst.getOpcode() != AMDGPU::BUFFER_WBINVL1 &&		!instructionInvalidatesL1Cache(Inst.getOpcode()) &&
Inst.getOpcode() != AMDGPU::BUFFER_WBINVL1_SC &&		!instructionInvalidatesL0Cache(Inst.getOpcode())) {
Inst.getOpcode() != AMDGPU::BUFFER_WBINVL1_VOL &&
Inst.getOpcode() != AMDGPU::BUFFER_GL0_INV &&
Inst.getOpcode() != AMDGPU::BUFFER_GL1_INV) {
if (!ST->hasVscnt())		if (!ST->hasVscnt())
ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_ACCESS, Inst);		ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_ACCESS, Inst);
else if ((Inst.mayLoad() && !SIInstrInfo::isAtomicNoRet(Inst)) \|\|		else if ((Inst.mayLoad() && !SIInstrInfo::isAtomicNoRet(Inst)) \|\|
/* IMAGE_GET_RESINFO / IMAGE_GET_LOD */		/* IMAGE_GET_RESINFO / IMAGE_GET_LOD */
(TII->isMIMG(Inst) && !Inst.mayLoad() && !Inst.mayStore()))		(TII->isMIMG(Inst) && !Inst.mayLoad() && !Inst.mayStore()))
ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_READ_ACCESS, Inst);		ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_READ_ACCESS, Inst);
else if (Inst.mayStore())		else if (Inst.mayStore())
ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_WRITE_ACCESS, Inst);		ScoreBrackets->updateByEvent(TII, TRI, MRI, VMEM_WRITE_ACCESS, Inst);
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
///		///
/// Returns whether the merge resulted in a change that requires tighter waits		/// Returns whether the merge resulted in a change that requires tighter waits
/// (i.e. the merged brackets strictly dominate the original brackets).		/// (i.e. the merged brackets strictly dominate the original brackets).
bool WaitcntBrackets::merge(const WaitcntBrackets &Other) {		bool WaitcntBrackets::merge(const WaitcntBrackets &Other) {
bool StrictDom = false;		bool StrictDom = false;

VgprUB = std::max(VgprUB, Other.VgprUB);		VgprUB = std::max(VgprUB, Other.VgprUB);
SgprUB = std::max(SgprUB, Other.SgprUB);		SgprUB = std::max(SgprUB, Other.SgprUB);
		for (MemoryCacheLevel Level : memoryCacheLevels()) {
		HasPotentiallyStaleCache[Level] \|= Other.HasPotentiallyStaleCache[Level];
		}

for (auto T : inst_counter_types()) {		for (auto T : inst_counter_types()) {
// Merge event flags for this counter		// Merge event flags for this counter
const bool OldOutOfOrder = counterOutOfOrder(T);		const bool OldOutOfOrder = counterOutOfOrder(T);
const unsigned OldEvents = PendingEvents & WaitEventMaskForInst[T];		const unsigned OldEvents = PendingEvents & WaitEventMaskForInst[T];
const unsigned OtherEvents = Other.PendingEvents & WaitEventMaskForInst[T];		const unsigned OtherEvents = Other.PendingEvents & WaitEventMaskForInst[T];
if (OtherEvents & ~OldEvents)		if (OtherEvents & ~OldEvents)
StrictDom = true;		StrictDom = true;
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	if (TII->isSMRD(Inst)) {
SLoadAddresses.insert(std::make_pair(Ptr, Inst.getParent()));		SLoadAddresses.insert(std::make_pair(Ptr, Inst.getParent()));
}		}
if (ST->hasReadVCCZBug()) {		if (ST->hasReadVCCZBug()) {
// This smem read could complete and clobber vccz at any time.		// This smem read could complete and clobber vccz at any time.
VCCZCorrect = false;		VCCZCorrect = false;
}		}
}		}

updateEventWaitcntAfter(Inst, &ScoreBrackets);		updateWaitcntBracketAfter(Inst, &ScoreBrackets);

#if 0 // TODO: implement resource type check controlled by options with ub = LB.		#if 0 // TODO: implement resource type check controlled by options with ub = LB.
// If this instruction generates a S_SETVSKIP because it is an		// If this instruction generates a S_SETVSKIP because it is an
// indexed resource, and we are on Tahiti, then it will also force		// indexed resource, and we are on Tahiti, then it will also force
// an S_WAITCNT vmcnt(0)		// an S_WAITCNT vmcnt(0)
if (RequireCheckResourceType(Inst, context)) {		if (RequireCheckResourceType(Inst, context)) {
// Force the score to as if an S_WAITCNT vmcnt(0) is emitted.		// Force the score to as if an S_WAITCNT vmcnt(0) is emitted.
ScoreBrackets->setScoreLB(VM_CNT,		ScoreBrackets->setScoreLB(VM_CNT,
Show All 21 Lines	#endif
}		}

++Iter;		++Iter;
}		}

return Modified;		return Modified;
}		}

		void SIInsertWaitcnts::clearStaleCacheLevelsIfIsCacheInvalidationInstruction(
		MachineInstr &MI, WaitcntBrackets &Brackets) {
		if (instructionInvalidatesL0Cache(MI.getOpcode()))
		Brackets.clearPotentiallyStaleCacheAtLevel(MEM_CACHE_LVL_0);

		if (instructionInvalidatesL1Cache(MI.getOpcode()))
		Brackets.clearPotentiallyStaleCacheAtLevel(MEM_CACHE_LVL_1);
		sameerdsUnsubmitted Done Reply Inline Actions This condition and the previous one need to be captured as a function with a meaningful name. Or if TableGen is involved in this enum, then perhaps a property on the instruction. sameerds: This condition and the previous one need to be captured as a function with a meaningful name.
		}

		bool SIInsertWaitcnts::removeUnnecessaryMemoryCacheInvalidationInstructions() {
		bool InstructionRemoved = false;
		for (auto &BlockInfo : BlockInfos) {
		WaitcntBrackets Bracket =
		(BlockInfo.second.Incoming ? *BlockInfo.second.Incoming
		: WaitcntBrackets(ST));
		InstructionRemoved \|= removeIfUnnecessaryMemoryCacheInvalidationInBlock(
		*BlockInfo.first, Bracket);
		}
		return InstructionRemoved;
		}

		bool SIInsertWaitcnts::removeIfUnnecessaryMemoryCacheInvalidationInBlock(
		MachineBasicBlock &BB, const WaitcntBrackets &Bracket) {
		WaitcntBrackets LocalBracket = Bracket;
		SmallVector<MachineInstr *> InstructionsToErase;
		for (MachineInstr &Inst : BB) {
		if (isUnnecessaryMemoryCacheInvalidation(Inst, LocalBracket))
		InstructionsToErase.push_back(&Inst);
		updateWaitcntBracketAfter(Inst, &LocalBracket);
		}

		for (MachineInstr *Inst : InstructionsToErase) {
		Inst->eraseFromParent();
		}
		return !InstructionsToErase.empty();
		}

		bool SIInsertWaitcnts::isUnnecessaryMemoryCacheInvalidation(
		const MachineInstr &Inst, WaitcntBrackets &Brackets) {
		auto OpCode = Inst.getOpcode();
		if (instructionInvalidatesL0Cache(OpCode) &&
		!Brackets.hasPotentiallyStaleCacheAtLevel(MEM_CACHE_LVL_0)) {
		return true;
		} else if (instructionInvalidatesL1Cache(OpCode) &&
		!Brackets.hasPotentiallyStaleCacheAtLevel(MEM_CACHE_LVL_1)) {
		return true;
		}
		return false;
		}

bool SIInsertWaitcnts::runOnMachineFunction(MachineFunction &MF) {		bool SIInsertWaitcnts::runOnMachineFunction(MachineFunction &MF) {
ST = &MF.getSubtarget<GCNSubtarget>();		ST = &MF.getSubtarget<GCNSubtarget>();
TII = ST->getInstrInfo();		TII = ST->getInstrInfo();
TRI = &TII->getRegisterInfo();		TRI = &TII->getRegisterInfo();
MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();
IV = AMDGPU::getIsaVersion(ST->getCPU());		IV = AMDGPU::getIsaVersion(ST->getCPU());
const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
PDT = &getAnalysis<MachinePostDominatorTree>();		PDT = &getAnalysis<MachinePostDominatorTree>();
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	for (auto BII = BlockInfos.begin(), BIE = BlockInfos.end(); BII != BIE;
Brackets = std::make_unique<WaitcntBrackets>(ST);		Brackets = std::make_unique<WaitcntBrackets>(ST);
else		else
*Brackets = WaitcntBrackets(ST);		*Brackets = WaitcntBrackets(ST);
}		}

Modified \|= insertWaitcntInBlock(MF, BI.MBB, Brackets);		Modified \|= insertWaitcntInBlock(MF, BI.MBB, Brackets);
BI.Dirty = false;		BI.Dirty = false;

if (Brackets->hasPending()) {		if (Brackets->hasPending()) {
		t-tyeUnsubmitted Done Reply Inline Actions Should hasPending be generalized to include all pending "things"? Seems merge has been generalized to merge all the "things". t-tye: Should hasPending be generalized to include all pending "things"? Seems merge has been…
BlockInfo *MoveBracketsToSucc = nullptr;		BlockInfo *MoveBracketsToSucc = nullptr;
for (MachineBasicBlock *Succ : BI.MBB->successors()) {		for (MachineBasicBlock *Succ : BI.MBB->successors()) {
auto SuccBII = BlockInfos.find(Succ);		auto SuccBII = BlockInfos.find(Succ);
BlockInfo &SuccBI = SuccBII->second;		BlockInfo &SuccBI = SuccBII->second;
if (!SuccBI.Incoming) {		if (!SuccBI.Incoming) {
SuccBI.Dirty = true;		SuccBI.Dirty = true;
if (SuccBII <= BII)		if (SuccBII <= BII)
Repeat = true;		Repeat = true;
Show All 9 Lines	for (auto BII = BlockInfos.begin(), BIE = BlockInfos.end(); BII != BIE;
}		}
}		}
if (MoveBracketsToSucc)		if (MoveBracketsToSucc)
MoveBracketsToSucc->Incoming = std::move(Brackets);		MoveBracketsToSucc->Incoming = std::move(Brackets);
}		}
}		}
} while (Repeat);		} while (Repeat);

		removeUnnecessaryMemoryCacheInvalidationInstructions();

SmallVector<MachineBasicBlock *, 4> EndPgmBlocks;		SmallVector<MachineBasicBlock *, 4> EndPgmBlocks;

bool HaveScalarStores = false;		bool HaveScalarStores = false;

for (MachineFunction::iterator BI = MF.begin(), BE = MF.end(); BI != BE;		for (MachineFunction::iterator BI = MF.begin(), BE = MF.end(); BI != BE;
++BI) {		++BI) {
MachineBasicBlock &MBB = *BI;		MachineBasicBlock &MBB = *BI;

▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	if (ST->hasVscnt())
BuildMI(EntryBB, I, DebugLoc(), TII->get(AMDGPU::S_WAITCNT_VSCNT))		BuildMI(EntryBB, I, DebugLoc(), TII->get(AMDGPU::S_WAITCNT_VSCNT))
.addReg(AMDGPU::SGPR_NULL, RegState::Undef)		.addReg(AMDGPU::SGPR_NULL, RegState::Undef)
.addImm(0);		.addImm(0);

Modified = true;		Modified = true;
}		}

return Modified;		return Modified;
}		}
		arsenmUnsubmitted Done Reply Inline Actions Range loop? arsenm: Range loop?

llvm/test/CodeGen/AMDGPU/GlobalISel/memory-legalizer-atomic-fence.ll

	; RUN: llc -global-isel -mtriple=amdgcn-amd- -mcpu=gfx600 -verify-machineinstrs < %s \| FileCheck -check-prefixes=FUNC,GCN,GFX6,GFX68 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd- -mcpu=gfx600 -verify-machineinstrs < %s \| FileCheck -check-prefixes=FUNC,GCN,GFX6,GFX68 %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd- -mcpu=gfx803 -verify-machineinstrs < %s \| FileCheck -check-prefixes=FUNC,GCN,GFX8,GFX68 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd- -mcpu=gfx803 -verify-machineinstrs < %s \| FileCheck -check-prefixes=FUNC,GCN,GFX8,GFX68 %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -verify-machineinstrs < %s \| FileCheck -check-prefixes=FUNC,GCN,GFX8,GFX68 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -verify-machineinstrs < %s \| FileCheck -check-prefixes=FUNC,GCN,GFX8,GFX68 %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=FUNC,GCN,GFX10,GFX10WGP %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=FUNC,GCN,GFX10,GFX10WGP %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=+cumode -verify-machineinstrs < %s \| FileCheck -check-prefixes=FUNC,GCN,GFX10,GFX10CU %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=+cumode -verify-machineinstrs < %s \| FileCheck -check-prefixes=FUNC,GCN,GFX10,GFX10CU %s

				; Stores to this variable are used to stop the si-insert-waitcnt pass from removing the cache invalidation instructions.
				@gint = external addrspace(1) global i32, align 4

	; FUNC-LABEL: {{^}}system_one_as_acquire:			; FUNC-LABEL: {{^}}system_one_as_acquire:
	; GCN: %bb.0			; GCN: %bb.0
	; GCN-NOT: ATOMIC_FENCE			; GCN-NOT: ATOMIC_FENCE
	; GFX6: s_waitcnt vmcnt(0){{$}}			; GFX6: s_waitcnt vmcnt(0){{$}}
	; GFX6-NEXT: buffer_wbinvl1{{$}}			; GFX6-NEXT: buffer_wbinvl1{{$}}
	; GFX8: s_waitcnt vmcnt(0){{$}}			; GFX8: s_waitcnt vmcnt(0){{$}}
	; GFX8-NEXT: buffer_wbinvl1_vol{{$}}			; GFX8-NEXT: buffer_wbinvl1_vol{{$}}
	; GFX10: s_waitcnt vmcnt(0){{$}}			; GFX10: s_waitcnt vmcnt(0){{$}}
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0{{$}}			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0{{$}}
	; GFX10-NEXT: buffer_gl0_inv{{$}}			; GFX10-NEXT: buffer_gl0_inv{{$}}
	; GFX10-NEXT: buffer_gl1_inv{{$}}			; GFX10-NEXT: buffer_gl1_inv{{$}}
	; GCN: s_endpgm			; GCN: s_endpgm
	; GFX10: .amdhsa_kernel system_one_as_acquire			; GFX10: .amdhsa_kernel system_one_as_acquire
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @system_one_as_acquire() {			define amdgpu_kernel void @system_one_as_acquire() {
	entry:			entry:
				store i32 0, i32 addrspace(1)* @gint
	fence syncscope("one-as") acquire			fence syncscope("one-as") acquire
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}system_one_as_release:			; FUNC-LABEL: {{^}}system_one_as_release:
	; GCN: %bb.0			; GCN: %bb.0
	; GCN-NOT: ATOMIC_FENCE			; GCN-NOT: ATOMIC_FENCE
	; GCN: s_waitcnt vmcnt(0){{$}}			; GCN: s_waitcnt vmcnt(0){{$}}
	Show All 20 Lines
	; GFX10-NEXT: buffer_gl1_inv{{$}}			; GFX10-NEXT: buffer_gl1_inv{{$}}
	; GCN: s_endpgm			; GCN: s_endpgm
	; GFX10: .amdhsa_kernel system_one_as_acq_rel			; GFX10: .amdhsa_kernel system_one_as_acq_rel
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @system_one_as_acq_rel() {			define amdgpu_kernel void @system_one_as_acq_rel() {
	entry:			entry:
				store i32 0, i32 addrspace(1)* @gint
	fence syncscope("one-as") acq_rel			fence syncscope("one-as") acq_rel
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}system_one_as_seq_cst:			; FUNC-LABEL: {{^}}system_one_as_seq_cst:
	; GCN: %bb.0			; GCN: %bb.0
	; GCN-NOT: ATOMIC_FENCE			; GCN-NOT: ATOMIC_FENCE
	; GCN: s_waitcnt vmcnt(0){{$}}			; GCN: s_waitcnt vmcnt(0){{$}}
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0{{$}}			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0{{$}}
	; GFX6: buffer_wbinvl1{{$}}			; GFX6: buffer_wbinvl1{{$}}
	; GFX8: buffer_wbinvl1_vol{{$}}			; GFX8: buffer_wbinvl1_vol{{$}}
	; GFX10-NEXT: buffer_gl0_inv{{$}}			; GFX10-NEXT: buffer_gl0_inv{{$}}
	; GFX10-NEXT: buffer_gl1_inv{{$}}			; GFX10-NEXT: buffer_gl1_inv{{$}}
	; GCN: s_endpgm			; GCN: s_endpgm
	; GFX10: .amdhsa_kernel system_one_as_seq_cst			; GFX10: .amdhsa_kernel system_one_as_seq_cst
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @system_one_as_seq_cst() {			define amdgpu_kernel void @system_one_as_seq_cst() {
	entry:			entry:
				store i32 0, i32 addrspace(1)* @gint
	fence syncscope("one-as") seq_cst			fence syncscope("one-as") seq_cst
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}singlethread_one_as_acquire:			; FUNC-LABEL: {{^}}singlethread_one_as_acquire:
	; GCN: %bb.0			; GCN: %bb.0
	; GCN-NOT: ATOMIC_FENCE			; GCN-NOT: ATOMIC_FENCE
	; GCN: s_endpgm			; GCN: s_endpgm
	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: buffer_gl1_inv{{$}}			; GFX10-NEXT: buffer_gl1_inv{{$}}
	; GCN: s_endpgm			; GCN: s_endpgm
	; GFX10: .amdhsa_kernel agent_one_as_acquire			; GFX10: .amdhsa_kernel agent_one_as_acquire
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @agent_one_as_acquire() {			define amdgpu_kernel void @agent_one_as_acquire() {
	entry:			entry:
				store i32 0, i32 addrspace(1)* @gint
	fence syncscope("agent-one-as") acquire			fence syncscope("agent-one-as") acquire
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}agent_one_as_release:			; FUNC-LABEL: {{^}}agent_one_as_release:
	; GCN: %bb.0			; GCN: %bb.0
	; GCN-NOT: ATOMIC_FENCE			; GCN-NOT: ATOMIC_FENCE
	; GCN: s_waitcnt vmcnt(0){{$}}			; GCN: s_waitcnt vmcnt(0){{$}}
	Show All 20 Lines
	; GFX10-NEXT: buffer_gl1_inv{{$}}			; GFX10-NEXT: buffer_gl1_inv{{$}}
	; GCN: s_endpgm			; GCN: s_endpgm
	; GFX10: .amdhsa_kernel agent_one_as_acq_rel			; GFX10: .amdhsa_kernel agent_one_as_acq_rel
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @agent_one_as_acq_rel() {			define amdgpu_kernel void @agent_one_as_acq_rel() {
	entry:			entry:
				store i32 0, i32 addrspace(1)* @gint
	fence syncscope("agent-one-as") acq_rel			fence syncscope("agent-one-as") acq_rel
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}agent_one_as_seq_cst:			; FUNC-LABEL: {{^}}agent_one_as_seq_cst:
	; GCN: %bb.0			; GCN: %bb.0
	; GCN-NOT: ATOMIC_FENCE			; GCN-NOT: ATOMIC_FENCE
	; GCN: s_waitcnt vmcnt(0){{$}}			; GCN: s_waitcnt vmcnt(0){{$}}
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0{{$}}			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0{{$}}
	; GFX6: buffer_wbinvl1{{$}}			; GFX6: buffer_wbinvl1{{$}}
	; GFX8: buffer_wbinvl1_vol{{$}}			; GFX8: buffer_wbinvl1_vol{{$}}
	; GFX10-NEXT: buffer_gl0_inv{{$}}			; GFX10-NEXT: buffer_gl0_inv{{$}}
	; GFX10-NEXT: buffer_gl1_inv{{$}}			; GFX10-NEXT: buffer_gl1_inv{{$}}
	; GCN: s_endpgm			; GCN: s_endpgm
	; GFX10: .amdhsa_kernel agent_one_as_seq_cst			; GFX10: .amdhsa_kernel agent_one_as_seq_cst
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @agent_one_as_seq_cst() {			define amdgpu_kernel void @agent_one_as_seq_cst() {
	entry:			entry:
				store i32 0, i32 addrspace(1)* @gint
	fence syncscope("agent-one-as") seq_cst			fence syncscope("agent-one-as") seq_cst
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}workgroup_one_as_acquire:			; FUNC-LABEL: {{^}}workgroup_one_as_acquire:
	; GCN: %bb.0			; GCN: %bb.0
	; GFX68-NOT: s_waitcnt vmcnt(0){{$}}			; GFX68-NOT: s_waitcnt vmcnt(0){{$}}
	; GFX10WGP: s_waitcnt vmcnt(0){{$}}			; GFX10WGP: s_waitcnt vmcnt(0){{$}}
	; GFX10WGP-NEXT: s_waitcnt_vscnt null, 0x0{{$}}			; GFX10WGP-NEXT: s_waitcnt_vscnt null, 0x0{{$}}
	; GFX10CU-NOT: s_waitcnt vmcnt(0){{$}}			; GFX10CU-NOT: s_waitcnt vmcnt(0){{$}}
	; GFX10CU-NOT: s_waitcnt_vscnt null, 0x0{{$}}			; GFX10CU-NOT: s_waitcnt_vscnt null, 0x0{{$}}
	; GFX10WGP-NEXT: buffer_gl0_inv{{$}}			; GFX10WGP-NEXT: buffer_gl0_inv{{$}}
	; GFX10CU-NOT: buffer_gl0_inv{{$}}			; GFX10CU-NOT: buffer_gl0_inv{{$}}
	; GCN-NOT: ATOMIC_FENCE			; GCN-NOT: ATOMIC_FENCE
	; GCN: s_endpgm			; GCN: s_endpgm
	; GFX10: .amdhsa_kernel workgroup_one_as_acquire			; GFX10: .amdhsa_kernel workgroup_one_as_acquire
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @workgroup_one_as_acquire() {			define amdgpu_kernel void @workgroup_one_as_acquire() {
	entry:			entry:
				store i32 0, i32 addrspace(1)* @gint
	fence syncscope("workgroup-one-as") acquire			fence syncscope("workgroup-one-as") acquire
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}workgroup_one_as_release:			; FUNC-LABEL: {{^}}workgroup_one_as_release:
	; GCN: %bb.0			; GCN: %bb.0
	; GFX68-NOT: s_waitcnt vmcnt(0){{$}}			; GFX68-NOT: s_waitcnt vmcnt(0){{$}}
	; GFX10WGP: s_waitcnt vmcnt(0){{$}}			; GFX10WGP: s_waitcnt vmcnt(0){{$}}
	; GFX10WGP-NEXT: s_waitcnt_vscnt null, 0x0{{$}}			; GFX10WGP-NEXT: s_waitcnt_vscnt null, 0x0{{$}}
	; GFX10CU-NOT: s_waitcnt vmcnt(0){{$}}			; GFX10CU-NOT: s_waitcnt vmcnt(0){{$}}
	; GFX10CU-NOT: s_waitcnt_vscnt null, 0x0{{$}}			; GFX10CU-NOT: s_waitcnt_vscnt null, 0x0{{$}}
	; GFX10-NOT: buffer_gl0_inv			; GFX10-NOT: buffer_gl0_inv
	; GCN-NOT: ATOMIC_FENCE			; GCN-NOT: ATOMIC_FENCE
	; GCN: s_endpgm			; GCN: s_endpgm
	; GFX10: .amdhsa_kernel workgroup_one_as_release			; GFX10: .amdhsa_kernel workgroup_one_as_release
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @workgroup_one_as_release() {			define amdgpu_kernel void @workgroup_one_as_release() {
	entry:			entry:
				store i32 0, i32 addrspace(1)* @gint
	fence syncscope("workgroup-one-as") release			fence syncscope("workgroup-one-as") release
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}workgroup_one_as_acq_rel:			; FUNC-LABEL: {{^}}workgroup_one_as_acq_rel:
	; GCN: %bb.0			; GCN: %bb.0
	; GFX68-NOT: s_waitcnt vmcnt(0){{$}}			; GFX68-NOT: s_waitcnt vmcnt(0){{$}}
	; GFX10WGP: s_waitcnt vmcnt(0){{$}}			; GFX10WGP: s_waitcnt vmcnt(0){{$}}
	; GFX10WGP-NEXT: s_waitcnt_vscnt null, 0x0{{$}}			; GFX10WGP-NEXT: s_waitcnt_vscnt null, 0x0{{$}}
	; GFX10WGP-NEXT: buffer_gl0_inv{{$}}			; GFX10WGP-NEXT: buffer_gl0_inv{{$}}
	; GFX10CU-NOT: s_waitcnt vmcnt(0){{$}}			; GFX10CU-NOT: s_waitcnt vmcnt(0){{$}}
	; GFX10CU-NOT: s_waitcnt_vscnt null, 0x0{{$}}			; GFX10CU-NOT: s_waitcnt_vscnt null, 0x0{{$}}
	; GFX10CU-NOT: buffer_gl0_inv{{$}}			; GFX10CU-NOT: buffer_gl0_inv{{$}}
	; GCN-NOT: ATOMIC_FENCE			; GCN-NOT: ATOMIC_FENCE
	; GCN: s_endpgm			; GCN: s_endpgm
	; GFX10: .amdhsa_kernel workgroup_one_as_acq_rel			; GFX10: .amdhsa_kernel workgroup_one_as_acq_rel
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @workgroup_one_as_acq_rel() {			define amdgpu_kernel void @workgroup_one_as_acq_rel() {
	entry:			entry:
				store i32 0, i32 addrspace(1)* @gint
	fence syncscope("workgroup-one-as") acq_rel			fence syncscope("workgroup-one-as") acq_rel
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}workgroup_one_as_seq_cst:			; FUNC-LABEL: {{^}}workgroup_one_as_seq_cst:
	; GCN: %bb.0			; GCN: %bb.0
	; GFX68-NOT: s_waitcnt vmcnt(0){{$}}			; GFX68-NOT: s_waitcnt vmcnt(0){{$}}
	; GFX10WGP: s_waitcnt vmcnt(0){{$}}			; GFX10WGP: s_waitcnt vmcnt(0){{$}}
	; GFX10WGP-NEXT: s_waitcnt_vscnt null, 0x0{{$}}			; GFX10WGP-NEXT: s_waitcnt_vscnt null, 0x0{{$}}
	; GFX10WGP-NEXT: buffer_gl0_inv{{$}}			; GFX10WGP-NEXT: buffer_gl0_inv{{$}}
	; GFX10CU-NOT: s_waitcnt vmcnt(0){{$}}			; GFX10CU-NOT: s_waitcnt vmcnt(0){{$}}
	; GFX10CU-NOT: s_waitcnt_vscnt null, 0x0{{$}}			; GFX10CU-NOT: s_waitcnt_vscnt null, 0x0{{$}}
	; GFX10CU-NOT: buffer_gl0_inv{{$}}			; GFX10CU-NOT: buffer_gl0_inv{{$}}
	; GCN-NOT: ATOMIC_FENCE			; GCN-NOT: ATOMIC_FENCE
	; GCN: s_endpgm			; GCN: s_endpgm
	; GFX10: .amdhsa_kernel workgroup_one_as_seq_cst			; GFX10: .amdhsa_kernel workgroup_one_as_seq_cst
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @workgroup_one_as_seq_cst() {			define amdgpu_kernel void @workgroup_one_as_seq_cst() {
	entry:			entry:
				store i32 0, i32 addrspace(1)* @gint
	fence syncscope("workgroup-one-as") seq_cst			fence syncscope("workgroup-one-as") seq_cst
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}wavefront_one_as_acquire:			; FUNC-LABEL: {{^}}wavefront_one_as_acquire:
	; GCN: %bb.0			; GCN: %bb.0
	; GCN-NOT: ATOMIC_FENCE			; GCN-NOT: ATOMIC_FENCE
	; GCN: s_endpgm			; GCN: s_endpgm
	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: buffer_gl1_inv{{$}}			; GFX10-NEXT: buffer_gl1_inv{{$}}
	; GCN: s_endpgm			; GCN: s_endpgm
	; GFX10: .amdhsa_kernel system_acquire			; GFX10: .amdhsa_kernel system_acquire
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @system_acquire() {			define amdgpu_kernel void @system_acquire() {
	entry:			entry:
				store i32 0, i32 addrspace(1)* @gint
	fence acquire			fence acquire
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}system_release:			; FUNC-LABEL: {{^}}system_release:
	; GCN: %bb.0			; GCN: %bb.0
	; GCN-NOT: ATOMIC_FENCE			; GCN-NOT: ATOMIC_FENCE
	; GFX6: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}			; GFX6: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}
	Show All 24 Lines
	; GFX10-NEXT: buffer_gl1_inv{{$}}			; GFX10-NEXT: buffer_gl1_inv{{$}}
	; GCN: s_endpgm			; GCN: s_endpgm
	; GFX10: .amdhsa_kernel system_acq_rel			; GFX10: .amdhsa_kernel system_acq_rel
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @system_acq_rel() {			define amdgpu_kernel void @system_acq_rel() {
	entry:			entry:
				store i32 0, i32 addrspace(1)* @gint
	fence acq_rel			fence acq_rel
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}system_seq_cst:			; FUNC-LABEL: {{^}}system_seq_cst:
	; GCN: %bb.0			; GCN: %bb.0
	; GCN-NOT: ATOMIC_FENCE			; GCN-NOT: ATOMIC_FENCE
	; GFX6: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}			; GFX6: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}
	; GFX8: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}			; GFX8: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}
	; GFX10: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}			; GFX10: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0{{$}}			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0{{$}}
	; GFX6: buffer_wbinvl1{{$}}			; GFX6: buffer_wbinvl1{{$}}
	; GFX8: buffer_wbinvl1_vol{{$}}			; GFX8: buffer_wbinvl1_vol{{$}}
	; GFX10-NEXT: buffer_gl0_inv{{$}}			; GFX10-NEXT: buffer_gl0_inv{{$}}
	; GFX10-NEXT: buffer_gl1_inv{{$}}			; GFX10-NEXT: buffer_gl1_inv{{$}}
	; GCN: s_endpgm			; GCN: s_endpgm
	; GFX10: .amdhsa_kernel system_seq_cst			; GFX10: .amdhsa_kernel system_seq_cst
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @system_seq_cst() {			define amdgpu_kernel void @system_seq_cst() {
	entry:			entry:
				store i32 0, i32 addrspace(1)* @gint
	fence seq_cst			fence seq_cst
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}singlethread_acquire:			; FUNC-LABEL: {{^}}singlethread_acquire:
	; GCN: %bb.0			; GCN: %bb.0
	; GCN-NOT: ATOMIC_FENCE			; GCN-NOT: ATOMIC_FENCE
	; GCN: s_endpgm			; GCN: s_endpgm
	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: buffer_gl1_inv{{$}}			; GFX10-NEXT: buffer_gl1_inv{{$}}
	; GCN: s_endpgm			; GCN: s_endpgm
	; GFX10: .amdhsa_kernel agent_acquire			; GFX10: .amdhsa_kernel agent_acquire
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @agent_acquire() {			define amdgpu_kernel void @agent_acquire() {
	entry:			entry:
				store i32 0, i32 addrspace(1)* @gint
	fence syncscope("agent") acquire			fence syncscope("agent") acquire
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}agent_release:			; FUNC-LABEL: {{^}}agent_release:
	; GCN: %bb.0			; GCN: %bb.0
	; GCN-NOT: ATOMIC_FENCE			; GCN-NOT: ATOMIC_FENCE
	; GFX6: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}			; GFX6: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}
	Show All 24 Lines
	; GFX10-NEXT: buffer_gl1_inv{{$}}			; GFX10-NEXT: buffer_gl1_inv{{$}}
	; GCN: s_endpgm			; GCN: s_endpgm
	; GFX10: .amdhsa_kernel agent_acq_rel			; GFX10: .amdhsa_kernel agent_acq_rel
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @agent_acq_rel() {			define amdgpu_kernel void @agent_acq_rel() {
	entry:			entry:
				store i32 0, i32 addrspace(1)* @gint
	fence syncscope("agent") acq_rel			fence syncscope("agent") acq_rel
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}agent_seq_cst:			; FUNC-LABEL: {{^}}agent_seq_cst:
	; GCN: %bb.0			; GCN: %bb.0
	; GCN-NOT: ATOMIC_FENCE			; GCN-NOT: ATOMIC_FENCE
	; GFX6: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}			; GFX6: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}
	; GFX8: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}			; GFX8: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}
	; GFX10: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}			; GFX10: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0{{$}}			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0{{$}}
	; GFX6: buffer_wbinvl1{{$}}			; GFX6: buffer_wbinvl1{{$}}
	; GFX8: buffer_wbinvl1_vol{{$}}			; GFX8: buffer_wbinvl1_vol{{$}}
	; GFX10-NEXT: buffer_gl0_inv{{$}}			; GFX10-NEXT: buffer_gl0_inv{{$}}
	; GFX10-NEXT: buffer_gl1_inv{{$}}			; GFX10-NEXT: buffer_gl1_inv{{$}}
	; GCN: s_endpgm			; GCN: s_endpgm
	; GFX10: .amdhsa_kernel agent_seq_cst			; GFX10: .amdhsa_kernel agent_seq_cst
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @agent_seq_cst() {			define amdgpu_kernel void @agent_seq_cst() {
	entry:			entry:
				store i32 0, i32 addrspace(1)* @gint
	fence syncscope("agent") seq_cst			fence syncscope("agent") seq_cst
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}workgroup_acquire:			; FUNC-LABEL: {{^}}workgroup_acquire:
	; GCN: %bb.0			; GCN: %bb.0
	; GFX68-NOT: s_waitcnt vmcnt(0){{$}}			; GFX68-NOT: s_waitcnt vmcnt(0){{$}}
	; GFX10WGP: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}			; GFX10WGP: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}
	; GFX10WGP-NEXT: s_waitcnt_vscnt null, 0x0{{$}}			; GFX10WGP-NEXT: s_waitcnt_vscnt null, 0x0{{$}}
	; GFX10CU-NOT: s_waitcnt vmcnt(0){{$}}			; GFX10CU-NOT: s_waitcnt vmcnt(0){{$}}
	; GFX10CU-NOT: s_waitcnt_vscnt null, 0x0{{$}}			; GFX10CU-NOT: s_waitcnt_vscnt null, 0x0{{$}}
	; GFX10WGP-NEXT: buffer_gl0_inv{{$}}			; GFX10WGP-NEXT: buffer_gl0_inv{{$}}
	; GFX10CU-NOT: buffer_gl0_inv{{$}}			; GFX10CU-NOT: buffer_gl0_inv{{$}}
	; GCN-NOT: ATOMIC_FENCE			; GCN-NOT: ATOMIC_FENCE
	; GCN: s_endpgm			; GCN: s_endpgm
	; GFX10: .amdhsa_kernel workgroup_acquire			; GFX10: .amdhsa_kernel workgroup_acquire
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @workgroup_acquire() {			define amdgpu_kernel void @workgroup_acquire() {
	entry:			entry:
				store i32 0, i32 addrspace(1)* @gint
	fence syncscope("workgroup") acquire			fence syncscope("workgroup") acquire
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}workgroup_release:			; FUNC-LABEL: {{^}}workgroup_release:
	; GCN: %bb.0			; GCN: %bb.0
	; GFX68-NOT: s_waitcnt vmcnt(0){{$}}			; GFX68-NOT: s_waitcnt vmcnt(0){{$}}
	; GFX10WGP: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}			; GFX10WGP: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}
	; GFX10WGP-NEXT: s_waitcnt_vscnt null, 0x0{{$}}			; GFX10WGP-NEXT: s_waitcnt_vscnt null, 0x0{{$}}
	; GFX10CU-NOT: s_waitcnt vmcnt(0){{$}}			; GFX10CU-NOT: s_waitcnt vmcnt(0){{$}}
	; GFX10CU-NOT: s_waitcnt_vscnt null, 0x0{{$}}			; GFX10CU-NOT: s_waitcnt_vscnt null, 0x0{{$}}
	; GFX10-NOT: buffer_gl0_inv			; GFX10-NOT: buffer_gl0_inv
	; GCN-NOT: ATOMIC_FENCE			; GCN-NOT: ATOMIC_FENCE
	; GCN: s_endpgm			; GCN: s_endpgm
	; GFX10: .amdhsa_kernel workgroup_release			; GFX10: .amdhsa_kernel workgroup_release
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @workgroup_release() {			define amdgpu_kernel void @workgroup_release() {
	entry:			entry:
				store i32 0, i32 addrspace(1)* @gint
	fence syncscope("workgroup") release			fence syncscope("workgroup") release
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}workgroup_acq_rel:			; FUNC-LABEL: {{^}}workgroup_acq_rel:
	; GCN: %bb.0			; GCN: %bb.0
	; GFX68-NOT: s_waitcnt vmcnt(0){{$}}			; GFX68-NOT: s_waitcnt vmcnt(0){{$}}
	; GFX10WGP: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}			; GFX10WGP: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}
	; GFX10WGP-NEXT: s_waitcnt_vscnt null, 0x0{{$}}			; GFX10WGP-NEXT: s_waitcnt_vscnt null, 0x0{{$}}
	; GFX10WGP-NEXT: buffer_gl0_inv{{$}}			; GFX10WGP-NEXT: buffer_gl0_inv{{$}}
	; GFX10CU-NOT: s_waitcnt vmcnt(0){{$}}			; GFX10CU-NOT: s_waitcnt vmcnt(0){{$}}
	; GFX10CU-NOT: s_waitcnt_vscnt null, 0x0{{$}}			; GFX10CU-NOT: s_waitcnt_vscnt null, 0x0{{$}}
	; GFX10CU-NOT: buffer_gl0_inv{{$}}			; GFX10CU-NOT: buffer_gl0_inv{{$}}
	; GCN-NOT: ATOMIC_FENCE			; GCN-NOT: ATOMIC_FENCE
	; GCN: s_endpgm			; GCN: s_endpgm
	; GFX10: .amdhsa_kernel workgroup_acq_rel			; GFX10: .amdhsa_kernel workgroup_acq_rel
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @workgroup_acq_rel() {			define amdgpu_kernel void @workgroup_acq_rel() {
	entry:			entry:
				store i32 0, i32 addrspace(1)* @gint
	fence syncscope("workgroup") acq_rel			fence syncscope("workgroup") acq_rel
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}workgroup_seq_cst:			; FUNC-LABEL: {{^}}workgroup_seq_cst:
	; GCN: %bb.0			; GCN: %bb.0
	; GFX68-NOT: s_waitcnt vmcnt(0){{$}}			; GFX68-NOT: s_waitcnt vmcnt(0){{$}}
	; GFX10WGP: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}			; GFX10WGP: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}
	; GFX10WGP-NEXT: s_waitcnt_vscnt null, 0x0{{$}}			; GFX10WGP-NEXT: s_waitcnt_vscnt null, 0x0{{$}}
	; GFX10WGP-NEXT: buffer_gl0_inv{{$}}			; GFX10WGP-NEXT: buffer_gl0_inv{{$}}
	; GFX10CU-NOT: s_waitcnt vmcnt(0){{$}}			; GFX10CU-NOT: s_waitcnt vmcnt(0){{$}}
	; GFX10CU-NOT: s_waitcnt_vscnt null, 0x0{{$}}			; GFX10CU-NOT: s_waitcnt_vscnt null, 0x0{{$}}
	; GFX10CU-NOT: buffer_gl0_inv{{$}}			; GFX10CU-NOT: buffer_gl0_inv{{$}}
	; GCN-NOT: ATOMIC_FENCE			; GCN-NOT: ATOMIC_FENCE
	; GCN: s_endpgm			; GCN: s_endpgm
	; GFX10: .amdhsa_kernel workgroup_seq_cst			; GFX10: .amdhsa_kernel workgroup_seq_cst
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @workgroup_seq_cst() {			define amdgpu_kernel void @workgroup_seq_cst() {
	entry:			entry:
				store i32 0, i32 addrspace(1)* @gint
	fence syncscope("workgroup") seq_cst			fence syncscope("workgroup") seq_cst
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}wavefront_acquire:			; FUNC-LABEL: {{^}}wavefront_acquire:
	; GCN: %bb.0			; GCN: %bb.0
	; GCN-NOT: ATOMIC_FENCE			; GCN-NOT: ATOMIC_FENCE
	; GCN: s_endpgm			; GCN: s_endpgm
	▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/cache_invalidate.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -march=amdgcn -mcpu=gfx900 -run-pass si-insert-waitcnts %s -o - \| FileCheck -check-prefixes=CHECK %s

				--- \|
				@lds = addrspace(3) global float undef, align 16

				define amdgpu_kernel void @no_stores() {ret void}
				define amdgpu_kernel void @lds_store() {ret void}
				define amdgpu_kernel void @smem_store() {ret void}
				define amdgpu_kernel void @smem_store_with_multiple_invalidations() {ret void}
				define amdgpu_kernel void @smem_store_with_invalidations_in_side_nodes() {ret void}
				define amdgpu_kernel void @smem_load_in_if_block_with_invalidations_in_side_nodes() {ret void}
				define amdgpu_kernel void @smem_load_in_else_block_with_invalidations_in_side_nodes() {ret void}
				define amdgpu_kernel void @smem_load_at_end_of_loop() {ret void}
				...

				---
				name: no_stores

				body: \|
				bb.0:
				; CHECK-LABEL: name: no_stores
				; CHECK: S_WAITCNT 0
				; CHECK-NOT: BUFFER_WBINVL1
				; CHECK: S_ENDPGM 0
				BUFFER_WBINVL1 implicit $exec
				S_ENDPGM 0
				...

				---
				name: lds_store

				body: \|
				bb.0:
				; CHECK-LABEL: name: lds_store
				; CHECK: S_WAITCNT 0
				; CHECK: renamable $vgpr0 = V_MOV_B32_e32 0, implicit $exec
				; CHECK: DS_WRITE_B32_gfx9 killed renamable $vgpr0, renamable $vgpr0, 0, 0, implicit $exec :: (store 4 into @lds, align 16, addrspace 3)
				; CHECK-NOT: BUFFER_WBINVL1
				; CHECK: S_ENDPGM 0
				renamable $vgpr0 = V_MOV_B32_e32 0, implicit $exec
				DS_WRITE_B32_gfx9 killed renamable $vgpr0, renamable $vgpr0, 0, 0, implicit $exec :: (store 4 into @lds, align 16, addrspace 3)
				BUFFER_WBINVL1 implicit $exec
				S_ENDPGM 0
				...

				---
				name: smem_store

				body: \|
				bb.0:
				liveins: $sgpr2
				; CHECK-LABEL: name: smem_store
				; CHECK: S_WAITCNT 0
				; CHECK: BUFFER_STORE_DWORD_OFFSET killed renamable $vgpr0, killed renamable $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: S_WAITCNT 3952
				; CHECK: BUFFER_WBINVL1 implicit $exec
				; CHECK: S_ENDPGM 0
				BUFFER_STORE_DWORD_OFFSET killed renamable $vgpr0, killed renamable $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec
				BUFFER_WBINVL1 implicit $exec
				S_ENDPGM 0
				...

				---
				name: smem_store_with_multiple_invalidations

				body: \|
				bb.0:
				liveins: $sgpr2
				; CHECK-LABEL: name: smem_store_with_multiple_invalidations
				; CHECK: S_WAITCNT 0
				; CHECK: BUFFER_STORE_DWORD_OFFSET killed renamable $vgpr0, killed renamable $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: S_WAITCNT 3952
				; CHECK: BUFFER_WBINVL1 implicit $exec
				; CHECK-NOT: BUFFER_WBINVL1
				; CHECK: S_ENDPGM 0
				BUFFER_STORE_DWORD_OFFSET killed renamable $vgpr0, killed renamable $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec
				BUFFER_WBINVL1 implicit $exec
				BUFFER_WBINVL1 implicit $exec
				S_ENDPGM 0
				...

				---
				name: smem_store_with_invalidations_in_side_nodes

				# The buffer_WBINVL1 instructions should left be in both paths.

				body: \|
				; CHECK-LABEL: name: smem_store_with_invalidations_in_side_nodes
				; CHECK: bb.0:
				; CHECK: successors: %bb.2(0x40000000), %bb.1(0x40000000)
				; CHECK: S_WAITCNT 0
				; CHECK: renamable $vgpr0 = BUFFER_LOAD_DWORD_OFFSET renamable $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: S_WAITCNT 3952
				; CHECK: V_CMP_NE_U32_e32 0, killed $vgpr0, implicit-def $vcc, implicit $exec
				; CHECK: $vcc = S_AND_B64 $exec, killed $vcc, implicit-def dead $scc
				; CHECK: S_CBRANCH_VCCNZ %bb.2, implicit killed $vcc
				; CHECK: bb.1:
				; CHECK: BUFFER_WBINVL1 implicit $exec
				; CHECK: S_ENDPGM 0
				; CHECK: bb.2:
				; CHECK: BUFFER_WBINVL1 implicit $exec
				; CHECK: S_ENDPGM 0
				bb.0:
				renamable $vgpr0 = BUFFER_LOAD_DWORD_OFFSET renamable $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec
				V_CMP_NE_U32_e32 0, killed $vgpr0, implicit-def $vcc, implicit $exec
				$vcc = S_AND_B64 $exec, killed $vcc, implicit-def dead $scc
				S_CBRANCH_VCCNZ %bb.2, implicit killed $vcc

				bb.1:
				BUFFER_WBINVL1 implicit $exec
				S_ENDPGM 0

				bb.2:
				BUFFER_WBINVL1 implicit $exec
				S_ENDPGM 0
				...

				---
				name: smem_load_in_if_block_with_invalidations_in_side_nodes

				# The buffer_WBINVL1 instruction in the else block can be removed because.

				body: \|
				; CHECK-LABEL: name: smem_load_in_if_block_with_invalidations_in_side_nodes
				; CHECK: bb.0:
				; CHECK: successors: %bb.2(0x40000000), %bb.1(0x40000000)
				; CHECK: S_WAITCNT 0
				; CHECK: V_CMP_NE_U32_e32 0, killed $vgpr0, implicit-def $vcc, implicit $exec
				; CHECK: $vcc = S_AND_B64 $exec, killed $vcc, implicit-def dead $scc
				; CHECK: S_CBRANCH_VCCNZ %bb.2, implicit killed $vcc
				; CHECK: bb.1:
				; CHECK: renamable $vgpr0 = BUFFER_LOAD_DWORD_OFFSET renamable $sgpr4_sgpr5_sgpr6_sgpr7, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: S_WAITCNT 3952
				; CHECK: BUFFER_WBINVL1 implicit $exec
				; CHECK: S_ENDPGM 0
				; CHECK: bb.2:
				; CHECK: S_ENDPGM 0
				bb.0:
				V_CMP_NE_U32_e32 0, killed $vgpr0, implicit-def $vcc, implicit $exec
				$vcc = S_AND_B64 $exec, killed $vcc, implicit-def dead $scc
				S_CBRANCH_VCCNZ %bb.2, implicit killed $vcc

				bb.1:
				renamable $vgpr0 = BUFFER_LOAD_DWORD_OFFSET renamable $sgpr4_sgpr5_sgpr6_sgpr7, 0, 0, 0, 0, 0, implicit $exec
				BUFFER_WBINVL1 implicit $exec
				S_ENDPGM 0

				bb.2:
				BUFFER_WBINVL1 implicit $exec
				S_ENDPGM 0
				...

				---
				name: smem_load_in_else_block_with_invalidations_in_side_nodes

				# The buffer_WBINVL1 instruction in the if block can be removed because.

				body: \|
				; CHECK-LABEL: name: smem_load_in_else_block_with_invalidations_in_side_nodes
				; CHECK: bb.0:
				; CHECK: successors: %bb.2(0x40000000), %bb.1(0x40000000)
				; CHECK: S_WAITCNT 0
				; CHECK: V_CMP_NE_U32_e32 0, killed $vgpr0, implicit-def $vcc, implicit $exec
				; CHECK: $vcc = S_AND_B64 $exec, killed $vcc, implicit-def dead $scc
				; CHECK: S_CBRANCH_VCCNZ %bb.2, implicit killed $vcc
				; CHECK: bb.1:
				; CHECK: S_ENDPGM 0
				; CHECK: bb.2:
				; CHECK: renamable $vgpr0 = BUFFER_LOAD_DWORD_OFFSET renamable $sgpr4_sgpr5_sgpr6_sgpr7, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: S_WAITCNT 3952
				; CHECK: BUFFER_WBINVL1 implicit $exec
				; CHECK: S_ENDPGM 0
				bb.0:
				V_CMP_NE_U32_e32 0, killed $vgpr0, implicit-def $vcc, implicit $exec
				$vcc = S_AND_B64 $exec, killed $vcc, implicit-def dead $scc
				S_CBRANCH_VCCNZ %bb.2, implicit killed $vcc

				bb.1:
				BUFFER_WBINVL1 implicit $exec
				S_ENDPGM 0

				bb.2:
				renamable $vgpr0 = BUFFER_LOAD_DWORD_OFFSET renamable $sgpr4_sgpr5_sgpr6_sgpr7, 0, 0, 0, 0, 0, implicit $exec
				BUFFER_WBINVL1 implicit $exec
				S_ENDPGM 0
				...

				---
				name: smem_load_at_end_of_loop

				# The buffer_WBINVL1 instruction at the start of the loop should not be removed.

				body: \|
				; CHECK-LABEL: name: smem_load_at_end_of_loop
				; CHECK: bb.0:
				; CHECK: successors: %bb.2(0x80000000)
				; CHECK: S_WAITCNT 0
				; CHECK: S_BRANCH %bb.2
				; CHECK: bb.1:
				; CHECK: successors: %bb.2(0x80000000)
				; CHECK: renamable $vgpr0 = BUFFER_LOAD_DWORD_OFFSET renamable $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: S_BRANCH %bb.2
				; CHECK: bb.2:
				; CHECK: successors: %bb.1(0x40000000), %bb.3(0x40000000)
				; CHECK: S_WAITCNT 3952
				; CHECK: V_CMP_EQ_U32_e32 0, killed $vgpr0, implicit-def $vcc, implicit $exec
				; CHECK: $vcc = S_AND_B64 $exec, killed $vcc, implicit-def dead $scc
				; CHECK: BUFFER_WBINVL1 implicit $exec
				; CHECK: S_CBRANCH_VCCZ %bb.1, implicit killed $vcc
				; CHECK: bb.3:
				; CHECK: S_ENDPGM 0
				bb.0:
				S_BRANCH %bb.1

				bb.3:
				renamable $vgpr0 = BUFFER_LOAD_DWORD_OFFSET renamable $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, implicit $exec
				S_BRANCH %bb.1

				bb.1:
				V_CMP_EQ_U32_e32 0, killed $vgpr0, implicit-def $vcc, implicit $exec
				$vcc = S_AND_B64 $exec, killed $vcc, implicit-def dead $scc
				BUFFER_WBINVL1 implicit $exec
				S_CBRANCH_VCCZ %bb.3, implicit killed $vcc

				bb.2:
				S_ENDPGM 0
				...
				---

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.buffer.wbinvl1.ll

	; RUN: llc -march=amdgcn -mcpu=tahiti -show-mc-encoding < %s \| FileCheck -check-prefix=GCN -check-prefix=SI %s			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=fiji -show-mc-encoding < %s \| FileCheck -check-prefix=GCN -check-prefix=VI %s			; RUN: llc -march=amdgcn -mcpu=tahiti -show-mc-encoding < %s \| FileCheck -check-prefix=SI %s
				; RUN: llc -march=amdgcn -mcpu=fiji -show-mc-encoding < %s \| FileCheck -check-prefix=VI %s

	declare void @llvm.amdgcn.buffer.wbinvl1() #0			declare void @llvm.amdgcn.buffer.wbinvl1() #0
				@gint = external addrspace(1) global i32, align 4

	; GCN-LABEL: {{^}}test_buffer_wbinvl1:			define amdgpu_kernel void @test_buffer_wbinvl1() #0 {
	; GCN-NEXT: ; %bb.0:			; SI-LABEL: test_buffer_wbinvl1:
				; SI: ; %bb.0:
				; SI-NEXT: s_getpc_b64 s[0:1] ; encoding: [0x00,0x1f,0x80,0xbe]
				; SI-NEXT: s_add_u32 s0, s0, gint@gotpcrel32@lo+4 ; encoding: [0x00,0xff,0x00,0x80,A,A,A,A]
				; SI-NEXT: ; fixup A - offset: 4, value: gint@gotpcrel32@lo+4, kind: FK_PCRel_4
				; SI-NEXT: s_addc_u32 s1, s1, gint@gotpcrel32@hi+12 ; encoding: [0x01,0xff,0x01,0x82,A,A,A,A]
				; SI-NEXT: ; fixup A - offset: 4, value: gint@gotpcrel32@hi+12, kind: FK_PCRel_4
				; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0 ; encoding: [0x00,0x01,0x40,0xc0]
				; SI-NEXT: s_mov_b32 s3, 0xf000 ; encoding: [0xff,0x03,0x83,0xbe,0x00,0xf0,0x00,0x00]
				; SI-NEXT: s_mov_b32 s2, -1 ; encoding: [0xc1,0x03,0x82,0xbe]
				; SI-NEXT: v_mov_b32_e32 v0, 0 ; encoding: [0x80,0x02,0x00,0x7e]
				; SI-NEXT: s_waitcnt lgkmcnt(0) ; encoding: [0x7f,0x00,0x8c,0xbf]
				; SI-NEXT: buffer_store_dword v0, off, s[0:3], 0 ; encoding: [0x00,0x00,0x70,0xe0,0x00,0x00,0x00,0x80]
				; SI-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
	; SI-NEXT: buffer_wbinvl1 ; encoding: [0x00,0x00,0xc4,0xe1,0x00,0x00,0x00,0x00]			; SI-NEXT: buffer_wbinvl1 ; encoding: [0x00,0x00,0xc4,0xe1,0x00,0x00,0x00,0x00]
				; SI-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				;
				; VI-LABEL: test_buffer_wbinvl1:
				; VI: ; %bb.0:
				; VI-NEXT: s_getpc_b64 s[0:1] ; encoding: [0x00,0x1c,0x80,0xbe]
				; VI-NEXT: s_add_u32 s0, s0, gint@gotpcrel32@lo+4 ; encoding: [0x00,0xff,0x00,0x80,A,A,A,A]
				; VI-NEXT: ; fixup A - offset: 4, value: gint@gotpcrel32@lo+4, kind: FK_PCRel_4
				; VI-NEXT: s_addc_u32 s1, s1, gint@gotpcrel32@hi+12 ; encoding: [0x01,0xff,0x01,0x82,A,A,A,A]
				; VI-NEXT: ; fixup A - offset: 4, value: gint@gotpcrel32@hi+12, kind: FK_PCRel_4
				; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0 ; encoding: [0x00,0x00,0x06,0xc0,0x00,0x00,0x00,0x00]
				; VI-NEXT: v_mov_b32_e32 v2, 0 ; encoding: [0x80,0x02,0x04,0x7e]
				; VI-NEXT: s_waitcnt lgkmcnt(0) ; encoding: [0x7f,0x00,0x8c,0xbf]
				; VI-NEXT: v_mov_b32_e32 v0, s0 ; encoding: [0x00,0x02,0x00,0x7e]
				; VI-NEXT: v_mov_b32_e32 v1, s1 ; encoding: [0x01,0x02,0x02,0x7e]
				; VI-NEXT: flat_store_dword v[0:1], v2 ; encoding: [0x00,0x00,0x70,0xdc,0x00,0x02,0x00,0x00]
				; VI-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
	; VI-NEXT: buffer_wbinvl1 ; encoding: [0x00,0x00,0xf8,0xe0,0x00,0x00,0x00,0x00]			; VI-NEXT: buffer_wbinvl1 ; encoding: [0x00,0x00,0xf8,0xe0,0x00,0x00,0x00,0x00]
	; GCN-NEXT: s_endpgm			; VI-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
	define amdgpu_kernel void @test_buffer_wbinvl1() #0 {			store i32 0, i32 addrspace(1)* @gint
	call void @llvm.amdgcn.buffer.wbinvl1()			call void @llvm.amdgcn.buffer.wbinvl1()
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.buffer.wbinvl1.sc.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=tahiti -show-mc-encoding < %s \| FileCheck -check-prefix=SI %s			; RUN: llc -march=amdgcn -mcpu=tahiti -show-mc-encoding < %s \| FileCheck -check-prefix=SI %s

	declare void @llvm.amdgcn.buffer.wbinvl1.sc() #0			declare void @llvm.amdgcn.buffer.wbinvl1.sc() #0
				@gint = external addrspace(1) global i32, align 4

	; SI-LABEL: {{^}}test_buffer_wbinvl1_sc:
	; SI-NEXT: ; %bb.0:
	; SI-NEXT: buffer_wbinvl1_sc ; encoding: [0x00,0x00,0xc0,0xe1,0x00,0x00,0x00,0x00]
	; SI-NEXT: s_endpgm
	define amdgpu_kernel void @test_buffer_wbinvl1_sc() #0 {			define amdgpu_kernel void @test_buffer_wbinvl1_sc() #0 {
				; SI-LABEL: test_buffer_wbinvl1_sc:
				; SI: ; %bb.0:
				; SI-NEXT: s_getpc_b64 s[0:1] ; encoding: [0x00,0x1f,0x80,0xbe]
				; SI-NEXT: s_add_u32 s0, s0, gint@gotpcrel32@lo+4 ; encoding: [0x00,0xff,0x00,0x80,A,A,A,A]
				; SI-NEXT: ; fixup A - offset: 4, value: gint@gotpcrel32@lo+4, kind: FK_PCRel_4
				; SI-NEXT: s_addc_u32 s1, s1, gint@gotpcrel32@hi+12 ; encoding: [0x01,0xff,0x01,0x82,A,A,A,A]
				; SI-NEXT: ; fixup A - offset: 4, value: gint@gotpcrel32@hi+12, kind: FK_PCRel_4
				; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0 ; encoding: [0x00,0x01,0x40,0xc0]
				; SI-NEXT: s_mov_b32 s3, 0xf000 ; encoding: [0xff,0x03,0x83,0xbe,0x00,0xf0,0x00,0x00]
				; SI-NEXT: s_mov_b32 s2, -1 ; encoding: [0xc1,0x03,0x82,0xbe]
				; SI-NEXT: v_mov_b32_e32 v0, 0 ; encoding: [0x80,0x02,0x00,0x7e]
				; SI-NEXT: s_waitcnt lgkmcnt(0) ; encoding: [0x7f,0x00,0x8c,0xbf]
				; SI-NEXT: buffer_store_dword v0, off, s[0:3], 0 ; encoding: [0x00,0x00,0x70,0xe0,0x00,0x00,0x00,0x80]
				; SI-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf]
				; SI-NEXT: buffer_wbinvl1_sc ; encoding: [0x00,0x00,0xc0,0xe1,0x00,0x00,0x00,0x00]
				; SI-NEXT: s_endpgm ; encoding: [0x00,0x00,0x81,0xbf]
				store i32 0, i32 addrspace(1)* @gint
	call void @llvm.amdgcn.buffer.wbinvl1.sc()			call void @llvm.amdgcn.buffer.wbinvl1.sc()
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

llvm/test/CodeGen/AMDGPU/memory-legalizer-fence.ll

	Show All 29 Lines
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: singlethread_acquire_fence:			; GFX90A-NOTTGSPLIT-LABEL: singlethread_acquire_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: singlethread_acquire_fence:			; GFX90A-TGSPLIT-LABEL: singlethread_acquire_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("singlethread") acquire			fence syncscope("singlethread") acquire
	ret void			ret void
	}			}

	define amdgpu_kernel void @singlethread_release_fence() {			define amdgpu_kernel void @singlethread_release_fence() {
	; GFX6-LABEL: singlethread_release_fence:			; GFX6-LABEL: singlethread_release_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	Show All 17 Lines
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: singlethread_release_fence:			; GFX90A-NOTTGSPLIT-LABEL: singlethread_release_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: singlethread_release_fence:			; GFX90A-TGSPLIT-LABEL: singlethread_release_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("singlethread") release			fence syncscope("singlethread") release
	ret void			ret void
	}			}

	define amdgpu_kernel void @singlethread_acq_rel_fence() {			define amdgpu_kernel void @singlethread_acq_rel_fence() {
	; GFX6-LABEL: singlethread_acq_rel_fence:			; GFX6-LABEL: singlethread_acq_rel_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	Show All 17 Lines
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: singlethread_acq_rel_fence:			; GFX90A-NOTTGSPLIT-LABEL: singlethread_acq_rel_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: singlethread_acq_rel_fence:			; GFX90A-TGSPLIT-LABEL: singlethread_acq_rel_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("singlethread") acq_rel			fence syncscope("singlethread") acq_rel
	ret void			ret void
	}			}

	define amdgpu_kernel void @singlethread_seq_cst_fence() {			define amdgpu_kernel void @singlethread_seq_cst_fence() {
	; GFX6-LABEL: singlethread_seq_cst_fence:			; GFX6-LABEL: singlethread_seq_cst_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	Show All 17 Lines
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: singlethread_seq_cst_fence:			; GFX90A-NOTTGSPLIT-LABEL: singlethread_seq_cst_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: singlethread_seq_cst_fence:			; GFX90A-TGSPLIT-LABEL: singlethread_seq_cst_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("singlethread") seq_cst			fence syncscope("singlethread") seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @singlethread_one_as_acquire_fence() {			define amdgpu_kernel void @singlethread_one_as_acquire_fence() {
	; GFX6-LABEL: singlethread_one_as_acquire_fence:			; GFX6-LABEL: singlethread_one_as_acquire_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	Show All 17 Lines
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: singlethread_one_as_acquire_fence:			; GFX90A-NOTTGSPLIT-LABEL: singlethread_one_as_acquire_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: singlethread_one_as_acquire_fence:			; GFX90A-TGSPLIT-LABEL: singlethread_one_as_acquire_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("singlethread-one-as") acquire			fence syncscope("singlethread-one-as") acquire
	ret void			ret void
	}			}

	define amdgpu_kernel void @singlethread_one_as_release_fence() {			define amdgpu_kernel void @singlethread_one_as_release_fence() {
	; GFX6-LABEL: singlethread_one_as_release_fence:			; GFX6-LABEL: singlethread_one_as_release_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	Show All 17 Lines
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: singlethread_one_as_release_fence:			; GFX90A-NOTTGSPLIT-LABEL: singlethread_one_as_release_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: singlethread_one_as_release_fence:			; GFX90A-TGSPLIT-LABEL: singlethread_one_as_release_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("singlethread-one-as") release			fence syncscope("singlethread-one-as") release
	ret void			ret void
	}			}

	define amdgpu_kernel void @singlethread_one_as_acq_rel_fence() {			define amdgpu_kernel void @singlethread_one_as_acq_rel_fence() {
	; GFX6-LABEL: singlethread_one_as_acq_rel_fence:			; GFX6-LABEL: singlethread_one_as_acq_rel_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	Show All 17 Lines
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: singlethread_one_as_acq_rel_fence:			; GFX90A-NOTTGSPLIT-LABEL: singlethread_one_as_acq_rel_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: singlethread_one_as_acq_rel_fence:			; GFX90A-TGSPLIT-LABEL: singlethread_one_as_acq_rel_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("singlethread-one-as") acq_rel			fence syncscope("singlethread-one-as") acq_rel
	ret void			ret void
	}			}

	define amdgpu_kernel void @singlethread_one_as_seq_cst_fence() {			define amdgpu_kernel void @singlethread_one_as_seq_cst_fence() {
	; GFX6-LABEL: singlethread_one_as_seq_cst_fence:			; GFX6-LABEL: singlethread_one_as_seq_cst_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	Show All 17 Lines
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: singlethread_one_as_seq_cst_fence:			; GFX90A-NOTTGSPLIT-LABEL: singlethread_one_as_seq_cst_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: singlethread_one_as_seq_cst_fence:			; GFX90A-TGSPLIT-LABEL: singlethread_one_as_seq_cst_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("singlethread-one-as") seq_cst			fence syncscope("singlethread-one-as") seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @wavefront_acquire_fence() {			define amdgpu_kernel void @wavefront_acquire_fence() {
	; GFX6-LABEL: wavefront_acquire_fence:			; GFX6-LABEL: wavefront_acquire_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	Show All 17 Lines
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: wavefront_acquire_fence:			; GFX90A-NOTTGSPLIT-LABEL: wavefront_acquire_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: wavefront_acquire_fence:			; GFX90A-TGSPLIT-LABEL: wavefront_acquire_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("wavefront") acquire			fence syncscope("wavefront") acquire
	ret void			ret void
	}			}

	define amdgpu_kernel void @wavefront_release_fence() {			define amdgpu_kernel void @wavefront_release_fence() {
	; GFX6-LABEL: wavefront_release_fence:			; GFX6-LABEL: wavefront_release_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	Show All 17 Lines
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: wavefront_release_fence:			; GFX90A-NOTTGSPLIT-LABEL: wavefront_release_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: wavefront_release_fence:			; GFX90A-TGSPLIT-LABEL: wavefront_release_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("wavefront") release			fence syncscope("wavefront") release
	ret void			ret void
	}			}

	define amdgpu_kernel void @wavefront_acq_rel_fence() {			define amdgpu_kernel void @wavefront_acq_rel_fence() {
	; GFX6-LABEL: wavefront_acq_rel_fence:			; GFX6-LABEL: wavefront_acq_rel_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	Show All 17 Lines
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: wavefront_acq_rel_fence:			; GFX90A-NOTTGSPLIT-LABEL: wavefront_acq_rel_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: wavefront_acq_rel_fence:			; GFX90A-TGSPLIT-LABEL: wavefront_acq_rel_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("wavefront") acq_rel			fence syncscope("wavefront") acq_rel
	ret void			ret void
	}			}

	define amdgpu_kernel void @wavefront_seq_cst_fence() {			define amdgpu_kernel void @wavefront_seq_cst_fence() {
	; GFX6-LABEL: wavefront_seq_cst_fence:			; GFX6-LABEL: wavefront_seq_cst_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	Show All 17 Lines
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: wavefront_seq_cst_fence:			; GFX90A-NOTTGSPLIT-LABEL: wavefront_seq_cst_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: wavefront_seq_cst_fence:			; GFX90A-TGSPLIT-LABEL: wavefront_seq_cst_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("wavefront") seq_cst			fence syncscope("wavefront") seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @wavefront_one_as_acquire_fence() {			define amdgpu_kernel void @wavefront_one_as_acquire_fence() {
	; GFX6-LABEL: wavefront_one_as_acquire_fence:			; GFX6-LABEL: wavefront_one_as_acquire_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	Show All 17 Lines
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: wavefront_one_as_acquire_fence:			; GFX90A-NOTTGSPLIT-LABEL: wavefront_one_as_acquire_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: wavefront_one_as_acquire_fence:			; GFX90A-TGSPLIT-LABEL: wavefront_one_as_acquire_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("wavefront-one-as") acquire			fence syncscope("wavefront-one-as") acquire
	ret void			ret void
	}			}

	define amdgpu_kernel void @wavefront_one_as_release_fence() {			define amdgpu_kernel void @wavefront_one_as_release_fence() {
	; GFX6-LABEL: wavefront_one_as_release_fence:			; GFX6-LABEL: wavefront_one_as_release_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	Show All 17 Lines
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: wavefront_one_as_release_fence:			; GFX90A-NOTTGSPLIT-LABEL: wavefront_one_as_release_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: wavefront_one_as_release_fence:			; GFX90A-TGSPLIT-LABEL: wavefront_one_as_release_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("wavefront-one-as") release			fence syncscope("wavefront-one-as") release
	ret void			ret void
	}			}

	define amdgpu_kernel void @wavefront_one_as_acq_rel_fence() {			define amdgpu_kernel void @wavefront_one_as_acq_rel_fence() {
	; GFX6-LABEL: wavefront_one_as_acq_rel_fence:			; GFX6-LABEL: wavefront_one_as_acq_rel_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	Show All 17 Lines
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: wavefront_one_as_acq_rel_fence:			; GFX90A-NOTTGSPLIT-LABEL: wavefront_one_as_acq_rel_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: wavefront_one_as_acq_rel_fence:			; GFX90A-TGSPLIT-LABEL: wavefront_one_as_acq_rel_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("wavefront-one-as") acq_rel			fence syncscope("wavefront-one-as") acq_rel
	ret void			ret void
	}			}

	define amdgpu_kernel void @wavefront_one_as_seq_cst_fence() {			define amdgpu_kernel void @wavefront_one_as_seq_cst_fence() {
	; GFX6-LABEL: wavefront_one_as_seq_cst_fence:			; GFX6-LABEL: wavefront_one_as_seq_cst_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	Show All 17 Lines
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: wavefront_one_as_seq_cst_fence:			; GFX90A-NOTTGSPLIT-LABEL: wavefront_one_as_seq_cst_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: wavefront_one_as_seq_cst_fence:			; GFX90A-TGSPLIT-LABEL: wavefront_one_as_seq_cst_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("wavefront-one-as") seq_cst			fence syncscope("wavefront-one-as") seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @workgroup_acquire_fence() {			define amdgpu_kernel void @workgroup_acquire_fence() {
	; GFX6-LABEL: workgroup_acquire_fence:			; GFX6-LABEL: workgroup_acquire_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	; GFX6-NEXT: s_waitcnt lgkmcnt(0)			; GFX6-NEXT: s_waitcnt lgkmcnt(0)
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: workgroup_acquire_fence:			; GFX7-LABEL: workgroup_acquire_fence:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX10-WGP-LABEL: workgroup_acquire_fence:			; GFX10-WGP-LABEL: workgroup_acquire_fence:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: workgroup_acquire_fence:			; GFX10-CU-LABEL: workgroup_acquire_fence:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-CU-NEXT: s_endpgm			; GFX10-CU-NEXT: s_endpgm
	;			;
	; SKIP-CACHE-INV-LABEL: workgroup_acquire_fence:			; SKIP-CACHE-INV-LABEL: workgroup_acquire_fence:
	; SKIP-CACHE-INV: ; %bb.0: ; %entry			; SKIP-CACHE-INV: ; %bb.0: ; %entry
	; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)			; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
	; SKIP-CACHE-INV-NEXT: s_endpgm			; SKIP-CACHE-INV-NEXT: s_endpgm
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: workgroup_acquire_fence:			; GFX90A-NOTTGSPLIT-LABEL: workgroup_acquire_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: workgroup_acquire_fence:			; GFX90A-TGSPLIT-LABEL: workgroup_acquire_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("workgroup") acquire			fence syncscope("workgroup") acquire
	ret void			ret void
	}			}

	define amdgpu_kernel void @workgroup_release_fence() {			define amdgpu_kernel void @workgroup_release_fence() {
	; GFX6-LABEL: workgroup_release_fence:			; GFX6-LABEL: workgroup_release_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	Show All 25 Lines
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: workgroup_release_fence:			; GFX90A-TGSPLIT-LABEL: workgroup_release_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("workgroup") release			fence syncscope("workgroup") release
	ret void			ret void
	}			}

	define amdgpu_kernel void @workgroup_acq_rel_fence() {			define amdgpu_kernel void @workgroup_acq_rel_fence() {
	; GFX6-LABEL: workgroup_acq_rel_fence:			; GFX6-LABEL: workgroup_acq_rel_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	; GFX6-NEXT: s_waitcnt lgkmcnt(0)			; GFX6-NEXT: s_waitcnt lgkmcnt(0)
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: workgroup_acq_rel_fence:			; GFX7-LABEL: workgroup_acq_rel_fence:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX10-WGP-LABEL: workgroup_acq_rel_fence:			; GFX10-WGP-LABEL: workgroup_acq_rel_fence:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: workgroup_acq_rel_fence:			; GFX10-CU-LABEL: workgroup_acq_rel_fence:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-CU-NEXT: s_endpgm			; GFX10-CU-NEXT: s_endpgm
	;			;
	; SKIP-CACHE-INV-LABEL: workgroup_acq_rel_fence:			; SKIP-CACHE-INV-LABEL: workgroup_acq_rel_fence:
	; SKIP-CACHE-INV: ; %bb.0: ; %entry			; SKIP-CACHE-INV: ; %bb.0: ; %entry
	; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)			; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
	; SKIP-CACHE-INV-NEXT: s_endpgm			; SKIP-CACHE-INV-NEXT: s_endpgm
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: workgroup_acq_rel_fence:			; GFX90A-NOTTGSPLIT-LABEL: workgroup_acq_rel_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: workgroup_acq_rel_fence:			; GFX90A-TGSPLIT-LABEL: workgroup_acq_rel_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("workgroup") acq_rel			fence syncscope("workgroup") acq_rel
	ret void			ret void
	}			}

	define amdgpu_kernel void @workgroup_seq_cst_fence() {			define amdgpu_kernel void @workgroup_seq_cst_fence() {
	; GFX6-LABEL: workgroup_seq_cst_fence:			; GFX6-LABEL: workgroup_seq_cst_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	; GFX6-NEXT: s_waitcnt lgkmcnt(0)			; GFX6-NEXT: s_waitcnt lgkmcnt(0)
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: workgroup_seq_cst_fence:			; GFX7-LABEL: workgroup_seq_cst_fence:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX10-WGP-LABEL: workgroup_seq_cst_fence:			; GFX10-WGP-LABEL: workgroup_seq_cst_fence:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: workgroup_seq_cst_fence:			; GFX10-CU-LABEL: workgroup_seq_cst_fence:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-CU-NEXT: s_endpgm			; GFX10-CU-NEXT: s_endpgm
	;			;
	; SKIP-CACHE-INV-LABEL: workgroup_seq_cst_fence:			; SKIP-CACHE-INV-LABEL: workgroup_seq_cst_fence:
	; SKIP-CACHE-INV: ; %bb.0: ; %entry			; SKIP-CACHE-INV: ; %bb.0: ; %entry
	; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)			; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
	; SKIP-CACHE-INV-NEXT: s_endpgm			; SKIP-CACHE-INV-NEXT: s_endpgm
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: workgroup_seq_cst_fence:			; GFX90A-NOTTGSPLIT-LABEL: workgroup_seq_cst_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: workgroup_seq_cst_fence:			; GFX90A-TGSPLIT-LABEL: workgroup_seq_cst_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("workgroup") seq_cst			fence syncscope("workgroup") seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @workgroup_one_as_acquire_fence() {			define amdgpu_kernel void @workgroup_one_as_acquire_fence() {
	; GFX6-LABEL: workgroup_one_as_acquire_fence:			; GFX6-LABEL: workgroup_one_as_acquire_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: workgroup_one_as_acquire_fence:			; GFX7-LABEL: workgroup_one_as_acquire_fence:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX10-WGP-LABEL: workgroup_one_as_acquire_fence:			; GFX10-WGP-LABEL: workgroup_one_as_acquire_fence:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: workgroup_one_as_acquire_fence:			; GFX10-CU-LABEL: workgroup_one_as_acquire_fence:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_endpgm			; GFX10-CU-NEXT: s_endpgm
	;			;
	; SKIP-CACHE-INV-LABEL: workgroup_one_as_acquire_fence:			; SKIP-CACHE-INV-LABEL: workgroup_one_as_acquire_fence:
	; SKIP-CACHE-INV: ; %bb.0: ; %entry			; SKIP-CACHE-INV: ; %bb.0: ; %entry
	; SKIP-CACHE-INV-NEXT: s_endpgm			; SKIP-CACHE-INV-NEXT: s_endpgm
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_acquire_fence:			; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_acquire_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: workgroup_one_as_acquire_fence:			; GFX90A-TGSPLIT-LABEL: workgroup_one_as_acquire_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("workgroup-one-as") acquire			fence syncscope("workgroup-one-as") acquire
	ret void			ret void
	}			}

	define amdgpu_kernel void @workgroup_one_as_release_fence() {			define amdgpu_kernel void @workgroup_one_as_release_fence() {
	; GFX6-LABEL: workgroup_one_as_release_fence:			; GFX6-LABEL: workgroup_one_as_release_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	Show All 20 Lines
	; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_release_fence:			; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_release_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: workgroup_one_as_release_fence:			; GFX90A-TGSPLIT-LABEL: workgroup_one_as_release_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("workgroup-one-as") release			fence syncscope("workgroup-one-as") release
	ret void			ret void
	}			}

	define amdgpu_kernel void @workgroup_one_as_acq_rel_fence() {			define amdgpu_kernel void @workgroup_one_as_acq_rel_fence() {
	; GFX6-LABEL: workgroup_one_as_acq_rel_fence:			; GFX6-LABEL: workgroup_one_as_acq_rel_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: workgroup_one_as_acq_rel_fence:			; GFX7-LABEL: workgroup_one_as_acq_rel_fence:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX10-WGP-LABEL: workgroup_one_as_acq_rel_fence:			; GFX10-WGP-LABEL: workgroup_one_as_acq_rel_fence:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: workgroup_one_as_acq_rel_fence:			; GFX10-CU-LABEL: workgroup_one_as_acq_rel_fence:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_endpgm			; GFX10-CU-NEXT: s_endpgm
	;			;
	; SKIP-CACHE-INV-LABEL: workgroup_one_as_acq_rel_fence:			; SKIP-CACHE-INV-LABEL: workgroup_one_as_acq_rel_fence:
	; SKIP-CACHE-INV: ; %bb.0: ; %entry			; SKIP-CACHE-INV: ; %bb.0: ; %entry
	; SKIP-CACHE-INV-NEXT: s_endpgm			; SKIP-CACHE-INV-NEXT: s_endpgm
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:			; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:			; GFX90A-TGSPLIT-LABEL: workgroup_one_as_acq_rel_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("workgroup-one-as") acq_rel			fence syncscope("workgroup-one-as") acq_rel
	ret void			ret void
	}			}

	define amdgpu_kernel void @workgroup_one_as_seq_cst_fence() {			define amdgpu_kernel void @workgroup_one_as_seq_cst_fence() {
	; GFX6-LABEL: workgroup_one_as_seq_cst_fence:			; GFX6-LABEL: workgroup_one_as_seq_cst_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: workgroup_one_as_seq_cst_fence:			; GFX7-LABEL: workgroup_one_as_seq_cst_fence:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX10-WGP-LABEL: workgroup_one_as_seq_cst_fence:			; GFX10-WGP-LABEL: workgroup_one_as_seq_cst_fence:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: workgroup_one_as_seq_cst_fence:			; GFX10-CU-LABEL: workgroup_one_as_seq_cst_fence:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
	; GFX10-CU-NEXT: s_endpgm			; GFX10-CU-NEXT: s_endpgm
	;			;
	; SKIP-CACHE-INV-LABEL: workgroup_one_as_seq_cst_fence:			; SKIP-CACHE-INV-LABEL: workgroup_one_as_seq_cst_fence:
	; SKIP-CACHE-INV: ; %bb.0: ; %entry			; SKIP-CACHE-INV: ; %bb.0: ; %entry
	; SKIP-CACHE-INV-NEXT: s_endpgm			; SKIP-CACHE-INV-NEXT: s_endpgm
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:			; GFX90A-NOTTGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:			; GFX90A-TGSPLIT-LABEL: workgroup_one_as_seq_cst_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("workgroup-one-as") seq_cst			fence syncscope("workgroup-one-as") seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @agent_acquire_fence() {			define amdgpu_kernel void @agent_acquire_fence(i8 addrspace(1)* %ptr) {
	; GFX6-LABEL: agent_acquire_fence:			; GFX6-LABEL: agent_acquire_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
				; GFX6-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX6-NEXT: s_mov_b32 s3, 0x100f000
				; GFX6-NEXT: s_mov_b32 s2, -1
				; GFX6-NEXT: v_mov_b32_e32 v0, 0
				; GFX6-NEXT: s_waitcnt lgkmcnt(0)
				; GFX6-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX6-NEXT: buffer_wbinvl1			; GFX6-NEXT: buffer_wbinvl1
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: agent_acquire_fence:			; GFX7-LABEL: agent_acquire_fence:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX7-NEXT: v_mov_b32_e32 v2, 0
				; GFX7-NEXT: s_waitcnt lgkmcnt(0)
				; GFX7-NEXT: v_mov_b32_e32 v0, s0
				; GFX7-NEXT: v_mov_b32_e32 v1, s1
				; GFX7-NEXT: flat_store_byte v[0:1], v2
	; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX7-NEXT: buffer_wbinvl1_vol			; GFX7-NEXT: buffer_wbinvl1_vol
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX10-WGP-LABEL: agent_acquire_fence:			; GFX10-WGP-LABEL: agent_acquire_fence:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
				; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-WGP-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-WGP-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: buffer_gl1_inv			; GFX10-WGP-NEXT: buffer_gl1_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: agent_acquire_fence:			; GFX10-CU-LABEL: agent_acquire_fence:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
				; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-CU-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-CU-NEXT: buffer_gl0_inv			; GFX10-CU-NEXT: buffer_gl0_inv
	; GFX10-CU-NEXT: buffer_gl1_inv			; GFX10-CU-NEXT: buffer_gl1_inv
	; GFX10-CU-NEXT: s_endpgm			; GFX10-CU-NEXT: s_endpgm
	;			;
	; SKIP-CACHE-INV-LABEL: agent_acquire_fence:			; SKIP-CACHE-INV-LABEL: agent_acquire_fence:
	; SKIP-CACHE-INV: ; %bb.0: ; %entry			; SKIP-CACHE-INV: ; %bb.0: ; %entry
				; SKIP-CACHE-INV-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, 0xf000
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, -1
				; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, 0
				; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
				; SKIP-CACHE-INV-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; SKIP-CACHE-INV-NEXT: s_endpgm			; SKIP-CACHE-INV-NEXT: s_endpgm
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: agent_acquire_fence:			; GFX90A-NOTTGSPLIT-LABEL: agent_acquire_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
				; GFX90A-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-NOTTGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: agent_acquire_fence:			; GFX90A-TGSPLIT-LABEL: agent_acquire_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
				; GFX90A-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-TGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
				store i8 0, i8 addrspace(1)* %ptr, align 1
	fence syncscope("agent") acquire			fence syncscope("agent") acquire
	ret void			ret void
	}			}

	define amdgpu_kernel void @agent_release_fence() {			define amdgpu_kernel void @agent_release_fence() {
	; GFX6-LABEL: agent_release_fence:			; GFX6-LABEL: agent_release_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	; GFX6-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	Show All 25 Lines
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: agent_release_fence:			; GFX90A-TGSPLIT-LABEL: agent_release_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("agent") release			fence syncscope("agent") release
	ret void			ret void
	}			}

	define amdgpu_kernel void @agent_acq_rel_fence() {			define amdgpu_kernel void @agent_acq_rel_fence(i8 addrspace(1)* %ptr) {
	; GFX6-LABEL: agent_acq_rel_fence:			; GFX6-LABEL: agent_acq_rel_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
				; GFX6-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX6-NEXT: s_mov_b32 s3, 0x100f000
				; GFX6-NEXT: s_mov_b32 s2, -1
				; GFX6-NEXT: v_mov_b32_e32 v0, 0
				; GFX6-NEXT: s_waitcnt lgkmcnt(0)
				; GFX6-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX6-NEXT: buffer_wbinvl1			; GFX6-NEXT: buffer_wbinvl1
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: agent_acq_rel_fence:			; GFX7-LABEL: agent_acq_rel_fence:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX7-NEXT: v_mov_b32_e32 v2, 0
				; GFX7-NEXT: s_waitcnt lgkmcnt(0)
				; GFX7-NEXT: v_mov_b32_e32 v0, s0
				; GFX7-NEXT: v_mov_b32_e32 v1, s1
				; GFX7-NEXT: flat_store_byte v[0:1], v2
	; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX7-NEXT: buffer_wbinvl1_vol			; GFX7-NEXT: buffer_wbinvl1_vol
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX10-WGP-LABEL: agent_acq_rel_fence:			; GFX10-WGP-LABEL: agent_acq_rel_fence:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
				; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-WGP-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-WGP-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: buffer_gl1_inv			; GFX10-WGP-NEXT: buffer_gl1_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: agent_acq_rel_fence:			; GFX10-CU-LABEL: agent_acq_rel_fence:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
				; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-CU-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-CU-NEXT: buffer_gl0_inv			; GFX10-CU-NEXT: buffer_gl0_inv
	; GFX10-CU-NEXT: buffer_gl1_inv			; GFX10-CU-NEXT: buffer_gl1_inv
	; GFX10-CU-NEXT: s_endpgm			; GFX10-CU-NEXT: s_endpgm
	;			;
	; SKIP-CACHE-INV-LABEL: agent_acq_rel_fence:			; SKIP-CACHE-INV-LABEL: agent_acq_rel_fence:
	; SKIP-CACHE-INV: ; %bb.0: ; %entry			; SKIP-CACHE-INV: ; %bb.0: ; %entry
				; SKIP-CACHE-INV-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, 0xf000
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, -1
				; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, 0
				; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
				; SKIP-CACHE-INV-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; SKIP-CACHE-INV-NEXT: s_endpgm			; SKIP-CACHE-INV-NEXT: s_endpgm
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: agent_acq_rel_fence:			; GFX90A-NOTTGSPLIT-LABEL: agent_acq_rel_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
				; GFX90A-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-NOTTGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: agent_acq_rel_fence:			; GFX90A-TGSPLIT-LABEL: agent_acq_rel_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
				; GFX90A-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-TGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
				store i8 0, i8 addrspace(1)* %ptr, align 1
	fence syncscope("agent") acq_rel			fence syncscope("agent") acq_rel
	ret void			ret void
	}			}

	define amdgpu_kernel void @agent_seq_cst_fence() {			define amdgpu_kernel void @agent_seq_cst_fence(i8 addrspace(1)* %ptr) {
	; GFX6-LABEL: agent_seq_cst_fence:			; GFX6-LABEL: agent_seq_cst_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
				; GFX6-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX6-NEXT: s_mov_b32 s3, 0x100f000
				; GFX6-NEXT: s_mov_b32 s2, -1
				; GFX6-NEXT: v_mov_b32_e32 v0, 0
				; GFX6-NEXT: s_waitcnt lgkmcnt(0)
				; GFX6-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX6-NEXT: buffer_wbinvl1			; GFX6-NEXT: buffer_wbinvl1
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: agent_seq_cst_fence:			; GFX7-LABEL: agent_seq_cst_fence:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX7-NEXT: v_mov_b32_e32 v2, 0
				; GFX7-NEXT: s_waitcnt lgkmcnt(0)
				; GFX7-NEXT: v_mov_b32_e32 v0, s0
				; GFX7-NEXT: v_mov_b32_e32 v1, s1
				; GFX7-NEXT: flat_store_byte v[0:1], v2
	; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX7-NEXT: buffer_wbinvl1_vol			; GFX7-NEXT: buffer_wbinvl1_vol
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX10-WGP-LABEL: agent_seq_cst_fence:			; GFX10-WGP-LABEL: agent_seq_cst_fence:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
				; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-WGP-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-WGP-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: buffer_gl1_inv			; GFX10-WGP-NEXT: buffer_gl1_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: agent_seq_cst_fence:			; GFX10-CU-LABEL: agent_seq_cst_fence:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
				; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-CU-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-CU-NEXT: buffer_gl0_inv			; GFX10-CU-NEXT: buffer_gl0_inv
	; GFX10-CU-NEXT: buffer_gl1_inv			; GFX10-CU-NEXT: buffer_gl1_inv
	; GFX10-CU-NEXT: s_endpgm			; GFX10-CU-NEXT: s_endpgm
	;			;
	; SKIP-CACHE-INV-LABEL: agent_seq_cst_fence:			; SKIP-CACHE-INV-LABEL: agent_seq_cst_fence:
	; SKIP-CACHE-INV: ; %bb.0: ; %entry			; SKIP-CACHE-INV: ; %bb.0: ; %entry
				; SKIP-CACHE-INV-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, 0xf000
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, -1
				; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, 0
				; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
				; SKIP-CACHE-INV-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; SKIP-CACHE-INV-NEXT: s_endpgm			; SKIP-CACHE-INV-NEXT: s_endpgm
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: agent_seq_cst_fence:			; GFX90A-NOTTGSPLIT-LABEL: agent_seq_cst_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
				; GFX90A-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-NOTTGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: agent_seq_cst_fence:			; GFX90A-TGSPLIT-LABEL: agent_seq_cst_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
				; GFX90A-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-TGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
				store i8 0, i8 addrspace(1)* %ptr, align 1
	fence syncscope("agent") seq_cst			fence syncscope("agent") seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @agent_one_as_acquire_fence() {			define amdgpu_kernel void @agent_one_as_acquire_fence(i8 addrspace(1)* %ptr) {
	; GFX6-LABEL: agent_one_as_acquire_fence:			; GFX6-LABEL: agent_one_as_acquire_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
				; GFX6-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX6-NEXT: s_mov_b32 s3, 0x100f000
				; GFX6-NEXT: s_mov_b32 s2, -1
				; GFX6-NEXT: v_mov_b32_e32 v0, 0
				; GFX6-NEXT: s_waitcnt lgkmcnt(0)
				; GFX6-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_wbinvl1			; GFX6-NEXT: buffer_wbinvl1
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: agent_one_as_acquire_fence:			; GFX7-LABEL: agent_one_as_acquire_fence:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX7-NEXT: v_mov_b32_e32 v2, 0
				; GFX7-NEXT: s_waitcnt lgkmcnt(0)
				; GFX7-NEXT: v_mov_b32_e32 v0, s0
				; GFX7-NEXT: v_mov_b32_e32 v1, s1
				; GFX7-NEXT: flat_store_byte v[0:1], v2
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_wbinvl1_vol			; GFX7-NEXT: buffer_wbinvl1_vol
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX10-WGP-LABEL: agent_one_as_acquire_fence:			; GFX10-WGP-LABEL: agent_one_as_acquire_fence:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
				; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-WGP-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-WGP-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: buffer_gl1_inv			; GFX10-WGP-NEXT: buffer_gl1_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: agent_one_as_acquire_fence:			; GFX10-CU-LABEL: agent_one_as_acquire_fence:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
				; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-CU-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-CU-NEXT: s_waitcnt vmcnt(0)			; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
	; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-CU-NEXT: buffer_gl0_inv			; GFX10-CU-NEXT: buffer_gl0_inv
	; GFX10-CU-NEXT: buffer_gl1_inv			; GFX10-CU-NEXT: buffer_gl1_inv
	; GFX10-CU-NEXT: s_endpgm			; GFX10-CU-NEXT: s_endpgm
	;			;
	; SKIP-CACHE-INV-LABEL: agent_one_as_acquire_fence:			; SKIP-CACHE-INV-LABEL: agent_one_as_acquire_fence:
	; SKIP-CACHE-INV: ; %bb.0: ; %entry			; SKIP-CACHE-INV: ; %bb.0: ; %entry
				; SKIP-CACHE-INV-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, 0xf000
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, -1
				; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, 0
				; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
				; SKIP-CACHE-INV-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)			; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
	; SKIP-CACHE-INV-NEXT: s_endpgm			; SKIP-CACHE-INV-NEXT: s_endpgm
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_acquire_fence:			; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_acquire_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
				; GFX90A-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-NOTTGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: agent_one_as_acquire_fence:			; GFX90A-TGSPLIT-LABEL: agent_one_as_acquire_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
				; GFX90A-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-TGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
				store i8 0, i8 addrspace(1)* %ptr, align 1
	fence syncscope("agent-one-as") acquire			fence syncscope("agent-one-as") acquire
	ret void			ret void
	}			}

	define amdgpu_kernel void @agent_one_as_release_fence() {			define amdgpu_kernel void @agent_one_as_release_fence() {
	; GFX6-LABEL: agent_one_as_release_fence:			; GFX6-LABEL: agent_one_as_release_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	Show All 25 Lines
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: agent_one_as_release_fence:			; GFX90A-TGSPLIT-LABEL: agent_one_as_release_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("agent-one-as") release			fence syncscope("agent-one-as") release
	ret void			ret void
	}			}

	define amdgpu_kernel void @agent_one_as_acq_rel_fence() {			define amdgpu_kernel void @agent_one_as_acq_rel_fence(i8 addrspace(1)* %ptr) {
	; GFX6-LABEL: agent_one_as_acq_rel_fence:			; GFX6-LABEL: agent_one_as_acq_rel_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
				; GFX6-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX6-NEXT: s_mov_b32 s3, 0x100f000
				; GFX6-NEXT: s_mov_b32 s2, -1
				; GFX6-NEXT: v_mov_b32_e32 v0, 0
				; GFX6-NEXT: s_waitcnt lgkmcnt(0)
				; GFX6-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_wbinvl1			; GFX6-NEXT: buffer_wbinvl1
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: agent_one_as_acq_rel_fence:			; GFX7-LABEL: agent_one_as_acq_rel_fence:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX7-NEXT: v_mov_b32_e32 v2, 0
				; GFX7-NEXT: s_waitcnt lgkmcnt(0)
				; GFX7-NEXT: v_mov_b32_e32 v0, s0
				; GFX7-NEXT: v_mov_b32_e32 v1, s1
				; GFX7-NEXT: flat_store_byte v[0:1], v2
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_wbinvl1_vol			; GFX7-NEXT: buffer_wbinvl1_vol
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX10-WGP-LABEL: agent_one_as_acq_rel_fence:			; GFX10-WGP-LABEL: agent_one_as_acq_rel_fence:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
				; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-WGP-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-WGP-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: buffer_gl1_inv			; GFX10-WGP-NEXT: buffer_gl1_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: agent_one_as_acq_rel_fence:			; GFX10-CU-LABEL: agent_one_as_acq_rel_fence:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
				; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-CU-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-CU-NEXT: s_waitcnt vmcnt(0)			; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
	; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-CU-NEXT: buffer_gl0_inv			; GFX10-CU-NEXT: buffer_gl0_inv
	; GFX10-CU-NEXT: buffer_gl1_inv			; GFX10-CU-NEXT: buffer_gl1_inv
	; GFX10-CU-NEXT: s_endpgm			; GFX10-CU-NEXT: s_endpgm
	;			;
	; SKIP-CACHE-INV-LABEL: agent_one_as_acq_rel_fence:			; SKIP-CACHE-INV-LABEL: agent_one_as_acq_rel_fence:
	; SKIP-CACHE-INV: ; %bb.0: ; %entry			; SKIP-CACHE-INV: ; %bb.0: ; %entry
				; SKIP-CACHE-INV-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, 0xf000
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, -1
				; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, 0
				; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
				; SKIP-CACHE-INV-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)			; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
	; SKIP-CACHE-INV-NEXT: s_endpgm			; SKIP-CACHE-INV-NEXT: s_endpgm
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_acq_rel_fence:			; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_acq_rel_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
				; GFX90A-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-NOTTGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: agent_one_as_acq_rel_fence:			; GFX90A-TGSPLIT-LABEL: agent_one_as_acq_rel_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
				; GFX90A-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-TGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
				store i8 0, i8 addrspace(1)* %ptr, align 1
	fence syncscope("agent-one-as") acq_rel			fence syncscope("agent-one-as") acq_rel
	ret void			ret void
	}			}

	define amdgpu_kernel void @agent_one_as_seq_cst_fence() {			define amdgpu_kernel void @agent_one_as_seq_cst_fence(i8 addrspace(1)* %ptr) {
	; GFX6-LABEL: agent_one_as_seq_cst_fence:			; GFX6-LABEL: agent_one_as_seq_cst_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
				; GFX6-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX6-NEXT: s_mov_b32 s3, 0x100f000
				; GFX6-NEXT: s_mov_b32 s2, -1
				; GFX6-NEXT: v_mov_b32_e32 v0, 0
				; GFX6-NEXT: s_waitcnt lgkmcnt(0)
				; GFX6-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_wbinvl1			; GFX6-NEXT: buffer_wbinvl1
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: agent_one_as_seq_cst_fence:			; GFX7-LABEL: agent_one_as_seq_cst_fence:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX7-NEXT: v_mov_b32_e32 v2, 0
				; GFX7-NEXT: s_waitcnt lgkmcnt(0)
				; GFX7-NEXT: v_mov_b32_e32 v0, s0
				; GFX7-NEXT: v_mov_b32_e32 v1, s1
				; GFX7-NEXT: flat_store_byte v[0:1], v2
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_wbinvl1_vol			; GFX7-NEXT: buffer_wbinvl1_vol
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX10-WGP-LABEL: agent_one_as_seq_cst_fence:			; GFX10-WGP-LABEL: agent_one_as_seq_cst_fence:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
				; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-WGP-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-WGP-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: buffer_gl1_inv			; GFX10-WGP-NEXT: buffer_gl1_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: agent_one_as_seq_cst_fence:			; GFX10-CU-LABEL: agent_one_as_seq_cst_fence:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
				; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-CU-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-CU-NEXT: s_waitcnt vmcnt(0)			; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
	; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-CU-NEXT: buffer_gl0_inv			; GFX10-CU-NEXT: buffer_gl0_inv
	; GFX10-CU-NEXT: buffer_gl1_inv			; GFX10-CU-NEXT: buffer_gl1_inv
	; GFX10-CU-NEXT: s_endpgm			; GFX10-CU-NEXT: s_endpgm
	;			;
	; SKIP-CACHE-INV-LABEL: agent_one_as_seq_cst_fence:			; SKIP-CACHE-INV-LABEL: agent_one_as_seq_cst_fence:
	; SKIP-CACHE-INV: ; %bb.0: ; %entry			; SKIP-CACHE-INV: ; %bb.0: ; %entry
				; SKIP-CACHE-INV-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, 0xf000
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, -1
				; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, 0
				; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
				; SKIP-CACHE-INV-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)			; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
	; SKIP-CACHE-INV-NEXT: s_endpgm			; SKIP-CACHE-INV-NEXT: s_endpgm
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_seq_cst_fence:			; GFX90A-NOTTGSPLIT-LABEL: agent_one_as_seq_cst_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
				; GFX90A-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-NOTTGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: agent_one_as_seq_cst_fence:			; GFX90A-TGSPLIT-LABEL: agent_one_as_seq_cst_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
				; GFX90A-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-TGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
				store i8 0, i8 addrspace(1)* %ptr, align 1
	fence syncscope("agent-one-as") seq_cst			fence syncscope("agent-one-as") seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @system_acquire_fence() {			define amdgpu_kernel void @system_acquire_fence(i8 addrspace(1)* %ptr) {
	; GFX6-LABEL: system_acquire_fence:			; GFX6-LABEL: system_acquire_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
				; GFX6-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX6-NEXT: s_mov_b32 s3, 0x100f000
				; GFX6-NEXT: s_mov_b32 s2, -1
				; GFX6-NEXT: v_mov_b32_e32 v0, 0
				; GFX6-NEXT: s_waitcnt lgkmcnt(0)
				; GFX6-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX6-NEXT: buffer_wbinvl1			; GFX6-NEXT: buffer_wbinvl1
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: system_acquire_fence:			; GFX7-LABEL: system_acquire_fence:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX7-NEXT: v_mov_b32_e32 v2, 0
				; GFX7-NEXT: s_waitcnt lgkmcnt(0)
				; GFX7-NEXT: v_mov_b32_e32 v0, s0
				; GFX7-NEXT: v_mov_b32_e32 v1, s1
				; GFX7-NEXT: flat_store_byte v[0:1], v2
	; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX7-NEXT: buffer_wbinvl1_vol			; GFX7-NEXT: buffer_wbinvl1_vol
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX10-WGP-LABEL: system_acquire_fence:			; GFX10-WGP-LABEL: system_acquire_fence:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
				; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-WGP-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-WGP-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: buffer_gl1_inv			; GFX10-WGP-NEXT: buffer_gl1_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: system_acquire_fence:			; GFX10-CU-LABEL: system_acquire_fence:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
				; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-CU-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-CU-NEXT: buffer_gl0_inv			; GFX10-CU-NEXT: buffer_gl0_inv
	; GFX10-CU-NEXT: buffer_gl1_inv			; GFX10-CU-NEXT: buffer_gl1_inv
	; GFX10-CU-NEXT: s_endpgm			; GFX10-CU-NEXT: s_endpgm
	;			;
	; SKIP-CACHE-INV-LABEL: system_acquire_fence:			; SKIP-CACHE-INV-LABEL: system_acquire_fence:
	; SKIP-CACHE-INV: ; %bb.0: ; %entry			; SKIP-CACHE-INV: ; %bb.0: ; %entry
				; SKIP-CACHE-INV-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, 0xf000
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, -1
				; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, 0
				; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
				; SKIP-CACHE-INV-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; SKIP-CACHE-INV-NEXT: s_endpgm			; SKIP-CACHE-INV-NEXT: s_endpgm
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: system_acquire_fence:			; GFX90A-NOTTGSPLIT-LABEL: system_acquire_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
				; GFX90A-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-NOTTGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2			; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2			; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: system_acquire_fence:			; GFX90A-TGSPLIT-LABEL: system_acquire_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
				; GFX90A-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-TGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-TGSPLIT-NEXT: buffer_wbl2			; GFX90A-TGSPLIT-NEXT: buffer_wbl2
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_invl2			; GFX90A-TGSPLIT-NEXT: buffer_invl2
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
				store i8 0, i8 addrspace(1)* %ptr, align 1
	fence acquire			fence acquire
	ret void			ret void
	}			}

	define amdgpu_kernel void @system_release_fence() {			define amdgpu_kernel void @system_release_fence() {
	; GFX6-LABEL: system_release_fence:			; GFX6-LABEL: system_release_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	; GFX6-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	Show All 27 Lines
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: system_release_fence:			; GFX90A-TGSPLIT-LABEL: system_release_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: buffer_wbl2			; GFX90A-TGSPLIT-NEXT: buffer_wbl2
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence release			fence release
	ret void			ret void
	}			}

	define amdgpu_kernel void @system_acq_rel_fence() {			define amdgpu_kernel void @system_acq_rel_fence(i8 addrspace(1)* %ptr) {
	; GFX6-LABEL: system_acq_rel_fence:			; GFX6-LABEL: system_acq_rel_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
				; GFX6-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX6-NEXT: s_mov_b32 s3, 0x100f000
				; GFX6-NEXT: s_mov_b32 s2, -1
				; GFX6-NEXT: v_mov_b32_e32 v0, 0
				; GFX6-NEXT: s_waitcnt lgkmcnt(0)
				; GFX6-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX6-NEXT: buffer_wbinvl1			; GFX6-NEXT: buffer_wbinvl1
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: system_acq_rel_fence:			; GFX7-LABEL: system_acq_rel_fence:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX7-NEXT: v_mov_b32_e32 v2, 0
				; GFX7-NEXT: s_waitcnt lgkmcnt(0)
				; GFX7-NEXT: v_mov_b32_e32 v0, s0
				; GFX7-NEXT: v_mov_b32_e32 v1, s1
				; GFX7-NEXT: flat_store_byte v[0:1], v2
	; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX7-NEXT: buffer_wbinvl1_vol			; GFX7-NEXT: buffer_wbinvl1_vol
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX10-WGP-LABEL: system_acq_rel_fence:			; GFX10-WGP-LABEL: system_acq_rel_fence:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
				; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-WGP-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-WGP-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: buffer_gl1_inv			; GFX10-WGP-NEXT: buffer_gl1_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: system_acq_rel_fence:			; GFX10-CU-LABEL: system_acq_rel_fence:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
				; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-CU-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-CU-NEXT: buffer_gl0_inv			; GFX10-CU-NEXT: buffer_gl0_inv
	; GFX10-CU-NEXT: buffer_gl1_inv			; GFX10-CU-NEXT: buffer_gl1_inv
	; GFX10-CU-NEXT: s_endpgm			; GFX10-CU-NEXT: s_endpgm
	;			;
	; SKIP-CACHE-INV-LABEL: system_acq_rel_fence:			; SKIP-CACHE-INV-LABEL: system_acq_rel_fence:
	; SKIP-CACHE-INV: ; %bb.0: ; %entry			; SKIP-CACHE-INV: ; %bb.0: ; %entry
				; SKIP-CACHE-INV-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, 0xf000
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, -1
				; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, 0
				; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
				; SKIP-CACHE-INV-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; SKIP-CACHE-INV-NEXT: s_endpgm			; SKIP-CACHE-INV-NEXT: s_endpgm
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: system_acq_rel_fence:			; GFX90A-NOTTGSPLIT-LABEL: system_acq_rel_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
				; GFX90A-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-NOTTGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2			; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2			; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: system_acq_rel_fence:			; GFX90A-TGSPLIT-LABEL: system_acq_rel_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
				; GFX90A-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-TGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-TGSPLIT-NEXT: buffer_wbl2			; GFX90A-TGSPLIT-NEXT: buffer_wbl2
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_invl2			; GFX90A-TGSPLIT-NEXT: buffer_invl2
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
				store i8 0, i8 addrspace(1)* %ptr, align 1
	fence acq_rel			fence acq_rel
	ret void			ret void
	}			}

	define amdgpu_kernel void @system_seq_cst_fence() {			define amdgpu_kernel void @system_seq_cst_fence(i8 addrspace(1)* %ptr) {
	; GFX6-LABEL: system_seq_cst_fence:			; GFX6-LABEL: system_seq_cst_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
				; GFX6-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX6-NEXT: s_mov_b32 s3, 0x100f000
				; GFX6-NEXT: s_mov_b32 s2, -1
				; GFX6-NEXT: v_mov_b32_e32 v0, 0
				; GFX6-NEXT: s_waitcnt lgkmcnt(0)
				; GFX6-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX6-NEXT: buffer_wbinvl1			; GFX6-NEXT: buffer_wbinvl1
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: system_seq_cst_fence:			; GFX7-LABEL: system_seq_cst_fence:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX7-NEXT: v_mov_b32_e32 v2, 0
				; GFX7-NEXT: s_waitcnt lgkmcnt(0)
				; GFX7-NEXT: v_mov_b32_e32 v0, s0
				; GFX7-NEXT: v_mov_b32_e32 v1, s1
				; GFX7-NEXT: flat_store_byte v[0:1], v2
	; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX7-NEXT: buffer_wbinvl1_vol			; GFX7-NEXT: buffer_wbinvl1_vol
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX10-WGP-LABEL: system_seq_cst_fence:			; GFX10-WGP-LABEL: system_seq_cst_fence:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
				; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-WGP-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-WGP-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: buffer_gl1_inv			; GFX10-WGP-NEXT: buffer_gl1_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: system_seq_cst_fence:			; GFX10-CU-LABEL: system_seq_cst_fence:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
				; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-CU-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX10-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-CU-NEXT: buffer_gl0_inv			; GFX10-CU-NEXT: buffer_gl0_inv
	; GFX10-CU-NEXT: buffer_gl1_inv			; GFX10-CU-NEXT: buffer_gl1_inv
	; GFX10-CU-NEXT: s_endpgm			; GFX10-CU-NEXT: s_endpgm
	;			;
	; SKIP-CACHE-INV-LABEL: system_seq_cst_fence:			; SKIP-CACHE-INV-LABEL: system_seq_cst_fence:
	; SKIP-CACHE-INV: ; %bb.0: ; %entry			; SKIP-CACHE-INV: ; %bb.0: ; %entry
				; SKIP-CACHE-INV-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, 0xf000
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, -1
				; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, 0
				; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
				; SKIP-CACHE-INV-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; SKIP-CACHE-INV-NEXT: s_endpgm			; SKIP-CACHE-INV-NEXT: s_endpgm
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: system_seq_cst_fence:			; GFX90A-NOTTGSPLIT-LABEL: system_seq_cst_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
				; GFX90A-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-NOTTGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2			; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2			; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: system_seq_cst_fence:			; GFX90A-TGSPLIT-LABEL: system_seq_cst_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
				; GFX90A-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-TGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-TGSPLIT-NEXT: buffer_wbl2			; GFX90A-TGSPLIT-NEXT: buffer_wbl2
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_invl2			; GFX90A-TGSPLIT-NEXT: buffer_invl2
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
				store i8 0, i8 addrspace(1)* %ptr, align 1
	fence seq_cst			fence seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @system_one_as_acquire_fence() {			define amdgpu_kernel void @system_one_as_acquire_fence(i8 addrspace(1)* %ptr) {
	; GFX6-LABEL: system_one_as_acquire_fence:			; GFX6-LABEL: system_one_as_acquire_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
				; GFX6-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX6-NEXT: s_mov_b32 s3, 0x100f000
				; GFX6-NEXT: s_mov_b32 s2, -1
				; GFX6-NEXT: v_mov_b32_e32 v0, 0
				; GFX6-NEXT: s_waitcnt lgkmcnt(0)
				; GFX6-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_wbinvl1			; GFX6-NEXT: buffer_wbinvl1
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: system_one_as_acquire_fence:			; GFX7-LABEL: system_one_as_acquire_fence:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX7-NEXT: v_mov_b32_e32 v2, 0
				; GFX7-NEXT: s_waitcnt lgkmcnt(0)
				; GFX7-NEXT: v_mov_b32_e32 v0, s0
				; GFX7-NEXT: v_mov_b32_e32 v1, s1
				; GFX7-NEXT: flat_store_byte v[0:1], v2
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_wbinvl1_vol			; GFX7-NEXT: buffer_wbinvl1_vol
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX10-WGP-LABEL: system_one_as_acquire_fence:			; GFX10-WGP-LABEL: system_one_as_acquire_fence:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
				; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-WGP-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-WGP-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: buffer_gl1_inv			; GFX10-WGP-NEXT: buffer_gl1_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: system_one_as_acquire_fence:			; GFX10-CU-LABEL: system_one_as_acquire_fence:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
				; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-CU-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-CU-NEXT: s_waitcnt vmcnt(0)			; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
	; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-CU-NEXT: buffer_gl0_inv			; GFX10-CU-NEXT: buffer_gl0_inv
	; GFX10-CU-NEXT: buffer_gl1_inv			; GFX10-CU-NEXT: buffer_gl1_inv
	; GFX10-CU-NEXT: s_endpgm			; GFX10-CU-NEXT: s_endpgm
	;			;
	; SKIP-CACHE-INV-LABEL: system_one_as_acquire_fence:			; SKIP-CACHE-INV-LABEL: system_one_as_acquire_fence:
	; SKIP-CACHE-INV: ; %bb.0: ; %entry			; SKIP-CACHE-INV: ; %bb.0: ; %entry
				; SKIP-CACHE-INV-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, 0xf000
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, -1
				; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, 0
				; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
				; SKIP-CACHE-INV-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)			; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
	; SKIP-CACHE-INV-NEXT: s_endpgm			; SKIP-CACHE-INV-NEXT: s_endpgm
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: system_one_as_acquire_fence:			; GFX90A-NOTTGSPLIT-LABEL: system_one_as_acquire_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
				; GFX90A-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-NOTTGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2			; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2			; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: system_one_as_acquire_fence:			; GFX90A-TGSPLIT-LABEL: system_one_as_acquire_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
				; GFX90A-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-TGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-TGSPLIT-NEXT: buffer_wbl2			; GFX90A-TGSPLIT-NEXT: buffer_wbl2
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_invl2			; GFX90A-TGSPLIT-NEXT: buffer_invl2
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
				store i8 0, i8 addrspace(1)* %ptr, align 1
	fence syncscope("one-as") acquire			fence syncscope("one-as") acquire
	ret void			ret void
	}			}

	define amdgpu_kernel void @system_one_as_release_fence() {			define amdgpu_kernel void @system_one_as_release_fence() {
	; GFX6-LABEL: system_one_as_release_fence:			; GFX6-LABEL: system_one_as_release_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	Show All 27 Lines
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: system_one_as_release_fence:			; GFX90A-TGSPLIT-LABEL: system_one_as_release_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
	; GFX90A-TGSPLIT-NEXT: buffer_wbl2			; GFX90A-TGSPLIT-NEXT: buffer_wbl2
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
	fence syncscope("one-as") release			fence syncscope("one-as") release
	ret void			ret void
	}			}

	define amdgpu_kernel void @system_one_as_acq_rel_fence() {			define amdgpu_kernel void @system_one_as_acq_rel_fence(i8 addrspace(1)* %ptr) {
	; GFX6-LABEL: system_one_as_acq_rel_fence:			; GFX6-LABEL: system_one_as_acq_rel_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
				; GFX6-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX6-NEXT: s_mov_b32 s3, 0x100f000
				; GFX6-NEXT: s_mov_b32 s2, -1
				; GFX6-NEXT: v_mov_b32_e32 v0, 0
				; GFX6-NEXT: s_waitcnt lgkmcnt(0)
				; GFX6-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_wbinvl1			; GFX6-NEXT: buffer_wbinvl1
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: system_one_as_acq_rel_fence:			; GFX7-LABEL: system_one_as_acq_rel_fence:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX7-NEXT: v_mov_b32_e32 v2, 0
				; GFX7-NEXT: s_waitcnt lgkmcnt(0)
				; GFX7-NEXT: v_mov_b32_e32 v0, s0
				; GFX7-NEXT: v_mov_b32_e32 v1, s1
				; GFX7-NEXT: flat_store_byte v[0:1], v2
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_wbinvl1_vol			; GFX7-NEXT: buffer_wbinvl1_vol
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX10-WGP-LABEL: system_one_as_acq_rel_fence:			; GFX10-WGP-LABEL: system_one_as_acq_rel_fence:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
				; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-WGP-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-WGP-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: buffer_gl1_inv			; GFX10-WGP-NEXT: buffer_gl1_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: system_one_as_acq_rel_fence:			; GFX10-CU-LABEL: system_one_as_acq_rel_fence:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
				; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-CU-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-CU-NEXT: s_waitcnt vmcnt(0)			; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
	; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-CU-NEXT: buffer_gl0_inv			; GFX10-CU-NEXT: buffer_gl0_inv
	; GFX10-CU-NEXT: buffer_gl1_inv			; GFX10-CU-NEXT: buffer_gl1_inv
	; GFX10-CU-NEXT: s_endpgm			; GFX10-CU-NEXT: s_endpgm
	;			;
	; SKIP-CACHE-INV-LABEL: system_one_as_acq_rel_fence:			; SKIP-CACHE-INV-LABEL: system_one_as_acq_rel_fence:
	; SKIP-CACHE-INV: ; %bb.0: ; %entry			; SKIP-CACHE-INV: ; %bb.0: ; %entry
				; SKIP-CACHE-INV-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, 0xf000
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, -1
				; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, 0
				; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
				; SKIP-CACHE-INV-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)			; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
	; SKIP-CACHE-INV-NEXT: s_endpgm			; SKIP-CACHE-INV-NEXT: s_endpgm
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: system_one_as_acq_rel_fence:			; GFX90A-NOTTGSPLIT-LABEL: system_one_as_acq_rel_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
				; GFX90A-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-NOTTGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2			; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2			; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: system_one_as_acq_rel_fence:			; GFX90A-TGSPLIT-LABEL: system_one_as_acq_rel_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
				; GFX90A-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-TGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-TGSPLIT-NEXT: buffer_wbl2			; GFX90A-TGSPLIT-NEXT: buffer_wbl2
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_invl2			; GFX90A-TGSPLIT-NEXT: buffer_invl2
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
				store i8 0, i8 addrspace(1)* %ptr, align 1
	fence syncscope("one-as") acq_rel			fence syncscope("one-as") acq_rel
	ret void			ret void
	}			}

	define amdgpu_kernel void @system_one_as_seq_cst_fence() {			define amdgpu_kernel void @system_one_as_seq_cst_fence(i8 addrspace(1)* %ptr) {
	; GFX6-LABEL: system_one_as_seq_cst_fence:			; GFX6-LABEL: system_one_as_seq_cst_fence:
	; GFX6: ; %bb.0: ; %entry			; GFX6: ; %bb.0: ; %entry
				; GFX6-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX6-NEXT: s_mov_b32 s3, 0x100f000
				; GFX6-NEXT: s_mov_b32 s2, -1
				; GFX6-NEXT: v_mov_b32_e32 v0, 0
				; GFX6-NEXT: s_waitcnt lgkmcnt(0)
				; GFX6-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_wbinvl1			; GFX6-NEXT: buffer_wbinvl1
	; GFX6-NEXT: s_endpgm			; GFX6-NEXT: s_endpgm
	;			;
	; GFX7-LABEL: system_one_as_seq_cst_fence:			; GFX7-LABEL: system_one_as_seq_cst_fence:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
				; GFX7-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX7-NEXT: v_mov_b32_e32 v2, 0
				; GFX7-NEXT: s_waitcnt lgkmcnt(0)
				; GFX7-NEXT: v_mov_b32_e32 v0, s0
				; GFX7-NEXT: v_mov_b32_e32 v1, s1
				; GFX7-NEXT: flat_store_byte v[0:1], v2
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_wbinvl1_vol			; GFX7-NEXT: buffer_wbinvl1_vol
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX10-WGP-LABEL: system_one_as_seq_cst_fence:			; GFX10-WGP-LABEL: system_one_as_seq_cst_fence:
	; GFX10-WGP: ; %bb.0: ; %entry			; GFX10-WGP: ; %bb.0: ; %entry
				; GFX10-WGP-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-WGP-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-WGP-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-WGP-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)			; GFX10-WGP-NEXT: s_waitcnt vmcnt(0)
	; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-WGP-NEXT: buffer_gl0_inv			; GFX10-WGP-NEXT: buffer_gl0_inv
	; GFX10-WGP-NEXT: buffer_gl1_inv			; GFX10-WGP-NEXT: buffer_gl1_inv
	; GFX10-WGP-NEXT: s_endpgm			; GFX10-WGP-NEXT: s_endpgm
	;			;
	; GFX10-CU-LABEL: system_one_as_seq_cst_fence:			; GFX10-CU-LABEL: system_one_as_seq_cst_fence:
	; GFX10-CU: ; %bb.0: ; %entry			; GFX10-CU: ; %bb.0: ; %entry
				; GFX10-CU-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX10-CU-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-CU-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-CU-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX10-CU-NEXT: s_waitcnt vmcnt(0)			; GFX10-CU-NEXT: s_waitcnt vmcnt(0)
	; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-CU-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-CU-NEXT: buffer_gl0_inv			; GFX10-CU-NEXT: buffer_gl0_inv
	; GFX10-CU-NEXT: buffer_gl1_inv			; GFX10-CU-NEXT: buffer_gl1_inv
	; GFX10-CU-NEXT: s_endpgm			; GFX10-CU-NEXT: s_endpgm
	;			;
	; SKIP-CACHE-INV-LABEL: system_one_as_seq_cst_fence:			; SKIP-CACHE-INV-LABEL: system_one_as_seq_cst_fence:
	; SKIP-CACHE-INV: ; %bb.0: ; %entry			; SKIP-CACHE-INV: ; %bb.0: ; %entry
				; SKIP-CACHE-INV-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s3, 0xf000
				; SKIP-CACHE-INV-NEXT: s_mov_b32 s2, -1
				; SKIP-CACHE-INV-NEXT: v_mov_b32_e32 v0, 0
				; SKIP-CACHE-INV-NEXT: s_waitcnt lgkmcnt(0)
				; SKIP-CACHE-INV-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)			; SKIP-CACHE-INV-NEXT: s_waitcnt vmcnt(0)
	; SKIP-CACHE-INV-NEXT: s_endpgm			; SKIP-CACHE-INV-NEXT: s_endpgm
	;			;
	; GFX90A-NOTTGSPLIT-LABEL: system_one_as_seq_cst_fence:			; GFX90A-NOTTGSPLIT-LABEL: system_one_as_seq_cst_fence:
	; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry			; GFX90A-NOTTGSPLIT: ; %bb.0: ; %entry
				; GFX90A-NOTTGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-NOTTGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-NOTTGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2			; GFX90A-NOTTGSPLIT-NEXT: buffer_wbl2
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2			; GFX90A-NOTTGSPLIT-NEXT: buffer_invl2
	; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NOTTGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-NOTTGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-NOTTGSPLIT-NEXT: s_endpgm			; GFX90A-NOTTGSPLIT-NEXT: s_endpgm
	;			;
	; GFX90A-TGSPLIT-LABEL: system_one_as_seq_cst_fence:			; GFX90A-TGSPLIT-LABEL: system_one_as_seq_cst_fence:
	; GFX90A-TGSPLIT: ; %bb.0: ; %entry			; GFX90A-TGSPLIT: ; %bb.0: ; %entry
				; GFX90A-TGSPLIT-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
				; GFX90A-TGSPLIT-NEXT: v_mov_b32_e32 v0, 0
				; GFX90A-TGSPLIT-NEXT: s_waitcnt lgkmcnt(0)
				; GFX90A-TGSPLIT-NEXT: global_store_byte v0, v0, s[0:1]
	; GFX90A-TGSPLIT-NEXT: buffer_wbl2			; GFX90A-TGSPLIT-NEXT: buffer_wbl2
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_invl2			; GFX90A-TGSPLIT-NEXT: buffer_invl2
	; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)			; GFX90A-TGSPLIT-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol			; GFX90A-TGSPLIT-NEXT: buffer_wbinvl1_vol
	; GFX90A-TGSPLIT-NEXT: s_endpgm			; GFX90A-TGSPLIT-NEXT: s_endpgm
	;
	;
	entry:			entry:
				store i8 0, i8 addrspace(1)* %ptr, align 1
	fence syncscope("one-as") seq_cst			fence syncscope("one-as") seq_cst
	ret void			ret void
	}			}